Add ProxyConfig.DIRECT sentinel for direct-then-proxy escalation
Allow "direct" or None in proxy_config list to explicitly try without a proxy before escalating to proxy servers. The retry loop already handled None as direct — this exposes it as a clean user-facing API via ProxyConfig.DIRECT.
This commit is contained in:
@@ -347,6 +347,8 @@ class GeolocationConfig:
|
||||
return GeolocationConfig.from_dict(config_dict)
|
||||
|
||||
class ProxyConfig:
|
||||
DIRECT = "direct" # Sentinel: use in proxy_config list to mean "no proxy"
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
server: str,
|
||||
@@ -1498,7 +1500,9 @@ class CrawlerRunConfig():
|
||||
if isinstance(proxy_config, list):
|
||||
normalized = []
|
||||
for p in proxy_config:
|
||||
if isinstance(p, dict):
|
||||
if p is None or p == "direct":
|
||||
normalized.append(None)
|
||||
elif isinstance(p, dict):
|
||||
normalized.append(ProxyConfig.from_dict(p))
|
||||
elif isinstance(p, str):
|
||||
normalized.append(ProxyConfig.from_string(p))
|
||||
|
||||
@@ -21,7 +21,7 @@ All anti-bot retry options live on `CrawlerRunConfig`:
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|---|---|---|---|
|
||||
| `proxy_config` | `ProxyConfig`, `list[ProxyConfig]`, or `None` | `None` | Single proxy or ordered list of proxies to try. Each retry round iterates through the full list. |
|
||||
| `proxy_config` | `ProxyConfig`, `list[ProxyConfig]`, or `None` | `None` | Single proxy or ordered list of proxies to try. Each retry round iterates through the full list. Use `"direct"` or `ProxyConfig.DIRECT` in a list to explicitly try without a proxy. |
|
||||
| `max_retries` | `int` | `0` | Number of retry rounds when blocking is detected. `0` = no retries. |
|
||||
| `fallback_fetch_function` | `async (str) -> str` | `None` | Async function called as last resort. Takes URL, returns raw HTML. |
|
||||
|
||||
@@ -95,6 +95,31 @@ config = CrawlerRunConfig(
|
||||
)
|
||||
```
|
||||
|
||||
### Direct-First, Then Proxies
|
||||
|
||||
Try without a proxy first, then escalate to proxies if blocked. Use `ProxyConfig.DIRECT` (or the string `"direct"`) in the list to represent a no-proxy attempt.
|
||||
|
||||
```python
|
||||
config = CrawlerRunConfig(
|
||||
max_retries=1,
|
||||
proxy_config=[
|
||||
ProxyConfig.DIRECT, # Try without proxy first
|
||||
ProxyConfig(
|
||||
server="http://datacenter-proxy.example.com:8080",
|
||||
username="user",
|
||||
password="pass",
|
||||
),
|
||||
ProxyConfig(
|
||||
server="http://residential-proxy.example.com:9090",
|
||||
username="user",
|
||||
password="pass",
|
||||
),
|
||||
],
|
||||
)
|
||||
```
|
||||
|
||||
With this setup, each round tries direct first, then datacenter, then residential. With `max_retries=1`, worst case is 2 rounds x 3 steps = 6 attempts.
|
||||
|
||||
### Proxy List (Escalation)
|
||||
|
||||
Pass a list of proxies. They're tried in order — first one that works wins. Within each retry round, the entire list is tried again.
|
||||
|
||||
Reference in New Issue
Block a user