Unify proxy_config to accept list, add crawl_stats tracking

- proxy_config on CrawlerRunConfig now accepts a single ProxyConfig or
  a list of ProxyConfig tried in order (first-come-first-served)
- Remove is_fallback from ProxyConfig and fallback_proxy_configs from
  CrawlerRunConfig — proxy escalation handled entirely by list order
- Add _get_proxy_list() normalizer for the retry loop
- Add CrawlResult.crawl_stats with attempts, retries, proxies_used,
  fallback_fetch_used, and resolved_by for billing and observability
- Set success=False with error_message when all attempts are blocked
- Simplify retry loop — no more is_fallback stashing logic
- Update docs and tests to reflect new API
This commit is contained in:
unclecode
2026-02-14 07:53:46 +00:00
parent 72b546c48d
commit 875207287e
6 changed files with 141 additions and 115 deletions

View File

@@ -276,12 +276,11 @@ class CrawlerRunConfig:
- See [Identity Based Crawling](../advanced/identity-based-crawling.md#7-locale-timezone-and-geolocation-control)
10.**Proxy Configuration**:
- **`proxy_config`**: Proxy server configuration (ProxyConfig object or dict) e.g. {"server": "...", "username": "...", "password"}. Set `is_fallback=True` to only use the proxy when anti-bot blocking is detected.
- **`proxy_config`**: Single `ProxyConfig` or `list[ProxyConfig]` — proxies tried in order. Pass a list for automatic escalation.
- **`proxy_rotation_strategy`**: Strategy for rotating proxies during crawls
11.**Anti-Bot Retry & Fallback** (see [Anti-Bot & Fallback](../advanced/anti-bot-and-fallback.md)):
- **`max_retries`**: Number of retry rounds when blocking is detected (default: 0)
- **`fallback_proxy_configs`**: List of fallback proxies tried in order within each retry round
- **`max_retries`**: Number of retry rounds when blocking is detected (default: 0). Each round tries all proxies in `proxy_config`.
- **`fallback_fetch_function`**: Async function called as last resort — takes URL, returns raw HTML
12.**Page Interaction Parameters**: