Fix proxy escalation: don't re-raise on first proxy exception when chain has alternatives

When proxy_config is a list (escalation chain) and the first proxy throws
an exception (timeout, connection error, browser crash), the retry loop
now continues to the next proxy instead of immediately re-raising.

Previously, exceptions on _p_idx==0 and _attempt==0 were always re-raised,
which broke the entire escalation chain — ISP/Residential/fallback proxies
were never tried. This made the proxy list effectively useless for sites
where the first-tier proxy fails with an exception rather than a blocked
response.

The raise is preserved when there's only a single proxy and single attempt
(len(proxy_list) <= 1 and max_attempts <= 1) so that simple non-chain
crawls still get immediate error propagation.
This commit is contained in:
unclecode
2026-02-15 09:55:55 +00:00
parent d028a889d0
commit 45d8e1450f

View File

@@ -513,15 +513,17 @@ class AsyncWebCrawler:
"blocked": True,
"reason": str(_crawl_err),
})
if _p_idx > 0 or _attempt > 0:
self.logger.error_status(
url=url,
error=f"Proxy {_proxy.server if _proxy else 'direct'} failed: {_crawl_err}",
tag="ANTIBOT",
)
_block_reason = str(_crawl_err)
else:
raise # First attempt on first proxy propagates normally
self.logger.error_status(
url=url,
error=f"Proxy {_proxy.server if _proxy else 'direct'} failed: {_crawl_err}",
tag="ANTIBOT",
)
_block_reason = str(_crawl_err)
# If this is the only proxy and only attempt, re-raise
# so the caller gets the real error (not a silent swallow).
# But if there are more proxies or retries to try, continue.
if len(_proxy_list) <= 1 and _max_attempts <= 1:
raise
# Restore original proxy_config
config.proxy_config = _original_proxy_config