Fix proxy escalation: don't re-raise on first proxy exception when chain has alternatives
When proxy_config is a list (escalation chain) and the first proxy throws an exception (timeout, connection error, browser crash), the retry loop now continues to the next proxy instead of immediately re-raising. Previously, exceptions on _p_idx==0 and _attempt==0 were always re-raised, which broke the entire escalation chain — ISP/Residential/fallback proxies were never tried. This made the proxy list effectively useless for sites where the first-tier proxy fails with an exception rather than a blocked response. The raise is preserved when there's only a single proxy and single attempt (len(proxy_list) <= 1 and max_attempts <= 1) so that simple non-chain crawls still get immediate error propagation.
This commit is contained in:
@@ -513,15 +513,17 @@ class AsyncWebCrawler:
|
||||
"blocked": True,
|
||||
"reason": str(_crawl_err),
|
||||
})
|
||||
if _p_idx > 0 or _attempt > 0:
|
||||
self.logger.error_status(
|
||||
url=url,
|
||||
error=f"Proxy {_proxy.server if _proxy else 'direct'} failed: {_crawl_err}",
|
||||
tag="ANTIBOT",
|
||||
)
|
||||
_block_reason = str(_crawl_err)
|
||||
else:
|
||||
raise # First attempt on first proxy propagates normally
|
||||
self.logger.error_status(
|
||||
url=url,
|
||||
error=f"Proxy {_proxy.server if _proxy else 'direct'} failed: {_crawl_err}",
|
||||
tag="ANTIBOT",
|
||||
)
|
||||
_block_reason = str(_crawl_err)
|
||||
# If this is the only proxy and only attempt, re-raise
|
||||
# so the caller gets the real error (not a silent swallow).
|
||||
# But if there are more proxies or retries to try, continue.
|
||||
if len(_proxy_list) <= 1 and _max_attempts <= 1:
|
||||
raise
|
||||
|
||||
# Restore original proxy_config
|
||||
config.proxy_config = _original_proxy_config
|
||||
|
||||
Reference in New Issue
Block a user