Fix proxy escalation: don't re-raise on first proxy exception when chain has alternatives
When proxy_config is a list (escalation chain) and the first proxy throws an exception (timeout, connection error, browser crash), the retry loop now continues to the next proxy instead of immediately re-raising. Previously, exceptions on _p_idx==0 and _attempt==0 were always re-raised, which broke the entire escalation chain — ISP/Residential/fallback proxies were never tried. This made the proxy list effectively useless for sites where the first-tier proxy fails with an exception rather than a blocked response. The raise is preserved when there's only a single proxy and single attempt (len(proxy_list) <= 1 and max_attempts <= 1) so that simple non-chain crawls still get immediate error propagation.
This commit is contained in:
@@ -513,15 +513,17 @@ class AsyncWebCrawler:
|
|||||||
"blocked": True,
|
"blocked": True,
|
||||||
"reason": str(_crawl_err),
|
"reason": str(_crawl_err),
|
||||||
})
|
})
|
||||||
if _p_idx > 0 or _attempt > 0:
|
self.logger.error_status(
|
||||||
self.logger.error_status(
|
url=url,
|
||||||
url=url,
|
error=f"Proxy {_proxy.server if _proxy else 'direct'} failed: {_crawl_err}",
|
||||||
error=f"Proxy {_proxy.server if _proxy else 'direct'} failed: {_crawl_err}",
|
tag="ANTIBOT",
|
||||||
tag="ANTIBOT",
|
)
|
||||||
)
|
_block_reason = str(_crawl_err)
|
||||||
_block_reason = str(_crawl_err)
|
# If this is the only proxy and only attempt, re-raise
|
||||||
else:
|
# so the caller gets the real error (not a silent swallow).
|
||||||
raise # First attempt on first proxy propagates normally
|
# But if there are more proxies or retries to try, continue.
|
||||||
|
if len(_proxy_list) <= 1 and _max_attempts <= 1:
|
||||||
|
raise
|
||||||
|
|
||||||
# Restore original proxy_config
|
# Restore original proxy_config
|
||||||
config.proxy_config = _original_proxy_config
|
config.proxy_config = _original_proxy_config
|
||||||
|
|||||||
Reference in New Issue
Block a user