Fix proxy escalation: don't re-raise on first proxy exception when chain has alternatives

When proxy_config is a list (escalation chain) and the first proxy throws
an exception (timeout, connection error, browser crash), the retry loop
now continues to the next proxy instead of immediately re-raising.

Previously, exceptions on _p_idx==0 and _attempt==0 were always re-raised,
which broke the entire escalation chain — ISP/Residential/fallback proxies
were never tried. This made the proxy list effectively useless for sites
where the first-tier proxy fails with an exception rather than a blocked
response.

The raise is preserved when there's only a single proxy and single attempt
(len(proxy_list) <= 1 and max_attempts <= 1) so that simple non-chain
crawls still get immediate error propagation.
This commit is contained in:
unclecode
2026-02-15 09:55:55 +00:00
parent d028a889d0
commit 45d8e1450f

View File

@@ -513,15 +513,17 @@ class AsyncWebCrawler:
"blocked": True, "blocked": True,
"reason": str(_crawl_err), "reason": str(_crawl_err),
}) })
if _p_idx > 0 or _attempt > 0: self.logger.error_status(
self.logger.error_status( url=url,
url=url, error=f"Proxy {_proxy.server if _proxy else 'direct'} failed: {_crawl_err}",
error=f"Proxy {_proxy.server if _proxy else 'direct'} failed: {_crawl_err}", tag="ANTIBOT",
tag="ANTIBOT", )
) _block_reason = str(_crawl_err)
_block_reason = str(_crawl_err) # If this is the only proxy and only attempt, re-raise
else: # so the caller gets the real error (not a silent swallow).
raise # First attempt on first proxy propagates normally # But if there are more proxies or retries to try, continue.
if len(_proxy_list) <= 1 and _max_attempts <= 1:
raise
# Restore original proxy_config # Restore original proxy_config
config.proxy_config = _original_proxy_config config.proxy_config = _original_proxy_config