Automatically detect when crawls are blocked by anti-bot systems (Akamai, Cloudflare, PerimeterX, DataDome, Imperva, etc.) and escalate through configurable retry and fallback strategies. New features on CrawlerRunConfig: - max_retries: retry rounds when blocking is detected - fallback_proxy_configs: list of fallback proxies tried each round - fallback_fetch_function: async last-resort function returning raw HTML New field on ProxyConfig: - is_fallback: skip proxy on first attempt, activate only when blocked Escalation chain per round: main proxy → fallback proxies in order. After all rounds: fallback_fetch_function as last resort. Detection uses tiered heuristics — structural HTML markers (high confidence) trigger on any page, generic patterns only on short error pages to avoid false positives.
12 KiB
Undetected Browser Mode
Overview
Crawl4AI offers two powerful anti-bot features to help you access websites with bot detection:
- Stealth Mode - Uses playwright-stealth to modify browser fingerprints and behaviors
- Undetected Browser Mode - Advanced browser adapter with deep-level patches for sophisticated bot detection
This guide covers both features and helps you choose the right approach for your needs.
Anti-Bot Features Comparison
| Feature | Regular Browser | Stealth Mode | Undetected Browser |
|---|---|---|---|
| WebDriver Detection | ❌ | ✅ | ✅ |
| Navigator Properties | ❌ | ✅ | ✅ |
| Plugin Emulation | ❌ | ✅ | ✅ |
| CDP Detection | ❌ | Partial | ✅ |
| Deep Browser Patches | ❌ | ❌ | ✅ |
| Performance Impact | None | Minimal | Moderate |
| Setup Complexity | None | None | Minimal |
When to Use Each Approach
Use Regular Browser + Stealth Mode When:
- Sites have basic bot detection (checking navigator.webdriver, plugins, etc.)
- You need good performance with basic protection
- Sites check for common automation indicators
Use Undetected Browser When:
- Sites employ sophisticated bot detection services (Cloudflare, DataDome, etc.)
- Stealth mode alone isn't sufficient
- You're willing to trade some performance for better evasion
Best Practice: Progressive Enhancement
- Start with: Regular browser + Stealth mode
- If blocked: Switch to Undetected browser
- If still blocked: Combine Undetected browser + Stealth mode
Stealth Mode
Stealth mode is the simpler anti-bot solution that works with both regular and undetected browsers:
from crawl4ai import AsyncWebCrawler, BrowserConfig
# Enable stealth mode with regular browser
browser_config = BrowserConfig(
enable_stealth=True, # Simple flag to enable
headless=False # Better for avoiding detection
)
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun("https://example.com")
What Stealth Mode Does:
- Removes
navigator.webdriverflag - Modifies browser fingerprints
- Emulates realistic plugin behavior
- Adjusts navigator properties
- Fixes common automation leaks
Undetected Browser Mode
For sites with sophisticated bot detection that stealth mode can't bypass, use the undetected browser adapter:
Key Features
- Drop-in Replacement: Uses the same API as regular browser mode
- Enhanced Stealth: Built-in patches to evade common detection methods
- Browser Adapter Pattern: Seamlessly switch between regular and undetected modes
- Automatic Installation:
crawl4ai-setupinstalls all necessary browser dependencies
Quick Start
import asyncio
from crawl4ai import (
AsyncWebCrawler,
BrowserConfig,
CrawlerRunConfig,
UndetectedAdapter
)
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
async def main():
# Create the undetected adapter
undetected_adapter = UndetectedAdapter()
# Create browser config
browser_config = BrowserConfig(
headless=False, # Headless mode can be detected easier
verbose=True,
)
# Create the crawler strategy with undetected adapter
crawler_strategy = AsyncPlaywrightCrawlerStrategy(
browser_config=browser_config,
browser_adapter=undetected_adapter
)
# Create the crawler with our custom strategy
async with AsyncWebCrawler(
crawler_strategy=crawler_strategy,
config=browser_config
) as crawler:
# Your crawling code here
result = await crawler.arun(
url="https://example.com",
config=CrawlerRunConfig()
)
print(result.markdown[:500])
asyncio.run(main())
Combining Both Features
For maximum evasion, combine stealth mode with undetected browser:
from crawl4ai import AsyncWebCrawler, BrowserConfig, UndetectedAdapter
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
# Create browser config with stealth enabled
browser_config = BrowserConfig(
enable_stealth=True, # Enable stealth mode
headless=False
)
# Create undetected adapter
adapter = UndetectedAdapter()
# Create strategy with both features
strategy = AsyncPlaywrightCrawlerStrategy(
browser_config=browser_config,
browser_adapter=adapter
)
async with AsyncWebCrawler(
crawler_strategy=strategy,
config=browser_config
) as crawler:
result = await crawler.arun("https://protected-site.com")
Examples
Example 1: Basic Stealth Mode
import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
async def test_stealth_mode():
# Simple stealth mode configuration
browser_config = BrowserConfig(
enable_stealth=True,
headless=False
)
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(
url="https://bot.sannysoft.com",
config=CrawlerRunConfig(screenshot=True)
)
if result.success:
print("✓ Successfully accessed bot detection test site")
# Save screenshot to verify detection results
if result.screenshot:
import base64
with open("stealth_test.png", "wb") as f:
f.write(base64.b64decode(result.screenshot))
print("✓ Screenshot saved - check for green (passed) tests")
asyncio.run(test_stealth_mode())
Example 2: Undetected Browser Mode
import asyncio
from crawl4ai import (
AsyncWebCrawler,
BrowserConfig,
CrawlerRunConfig,
UndetectedAdapter
)
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
async def main():
# Create browser config
browser_config = BrowserConfig(
headless=False,
verbose=True,
)
# Create the undetected adapter
undetected_adapter = UndetectedAdapter()
# Create the crawler strategy with the undetected adapter
crawler_strategy = AsyncPlaywrightCrawlerStrategy(
browser_config=browser_config,
browser_adapter=undetected_adapter
)
# Create the crawler with our custom strategy
async with AsyncWebCrawler(
crawler_strategy=crawler_strategy,
config=browser_config
) as crawler:
# Configure the crawl
crawler_config = CrawlerRunConfig(
markdown_generator=DefaultMarkdownGenerator(
content_filter=PruningContentFilter()
),
capture_console_messages=True, # Test adapter console capture
)
# Test on a site that typically detects bots
print("Testing undetected adapter...")
result: CrawlResult = await crawler.arun(
url="https://www.helloworld.org",
config=crawler_config
)
print(f"Status: {result.status_code}")
print(f"Success: {result.success}")
print(f"Console messages captured: {len(result.console_messages or [])}")
print(f"Markdown content (first 500 chars):\n{result.markdown.raw_markdown[:500]}")
if __name__ == "__main__":
asyncio.run(main())
Browser Adapter Pattern
The undetected browser support is implemented using an adapter pattern, allowing seamless switching between different browser implementations:
# Regular browser adapter (default)
from crawl4ai import PlaywrightAdapter
regular_adapter = PlaywrightAdapter()
# Undetected browser adapter
from crawl4ai import UndetectedAdapter
undetected_adapter = UndetectedAdapter()
The adapter handles:
- JavaScript execution
- Console message capture
- Error handling
- Browser-specific optimizations
Best Practices
-
Avoid Headless Mode: Detection is easier in headless mode
browser_config = BrowserConfig(headless=False) -
Use Reasonable Delays: Don't rush through pages
crawler_config = CrawlerRunConfig( wait_time=3.0, # Wait 3 seconds after page load delay_before_return_html=2.0 # Additional delay ) -
Rotate User Agents: You can customize user agents
browser_config = BrowserConfig( headers={"User-Agent": "your-user-agent"} ) -
Handle Failures Gracefully: Some sites may still detect and block
if not result.success: print(f"Crawl failed: {result.error_message}")
Advanced Usage Tips
Progressive Detection Handling
async def crawl_with_progressive_evasion(url):
# Step 1: Try regular browser with stealth
browser_config = BrowserConfig(
enable_stealth=True,
headless=False
)
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(url)
if result.success and "Access Denied" not in result.html:
return result
# Step 2: If blocked, try undetected browser
print("Regular + stealth blocked, trying undetected browser...")
adapter = UndetectedAdapter()
strategy = AsyncPlaywrightCrawlerStrategy(
browser_config=browser_config,
browser_adapter=adapter
)
async with AsyncWebCrawler(
crawler_strategy=strategy,
config=browser_config
) as crawler:
result = await crawler.arun(url)
return result
Installation
The undetected browser dependencies are automatically installed when you run:
crawl4ai-setup
This command installs all necessary browser dependencies for both regular and undetected modes.
Limitations
- Performance: Slightly slower than regular mode due to additional patches
- Headless Detection: Some sites can still detect headless mode
- Resource Usage: May use more resources than regular mode
- Not 100% Guaranteed: Advanced anti-bot services are constantly evolving
Troubleshooting
Browser Not Found
Run the setup command:
crawl4ai-setup
Detection Still Occurring
Try combining with other features:
crawler_config = CrawlerRunConfig(
simulate_user=True, # Add user simulation
magic=True, # Enable magic mode
wait_time=5.0, # Longer waits
)
Performance Issues
If experiencing slow performance:
# Use selective undetected mode only for protected sites
if is_protected_site(url):
adapter = UndetectedAdapter()
else:
adapter = PlaywrightAdapter() # Default adapter
Future Plans
Note: In future versions of Crawl4AI, we may enable stealth mode and undetected browser by default to provide better out-of-the-box success rates. For now, users should explicitly enable these features when needed.
Conclusion
Crawl4AI provides flexible anti-bot solutions:
- Start Simple: Use regular browser + stealth mode for most sites
- Escalate if Needed: Switch to undetected browser for sophisticated protection
- Combine for Maximum Effect: Use both features together when facing the toughest challenges
Remember:
- Always respect robots.txt and website terms of service
- Use appropriate delays to avoid overwhelming servers
- Consider the performance trade-offs of each approach
- Test progressively to find the minimum necessary evasion level
See Also
- Advanced Features - Overview of all advanced features
- Proxy & Security - Using proxies with anti-bot features
- Session Management - Maintaining sessions across requests
- Identity Based Crawling - Additional anti-detection strategies
- Anti-Bot Detection & Fallback - Automatic retry and proxy escalation when blocking is detected