Add comprehensive tests for anti-bot strategies and extended features

- Implemented `test_adapter_verification.py` to verify correct usage of browser adapters.
- Created `test_all_features.py` for a comprehensive suite covering URL seeding, adaptive crawling, browser adapters, proxy rotation, and dispatchers.
- Developed `test_anti_bot_strategy.py` to validate the functionality of various anti-bot strategies.
- Added `test_antibot_simple.py` for simple testing of anti-bot strategies using async web crawling.
- Introduced `test_bot_detection.py` to assess adapter performance against bot detection mechanisms.
- Compiled `test_final_summary.py` to provide a detailed summary of all tests and their results.
This commit is contained in:
AHMET YILMAZ
2025-10-07 18:51:13 +08:00
parent f00e8cbf35
commit 201843a204
23 changed files with 5265 additions and 96 deletions

View File

@@ -56,14 +56,23 @@ async def get_crawler(
if psutil.virtual_memory().percent >= MEM_LIMIT:
raise MemoryError("RAM pressure new browser denied")
# Create strategy with the specified adapter
strategy = AsyncPlaywrightCrawlerStrategy(
browser_config=cfg, browser_adapter=adapter or PlaywrightAdapter()
)
# Create crawler - let it initialize the strategy with proper logger
# Pass browser_adapter as a kwarg so AsyncWebCrawler can use it when creating the strategy
crawler = AsyncWebCrawler(
config=cfg, crawler_strategy=strategy, thread_safe=False
config=cfg,
thread_safe=False
)
# Set the browser adapter on the strategy after crawler initialization
if adapter:
# Create a new strategy with the adapter and the crawler's logger
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
crawler.crawler_strategy = AsyncPlaywrightCrawlerStrategy(
browser_config=cfg,
logger=crawler.logger,
browser_adapter=adapter
)
await crawler.start()
POOL[sig] = crawler
LAST_USED[sig] = time.time()