crawl4ai/tests/test_llm_extraction_parallel_issue_1055.py at f6f7f1b5516e47b85e850183636132c589311419

Files

ntohidi a30548a98f This commit resolves issue #1055 where LLM extraction was blocking async

execution, causing URLs to be processed sequentially instead of in parallel.

  Changes:
  - Added aperform_completion_with_backoff() using litellm.acompletion for async LLM calls
  - Implemented arun() method in ExtractionStrategy base class with thread pool fallback
  - Created async arun() and aextract() methods in LLMExtractionStrategy using asyncio.gather
  - Updated AsyncWebCrawler.arun() to detect and use arun() when available
  - Added comprehensive test suite to verify parallel execution

  Impact:
  - LLM extraction now runs truly in parallel across multiple URLs
  - Significant performance improvement for multi-URL crawls with LLM strategies
  - Backward compatible - existing extraction strategies continue to work
  - No breaking changes to public API

  Technical details:
  - Uses litellm.acompletion for non-blocking LLM calls
  - Leverages asyncio.gather for concurrent chunk processing
  - Maintains backward compatibility via asyncio.to_thread fallback
  - Works seamlessly with MemoryAdaptiveDispatcher and other dispatchers

2025-11-06 11:22:45 +01:00

6.7 KiB

Raw Blame History

View Raw

6.7 KiB Raw Blame History

6.7 KiB

Raw Blame History