crawl4ai/tests at fix/dfs_deep_crawling - crawl4ai - Gitea: Git with a cup of tea

ayrisdev/crawl4ai

Files

History

ntohidi a30548a98f This commit resolves issue #1055 where LLM extraction was blocking async

execution, causing URLs to be processed sequentially instead of in parallel.

  Changes:
  - Added aperform_completion_with_backoff() using litellm.acompletion for async LLM calls
  - Implemented arun() method in ExtractionStrategy base class with thread pool fallback
  - Created async arun() and aextract() methods in LLMExtractionStrategy using asyncio.gather
  - Updated AsyncWebCrawler.arun() to detect and use arun() when available
  - Added comprehensive test suite to verify parallel execution

  Impact:
  - LLM extraction now runs truly in parallel across multiple URLs
  - Significant performance improvement for multi-URL crawls with LLM strategies
  - Backward compatible - existing extraction strategies continue to work
  - No breaking changes to public API

  Technical details:
  - Uses litellm.acompletion for non-blocking LLM calls
  - Leverages asyncio.gather for concurrent chunk processing
  - Maintains backward compatibility via asyncio.to_thread fallback
  - Works seamlessly with MemoryAdaptiveDispatcher and other dispatchers

2025-11-06 11:22:45 +01:00

..

fix: allow custom LLM providers for adaptive crawler embedding config. ref: #1291

2025-09-09 12:49:55 +08:00

#1375 : refactor(proxy) Deprecate 'proxy' parameter in BrowserConfig and enhance proxy string parsing

2025-08-28 17:21:49 +08:00

async_assistant

test(async_assistant): add new tests for extract pipeline

2025-06-23 10:44:27 +08:00

fix(browser_profiler): improve keyboard input handling

2025-06-12 14:33:12 +03:00

feat(cli): add command line interface with comprehensive features

2025-02-10 16:58:52 +08:00

#1167 Add PHP MIME types to ContentTypeFilter for better file handling

2025-06-09 11:49:33 +08:00

feat: Add hooks utility for function-based hooks with Docker client integration. ref #1377

2025-10-13 12:34:08 +08:00

fix: remove_overlay_elements functionality by calling injected JS function. ref: #1396

2025-09-29 20:40:08 +05:30

refactor(crawler): improve HTML handling and cleanup codebase

2025-02-07 21:56:27 +08:00

Release prep (#749 )

2025-02-28 19:53:35 +08:00

feat(docker): update Docker deployment for v0.6.0

2025-04-22 22:35:25 +08:00

#1375 : refactor(proxy) Deprecate 'proxy' parameter in BrowserConfig and enhance proxy string parsing

2025-08-28 17:21:49 +08:00

chore(profile-test): fix filename typo ( test_crteate_profile.py → test_create_profile.py )

2025-06-12 14:38:32 +03:00

#1375 : refactor(proxy) Deprecate 'proxy' parameter in BrowserConfig and enhance proxy string parsing

2025-08-28 17:21:49 +08:00

test(releases): Add test cases for release 0.7.0

2025-07-11 22:27:18 +08:00

__init__.py

- Test all methods

2024-05-14 21:27:41 +08:00

check_dependencies.py

feat: add stealth mode and enhance undetected browser support

2025-07-17 16:59:10 +08:00

docker_example.py

docs: remove CRAWL4AI_API_TOKEN references and use correct endpoints in Docker example scripts (#1015 )

2025-08-09 19:37:22 +05:30

test_arun_many.py

feat: Add URL-specific crawler configurations for multi-URL crawling

2025-08-02 19:10:36 +08:00

test_cli_docs.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

test_config_matching_only.py

feat: Add URL-specific crawler configurations for multi-URL crawling

2025-08-02 19:10:36 +08:00

test_config_selection.py

feat: Add URL-specific crawler configurations for multi-URL crawling

2025-08-02 19:10:36 +08:00

test_docker_api_with_llm_provider.py

feat(docker): add flexible LLM provider configuration

2025-08-05 14:09:54 +08:00

test_docker.py

Fix: README.md urls list

2025-04-29 16:26:35 +02:00

test_link_extractor.py

feat: cleanup unused code and enhance documentation for v0.7.1

2025-07-17 11:35:16 +02:00

test_llm_extraction_parallel_issue_1055.py

This commit resolves issue #1055 where LLM extraction was blocking async

2025-11-06 11:22:45 +01:00

test_llm_simple_url.py

refactor: Update LLMTableExtraction examples and tests

2025-08-15 18:47:31 +08:00

test_llmtxt.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

test_main.py

Fix: README.md urls list

2025-04-29 16:26:35 +02:00

test_memory_macos.py

refactor(utils): move memory utilities to utils and update imports

2025-08-17 19:14:55 +08:00

test_multi_config.py

fix: Correct URL matcher fallback behavior and improve memory monitoring

2025-08-03 16:50:54 +08:00

test_normalize_url.py

#1103 fix(url): enhance URL normalization to handle invalid schemes and trailing slashes

2025-05-19 13:51:16 +08:00

test_preserve_https_for_internal_links.py

feat: add preserve_https_for_internal_links flag to maintain HTTPS during crawling. Ref #1410

2025-08-28 17:38:40 +08:00

test_pyopenssl_security_fix.py

test: add verification tests for pyOpenSSL security update

2025-10-23 06:57:25 +00:00

test_pyopenssl_update.py

test: add verification tests for pyOpenSSL security update

2025-10-23 06:57:25 +00:00

test_scraping_strategy.py

Release prep (#749 )

2025-02-28 19:53:35 +08:00

test_virtual_scroll.py

feat: Add virtual scroll support for modern web scraping

2025-06-29 20:41:37 +08:00

test_web_crawler.py

Update all documentation to import extraction strategies directly from crawl4ai.

2025-06-10 18:08:27 +08:00

test_webhook_feature.sh

test: add comprehensive webhook feature test script

2025-10-22 00:35:07 +00:00

WEBHOOK_TEST_README.md

test: add comprehensive webhook feature test script

2025-10-22 00:35:07 +00:00