crawl4ai/tests at 418bfcfd3b168a000f75583335c2bfa4098b86c9 - crawl4ai - Gitea: Git with a cup of tea

ayrisdev/crawl4ai

Files

History

unclecode 418bfcfd3b Fix redirected_url containing raw HTML content for raw: URLs

When using raw: URLs without a base_url, redirected_url was incorrectly
set to the entire raw HTML string (potentially 300KB+) instead of None.

Changes:
- async_crawler_strategy.py: Don't fall back to url for raw:/file:// URLs
  in fast path, browser path, and HTTP strategy
- async_crawler_strategy.py: Skip page.url assignment for local content
  (would return "about:blank")
- async_webcrawler.py: Don't fall back to url for raw: URLs in crawl
  result and cached result paths
- Add comprehensive test suite for redirected_url handling

2026-01-20 00:45:15 +00:00

..

Release/v0.7.6 (#1556 )

2025-10-22 20:41:06 +08:00

Fix: capture current page URL to reflect JavaScript navigation and add test for delayed redirects. ref #1268

2025-12-02 13:00:54 +01:00

async_assistant

test(async_assistant): add new tests for extract pipeline

2025-06-23 10:44:27 +08:00

Fix CDP connection handling: support WS URLs and proper cleanup

2025-12-18 22:04:52 +08:00

cache_validation

Some debugging for caching

2025-12-21 04:48:03 +00:00

feat(cli): add command line interface with comprehensive features

2025-02-10 16:58:52 +08:00

Add crash recovery for deep crawl strategies

2025-12-22 14:51:10 +00:00

#1167 Add PHP MIME types to ContentTypeFilter for better file handling

2025-06-09 11:49:33 +08:00

Enhance authentication flow by implementing JWT token retrieval and adding authorization headers to API requests

2026-01-12 13:46:32 +01:00

Merge branch 'develop' into fix/wrong_url_raw

2025-11-24 13:54:07 +02:00

refactor(crawler): improve HTML handling and cleanup codebase

2025-02-07 21:56:27 +08:00

Release prep (#749 )

2025-02-28 19:53:35 +08:00

feat(docker): update Docker deployment for v0.6.0

2025-04-22 22:35:25 +08:00

Release/v0.7.6 (#1556 )

2025-10-22 20:41:06 +08:00

chore(profile-test): fix filename typo ( test_crteate_profile.py → test_create_profile.py )

2025-06-12 14:38:32 +03:00

Updates on proxy rotation and proxy configuration

2025-12-26 12:45:57 +00:00

test(releases): Add test cases for release 0.7.0

2025-07-11 22:27:18 +08:00

#1559 :Add tests for sitemap parsing and URL normalization in AsyncUrlSeeder

2025-11-10 14:15:54 +08:00

__init__.py

- Test all methods

2024-05-14 21:27:41 +08:00

check_dependencies.py

refactor: replace PyPDF2 with pypdf across the codebase. ref #1412

2025-12-03 10:59:18 +01:00

docker_example.py

docs: remove CRAWL4AI_API_TOKEN references and use correct endpoints in Docker example scripts (#1015 )

2025-08-09 19:37:22 +05:30

test_arun_many.py

feat: Add URL-specific crawler configurations for multi-URL crawling

2025-08-02 19:10:36 +08:00

test_cli_docs.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

test_config_matching_only.py

feat: Add URL-specific crawler configurations for multi-URL crawling

2025-08-02 19:10:36 +08:00

test_config_selection.py

feat: Add URL-specific crawler configurations for multi-URL crawling

2025-08-02 19:10:36 +08:00

test_docker_api_with_llm_provider.py

feat(docker): add flexible LLM provider configuration

2025-08-05 14:09:54 +08:00

test_docker.py

Fix: README.md urls list

2025-04-29 16:26:35 +02:00

test_link_extractor.py

feat: cleanup unused code and enhance documentation for v0.7.1

2025-07-17 11:35:16 +02:00

test_llm_extraction_parallel_issue_1055.py

This commit resolves issue #1055 where LLM extraction was blocking async

2025-11-06 11:22:45 +01:00

test_llm_simple_url.py

refactor: Update LLMTableExtraction examples and tests

2025-08-15 18:47:31 +08:00

test_llmtxt.py

Apply Ruff Corrections

2025-01-13 19:19:58 +08:00

test_main.py

Fix: README.md urls list

2025-04-29 16:26:35 +02:00

test_memory_macos.py

refactor(utils): move memory utilities to utils and update imports

2025-08-17 19:14:55 +08:00

test_multi_config.py

fix: Correct URL matcher fallback behavior and improve memory monitoring

2025-08-03 16:50:54 +08:00

test_normalize_url.py

#1103 fix(url): enhance URL normalization to handle invalid schemes and trailing slashes

2025-05-19 13:51:16 +08:00

test_prefetch_integration.py

Add prefetch mode for two-phase deep crawling

2025-12-25 01:55:08 +00:00

test_prefetch_mode.py

Add prefetch mode for two-phase deep crawling

2025-12-25 01:55:08 +00:00

test_prefetch_regression.py

Add prefetch mode for two-phase deep crawling

2025-12-25 01:55:08 +00:00

test_preserve_https_for_internal_links.py

feat: add preserve_https_for_internal_links flag to maintain HTTPS during crawling. Ref #1410

2025-08-28 17:38:40 +08:00

test_pyopenssl_security_fix.py

test: add verification tests for pyOpenSSL security update

2025-10-23 06:57:25 +00:00

test_pyopenssl_update.py

test: add verification tests for pyOpenSSL security update

2025-10-23 06:57:25 +00:00

test_raw_html_browser.py

Add browser pipeline support for raw:/file:// URLs

2025-12-27 12:32:42 +00:00

test_raw_html_edge_cases.py

Add browser pipeline support for raw:/file:// URLs

2025-12-27 12:32:42 +00:00

test_raw_html_redirected_url.py

Fix redirected_url containing raw HTML content for raw: URLs

2026-01-20 00:45:15 +00:00

test_scraping_strategy.py

Release prep (#749 )

2025-02-28 19:53:35 +08:00

test_virtual_scroll.py

feat: Add virtual scroll support for modern web scraping

2025-06-29 20:41:37 +08:00

test_web_crawler.py

Update all documentation to import extraction strategies directly from crawl4ai.

2025-06-10 18:08:27 +08:00

test_webhook_feature.sh

test: add comprehensive webhook feature test script

2025-10-22 00:35:07 +00:00

WEBHOOK_TEST_README.md

test: add comprehensive webhook feature test script

2025-10-22 00:35:07 +00:00