Commit Graph

  • c854e2b899 Fix simulate_user destroying page content via ArrowDown keypress develop unclecode 2026-02-19 15:03:28 +00:00
  • 8df3541ac4 Skip anti-bot checks and fallback for raw: URLs unclecode 2026-02-19 14:05:56 +00:00
  • 94a77eea30 Move test_repro_1640.py to tests/browser/ unclecode 2026-02-19 06:33:46 +00:00
  • 2060c7e965 Fix browser recycling deadlock under sustained concurrent load (#1640) unclecode 2026-02-19 06:27:25 +00:00
  • 13048a106b Add Tier 3 structural integrity check to anti-bot detector unclecode 2026-02-18 06:59:22 +00:00
  • c9cb0160cf Add token usage tracking to generate_schema / agenerate_schema unclecode 2026-02-18 06:44:17 +00:00
  • 8576331d4e Add Shadow DOM flattening and reorder js_code execution pipeline unclecode 2026-02-18 06:43:00 +00:00
  • 4fb02f8b50 Warn LLM against hashed/generated CSS class names in schema prompts unclecode 2026-02-17 12:02:58 +00:00
  • d267c650cb Add source (sibling selector) support to JSON extraction strategies unclecode 2026-02-17 09:04:40 +00:00
  • ccd24aa824 Fix fallback fetch: run when all proxies crash, skip re-check, never return None unclecode 2026-02-15 10:55:00 +00:00
  • 45d8e1450f Fix proxy escalation: don't re-raise on first proxy exception when chain has alternatives unclecode 2026-02-15 09:55:55 +00:00
  • d028a889d0 Make proxy_config a property so direct assignment also normalizes unclecode 2026-02-14 13:16:36 +00:00
  • 879553955c Add ProxyConfig.DIRECT sentinel for direct-then-proxy escalation unclecode 2026-02-14 10:25:07 +00:00
  • 875207287e Unify proxy_config to accept list, add crawl_stats tracking unclecode 2026-02-14 07:53:46 +00:00
  • 72b546c48d Add anti-bot detection, retry, and fallback system unclecode 2026-02-14 05:24:07 +00:00
  • fdd989785f Sync sec-ch-ua with User-Agent and keep WebGL alive in stealth mode unclecode 2026-02-13 04:10:47 +00:00
  • 112f44a97d Fix proxy auth for persistent browser contexts unclecode 2026-02-12 11:19:29 +00:00
  • 1a24ac785e Refactor from_kwargs to respect set_defaults and use __init__ defaults unclecode 2026-02-11 13:35:36 +00:00
  • 3fc7730aaf Add remove_consent_popups flag and fix from_kwargs dict deserialization unclecode 2026-02-11 12:46:47 +00:00
  • 44b8afb6dc Improve schema generation prompt for sibling-based layouts unclecode 2026-02-10 08:34:22 +00:00
  • fbc52813a4 Add tests, docs, and contributors for PRs #1463 and #1435 unclecode 2026-02-06 09:30:19 +00:00
  • 37a49c5315 Merge PR #1435: Add redirected_status_code to CrawlResult unclecode 2026-02-06 09:23:54 +00:00
  • 0aacafed0a Merge PR #1463: Add configurable device_scale_factor for screenshot quality unclecode 2026-02-06 09:19:42 +00:00
  • 719e83e105 Update PR todolist — refresh open PRs, add 6 new, classify unclecode 2026-02-06 09:06:13 +00:00
  • 3401dd1620 Fix browser recycling under high concurrency — version-based approach unclecode 2026-02-05 07:48:12 +00:00
  • c046918bb4 Add memory-saving mode, browser recycling, and CDP leak fixes unclecode 2026-02-04 02:00:13 +00:00
  • 4e56f3e00d Add contributing guide and update mkdocs navigation for community resources ntohidi 2026-02-03 09:46:54 +01:00
  • 0bfcf080dd Add contributors from PRs #1133, #729 unclecode 2026-02-02 07:56:37 +00:00
  • b962699c0d Add contributors from PRs #973, #1073, #931 unclecode 2026-02-02 07:14:12 +00:00
  • ffd3face6b Remove duplicate PROMPT_EXTRACT_BLOCKS definition in prompts.py unclecode 2026-02-02 07:04:35 +00:00
  • c790231aba Fix browser context memory leak — signature shrink + LRU eviction (#943) unclecode 2026-02-01 14:23:04 +00:00
  • bb523b6c6c Merge PRs #1077, #1281 — bs4 deprecation and proxy auth fix unclecode 2026-02-01 07:06:39 +00:00
  • 980dc73156 Merge PR #1281: Fix proxy auth ERR_INVALID_AUTH_CREDENTIALS unclecode 2026-02-01 07:05:00 +00:00
  • 98aea2fb46 Merge PR #1077: Fix bs4 deprecation warning (text -> string) unclecode 2026-02-01 07:04:31 +00:00
  • a56dd07559 Merge PRs #1667, #1296, #1364 — CLI deep-crawl, env var, script tags unclecode 2026-02-01 06:53:53 +00:00
  • 312cef8633 Fix PR #1296: restore .crawl4ai subfolder in VersionManager path unclecode 2026-02-01 06:22:16 +00:00
  • a244e4d781 Merge PR #1364: Fix script tag removal losing adjacent text in cleaned_html unclecode 2026-02-01 06:22:10 +00:00
  • 0f83b05a2d Merge PR #1296: Fix VersionManager ignoring CRAWL4_AI_BASE_DIRECTORY env var unclecode 2026-02-01 06:21:40 +00:00
  • 37995d4d3f Merge PR #1667: Fix deep-crawl CLI outputting only the first page unclecode 2026-02-01 06:21:25 +00:00
  • dc4ae73221 Merge PRs #1714, #1721, #1719, #1717 and fix base tag pipeline unclecode 2026-02-01 05:41:33 +00:00
  • 5cd0648d71 Merge PR #1717: Allow local embeddings by removing OpenAI fallback unclecode 2026-02-01 05:02:18 +00:00
  • 9172581416 Merge PR #1719: Include GoogleSearchCrawler script.js in package distribution unclecode 2026-02-01 05:02:05 +00:00
  • c39e796a18 Merge PR #1721: Fix <base> tag ignored in html2text relative link resolution unclecode 2026-02-01 05:01:52 +00:00
  • ccab926f1f Merge PR #1714: Replace tf-playwright-stealth with playwright-stealth unclecode 2026-02-01 05:01:31 +00:00
  • 43738c9ed2 Fix can_process_url() to receive normalized URL in deep crawl strategies unclecode 2026-02-01 03:45:52 +00:00
  • ee717dc019 Add contributor for PR #1746 and fix test pytest marker unclecode 2026-02-01 03:10:32 +00:00
  • 7c5933e2e7 Merge PR #1746: Fix sitemap-only URL seeding avoiding Common Crawl calls unclecode 2026-02-01 02:57:06 +00:00
  • 5be0d2d75e Add contributor and docs for force_viewport_screenshot feature unclecode 2026-02-01 01:10:20 +00:00
  • e19492a82e Merge PR #1694: feat: add force viewport screenshot unclecode 2026-02-01 01:05:52 +00:00
  • 55a2cc8181 Document set_defaults/get_defaults/reset_defaults in config guides unclecode 2026-01-31 11:46:53 +00:00
  • 13a414802b Add set_defaults/get_defaults/reset_defaults to config classes unclecode 2026-01-31 11:44:07 +00:00
  • 19b9140c68 Improve CDP connection handling unclecode 2026-01-31 11:07:26 +00:00
  • 694ba44a04 Added fix for URL Seeder forcing Common Crawl index in case of a "sitemap" ChiragBellara 2026-01-30 09:33:30 -08:00
  • 0104db6de2 Fix critical RCE via deserialization and eval() in /crawl endpoint unclecode 2026-01-30 08:46:01 +00:00
  • ad5ebf166a Merge pull request #1718 from YuriNachos/fix/issue-1704-default-logger Nasrin 2026-01-29 13:03:11 +01:00
  • 034bddf557 Merge pull request #1733 from jose-blockchain/fix/1686-docker-health-version Nasrin 2026-01-29 12:55:24 +01:00
  • 911bbce8b1 Fix agenerate_schema() JSON parsing for Anthropic models unclecode 2026-01-29 11:38:53 +00:00
  • 0a17fe8f19 Improve page tracking with global CDP endpoint-based tracking unclecode 2026-01-28 09:30:20 +00:00
  • 9b52c1490b Fix page reuse race condition when create_isolated_context=False unclecode 2026-01-28 01:43:21 +00:00
  • 656b938ef8 Merge branch 'main' into develop unclecode 2026-01-27 01:58:45 +00:00
  • 55de32d925 Add CycloneDX SBOM and generation script main unclecode 2026-01-27 01:45:42 +00:00
  • 21e6c418be Fix: Keep storage_state.json in profile shrink unclecode 2026-01-26 13:06:31 +00:00
  • 18d2ef4a24 Fix: Disable cookie encryption for portable profiles unclecode 2026-01-26 12:57:17 +00:00
  • ef226f5787 Add: Cloud CLI module for profile management unclecode 2026-01-25 09:35:48 +00:00
  • 94e19a4c72 Enhance browser profile management capabilities unclecode 2026-01-24 08:02:52 +00:00
  • 79ebfce913 Refactor HTML block delimiter to use config constant unclecode 2026-01-24 04:19:50 +00:00
  • 2d5e5306c5 Add support for parallel URL processing in extraction utilities unclecode 2026-01-24 04:13:39 +00:00
  • b0b3ca1222 Refactor extraction strategy internals and improve error handling unclecode 2026-01-24 03:08:41 +00:00
  • 777d0878f2 Update security contact emails in SECURITY.md ntohidi 2026-01-22 09:53:24 +01:00
  • fbfbc6995c Fix deep crawl cancellation example to use DFS for precise control unclecode 2026-01-22 06:25:34 +00:00
  • 1e2b7fe7e6 Add documentation and example for deep crawl cancellation unclecode 2026-01-22 06:10:54 +00:00
  • f6897d1429 Add cancellation support for deep crawl strategies unclecode 2026-01-22 06:08:25 +00:00
  • c9a271a3ff Merge branch 'fix/1686-docker-health-version' of https://github.com/jose-blockchain/crawl4ai into fix/1686-docker-health-version José 2026-01-20 23:45:13 +01:00
  • 9123f65140 Fix #1686: Use dynamic version from crawl4ai package in health endpoint José 2026-01-20 23:40:38 +01:00
  • fe1c1cb0bc Fix #1686: Use dynamic version from crawl4ai package in health endpoint José 2026-01-20 23:40:38 +01:00
  • 418bfcfd3b Fix redirected_url containing raw HTML content for raw: URLs unclecode 2026-01-20 00:31:12 +00:00
  • 857b1ed23b Merge branch 'main' into develop ntohidi 2026-01-19 13:25:56 +01:00
  • f6f7f1b551 Release v0.8.0: Crash Recovery, Prefetch Mode & Security Fixes (#1712) Nasrin 2026-01-17 14:19:15 +01:00
  • 2016d669a9 fix: Respect <base> tag for relative link resolution in html2text Yurii Chukhlib 2026-01-17 11:17:28 +01:00
  • 232f00752c fix: Initialize default logger in AsyncPlaywrightCrawlerStrategy Yurii Chukhlib 2026-01-17 11:14:42 +01:00
  • ef8f0c6096 fix: Include GoogleSearchCrawler script.js in package distribution Yurii Chukhlib 2026-01-17 11:15:30 +01:00
  • 2a04fc319a fix: Allow local embeddings by removing OpenAI fallback in EmbeddingStrategy Yurii Chukhlib 2026-01-17 11:10:33 +01:00
  • 624dfe7af5 fix: Replace tf-playwright-stealth with playwright-stealth dependency Yurii Chukhlib 2026-01-17 11:06:44 +01:00
  • a5354f267a Merge branch 'develop' into release/v0.8.0 v0.8.0 docker-rebuild-v0.8.0 release/v0.8.0 ntohidi 2026-01-16 11:34:24 +01:00
  • 6090629ee0 Fix: Enable litellm.drop_params for O-series/GPT-5 model compatibility unclecode 2026-01-16 09:56:38 +00:00
  • a00da6557b Add async agenerate_schema method for schema generation unclecode 2026-01-16 06:19:33 +00:00
  • 177e298af0 Update security researcher acknowledgment with a hyperlink for Neo by ProjectDiscovery ntohidi 2026-01-14 14:19:23 +01:00
  • f09146c435 Release v0.8.0: The v0.8.0 Update ntohidi 2026-01-14 13:46:42 +01:00
  • 315eae9e6f Add examples for deep crawl crash recovery and prefetch mode in documentation ntohidi 2026-01-14 12:58:44 +01:00
  • 530cde351f Add release notes for v0.8.0, detailing breaking changes, security fixes, new features, bug fixes, and documentation updates unclecode 2026-01-12 13:45:42 +00:00
  • 122b4fe3f0 Add release notes for v0.7.9, detailing breaking changes, security fixes, new features, bug fixes, and documentation updates ntohidi 2026-01-12 13:46:39 +01:00
  • acfab80dd4 Enhance authentication flow by implementing JWT token retrieval and adding authorization headers to API requests ntohidi 2026-01-12 13:46:32 +01:00
  • f24396c23e Fix critical RCE and LFI vulnerabilities in Docker API deployment unclecode 2026-01-12 04:14:37 +00:00
  • cee79a8129 feat: add force viewport screenshot TheRedRad 2026-01-06 21:12:17 +01:00
  • 6b2dca76c3 Docs: Add multi-sample schema generation section unclecode 2026-01-04 12:50:08 +00:00
  • 0d3f9e65b0 Add MEMORY.md to gitignore unclecode 2025-12-30 03:04:30 +00:00
  • db61ab8559 Update URL seeder docs with smart TTL cache parameters unclecode 2025-12-30 03:03:41 +00:00
  • 3d78001c30 Add smart TTL cache for sitemap URL seeder unclecode 2025-12-30 01:59:09 +00:00
  • 2550f3d2d5 Add browser pipeline support for raw:/file:// URLs unclecode 2025-12-27 12:32:42 +00:00
  • a43256b27a Add proxy support to HTTP crawler strategy unclecode 2025-12-26 13:17:28 +00:00