This website requires JavaScript.
c854e2b899
Fix simulate_user destroying page content via ArrowDown keypress
develop
unclecode
2026-02-19 15:03:28 +00:00
8df3541ac4
Skip anti-bot checks and fallback for raw: URLs
unclecode
2026-02-19 14:05:56 +00:00
94a77eea30
Move test_repro_1640.py to tests/browser/
unclecode
2026-02-19 06:33:46 +00:00
2060c7e965
Fix browser recycling deadlock under sustained concurrent load (#1640 )
unclecode
2026-02-19 06:27:25 +00:00
13048a106b
Add Tier 3 structural integrity check to anti-bot detector
unclecode
2026-02-18 06:59:22 +00:00
c9cb0160cf
Add token usage tracking to generate_schema / agenerate_schema
unclecode
2026-02-18 06:44:17 +00:00
8576331d4e
Add Shadow DOM flattening and reorder js_code execution pipeline
unclecode
2026-02-18 06:43:00 +00:00
4fb02f8b50
Warn LLM against hashed/generated CSS class names in schema prompts
unclecode
2026-02-17 12:02:58 +00:00
d267c650cb
Add source (sibling selector) support to JSON extraction strategies
unclecode
2026-02-17 09:04:40 +00:00
ccd24aa824
Fix fallback fetch: run when all proxies crash, skip re-check, never return None
unclecode
2026-02-15 10:55:00 +00:00
45d8e1450f
Fix proxy escalation: don't re-raise on first proxy exception when chain has alternatives
unclecode
2026-02-15 09:55:55 +00:00
d028a889d0
Make proxy_config a property so direct assignment also normalizes
unclecode
2026-02-14 13:16:36 +00:00
879553955c
Add ProxyConfig.DIRECT sentinel for direct-then-proxy escalation
unclecode
2026-02-14 10:25:07 +00:00
875207287e
Unify proxy_config to accept list, add crawl_stats tracking
unclecode
2026-02-14 07:53:46 +00:00
72b546c48d
Add anti-bot detection, retry, and fallback system
unclecode
2026-02-14 05:24:07 +00:00
fdd989785f
Sync sec-ch-ua with User-Agent and keep WebGL alive in stealth mode
unclecode
2026-02-13 04:10:47 +00:00
112f44a97d
Fix proxy auth for persistent browser contexts
unclecode
2026-02-12 11:19:29 +00:00
1a24ac785e
Refactor from_kwargs to respect set_defaults and use __init__ defaults
unclecode
2026-02-11 13:35:36 +00:00
3fc7730aaf
Add remove_consent_popups flag and fix from_kwargs dict deserialization
unclecode
2026-02-11 12:46:47 +00:00
44b8afb6dc
Improve schema generation prompt for sibling-based layouts
unclecode
2026-02-10 08:34:22 +00:00
fbc52813a4
Add tests, docs, and contributors for PRs #1463 and #1435
unclecode
2026-02-06 09:30:19 +00:00
37a49c5315
Merge PR #1435 : Add redirected_status_code to CrawlResult
unclecode
2026-02-06 09:23:54 +00:00
0aacafed0a
Merge PR #1463 : Add configurable device_scale_factor for screenshot quality
unclecode
2026-02-06 09:19:42 +00:00
719e83e105
Update PR todolist — refresh open PRs, add 6 new, classify
unclecode
2026-02-06 09:06:13 +00:00
3401dd1620
Fix browser recycling under high concurrency — version-based approach
unclecode
2026-02-05 07:48:12 +00:00
c046918bb4
Add memory-saving mode, browser recycling, and CDP leak fixes
unclecode
2026-02-04 02:00:13 +00:00
4e56f3e00d
Add contributing guide and update mkdocs navigation for community resources
ntohidi
2026-02-03 09:46:54 +01:00
0bfcf080dd
Add contributors from PRs #1133 , #729
unclecode
2026-02-02 07:56:37 +00:00
b962699c0d
Add contributors from PRs #973 , #1073 , #931
unclecode
2026-02-02 07:14:12 +00:00
ffd3face6b
Remove duplicate PROMPT_EXTRACT_BLOCKS definition in prompts.py
unclecode
2026-02-02 07:04:35 +00:00
c790231aba
Fix browser context memory leak — signature shrink + LRU eviction (#943 )
unclecode
2026-02-01 14:23:04 +00:00
bb523b6c6c
Merge PRs #1077 , #1281 — bs4 deprecation and proxy auth fix
unclecode
2026-02-01 07:06:39 +00:00
980dc73156
Merge PR #1281 : Fix proxy auth ERR_INVALID_AUTH_CREDENTIALS
unclecode
2026-02-01 07:05:00 +00:00
98aea2fb46
Merge PR #1077 : Fix bs4 deprecation warning (text -> string)
unclecode
2026-02-01 07:04:31 +00:00
a56dd07559
Merge PRs #1667 , #1296 , #1364 — CLI deep-crawl, env var, script tags
unclecode
2026-02-01 06:53:53 +00:00
312cef8633
Fix PR #1296 : restore .crawl4ai subfolder in VersionManager path
unclecode
2026-02-01 06:22:16 +00:00
a244e4d781
Merge PR #1364 : Fix script tag removal losing adjacent text in cleaned_html
unclecode
2026-02-01 06:22:10 +00:00
0f83b05a2d
Merge PR #1296 : Fix VersionManager ignoring CRAWL4_AI_BASE_DIRECTORY env var
unclecode
2026-02-01 06:21:40 +00:00
37995d4d3f
Merge PR #1667 : Fix deep-crawl CLI outputting only the first page
unclecode
2026-02-01 06:21:25 +00:00
dc4ae73221
Merge PRs #1714 , #1721 , #1719 , #1717 and fix base tag pipeline
unclecode
2026-02-01 05:41:33 +00:00
5cd0648d71
Merge PR #1717 : Allow local embeddings by removing OpenAI fallback
unclecode
2026-02-01 05:02:18 +00:00
9172581416
Merge PR #1719 : Include GoogleSearchCrawler script.js in package distribution
unclecode
2026-02-01 05:02:05 +00:00
c39e796a18
Merge PR #1721 : Fix <base> tag ignored in html2text relative link resolution
unclecode
2026-02-01 05:01:52 +00:00
ccab926f1f
Merge PR #1714 : Replace tf-playwright-stealth with playwright-stealth
unclecode
2026-02-01 05:01:31 +00:00
43738c9ed2
Fix can_process_url() to receive normalized URL in deep crawl strategies
unclecode
2026-02-01 03:45:52 +00:00
ee717dc019
Add contributor for PR #1746 and fix test pytest marker
unclecode
2026-02-01 03:10:32 +00:00
7c5933e2e7
Merge PR #1746 : Fix sitemap-only URL seeding avoiding Common Crawl calls
unclecode
2026-02-01 02:57:06 +00:00
5be0d2d75e
Add contributor and docs for force_viewport_screenshot feature
unclecode
2026-02-01 01:10:20 +00:00
e19492a82e
Merge PR #1694 : feat: add force viewport screenshot
unclecode
2026-02-01 01:05:52 +00:00
55a2cc8181
Document set_defaults/get_defaults/reset_defaults in config guides
unclecode
2026-01-31 11:46:53 +00:00
13a414802b
Add set_defaults/get_defaults/reset_defaults to config classes
unclecode
2026-01-31 11:44:07 +00:00
19b9140c68
Improve CDP connection handling
unclecode
2026-01-31 11:07:26 +00:00
694ba44a04
Added fix for URL Seeder forcing Common Crawl index in case of a "sitemap"
ChiragBellara
2026-01-30 09:33:30 -08:00
0104db6de2
Fix critical RCE via deserialization and eval() in /crawl endpoint
unclecode
2026-01-30 08:46:01 +00:00
ad5ebf166a
Merge pull request #1718 from YuriNachos/fix/issue-1704-default-logger
Nasrin
2026-01-29 13:03:11 +01:00
034bddf557
Merge pull request #1733 from jose-blockchain/fix/1686-docker-health-version
Nasrin
2026-01-29 12:55:24 +01:00
911bbce8b1
Fix agenerate_schema() JSON parsing for Anthropic models
unclecode
2026-01-29 11:38:53 +00:00
0a17fe8f19
Improve page tracking with global CDP endpoint-based tracking
unclecode
2026-01-28 09:30:20 +00:00
9b52c1490b
Fix page reuse race condition when create_isolated_context=False
unclecode
2026-01-28 01:43:21 +00:00
656b938ef8
Merge branch 'main' into develop
unclecode
2026-01-27 01:58:45 +00:00
55de32d925
Add CycloneDX SBOM and generation script
main
unclecode
2026-01-27 01:45:42 +00:00
21e6c418be
Fix: Keep storage_state.json in profile shrink
unclecode
2026-01-26 13:06:31 +00:00
18d2ef4a24
Fix: Disable cookie encryption for portable profiles
unclecode
2026-01-26 12:57:17 +00:00
ef226f5787
Add: Cloud CLI module for profile management
unclecode
2026-01-25 09:35:48 +00:00
94e19a4c72
Enhance browser profile management capabilities
unclecode
2026-01-24 08:02:52 +00:00
79ebfce913
Refactor HTML block delimiter to use config constant
unclecode
2026-01-24 04:19:50 +00:00
2d5e5306c5
Add support for parallel URL processing in extraction utilities
unclecode
2026-01-24 04:13:39 +00:00
b0b3ca1222
Refactor extraction strategy internals and improve error handling
unclecode
2026-01-24 03:08:41 +00:00
777d0878f2
Update security contact emails in SECURITY.md
ntohidi
2026-01-22 09:53:24 +01:00
fbfbc6995c
Fix deep crawl cancellation example to use DFS for precise control
unclecode
2026-01-22 06:25:34 +00:00
1e2b7fe7e6
Add documentation and example for deep crawl cancellation
unclecode
2026-01-22 06:10:54 +00:00
f6897d1429
Add cancellation support for deep crawl strategies
unclecode
2026-01-22 06:08:25 +00:00
c9a271a3ff
Merge branch 'fix/1686-docker-health-version' of https://github.com/jose-blockchain/crawl4ai into fix/1686-docker-health-version
José
2026-01-20 23:45:13 +01:00
9123f65140
Fix #1686 : Use dynamic version from crawl4ai package in health endpoint
José
2026-01-20 23:40:38 +01:00
fe1c1cb0bc
Fix #1686 : Use dynamic version from crawl4ai package in health endpoint
José
2026-01-20 23:40:38 +01:00
418bfcfd3b
Fix redirected_url containing raw HTML content for raw: URLs
unclecode
2026-01-20 00:31:12 +00:00
857b1ed23b
Merge branch 'main' into develop
ntohidi
2026-01-19 13:25:56 +01:00
f6f7f1b551
Release v0.8.0: Crash Recovery, Prefetch Mode & Security Fixes (#1712 )
Nasrin
2026-01-17 14:19:15 +01:00
2016d669a9
fix: Respect <base> tag for relative link resolution in html2text
Yurii Chukhlib
2026-01-17 11:17:28 +01:00
232f00752c
fix: Initialize default logger in AsyncPlaywrightCrawlerStrategy
Yurii Chukhlib
2026-01-17 11:14:42 +01:00
ef8f0c6096
fix: Include GoogleSearchCrawler script.js in package distribution
Yurii Chukhlib
2026-01-17 11:15:30 +01:00
2a04fc319a
fix: Allow local embeddings by removing OpenAI fallback in EmbeddingStrategy
Yurii Chukhlib
2026-01-17 11:10:33 +01:00
624dfe7af5
fix: Replace tf-playwright-stealth with playwright-stealth dependency
Yurii Chukhlib
2026-01-17 11:06:44 +01:00
a5354f267a
Merge branch 'develop' into release/v0.8.0
v0.8.0
docker-rebuild-v0.8.0
release/v0.8.0
ntohidi
2026-01-16 11:34:24 +01:00
6090629ee0
Fix: Enable litellm.drop_params for O-series/GPT-5 model compatibility
unclecode
2026-01-16 09:56:38 +00:00
a00da6557b
Add async agenerate_schema method for schema generation
unclecode
2026-01-16 06:19:33 +00:00
177e298af0
Update security researcher acknowledgment with a hyperlink for Neo by ProjectDiscovery
ntohidi
2026-01-14 14:19:23 +01:00
f09146c435
Release v0.8.0: The v0.8.0 Update
ntohidi
2026-01-14 13:46:42 +01:00
315eae9e6f
Add examples for deep crawl crash recovery and prefetch mode in documentation
ntohidi
2026-01-14 12:58:44 +01:00
530cde351f
Add release notes for v0.8.0, detailing breaking changes, security fixes, new features, bug fixes, and documentation updates
unclecode
2026-01-12 13:45:42 +00:00
122b4fe3f0
Add release notes for v0.7.9, detailing breaking changes, security fixes, new features, bug fixes, and documentation updates
ntohidi
2026-01-12 13:46:39 +01:00
acfab80dd4
Enhance authentication flow by implementing JWT token retrieval and adding authorization headers to API requests
ntohidi
2026-01-12 13:46:32 +01:00
f24396c23e
Fix critical RCE and LFI vulnerabilities in Docker API deployment
unclecode
2026-01-12 04:14:37 +00:00
cee79a8129
feat: add force viewport screenshot
TheRedRad
2026-01-06 21:12:17 +01:00
6b2dca76c3
Docs: Add multi-sample schema generation section
unclecode
2026-01-04 12:50:08 +00:00
0d3f9e65b0
Add MEMORY.md to gitignore
unclecode
2025-12-30 03:04:30 +00:00
db61ab8559
Update URL seeder docs with smart TTL cache parameters
unclecode
2025-12-30 03:03:41 +00:00
3d78001c30
Add smart TTL cache for sitemap URL seeder
unclecode
2025-12-30 01:59:09 +00:00
2550f3d2d5
Add browser pipeline support for raw:/file:// URLs
unclecode
2025-12-27 12:32:42 +00:00
a43256b27a
Add proxy support to HTTP crawler strategy
unclecode
2025-12-26 13:17:28 +00:00