Commit Graph

  • 11b310edef Merge pull request #1378 from unclecode/fix/exit_with_q Nasrin 2025-08-13 14:16:47 +08:00
  • 489981e670 Merge pull request #1390 from unclecode/fix/docker-raw-html Nasrin 2025-08-13 13:56:33 +08:00
  • b92be4ef66 Merge pull request #1371 from unclecode/bug/proxy_config Nasrin 2025-08-12 16:55:52 +08:00
  • 7c0edaf266 Merge pull request #1384 from unclecode/fix/update_docker_examples Nasrin 2025-08-12 16:53:42 +08:00
  • dfcfd8ae57 fix(dispatcher): enable true concurrency for fast-completing tasks in arun_many. REF: #560 ntohidi 2025-08-12 16:51:22 +08:00
  • 955110a8b0 Merge branch 'develop' of https://github.com/unclecode/crawl4ai into develop ntohidi 2025-08-12 12:22:25 +08:00
  • f30811b524 fix: Check for raw: and raw:// URLs before auto-appending https:// prefix - Add raw HTML URL validation alongside http/https checks - Fix URL preprocessing logic to handle raw: and raw:// prefixes - Update error message and add comprehensive test cases Soham Kukreti 2025-08-11 22:10:53 +05:30
  • 8146d477e9 Merge branch 'main' into develop ntohidi 2025-08-11 18:56:15 +08:00
  • 96c4b0de67 fix(browser_manager): serialize new_page on persistent context to avoid races ref #1198 - Add _page_lock and guarded creation; handle empty context.pages safely - Prevents BrowserContext.new_page “Target page/context closed” during concurrent arun_many ntohidi 2025-08-11 18:55:43 +08:00
  • 57c14db7cb Merge pull request #1381 from unclecode/fix/base-tag-link-resolution Nasrin 2025-08-11 18:32:32 +08:00
  • 88a9fbbb7e fix(deep-crawl): BestFirst priority inversion; remove pre-scoring truncation. ref #1253 fix/deep-crawl-scoring ntohidi 2025-08-11 18:16:57 +08:00
  • be63c98db3 feat(docker): add user-provided hooks support to Docker API feature/docker-hooks ntohidi 2025-08-11 13:25:17 +08:00
  • cd2dd68e4c docs: remove CRAWL4AI_API_TOKEN references and use correct endpoints in Docker example scripts (#1015) Soham Kukreti 2025-08-09 19:15:11 +05:30
  • f0ce7b2710 feat: add v0.7.3 release notes, changelog updates, and documentation for new features UncleCode 2025-08-09 21:04:18 +08:00
  • 21f79fe166 Release v0.7.3: Merge release branch v0.7.3 UncleCode 2025-08-09 20:11:35 +08:00
  • a9a2d798b4 feat: update sponsorship tier details and add custom arrangements note release/v0.7.3 unclecode 2025-08-09 20:10:32 +08:00
  • 612270fcb0 feat: add scheduling link to contact information in SPONSORS.md unclecode 2025-08-09 20:05:59 +08:00
  • bc099fdd76 Merge branch 'main' into release/v0.7.3 unclecode 2025-08-09 19:30:46 +08:00
  • 18504d782e Add Founding Sponsors section and update README with detailed project information unclecode 2025-08-09 19:11:32 +08:00
  • ad547607b9 feat: add GitHub Sponsors support with 4 tiers unclecode 2025-08-09 17:57:47 +08:00
  • 18ad3ef159 fix: Implement base tag support in link extraction (#1147) - Extract base href from <head><base> tag using XPath in _process_element method - Use base URL as the primary URL for link normalization when present - Add error handling with logging for malformed or problematic base tags - Maintain backward compatibility when no base tag is present - Add test to verify the functionality of the base tag extraction. Soham Kukreti 2025-08-08 20:00:11 +05:30
  • b61b2ee676 feat(browser-profiler): implement cross-platform keyboard listeners and improve quit handling AHMET YILMAZ 2025-08-08 11:18:34 +08:00
  • 0541b61405 feat(browser-profiler): implement cross-platform keyboard listeners and improve quit handling fix/exit_with_q AHMET YILMAZ 2025-08-08 11:18:34 +08:00
  • 66925eb1d6 fix(deep_crawling): fix priority queue ordering and link truncation in BestFirstCrawlingStrategy - ref #1253 fix/deep-crawl-scoring-priority ntohidi 2025-08-07 15:28:43 +08:00
  • 89cf5aba2b #1057 : enhance ProxyConfig initialization to support dict and string formats bug/proxy_config AHMET YILMAZ 2025-08-06 18:34:23 +08:00
  • 6b0b5301ba Release v0.7.3: ntohidi 2025-08-06 17:52:01 +08:00
  • 7a8190ecb6 Fix examples in README.md Nezar Ali 2025-08-06 11:58:29 +03:00
  • 64f37792a7 Merge pull request #1170 from prokopis3/fix/create-profile Nasrin 2025-08-06 16:29:14 +08:00
  • 6735c68288 Merge pull request #1170 from prokopis3/fix/create-profile Nasrin 2025-08-06 16:29:14 +08:00
  • a5bcac4c9d feat(docs): enhance table data access example with a real url ntohidi 2025-08-06 15:19:37 +08:00
  • 45d8327d23 Merge pull request #1366 from unclecode/fix/update-tables-documentation Nasrin 2025-08-06 15:15:24 +08:00
  • 437395e490 Merge branch 'feat/undetected-browser' into develop-future ntohidi 2025-08-06 15:03:30 +08:00
  • fddae303fb docs: Update README.md and modify Media and Tables Documentation.(#1271) - Update Table-to-DataFrame Extraction example in README.md - Replace old method of accessing tables via result.media directly with result.tables in the documentation - Remove tables section from links & media page. - Add tables section to crawler result page. Soham Kukreti 2025-08-05 23:23:17 +05:30
  • 660d7011b9 In obtaining cleaned_html, the tag "script" needs to be processed separately. lizhuxiong 2025-08-05 16:27:03 +08:00
  • 6d3444ba17 In obtaining cleaned_html, the tag "script" needs to be processed separately. lizhuxiong 2025-08-05 16:18:34 +08:00
  • ff6ea41ac3 feat(docker): add flexible LLM provider configuration ntohidi 2025-08-05 14:09:54 +08:00
  • 31a435fb0e Merge branch 'develop' of https://github.com/unclecode/crawl4ai into develop ntohidi 2025-08-04 19:12:19 +08:00
  • 5de6a28055 Merge pull request #1361 from unclecode/fix/crawler-result-docs Nasrin 2025-08-04 19:12:09 +08:00
  • de1561ad14 Merge branch 'develop' of https://github.com/unclecode/crawl4ai into develop ntohidi 2025-08-04 19:04:50 +08:00
  • 337b588732 Merge pull request #1358 from shonenada/patch-1 Nasrin 2025-08-04 19:04:42 +08:00
  • 7a6ad547f0 Squashed commit of the following: ntohidi 2025-08-04 19:02:01 +08:00
  • e6692b987d docs: Update CrawlResult documentation with missing fields. - Add missing fields: fit_html, js_execution_result, redirected_url, network_requests, console_messages, tables Soham Kukreti 2025-08-04 15:40:33 +05:30
  • 307fe28b32 fix: Correct URL matcher fallback behavior and improve memory monitoring ntohidi 2025-08-03 16:50:54 +08:00
  • 438a103b17 Fix typos in examples.md Yaoda Liu 2025-08-03 14:33:10 +08:00
  • a03e68fa2f feat: Add URL-specific crawler configurations for multi-URL crawling ntohidi 2025-08-02 19:10:36 +08:00
  • 864d87afb2 Merge pull request #1339 from charlaie/fix-sitemap-redirect Nasrin 2025-07-31 15:21:03 +08:00
  • 508b6fc233 fix: Enable following redirects in sitemap fetching for seeder Charlie C 2025-07-25 15:57:09 +08:00
  • 8a906fcad0 fix(dependencies): Update and clean up package versions in pyproject.toml, the bundle size will be much smaller. next UncleCode 2025-07-29 19:56:27 +08:00
  • 54ae10d957 feat(extraction_strategy): Enhance schema generation with improved validation and task description handling UncleCode 2025-07-29 19:33:36 +08:00
  • 8e3c411a3e Merge branch 'main' into main Emmanuel Ferdman 2025-07-29 14:05:35 +03:00
  • e3281935bc fix: Add write permissions for GitHub release creation UncleCode 2025-07-25 18:22:45 +08:00
  • 48647300b4 chore: Bump version to 0.7.2 v0.7.2 release/v0.7.2 UncleCode 2025-07-25 17:42:48 +08:00
  • 9f9ea3bb3b chore: Clean up test artifacts and disable test workflow release/v0.7.1 UncleCode 2025-07-25 17:31:52 +08:00
  • d58b93c207 fix: Re-enable multi-platform Docker builds for ARM64 support UncleCode 2025-07-25 16:38:11 +08:00
  • e2b4705010 fix: Use hardcoded Docker repository name to avoid masking issues UncleCode 2025-07-25 15:52:26 +08:00
  • 4a1abd5086 fix: Handle existing version on Test PyPI gracefully UncleCode 2025-07-25 15:41:16 +08:00
  • 04258cd4f2 fix: Speed up Docker test builds by using single platform and caching UncleCode 2025-07-25 15:37:44 +08:00
  • 84e462d9f8 Merge remote-tracking branch 'origin/develop' UncleCode 2025-07-25 15:35:53 +08:00
  • 9546773a07 fix: Move sentence-transformers to optional dependencies UncleCode 2025-07-24 21:24:40 +08:00
  • 66a979ad11 fix: Install dependencies before version check in workflows UncleCode 2025-07-24 21:01:36 +08:00
  • 0c31e91b53 feat: Add CI/CD workflows for automated PyPI and Docker releases UncleCode 2025-07-24 20:58:43 +08:00
  • 843457a9cb Refactor adaptive crawling state management UncleCode 2025-07-24 20:11:43 +08:00
  • 1b6a31f88f fix: encode PDF results to base64 in /crawl endpoint. ref #1301 ntohidi 2025-07-23 13:52:18 +02:00
  • b8c261780f Merge pull request #1319 from volumetric/fix_for_bug_#1310 Nasrin 2025-07-23 12:45:12 +02:00
  • db6ad7a79d fix: update links in README and C4A-Script documentation for accuracy ntohidi 2025-07-23 09:47:18 +02:00
  • 004d514f33 Merge pull request #1265 from unclecode/feature/nasrin-cli-deep-crawl Nasrin 2025-07-23 09:40:33 +02:00
  • d1de82a332 feat(crawl4ai): Implement SMART cache mode UncleCode 2025-07-21 21:19:37 +08:00
  • 8a04351406 feat(crawl4ai): Update to version 0.7.1 with improvements and new tests UncleCode 2025-07-18 16:27:19 +08:00
  • 3a9e2c716e Remvoed the incorrect reference in browser_config variable Vinit Agrawal 2025-07-18 10:01:00 +05:30
  • 0163bd797c Merge branch 'release/v0.7.1' v0.7.1 unclecode 2025-07-17 17:42:04 +08:00
  • 26bad799e4 chore: update version to 0.7.1 ntohidi 2025-07-17 11:37:41 +02:00
  • cf8badfe27 feat: cleanup unused code and enhance documentation for v0.7.1 ntohidi 2025-07-17 11:35:16 +02:00
  • 805c498adf docs: add simple anti-bot examples feat/undetected-browser unclecode 2025-07-17 17:05:35 +08:00
  • 6a728cbe5b feat: add stealth mode and enhance undetected browser support unclecode 2025-07-17 16:59:10 +08:00
  • ccbe3c105c refactor: improve link scoring output format in release notes ntohidi 2025-07-17 09:13:20 +02:00
  • 761c19d54b Merge pull request #1307 from unclecode/fix/json-infinity-serialization Nasrin 2025-07-16 13:34:25 +02:00
  • 14b0ecb137 Merge pull request #1305 from unclecode/fix/release-notes-demo-code Nasrin 2025-07-16 13:33:53 +02:00
  • 65902a4773 feat: Enhance stealth compatibility with new and legacy APIs, add configuration support fix/playwright-stealth AHMET YILMAZ 2025-07-16 17:41:47 +08:00
  • 0eaa9f9895 fix: handle infinity values in JSON serialization for API responses fix/json-infinity-serialization ntohidi 2025-07-15 13:49:07 +02:00
  • 1d1970ae69 docs: Update release notes and docs for v0.7.0 with teh correct parameters and explanations fix/release-notes-demo-code ntohidi 2025-07-15 11:32:04 +02:00
  • 205df1e330 docs: Fix virtual scroll configuration ntohidi 2025-07-15 10:29:47 +02:00
  • 2640dc73a5 docs: Enhance session management example for dynamic content crawling with improved JavaScript handling and extraction schema. ref #226 ntohidi 2025-07-15 10:19:29 +02:00
  • 58024755c5 docs: Update adaptive crawling parameters and examples in README and release notes ntohidi 2025-07-15 10:15:05 +02:00
  • 5c13baf574 feat: Add stealth option to BrowserConfig for enhanced browser behavior AHMET YILMAZ 2025-07-15 15:48:23 +08:00
  • d2759824ef fix: Update playwright-stealth to v2.0.0+ compatibility AHMET YILMAZ 2025-07-15 15:09:53 +08:00
  • 5c33cbcca2 feat: add undetected browser support with adapter pattern unclecode 2025-07-14 17:29:50 +08:00
  • 83b323f13a fix VersionManager not using CRAWL4_AI_BASE_DIRECTORY Vladimir Mandic 2025-07-12 17:40:34 -04:00
  • dd5ee752cf docs: Add missing documentation pages to mkdocs.yml UncleCode 2025-07-12 19:58:26 +08:00
  • bde1bba6a2 docs: Add missing documentation pages to mkdocs.yml UncleCode 2025-07-12 19:56:33 +08:00
  • 7b80eb6b99 docs: Add missing documentation pages to mkdocs.yml UncleCode 2025-07-12 19:55:35 +08:00
  • 14f690d751 docs: Update documentation for v0.7.0 release UncleCode 2025-07-12 19:08:17 +08:00
  • 7b9ba3015f Merge branch 'release/v0.7.0' - The Adaptive Intelligence Update v0.7.0 UncleCode 2025-07-12 18:54:20 +08:00
  • 0c8bb742b7 Release v0.7.0-r1: The Adaptive Intelligence Update release/v0.7.0 UncleCode 2025-07-12 18:51:13 +08:00
  • ba2ed53ff1 test(releases): Add test cases for release 0.7.0 UncleCode 2025-07-11 22:27:18 +08:00
  • a93efcb650 Merge PR #1285: 2025 APR, MAY, and JUN bug fixes UncleCode 2025-07-11 21:22:34 +08:00
  • 8794852a26 Merge PR #1285: 2025 APR, MAY, and JUN bug fixes UncleCode 2025-07-11 21:22:03 +08:00
  • fb25a4a769 docs(examples): update crawl4ai showcase script UncleCode 2025-07-11 20:55:37 +08:00
  • afe852935e fix: show /llm API response in playground. ref #1288 next-MAY ntohidi 2025-07-09 16:59:17 +02:00
  • 0ebce590f8 Merge branch '2025-JUN-1' into next-MAY ntohidi 2025-07-09 09:41:03 +02:00
  • 026e96a2df feat: Add social media and community links to README and index documentation ntohidi 2025-07-08 15:48:40 +02:00