Commit Graph

  • 36429a63de fix: Improve comments for article metadata extraction in extract_metadata functions. ref #1105 ntohidi 2025-07-08 12:54:33 +02:00
  • a3d41c7951 fix: Clarify description of 'use_stemming' parameter in markdown generation documentation ref #1086 ntohidi 2025-07-08 12:24:33 +02:00
  • 1d6efb622d Fix proxy authentication ERR_INVALID_AUTH_CREDENTIALS Gary 2025-07-08 17:55:28 +08:00
  • fee4c5c783 fix: Consolidate import statements in local-files.md for clarity ntohidi 2025-07-08 11:46:24 +02:00
  • 0f210f6e02 Merge branch '2025-MAY-2' into next-MAY ntohidi 2025-07-08 11:46:13 +02:00
  • 1a73fb60db feat(crawl4ai): Implement adaptive crawling feature next-JUN UncleCode 2025-07-04 15:16:53 +08:00
  • 74705c1f67 Move release scripts to private .scripts folder UncleCode 2025-07-04 15:02:25 +08:00
  • 048d9b0f5b feat: Implement nightly build script and update version handling UncleCode 2025-07-03 20:53:03 +08:00
  • ee25c771d8 feat(cli): add deep crawling options with configurable strategies and max pages. ref #874 feature/nasrin-cli-deep-crawl ntohidi 2025-07-02 14:07:23 +02:00
  • a353515271 feat: Add virtual scroll support for modern web scraping UncleCode 2025-06-29 20:41:37 +08:00
  • 539a324cf6 refactor(link_extractor): remove link_extractor and rename to link_preview UncleCode 2025-06-27 21:54:22 +08:00
  • 5c9c305dbf feat: Add advanced link head extraction with three-layer scoring system (#1) UncleCode 2025-06-27 20:06:04 +08:00
  • 02f3127ded Track Stargazers (#1249) Aravind 2025-06-25 19:56:19 +05:30
  • e528086341 test(async_assistant): add new tests for extract pipeline UncleCode 2025-06-23 10:44:27 +08:00
  • 414f16e975 fix: Update pdf and screenshot usage documentation. ref #1230 ntohidi 2025-06-18 19:05:44 +02:00
  • b7a6e02236 fix: Update pdf and screenshot usage documentation. ref #1230 ntohidi 2025-06-18 19:04:32 +02:00
  • 9332326457 feat: Add PDF parsing documentation and navigation entry 2025-JUN-1 AHMET YILMAZ 2025-06-16 18:18:32 +08:00
  • 6cd34b3157 Merge branch '2025-MAY-2' of https://github.com/unclecode/crawl4ai into 2025-MAY-2 ntohidi 2025-06-13 11:26:17 +02:00
  • 871d4f1158 fix(extraction_strategy): rename response variable to content for clarity in LLMExtractionStrategy. ref #1146 ntohidi 2025-06-13 11:26:05 +02:00
  • c4d625fb3c chore(profile-test): fix filename typo ( test_crteate_profile.py → test_create_profile.py ) prokopis3 2025-06-12 14:38:32 +03:00
  • ef722766f0 fix(browser_profiler): improve keyboard input handling prokopis3 2025-06-12 14:33:12 +03:00
  • dc85481180 refactor: Update LLM extraction example with the updated structure ntohidi 2025-06-12 12:23:03 +02:00
  • 5d9213a0e9 fix: Update JavaScript execution in AsyncPlaywrightCrawlerStrategy to handle script errors and add basic download test case. ref #1215 ntohidi 2025-06-12 12:21:40 +02:00
  • c0fd36982d Update all documentation to import extraction strategies directly from crawl4ai. UncleCode 2025-06-10 18:08:27 +08:00
  • 4679ee023d fix: Enhance URLPatternFilter to enforce path boundary checks for prefix matching. ref #1003 ntohidi 2025-06-10 11:19:18 +02:00
  • f9b7090084 Merge pull request #1186 from zimmski/fix-typo-provoder Nasrin 2025-06-10 10:26:45 +02:00
  • cab457e9c7 Merge branch 'next' of https://github.com/unclecode/crawl4ai into next UncleCode 2025-06-10 15:54:20 +08:00
  • 2a0c0ed18d chore(deps): add httpx extras (#1195) UncleCode 2025-06-08 10:06:38 +02:00
  • c73a130c50 Set memory_wait_timeout default to 10 minutes (#1193) UncleCode 2025-06-08 07:53:09 +02:00
  • ef6f4329fa Add use_stemming option to BM25ContentFilter (#1192) UncleCode 2025-06-08 06:57:37 +02:00
  • 4eb90b41b6 Refactor Crawl4AI Assistant: Rename Schema Builder to Click2Crawl, update UI elements, and remove deprecated files feature/c4a-script UncleCode 2025-06-10 15:40:26 +08:00
  • 9442597f81 #1127: Improve URL handling and normalization in scraping strategies AHMET YILMAZ 2025-06-10 11:57:06 +08:00
  • 0ac12da9f3 feat: Major Chrome Extension overhaul with Click2Crawl, instant Schema extraction, and modular architecture UncleCode 2025-06-09 23:18:27 +08:00
  • 74b06d4b80 #1167 Add PHP MIME types to ContentTypeFilter for better file handling AHMET YILMAZ 2025-06-05 11:29:35 +08:00
  • 40640badad feat: add Script Builder to Chrome Extension and reorganize LLM context files UncleCode 2025-06-08 22:02:12 +08:00
  • 926592649e Add Crawl4AI Assistant Chrome Extension UncleCode 2025-06-08 18:34:05 +08:00
  • b870bfdb6c chore(deps): add httpx extras (#1195) UncleCode 2025-06-08 10:06:38 +02:00
  • f54db649c5 chore(deps): add httpx extras codex/add-httpx-and-https-http2]-packages UncleCode 2025-06-08 10:06:13 +02:00
  • 6f3a0ea38e Create "Apps" section in documentation and Add interactive c4a-script playground and LLM context builder for Crawl4AI UncleCode 2025-06-08 15:48:17 +08:00
  • 451b0d6c9a Set memory_wait_timeout default to 10 minutes (#1193) UncleCode 2025-06-08 07:53:09 +02:00
  • c8456d8a01 Set memory_wait_timeout default to 10 minutes codex/add-memory_wait_timeout-parameter-to-memoryadaptivedispatche UncleCode 2025-06-08 07:52:21 +02:00
  • 8b215e17af Add use_stemming option to BM25ContentFilter (#1192) UncleCode 2025-06-08 06:57:37 +02:00
  • 5100dd28be Add use_stemming option to BM25ContentFilter codex/add-use_stemming-parameter-to-bm25contentfiler UncleCode 2025-06-08 06:56:33 +02:00
  • b4bb0ccea0 Update simple-crawling.md UncleCode 2025-06-08 11:33:28 +08:00
  • 08a2cdae53 Add C4A-Script support and documentation UncleCode 2025-06-07 23:07:19 +08:00
  • ca03acbc82 Add some new commands for the Crawl4ai script transpiler and creating an interactive tutorial that allows users to go through multiple steps and apply the syntax to automate the page. Fixed some issues and add several new commands for setting input values, variables, clearing input fields, and more. UncleCode 2025-06-06 23:03:26 +08:00
  • 3f6f2e998c feat(script): add new scripting capabilities and documentation UncleCode 2025-06-06 17:16:53 +08:00
  • 5ac19a61d7 feat: Implement max_scroll_steps parameter for full page scanning. ref: #1168 ntohidi 2025-06-05 16:40:34 +02:00
  • 022cc2d92a fix, Typo Markus Zimmermann 2025-06-05 15:30:38 +02:00
  • e731596315 docs(tutorial_url_seeder): refine summary and next steps, enhance agentic design patterns section UncleCode 2025-06-05 16:20:58 +08:00
  • 641526af81 docs(tutorial_url_seeder): add advanced agentic patterns and implementation examples UncleCode 2025-06-05 16:07:05 +08:00
  • 82a25c037a feat(async_url_seeder): add smart URL filtering to exclude nonsense URLs UncleCode 2025-06-05 15:46:24 +08:00
  • c6fc5c0518 docs(linkdin, url_seeder): update and reorganize LinkedIn data discovery and URL seeder documentation UncleCode 2025-06-05 15:06:25 +08:00
  • b5c2732f88 Add BBC Sp0ort Research Assistant pipeline example UncleCode 2025-06-04 23:23:21 +08:00
  • 09fd3e152a fix: Import os and adjust file saving path in URL seeder demo UncleCode 2025-06-03 23:34:11 +08:00
  • 3f9424e884 Update CHANGELOG UncleCode 2025-06-03 23:27:31 +08:00
  • 3048cc1ff9 feat: Add AsyncUrlSeeder for intelligent URL discovery and filtering UncleCode 2025-06-03 23:27:12 +08:00
  • fcc2abe4db (fix): Update document about LLM extraction strategy to use LLMConfig. REF #1146 ntohidi 2025-06-03 12:53:59 +02:00
  • cc95d3abd4 Fix raw URL parsing logic to correctly handle "raw://" and "raw:" prefixes. REF #1118 ntohidi 2025-06-03 11:19:08 +02:00
  • 5ce3e682f3 Merge pull request #752 from jl-martins/fix-raw-url-parsing Nasrin 2025-06-03 11:10:29 +02:00
  • 28125c1980 Merge branch 'next' into 2025-MAY-2 ntohidi 2025-06-02 20:26:40 +02:00
  • 773ed7b281 Merge branch '2025-APR-1' into 2025-MAY-2 ntohidi 2025-06-02 20:25:58 +02:00
  • 58c1e17170 Merge branch 'main' into fix-raw-url-parsing João Martins 2025-05-30 13:03:25 +01:00
  • 4bcb7171a3 fix(browser_profiler): cross-platform 'q' to quit prokopis3 2025-05-30 14:43:18 +03:00
  • 2b3b728dcd fix(metadata): improve title extraction with fallbacks for edge cases. REF #995 feature/scraping-strategy ntohidi 2025-05-28 10:17:50 +02:00
  • bfec5156ad Refactor content scraping strategies: comment out WebScrapingStrategy references and update to use LXMLWebScrapingStrategy across multiple files. Bring WebScrapingStrategy methods to LXMLWebScrapingStrategy ntohidi 2025-05-27 17:32:45 +02:00
  • 2b2ef12e25 #1156: Refactor completion function calls to use asynchronous version feature/async-llm-extaction Ahmed-Tawfik94 2025-05-27 15:10:34 +08:00
  • b55e27d2ef fix: chanegd error variable name handle_crawl_request, docker api ntohidi 2025-05-26 11:08:23 +02:00
  • d9b3db925a Refactor extraction and completion functions to support asynchronous execution Ahmed-Tawfik94 2025-05-26 16:01:38 +08:00
  • 3b766e1aac Add Google Colab button to LinkedIn Prospect Wizard README UncleCode 2025-05-26 14:35:06 +08:00
  • c3b7b7e918 Add linkedin example ipynb. UncleCode 2025-05-25 17:55:22 +08:00
  • 7d0b447e1c Update setup script to clarify virtual display setup message UncleCode 2025-05-25 16:55:18 +08:00
  • 33b0e222ca Add Colab utilities and rename setup function for clarity UncleCode 2025-05-25 16:50:56 +08:00
  • 1fc45ffac8 Fix temperature typo and enhance LinkedIn extraction with Colab support UncleCode 2025-05-25 16:47:12 +08:00
  • 9c2cc7f73c Fix BM25ContentFilter documentation to use language parameter instead of use_stemming (#1152) devin-ai-integration[bot] 2025-05-25 10:02:13 +08:00
  • c8d28316b9 Fix BM25ContentFilter documentation to use language parameter instead of use_stemming devin/1748137705-fix-bm25contentfilter-docs Devin AI 2025-05-25 01:51:21 +00:00
  • 1c5e76d51a Adjust positioning and set only core component as selected item by default UncleCode 2025-05-24 20:49:44 +08:00
  • 7665a6832f Add LLMContext article and updte JS to not show all components. UncleCode 2025-05-24 20:46:24 +08:00
  • a06710ff03 Adding LLMContext generator to website. UncleCode 2025-05-24 20:37:09 +08:00
  • ad078c3f18 fix(pdf): add timeout to PDF downloads to prevent hanging (#1141) unclecode 2025-05-23 16:05:44 +08:00
  • 400a6621ee Add debug folder to gitignore unclecode 2025-05-23 10:43:05 +08:00
  • 3d46d89759 docs: fix https://github.com/unclecode/crawl4ai/issues/1109 Aravind Karnam 2025-05-22 17:21:42 +05:30
  • da8f0dbb93 fix(browser_profiler): change logger print to info for consistent logging in interactive manager ntohidi 2025-05-22 11:25:51 +02:00
  • 33a0c7a17a fix(logger): add RED color to LogColor enum for enhanced logging options ntohidi 2025-05-22 11:17:28 +02:00
  • bf56787874 refactor(browser): remove commented-out code for clarity UncleCode 2025-05-21 20:32:40 +08:00
  • 08ad7ef257 feat(browser): improve browser session management and profile handling UncleCode 2025-05-21 20:23:17 +08:00
  • 984524ca1c fix(auth): add token authorization header in request preparation to ensure authenticated requests are made Ahmed-Tawfik94 2025-05-21 13:26:11 +08:00
  • 1c0ce41328 Fix managed browser page retrieval when no pages (#1137) UncleCode 2025-05-20 21:12:32 +08:00
  • 0e840aea2b Fix managed browser page retrieval when no pages codex/fix-indexerror-in-browser-manager-py-with-use-managed-browse UncleCode 2025-05-20 21:06:12 +08:00
  • cb8d581e47 fix(docs): update CrawlerRunConfig to use CacheMode for bypassing cache. REF: #1125 ntohidi 2025-05-19 18:03:05 +02:00
  • a55c2b3f88 refactor(logging): update extraction logging to use url_status method Ahmed-Tawfik94 2025-05-19 16:32:22 +08:00
  • ce09648af1 Merge pull request #1054 from Sacristaan/feature/readme_example Ahmed Tawfik 2025-05-19 14:20:21 +08:00
  • a97654270b #1086 fix(markdown): update BM25 filter to use language parameter for stemming Ahmed-Tawfik94 2025-05-19 14:11:46 +08:00
  • b4fc60a555 #1103 fix(url): enhance URL normalization to handle invalid schemes and trailing slashes Ahmed-Tawfik94 2025-05-19 13:51:16 +08:00
  • 137ac014fb #1105 :fix(metadata): optimize article metadata extraction using XPath for improved performance Ahmed-Tawfik94 2025-05-19 13:48:02 +08:00
  • faa98eefbc #1105 got fixed (metadata now matches with meta property article:* Ahmed-Tawfik94 2025-05-19 11:35:13 +08:00
  • 6029097114 feat: add VNC streaming support codex/add-vnc-streaming-endpoint-to-docker-server UncleCode 2025-05-17 19:12:15 +08:00
  • 85ac6fa523 Merge branch 'next' of https://github.com/unclecode/crawl4ai into next UncleCode 2025-05-17 19:04:03 +08:00
  • becc4624bb feat(favicon): add new favicon images for improved branding UncleCode 2025-05-17 19:03:51 +08:00
  • 754ba731fa Fix chunk splitting utilities (#1122) UncleCode 2025-05-17 15:06:53 +08:00