This website requires JavaScript.
36429a63de
fix: Improve comments for article metadata extraction in extract_metadata functions. ref #1105
ntohidi
2025-07-08 12:54:33 +02:00
a3d41c7951
fix: Clarify description of 'use_stemming' parameter in markdown generation documentation ref #1086
ntohidi
2025-07-08 12:24:33 +02:00
1d6efb622d
Fix proxy authentication ERR_INVALID_AUTH_CREDENTIALS
Gary
2025-07-08 17:55:28 +08:00
fee4c5c783
fix: Consolidate import statements in local-files.md for clarity
ntohidi
2025-07-08 11:46:24 +02:00
0f210f6e02
Merge branch '2025-MAY-2' into next-MAY
ntohidi
2025-07-08 11:46:13 +02:00
1a73fb60db
feat(crawl4ai): Implement adaptive crawling feature
next-JUN
UncleCode
2025-07-04 15:16:53 +08:00
74705c1f67
Move release scripts to private .scripts folder
UncleCode
2025-07-04 15:02:25 +08:00
048d9b0f5b
feat: Implement nightly build script and update version handling
UncleCode
2025-07-03 20:53:03 +08:00
ee25c771d8
feat(cli): add deep crawling options with configurable strategies and max pages. ref #874
feature/nasrin-cli-deep-crawl
ntohidi
2025-07-02 14:07:23 +02:00
a353515271
feat: Add virtual scroll support for modern web scraping
UncleCode
2025-06-29 20:41:37 +08:00
539a324cf6
refactor(link_extractor): remove link_extractor and rename to link_preview
UncleCode
2025-06-27 21:54:22 +08:00
5c9c305dbf
feat: Add advanced link head extraction with three-layer scoring system (#1 )
UncleCode
2025-06-27 20:06:04 +08:00
02f3127ded
Track Stargazers (#1249 )
Aravind
2025-06-25 19:56:19 +05:30
e528086341
test(async_assistant): add new tests for extract pipeline
UncleCode
2025-06-23 10:44:27 +08:00
414f16e975
fix: Update pdf and screenshot usage documentation. ref #1230
ntohidi
2025-06-18 19:05:44 +02:00
b7a6e02236
fix: Update pdf and screenshot usage documentation. ref #1230
ntohidi
2025-06-18 19:04:32 +02:00
9332326457
feat: Add PDF parsing documentation and navigation entry
2025-JUN-1
AHMET YILMAZ
2025-06-16 18:18:32 +08:00
6cd34b3157
Merge branch '2025-MAY-2' of https://github.com/unclecode/crawl4ai into 2025-MAY-2
ntohidi
2025-06-13 11:26:17 +02:00
871d4f1158
fix(extraction_strategy): rename response variable to content for clarity in LLMExtractionStrategy. ref #1146
ntohidi
2025-06-13 11:26:05 +02:00
c4d625fb3c
chore(profile-test): fix filename typo ( test_crteate_profile.py → test_create_profile.py )
prokopis3
2025-06-12 14:38:32 +03:00
ef722766f0
fix(browser_profiler): improve keyboard input handling
prokopis3
2025-06-12 14:33:12 +03:00
dc85481180
refactor: Update LLM extraction example with the updated structure
ntohidi
2025-06-12 12:23:03 +02:00
5d9213a0e9
fix: Update JavaScript execution in AsyncPlaywrightCrawlerStrategy to handle script errors and add basic download test case. ref #1215
ntohidi
2025-06-12 12:21:40 +02:00
c0fd36982d
Update all documentation to import extraction strategies directly from crawl4ai.
UncleCode
2025-06-10 18:08:27 +08:00
4679ee023d
fix: Enhance URLPatternFilter to enforce path boundary checks for prefix matching. ref #1003
ntohidi
2025-06-10 11:19:18 +02:00
f9b7090084
Merge pull request #1186 from zimmski/fix-typo-provoder
Nasrin
2025-06-10 10:26:45 +02:00
cab457e9c7
Merge branch 'next' of https://github.com/unclecode/crawl4ai into next
UncleCode
2025-06-10 15:54:20 +08:00
2a0c0ed18d
chore(deps): add httpx extras (#1195 )
UncleCode
2025-06-08 10:06:38 +02:00
c73a130c50
Set memory_wait_timeout default to 10 minutes (#1193 )
UncleCode
2025-06-08 07:53:09 +02:00
ef6f4329fa
Add use_stemming option to BM25ContentFilter (#1192 )
UncleCode
2025-06-08 06:57:37 +02:00
4eb90b41b6
Refactor Crawl4AI Assistant: Rename Schema Builder to Click2Crawl, update UI elements, and remove deprecated files
feature/c4a-script
UncleCode
2025-06-10 15:40:26 +08:00
9442597f81
#1127 : Improve URL handling and normalization in scraping strategies
AHMET YILMAZ
2025-06-10 11:57:06 +08:00
0ac12da9f3
feat: Major Chrome Extension overhaul with Click2Crawl, instant Schema extraction, and modular architecture
UncleCode
2025-06-09 23:18:27 +08:00
74b06d4b80
#1167 Add PHP MIME types to ContentTypeFilter for better file handling
AHMET YILMAZ
2025-06-05 11:29:35 +08:00
40640badad
feat: add Script Builder to Chrome Extension and reorganize LLM context files
UncleCode
2025-06-08 22:02:12 +08:00
926592649e
Add Crawl4AI Assistant Chrome Extension
UncleCode
2025-06-08 18:34:05 +08:00
b870bfdb6c
chore(deps): add httpx extras (#1195 )
UncleCode
2025-06-08 10:06:38 +02:00
f54db649c5
chore(deps): add httpx extras
codex/add-httpx-and-https-http2]-packages
UncleCode
2025-06-08 10:06:13 +02:00
6f3a0ea38e
Create "Apps" section in documentation and Add interactive c4a-script playground and LLM context builder for Crawl4AI
UncleCode
2025-06-08 15:48:17 +08:00
451b0d6c9a
Set memory_wait_timeout default to 10 minutes (#1193 )
UncleCode
2025-06-08 07:53:09 +02:00
c8456d8a01
Set memory_wait_timeout default to 10 minutes
codex/add-memory_wait_timeout-parameter-to-memoryadaptivedispatche
UncleCode
2025-06-08 07:52:21 +02:00
8b215e17af
Add use_stemming option to BM25ContentFilter (#1192 )
UncleCode
2025-06-08 06:57:37 +02:00
5100dd28be
Add use_stemming option to BM25ContentFilter
codex/add-use_stemming-parameter-to-bm25contentfiler
UncleCode
2025-06-08 06:56:33 +02:00
b4bb0ccea0
Update simple-crawling.md
UncleCode
2025-06-08 11:33:28 +08:00
08a2cdae53
Add C4A-Script support and documentation
UncleCode
2025-06-07 23:07:19 +08:00
ca03acbc82
Add some new commands for the Crawl4ai script transpiler and creating an interactive tutorial that allows users to go through multiple steps and apply the syntax to automate the page. Fixed some issues and add several new commands for setting input values, variables, clearing input fields, and more.
UncleCode
2025-06-06 23:03:26 +08:00
3f6f2e998c
feat(script): add new scripting capabilities and documentation
UncleCode
2025-06-06 17:16:53 +08:00
5ac19a61d7
feat: Implement max_scroll_steps parameter for full page scanning. ref: #1168
ntohidi
2025-06-05 16:40:34 +02:00
022cc2d92a
fix, Typo
Markus Zimmermann
2025-06-05 15:30:38 +02:00
e731596315
docs(tutorial_url_seeder): refine summary and next steps, enhance agentic design patterns section
UncleCode
2025-06-05 16:20:58 +08:00
641526af81
docs(tutorial_url_seeder): add advanced agentic patterns and implementation examples
UncleCode
2025-06-05 16:07:05 +08:00
82a25c037a
feat(async_url_seeder): add smart URL filtering to exclude nonsense URLs
UncleCode
2025-06-05 15:46:24 +08:00
c6fc5c0518
docs(linkdin, url_seeder): update and reorganize LinkedIn data discovery and URL seeder documentation
UncleCode
2025-06-05 15:06:25 +08:00
b5c2732f88
Add BBC Sp0ort Research Assistant pipeline example
UncleCode
2025-06-04 23:23:21 +08:00
09fd3e152a
fix: Import os and adjust file saving path in URL seeder demo
UncleCode
2025-06-03 23:34:11 +08:00
3f9424e884
Update CHANGELOG
UncleCode
2025-06-03 23:27:31 +08:00
3048cc1ff9
feat: Add AsyncUrlSeeder for intelligent URL discovery and filtering
UncleCode
2025-06-03 23:27:12 +08:00
fcc2abe4db
(fix): Update document about LLM extraction strategy to use LLMConfig. REF #1146
ntohidi
2025-06-03 12:53:59 +02:00
cc95d3abd4
Fix raw URL parsing logic to correctly handle "raw://" and "raw:" prefixes. REF #1118
ntohidi
2025-06-03 11:19:08 +02:00
5ce3e682f3
Merge pull request #752 from jl-martins/fix-raw-url-parsing
Nasrin
2025-06-03 11:10:29 +02:00
28125c1980
Merge branch 'next' into 2025-MAY-2
ntohidi
2025-06-02 20:26:40 +02:00
773ed7b281
Merge branch '2025-APR-1' into 2025-MAY-2
ntohidi
2025-06-02 20:25:58 +02:00
58c1e17170
Merge branch 'main' into fix-raw-url-parsing
João Martins
2025-05-30 13:03:25 +01:00
4bcb7171a3
fix(browser_profiler): cross-platform 'q' to quit
prokopis3
2025-05-30 14:43:18 +03:00
2b3b728dcd
fix(metadata): improve title extraction with fallbacks for edge cases. REF #995
feature/scraping-strategy
ntohidi
2025-05-28 10:17:50 +02:00
bfec5156ad
Refactor content scraping strategies: comment out WebScrapingStrategy references and update to use LXMLWebScrapingStrategy across multiple files. Bring WebScrapingStrategy methods to LXMLWebScrapingStrategy
ntohidi
2025-05-27 17:32:45 +02:00
2b2ef12e25
#1156 : Refactor completion function calls to use asynchronous version
feature/async-llm-extaction
Ahmed-Tawfik94
2025-05-27 15:10:34 +08:00
b55e27d2ef
fix: chanegd error variable name handle_crawl_request, docker api
ntohidi
2025-05-26 11:08:23 +02:00
d9b3db925a
Refactor extraction and completion functions to support asynchronous execution
Ahmed-Tawfik94
2025-05-26 16:01:38 +08:00
3b766e1aac
Add Google Colab button to LinkedIn Prospect Wizard README
UncleCode
2025-05-26 14:35:06 +08:00
c3b7b7e918
Add linkedin example ipynb.
UncleCode
2025-05-25 17:55:22 +08:00
7d0b447e1c
Update setup script to clarify virtual display setup message
UncleCode
2025-05-25 16:55:18 +08:00
33b0e222ca
Add Colab utilities and rename setup function for clarity
UncleCode
2025-05-25 16:50:56 +08:00
1fc45ffac8
Fix temperature typo and enhance LinkedIn extraction with Colab support
UncleCode
2025-05-25 16:47:12 +08:00
9c2cc7f73c
Fix BM25ContentFilter documentation to use language parameter instead of use_stemming (#1152 )
devin-ai-integration[bot]
2025-05-25 10:02:13 +08:00
c8d28316b9
Fix BM25ContentFilter documentation to use language parameter instead of use_stemming
devin/1748137705-fix-bm25contentfilter-docs
Devin AI
2025-05-25 01:51:21 +00:00
1c5e76d51a
Adjust positioning and set only core component as selected item by default
UncleCode
2025-05-24 20:49:44 +08:00
7665a6832f
Add LLMContext article and updte JS to not show all components.
UncleCode
2025-05-24 20:46:24 +08:00
a06710ff03
Adding LLMContext generator to website.
UncleCode
2025-05-24 20:37:09 +08:00
ad078c3f18
fix(pdf): add timeout to PDF downloads to prevent hanging (#1141 )
unclecode
2025-05-23 16:05:44 +08:00
400a6621ee
Add debug folder to gitignore
unclecode
2025-05-23 10:43:05 +08:00
3d46d89759
docs: fix https://github.com/unclecode/crawl4ai/issues/1109
Aravind Karnam
2025-05-22 17:21:42 +05:30
da8f0dbb93
fix(browser_profiler): change logger print to info for consistent logging in interactive manager
ntohidi
2025-05-22 11:25:51 +02:00
33a0c7a17a
fix(logger): add RED color to LogColor enum for enhanced logging options
ntohidi
2025-05-22 11:17:28 +02:00
bf56787874
refactor(browser): remove commented-out code for clarity
UncleCode
2025-05-21 20:32:40 +08:00
08ad7ef257
feat(browser): improve browser session management and profile handling
UncleCode
2025-05-21 20:23:17 +08:00
984524ca1c
fix(auth): add token authorization header in request preparation to ensure authenticated requests are made
Ahmed-Tawfik94
2025-05-21 13:26:11 +08:00
1c0ce41328
Fix managed browser page retrieval when no pages (#1137 )
UncleCode
2025-05-20 21:12:32 +08:00
0e840aea2b
Fix managed browser page retrieval when no pages
codex/fix-indexerror-in-browser-manager-py-with-use-managed-browse
UncleCode
2025-05-20 21:06:12 +08:00
cb8d581e47
fix(docs): update CrawlerRunConfig to use CacheMode for bypassing cache. REF: #1125
ntohidi
2025-05-19 18:03:05 +02:00
a55c2b3f88
refactor(logging): update extraction logging to use url_status method
Ahmed-Tawfik94
2025-05-19 16:32:22 +08:00
ce09648af1
Merge pull request #1054 from Sacristaan/feature/readme_example
Ahmed Tawfik
2025-05-19 14:20:21 +08:00
a97654270b
#1086 fix(markdown): update BM25 filter to use language parameter for stemming
Ahmed-Tawfik94
2025-05-19 14:11:46 +08:00
b4fc60a555
#1103 fix(url): enhance URL normalization to handle invalid schemes and trailing slashes
Ahmed-Tawfik94
2025-05-19 13:51:16 +08:00
137ac014fb
#1105 :fix(metadata): optimize article metadata extraction using XPath for improved performance
Ahmed-Tawfik94
2025-05-19 13:48:02 +08:00
faa98eefbc
#1105 got fixed (metadata now matches with meta property article:*
Ahmed-Tawfik94
2025-05-19 11:35:13 +08:00
6029097114
feat: add VNC streaming support
codex/add-vnc-streaming-endpoint-to-docker-server
UncleCode
2025-05-17 19:12:15 +08:00
85ac6fa523
Merge branch 'next' of https://github.com/unclecode/crawl4ai into next
UncleCode
2025-05-17 19:04:03 +08:00
becc4624bb
feat(favicon): add new favicon images for improved branding
UncleCode
2025-05-17 19:03:51 +08:00
754ba731fa
Fix chunk splitting utilities (#1122 )
UncleCode
2025-05-17 15:06:53 +08:00