crawl4ai

Author	SHA1	Message	Date
devin-ai-integration[bot]	9c2cc7f73c	Fix BM25ContentFilter documentation to use language parameter instead of use_stemming (#1152 ) Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: UncleCode <unclecode@kidocode.com>	2025-05-25 10:02:13 +08:00
UncleCode	1c5e76d51a	Adjust positioning and set only core component as selected item by default	2025-05-24 20:49:44 +08:00
UncleCode	7665a6832f	Add LLMContext article and updte JS to not show all components.	2025-05-24 20:46:24 +08:00
UncleCode	a06710ff03	Adding LLMContext generator to website.	2025-05-24 20:37:09 +08:00
Aravind Karnam	3d46d89759	docs: fix https://github.com/unclecode/crawl4ai/issues/1109	2025-05-22 17:21:42 +05:30
ntohidi	cb8d581e47	fix(docs): update CrawlerRunConfig to use CacheMode for bypassing cache. REF: #1125	2025-05-19 18:03:05 +02:00
Ahmed-Tawfik94	a97654270b	#1086 fix(markdown): update BM25 filter to use language parameter for stemming	2025-05-19 14:11:46 +08:00
UncleCode	becc4624bb	feat(favicon): add new favicon images for improved branding	2025-05-17 19:03:51 +08:00
UncleCode	ac9981a1f5	feat(favicon): add favicon image and update mkdocs configuration	2025-05-16 21:59:23 +08:00
UncleCode	83ef15fd47	feat(favicon): add favicon.ico for improved branding	2025-05-16 21:55:07 +08:00
UncleCode	a3cb938675	feat(theme): enable dark color mode in mkdocs configuration	2025-05-16 21:44:56 +08:00
UncleCode	9b60988232	feat(feedback): add feedback modal styles and integrate into mkdocs configuration	2025-05-16 21:25:10 +08:00
UncleCode	baca2df8df	feat(analytics): add Google Tag Manager script and gtag.js for tracking	2025-05-16 20:49:02 +08:00
Aravind Karnam	2b17f234f8	docs: update direct passing of content_filter to CrawlerRunConfig and instead pass it via MarkdownGenerator. Ref: #603	2025-05-07 15:20:36 +05:30
Aravind Karnam	39e3b792a1	Merge branch 'next' into 2025-APR-1	2025-05-07 10:25:25 +05:30
UncleCode	9b5ccac76e	feat(extraction): add RegexExtractionStrategy for pattern-based extraction Add new RegexExtractionStrategy for fast, zero-LLM extraction of common data types: - Built-in patterns for emails, URLs, phones, dates, and more - Support for custom regex patterns - LLM-assisted pattern generation utility - Optimized HTML preprocessing with fit_html field - Enhanced network response body capture Breaking changes: None	2025-05-02 21:15:24 +08:00
Aravind Karnam	094201ab2a	Merge next + resolve conflicts	2025-04-23 19:44:50 +05:30
UncleCode	ad4dfb21e1	Remoce "rc1"	2025-04-23 21:00:00 +08:00
UncleCode	7784b2468e	feat(docs): enhance Ask AI button UX and add v0.6.0 release notes Improve Ask AI button with better mobile support, animations, and positioning: - Add button animations and hover effects - Improve mobile responsiveness - Add icon to button - Fix positioning logic for different viewport sizes - Add keyboard (Escape) support Add comprehensive v0.6.0 release documentation: - Create detailed release notes - Update blog index with latest release - Document all major features and breaking changes BREAKING CHANGE: Documentation structure updated with new v0.6.0 section	2025-04-23 20:07:03 +08:00
UncleCode	37fd80e4b9	feat(docs): add mobile-friendly navigation menu Implements a responsive hamburger menu for mobile devices with the following changes: - Add new mobile_menu.js for handling mobile navigation - Update layout.css with mobile-specific styles and animations - Enhance README with updated geolocation example - Register mobile_menu.js in mkdocs.yml The mobile menu includes: - Hamburger button animation - Slide-out sidebar - Backdrop overlay - Touch-friendly navigation - Proper event handling	2025-04-23 19:44:25 +08:00
UncleCode	949a93982e	feat(docs): update documentation and disable Ask AI feature Major documentation updates including: - Add comprehensive code examples page - Add video tutorial to homepage - Update Docker deployment instructions for v0.6.0 - Temporarily disable Ask AI feature - Add table border styling - Update site version to v0.6.x BREAKING CHANGE: Ask AI feature temporarily disabled pending launch	2025-04-23 19:02:39 +08:00
UncleCode	b0aa8bc9f7	Update README	2025-04-22 23:21:42 +08:00
UncleCode	4812f08a73	feat(docker): update Docker deployment for v0.6.0 Major updates to Docker deployment infrastructure: - Switch default port to 11235 for all services - Add MCP (Model Context Protocol) support with WebSocket/SSE endpoints - Simplify docker-compose.yml with auto-platform detection - Update documentation with new features and examples - Consolidate configuration and improve resource management BREAKING CHANGE: Default port changed from 8020 to 11235. Update your configurations and deployment scripts accordingly.	2025-04-22 22:35:25 +08:00
unclecode	f3ebb38edf	Merge PR #899 into next, resolve conflicts in server.py and docs/browser-crawler-config.md	2025-04-22 14:56:47 +08:00
UncleCode	0007aea204	Update changelog	2025-04-21 23:21:49 +08:00
UncleCode	b5c25731e6	feat(browser): add geolocation, locale and timezone support Add support for controlling browser geolocation, locale and timezone settings: - New GeolocationConfig class for managing GPS coordinates - Add locale and timezone_id parameters to CrawlerRunConfig - Update browser context creation to handle location settings - Add example script for geolocation usage - Update documentation with location-based identity features This enables more precise control over browser identity and location reporting.	2025-04-21 23:20:59 +08:00
ntohidi	14a31456ef	fix(docs): update browser-crawler-config example to include LLMContentFilter and DefaultMarkdownGenerator, fix syntax errors	2025-04-21 13:59:49 +02:00
Aravind Karnam	b27bb367e8	merge next. Resolve conflicts. Fix some import errors and error handling in server.py	2025-04-19 20:27:47 +05:30
UncleCode	7db6b468d9	feat(markdown): add content source selection for markdown generation Adds a new content_source parameter to MarkdownGenerationStrategy that allows selecting which HTML content to use for markdown generation: - cleaned_html (default): uses post-processed HTML - raw_html: uses original webpage HTML - fit_html: uses preprocessed HTML for schema extraction Changes include: - Added content_source parameter to MarkdownGenerationStrategy - Updated AsyncWebCrawler to handle HTML source selection - Added examples and tests for the new feature - Updated documentation with new parameter details BREAKING CHANGE: Renamed cleaned_html parameter to input_html in generate_markdown() method signature to better reflect its generalized purpose	2025-04-17 20:13:53 +08:00
Aravind Karnam	eed7f88f29	Merge branch 'next' into 2025-MAR-ALPHA-1	2025-04-17 10:50:02 +05:30
UncleCode	230f22da86	refactor(proxy): move ProxyConfig to async_configs and improve LLM token handling Moved ProxyConfig class from proxy_strategy.py to async_configs.py for better organization. Improved LLM token handling with new PROVIDER_MODELS_PREFIXES. Added test cases for deep crawling and proxy rotation. Removed docker_config from BrowserConfig as it's handled separately. BREAKING CHANGE: ProxyConfig import path changed from crawl4ai.proxy_strategy to crawl4ai	2025-04-15 22:27:18 +08:00
UncleCode	cd7ff6f9c1	feat(docs): add AI assistant interface and code copy button Add new AI assistant chat interface with features: - Real-time chat with markdown support - Chat history management - Citation tracking - Selection-to-query functionality Also adds code copy button to documentation code blocks and adjusts layout/styling. Breaking changes: None	2025-04-14 23:00:47 +08:00
UncleCode	c56974cf59	feat(docs): enhance documentation UI with ToC and GitHub stats Add new features to documentation UI: - Add table of contents with scroll spy functionality - Add GitHub repository statistics badge - Implement new centered layout system with fixed sidebar - Add conditional Playwright installation based on CRAWL4AI_MODE Breaking changes: None	2025-04-14 20:46:32 +08:00
ntohidi	1f3b1251d0	docs(cli): add Crawl4AI CLI installation instructions to the CLI guide	2025-04-14 12:16:31 +02:00
Aravind Karnam	022f5c9e25	Merged next branch	2025-04-12 10:47:02 +05:30
UncleCode	108b2a8bfb	Fixed capturing console messages for case the url is the local file. Update docker configuration (work in progress)	2025-04-10 23:22:38 +08:00
unclecode	66ac07b4f3	feat(crawler): add network request and console message capturing Implement comprehensive network request and console message capturing functionality: - Add capture_network_requests and capture_console_messages config parameters - Add network_requests and console_messages fields to models - Implement Playwright event listeners to capture requests, responses, and console output - Create detailed documentation and examples - Add comprehensive tests This feature enables deep visibility into web page activity for debugging, security analysis, performance profiling, and API discovery in web applications.	2025-04-10 16:03:48 +08:00
UncleCode	a2061bf31e	feat(crawler): add MHTML capture functionality Add ability to capture web pages as MHTML format, which includes all page resources in a single file. This enables complete page archival and offline viewing. - Add capture_mhtml parameter to CrawlerRunConfig - Implement MHTML capture using CDP in AsyncPlaywrightCrawlerStrategy - Add mhtml field to CrawlResult and AsyncCrawlResponse models - Add comprehensive tests for MHTML capture functionality - Update documentation with MHTML capture details - Add exclude_all_images option for better memory management Breaking changes: None	2025-04-09 15:39:04 +08:00
Aravind Karnam	529a79725e	docs: remove hallucinations from docs for CrawlerRunConfig + Add chunking strategy docs in the table	2025-03-18 16:14:00 +05:30
Aravind Karnam	cbb8755972	Merge branch 'next' into 2025-MAR-ALPHA-1	2025-03-13 10:42:22 +05:30
UncleCode	9547bada3a	feat(content): add target_elements parameter for selective content extraction Adds new target_elements parameter to CrawlerRunConfig that allows more flexible content selection than css_selector. This enables focusing markdown generation and data extraction on specific elements while still processing the entire page for links and media. Key changes: - Added target_elements list parameter to CrawlerRunConfig - Modified WebScrapingStrategy and LXMLWebScrapingStrategy to handle target_elements - Updated documentation with examples and comparison between css_selector and target_elements - Fixed table extraction in content_scraping_strategy.py BREAKING CHANGE: Table extraction logic has been modified to better handle thead/tbody structures	2025-03-10 18:54:51 +08:00
UncleCode	9d69fce834	feat(scraping): add smart table extraction and analysis capabilities Add comprehensive table detection and extraction functionality to the web scraping system: - Implement intelligent table detection algorithm with scoring system - Add table extraction with support for headers, rows, captions - Update models to include tables in Media class - Add table_score_threshold configuration option - Add documentation and examples for table extraction - Include crypto analysis example demonstrating table usage This change enables users to extract structured data from HTML tables while intelligently filtering out layout tables.	2025-03-09 21:31:33 +08:00
UncleCode	4aeb7ef9ad	refactor(proxy): consolidate proxy configuration handling Moves ProxyConfig from configs/ directory into proxy_strategy.py to improve code organization and reduce fragmentation. Updates all imports and type hints to reflect the new location. Key changes: - Moved ProxyConfig class from configs/proxy_config.py to proxy_strategy.py - Updated type hints in async_configs.py to support ProxyConfig - Fixed proxy configuration handling in browser_manager.py - Updated documentation and examples to use new import path BREAKING CHANGE: ProxyConfig import path has changed from crawl4ai.configs to crawl4ai.proxy_strategy	2025-03-07 23:14:11 +08:00
UncleCode	a68cbb232b	feat(browser): add standalone CDP browser launch and lxml extraction strategy Add new features to enhance browser automation and HTML extraction: - Add CDP browser launch capability with customizable ports and profiles - Implement JsonLxmlExtractionStrategy for faster HTML parsing - Add CLI command 'crwl cdp' for launching standalone CDP browsers - Support connecting to external CDP browsers via URL - Optimize selector caching and context-sensitive queries BREAKING CHANGE: LLMConfig import path changed from crawl4ai.types to crawl4ai	2025-03-07 20:55:56 +08:00
UncleCode	baee4949d3	refactor(llm): rename LlmConfig to LLMConfig for consistency Rename LlmConfig to LLMConfig across the codebase to follow consistent naming conventions. Update all imports and usages to use the new name. Update documentation and examples to reflect the change. BREAKING CHANGE: LlmConfig has been renamed to LLMConfig. Users need to update their imports and usage.	2025-03-05 14:17:04 +08:00
Aravind Karnam	504207faa6	docs: update text in llm-strategies.md to reflect new changes in LlmConfig	2025-03-03 19:24:44 +05:30
UncleCode	d024749633	refactor(deep-crawl): add max_pages limit and improve crawl control Add max_pages parameter to all deep crawling strategies to limit total pages crawled. Add score_threshold parameter to BFS/DFS strategies for quality control. Remove legacy parameter handling in AsyncWebCrawler. Improve error handling and logging in crawl strategies. BREAKING CHANGE: Removed support for legacy parameters in AsyncWebCrawler.run_many()	2025-03-03 21:51:11 +08:00
Aravind	f14e4a4b67	Merge pull request #776 from jawshoeadan/patch-1 Fix LiteLLM branding and link	2025-03-03 19:01:30 +05:30
Aravind Karnam	1e819cdb26	fixes: https://github.com/unclecode/crawl4ai/issues/774	2025-03-03 11:53:15 +05:30
jawshoeadan	5edfea279d	Fix LiteLLM branding and link	2025-03-02 16:58:00 +01:00

1 2 3 4 5

210 Commits