crawl4ai

Author	SHA1	Message	Date
Aravind Karnam	aaf05910eb	fix: removed unnecessary imports and installs	2025-05-06 15:53:55 +05:30
Aravind Karnam	a0555d5fa6	merge:from next branch	2025-05-06 15:16:47 +05:30
Aravind Karnam	38ebcbb304	fix: provide support for local llm by adding it to the arguments	2025-05-05 10:34:38 +05:30
UncleCode	9b5ccac76e	feat(extraction): add RegexExtractionStrategy for pattern-based extraction Add new RegexExtractionStrategy for fast, zero-LLM extraction of common data types: - Built-in patterns for emails, URLs, phones, dates, and more - Support for custom regex patterns - LLM-assisted pattern generation utility - Optimized HTML preprocessing with fit_html field - Enhanced network response body capture Breaking changes: None	2025-05-02 21:15:24 +08:00
Aravind Karnam	87d4b0fff4	format bash scripts properly so copy & paste may work without issues	2025-05-02 17:21:09 +05:30
Aravind Karnam	bd5a9ac632	updated readme with arguments for litellm	2025-05-02 17:04:42 +05:30
Aravind Karnam	6650b2f34a	fix: replace openAI with litellm to support multiple llm providers	2025-05-02 16:51:15 +05:30
Aravind Karnam	5cc58f9bb3	fix: 1. duplicate verbose flag 2.inconsistency in argument name --profile-name 3. duplicate initialisaiton of env_defaults	2025-05-02 16:40:58 +05:30
Aravind Karnam	baf7f6a6f5	fix: typo in readme	2025-05-02 16:33:11 +05:30
UncleCode	94e9959fe0	feat(docker-api): add job-based polling endpoints for crawl and LLM tasks Implements new asynchronous endpoints for handling long-running crawl and LLM tasks: - POST /crawl/job and GET /crawl/job/{task_id} for crawl operations - POST /llm/job and GET /llm/job/{task_id} for LLM operations - Added Redis-based task management with configurable TTL - Moved schema definitions to dedicated schemas.py - Added example polling client demo_docker_polling.py This change allows clients to handle long-running operations asynchronously through a polling pattern rather than holding connections open.	2025-05-01 21:24:52 +08:00
Aravind Karnam	7c2fd5202e	fix: incorrect params and commands in linkedin app readme	2025-05-01 18:27:03 +05:30
UncleCode	50f0b83fcd	feat(linkedin): add prospect-wizard app with scraping and visualization Add new LinkedIn prospect discovery tool with three main components: - c4ai_discover.py for company and people scraping - c4ai_insights.py for org chart and decision maker analysis - Interactive graph visualization with company/people exploration Features include: - Configurable LinkedIn search and scraping - Org chart generation with decision maker scoring - Interactive network graph visualization - Company similarity analysis - Chat interface for data exploration Requires: crawl4ai, openai, sentence-transformers, networkx	2025-04-30 19:38:25 +08:00
UncleCode	9499164d3c	feat(browser): improve browser profile management and cleanup Enhance browser profile handling with better process cleanup and documentation: - Add process cleanup for existing Chromium instances on Windows/Unix - Fix profile creation by passing complete browser config - Add comprehensive documentation for browser and CLI components - Add initial profile creation test - Bump version to 0.6.3 This change improves reliability when managing browser profiles and provides better documentation for developers.	2025-04-29 23:04:32 +08:00
UncleCode	2140d9aca4	fix(browser): correct headless mode default behavior Modify BrowserConfig to respect explicit headless parameter setting instead of forcing True. Update version to 0.6.2 and clean up code formatting in examples. BREAKING CHANGE: BrowserConfig no longer defaults to headless=True when explicitly set to False	2025-04-26 21:09:50 +08:00
UncleCode	ccec40ed17	feat(models): add dedicated tables field to CrawlResult - Add tables field to CrawlResult model while maintaining backward compatibility - Update async_webcrawler.py to extract tables from media and pass to tables field - Update crypto_analysis_example.py to use the new tables field - Add /config/dump examples to demo_docker_api.py - Bump version to 0.6.1	2025-04-24 18:36:25 +08:00
UncleCode	ad4dfb21e1	Remoce "rc1"	2025-04-23 21:00:00 +08:00
UncleCode	7784b2468e	feat(docs): enhance Ask AI button UX and add v0.6.0 release notes Improve Ask AI button with better mobile support, animations, and positioning: - Add button animations and hover effects - Improve mobile responsiveness - Add icon to button - Fix positioning logic for different viewport sizes - Add keyboard (Escape) support Add comprehensive v0.6.0 release documentation: - Create detailed release notes - Update blog index with latest release - Document all major features and breaking changes BREAKING CHANGE: Documentation structure updated with new v0.6.0 section	2025-04-23 20:07:03 +08:00
UncleCode	146f9d415f	Update README	2025-04-23 19:50:33 +08:00
UncleCode	37fd80e4b9	feat(docs): add mobile-friendly navigation menu Implements a responsive hamburger menu for mobile devices with the following changes: - Add new mobile_menu.js for handling mobile navigation - Update layout.css with mobile-specific styles and animations - Enhance README with updated geolocation example - Register mobile_menu.js in mkdocs.yml The mobile menu includes: - Hamburger button animation - Slide-out sidebar - Backdrop overlay - Touch-friendly navigation - Proper event handling	2025-04-23 19:44:25 +08:00
UncleCode	949a93982e	feat(docs): update documentation and disable Ask AI feature Major documentation updates including: - Add comprehensive code examples page - Add video tutorial to homepage - Update Docker deployment instructions for v0.6.0 - Temporarily disable Ask AI feature - Add table border styling - Update site version to v0.6.x BREAKING CHANGE: Ask AI feature temporarily disabled pending launch	2025-04-23 19:02:39 +08:00
UncleCode	c4f5651199	chore(deps): upgrade to Python 3.12 and prepare for 0.6.0 release - Update Docker base image to Python 3.12-slim-bookworm - Bump version from 0.6.0rc1 to 0.6.0 - Update documentation to reflect release version changes - Fix license specification in pyproject.toml and setup.py - Clean up code formatting in demo_docker_api.py BREAKING CHANGE: Base Python version upgraded from 3.10 to 3.12	2025-04-23 16:35:15 +08:00
UncleCode	b0aa8bc9f7	Update README	2025-04-22 23:21:42 +08:00
UncleCode	4812f08a73	feat(docker): update Docker deployment for v0.6.0 Major updates to Docker deployment infrastructure: - Switch default port to 11235 for all services - Add MCP (Model Context Protocol) support with WebSocket/SSE endpoints - Simplify docker-compose.yml with auto-platform detection - Update documentation with new features and examples - Consolidate configuration and improve resource management BREAKING CHANGE: Default port changed from 8020 to 11235. Update your configurations and deployment scripts accordingly.	2025-04-22 22:35:25 +08:00
unclecode	f3ebb38edf	Merge PR #899 into next, resolve conflicts in server.py and docs/browser-crawler-config.md	2025-04-22 14:56:47 +08:00
UncleCode	0007aea204	Update changelog	2025-04-21 23:21:49 +08:00
UncleCode	b5c25731e6	feat(browser): add geolocation, locale and timezone support Add support for controlling browser geolocation, locale and timezone settings: - New GeolocationConfig class for managing GPS coordinates - Add locale and timezone_id parameters to CrawlerRunConfig - Update browser context creation to handle location settings - Add example script for geolocation usage - Update documentation with location-based identity features This enables more precise control over browser identity and location reporting.	2025-04-21 23:20:59 +08:00
Aravind Karnam	b27bb367e8	merge next. Resolve conflicts. Fix some import errors and error handling in server.py	2025-04-19 20:27:47 +05:30
UncleCode	3bf78ff47a	refactor(docker-demo): enhance error handling and output formatting Improve the Docker API demo script with better error handling, more detailed output, and enhanced visualization: - Add detailed error messages and stack traces for debugging - Implement better status code handling and display - Enhance JSON output formatting with monokai theme and word wrap - Add depth information display for deep crawls - Improve proxy usage reporting - Fix port number inconsistency No breaking changes.	2025-04-17 22:32:58 +08:00
UncleCode	fd899f66aa	Merge branch 'next-fix-markdown-source' into next	2025-04-17 20:16:15 +08:00
UncleCode	30ec4f571f	feat(docs): add comprehensive Docker API demo script Add a new example script demonstrating Docker API usage with extensive features: - Basic crawling with single/multi URL support - Markdown generation with various filters - Parameter demonstrations (CSS, JS, screenshots, SSL, proxies) - Extraction strategies using CSS and LLM - Deep crawling capabilities with streaming - Integration examples with proxy rotation and SSL certificate fetching Also includes minor formatting improvements in async_webcrawler.py	2025-04-17 20:16:11 +08:00
UncleCode	7db6b468d9	feat(markdown): add content source selection for markdown generation Adds a new content_source parameter to MarkdownGenerationStrategy that allows selecting which HTML content to use for markdown generation: - cleaned_html (default): uses post-processed HTML - raw_html: uses original webpage HTML - fit_html: uses preprocessed HTML for schema extraction Changes include: - Added content_source parameter to MarkdownGenerationStrategy - Updated AsyncWebCrawler to handle HTML source selection - Added examples and tests for the new feature - Updated documentation with new parameter details BREAKING CHANGE: Renamed cleaned_html parameter to input_html in generate_markdown() method signature to better reflect its generalized purpose	2025-04-17 20:13:53 +08:00
Aravind Karnam	eed7f88f29	Merge branch 'next' into 2025-MAR-ALPHA-1	2025-04-17 10:50:02 +05:30
UncleCode	230f22da86	refactor(proxy): move ProxyConfig to async_configs and improve LLM token handling Moved ProxyConfig class from proxy_strategy.py to async_configs.py for better organization. Improved LLM token handling with new PROVIDER_MODELS_PREFIXES. Added test cases for deep crawling and proxy rotation. Removed docker_config from BrowserConfig as it's handled separately. BREAKING CHANGE: ProxyConfig import path changed from crawl4ai.proxy_strategy to crawl4ai	2025-04-15 22:27:18 +08:00
UncleCode	cd7ff6f9c1	feat(docs): add AI assistant interface and code copy button Add new AI assistant chat interface with features: - Real-time chat with markdown support - Chat history management - Citation tracking - Selection-to-query functionality Also adds code copy button to documentation code blocks and adjusts layout/styling. Breaking changes: None	2025-04-14 23:00:47 +08:00
UncleCode	c56974cf59	feat(docs): enhance documentation UI with ToC and GitHub stats Add new features to documentation UI: - Add table of contents with scroll spy functionality - Add GitHub repository statistics badge - Implement new centered layout system with fixed sidebar - Add conditional Playwright installation based on CRAWL4AI_MODE Breaking changes: None	2025-04-14 20:46:32 +08:00
Aravind Karnam	022f5c9e25	Merged next branch	2025-04-12 10:47:02 +05:30
UncleCode	108b2a8bfb	Fixed capturing console messages for case the url is the local file. Update docker configuration (work in progress)	2025-04-10 23:22:38 +08:00
unclecode	66ac07b4f3	feat(crawler): add network request and console message capturing Implement comprehensive network request and console message capturing functionality: - Add capture_network_requests and capture_console_messages config parameters - Add network_requests and console_messages fields to models - Implement Playwright event listeners to capture requests, responses, and console output - Create detailed documentation and examples - Add comprehensive tests This feature enables deep visibility into web page activity for debugging, security analysis, performance profiling, and API discovery in web applications.	2025-04-10 16:03:48 +08:00
UncleCode	a2061bf31e	feat(crawler): add MHTML capture functionality Add ability to capture web pages as MHTML format, which includes all page resources in a single file. This enables complete page archival and offline viewing. - Add capture_mhtml parameter to CrawlerRunConfig - Implement MHTML capture using CDP in AsyncPlaywrightCrawlerStrategy - Add mhtml field to CrawlResult and AsyncCrawlResponse models - Add comprehensive tests for MHTML capture functionality - Update documentation with MHTML capture details - Add exclude_all_images option for better memory management Breaking changes: None	2025-04-09 15:39:04 +08:00
UncleCode	9038e9acbd	Merge branch 'main' into next	2025-04-08 17:43:42 +08:00
UncleCode	e1d9e2489c	refactor(docs): update import statement in quickstart.py for improved clarity	2025-04-05 23:12:06 +08:00
UncleCode	b1693b1c21	Remove old quickstart files	2025-04-05 23:10:25 +08:00
UncleCode	49d904ca0a	refactor(docs): enhance quickstart_examples.py with improved configuration and file handling	2025-04-05 22:57:45 +08:00
UncleCode	ca9351252a	refactor(docs): update import paths and clean up example code in quickstart_examples.py	2025-04-05 22:55:56 +08:00
UncleCode	935d9d39f8	Add quickstart example set	2025-04-05 21:37:25 +08:00
Aravind Karnam	9e16a4bb26	Merge next and resolve conflicts	2025-04-02 12:18:23 +05:30
UncleCode	c635f6b9a2	refactor(browser): reorganize browser strategies and improve Docker implementation Reorganize browser strategy code into separate modules for better maintainability and separation of concerns. Improve Docker implementation with: - Add Alpine and Debian-based Dockerfiles for better container options - Enhance Docker registry to share configuration with BuiltinBrowserStrategy - Add CPU and memory limits to container configuration - Improve error handling and logging - Update documentation and examples BREAKING CHANGE: DockerConfig, DockerRegistry, and DockerUtils have been moved to new locations and their APIs have been updated.	2025-03-27 21:35:13 +08:00
Aravind Karnam	efa73257c5	Merge branch 'next' into 2025-MAR-ALPHA-1	2025-03-24 21:57:29 +05:30
UncleCode	4ab0893ffb	feat(browser): implement modular browser management system Adds a new browser management system with strategy pattern implementation: - Introduces BrowserManager class with strategy pattern support - Adds PlaywrightBrowserStrategy, CDPBrowserStrategy, and BuiltinBrowserStrategy - Implements BrowserProfileManager for profile management - Adds PagePoolConfig for browser page pooling - Includes comprehensive test suite for all browser strategies BREAKING CHANGE: Browser management has been moved to browser/ module. Direct usage of browser_manager.py and browser_profiler.py is deprecated.	2025-03-21 22:50:00 +08:00
Aravind Karnam	8cecbec7a7	Merge branch 'next' into 2025-MAR-ALPHA-1	2025-03-20 17:07:53 +05:30

1 2 3 4 5 ...

252 Commits