Adds a new wait_for_timeout parameter to CrawlerRunConfig that allows specifying
a separate timeout for the wait_for condition, independent of the page_timeout.
This provides more granular control over waiting behaviors in the crawler.
Also removes unused colorama dependency and updates LinkedIn crawler example.
BREAKING CHANGE: LinkedIn crawler example now uses different wait_for_images timing
Add session_id feature to allow reusing browser pages across multiple crawls.
Add support for view-source: protocol in URL handling.
Fix browser config reference and string formatting issues.
Update examples to demonstrate new session management features.
BREAKING CHANGE: Browser page handling now persists when using session_id
Add new RegexExtractionStrategy for fast, zero-LLM extraction of common data types:
- Built-in patterns for emails, URLs, phones, dates, and more
- Support for custom regex patterns
- LLM-assisted pattern generation utility
- Optimized HTML preprocessing with fit_html field
- Enhanced network response body capture
Breaking changes: None
Implements new asynchronous endpoints for handling long-running crawl and LLM tasks:
- POST /crawl/job and GET /crawl/job/{task_id} for crawl operations
- POST /llm/job and GET /llm/job/{task_id} for LLM operations
- Added Redis-based task management with configurable TTL
- Moved schema definitions to dedicated schemas.py
- Added example polling client demo_docker_polling.py
This change allows clients to handle long-running operations asynchronously through a polling pattern rather than holding connections open.
Add new LinkedIn prospect discovery tool with three main components:
- c4ai_discover.py for company and people scraping
- c4ai_insights.py for org chart and decision maker analysis
- Interactive graph visualization with company/people exploration
Features include:
- Configurable LinkedIn search and scraping
- Org chart generation with decision maker scoring
- Interactive network graph visualization
- Company similarity analysis
- Chat interface for data exploration
Requires: crawl4ai, openai, sentence-transformers, networkx
Enhance browser profile handling with better process cleanup and documentation:
- Add process cleanup for existing Chromium instances on Windows/Unix
- Fix profile creation by passing complete browser config
- Add comprehensive documentation for browser and CLI components
- Add initial profile creation test
- Bump version to 0.6.3
This change improves reliability when managing browser profiles and provides better documentation for developers.
Modify BrowserConfig to respect explicit headless parameter setting instead of forcing True. Update version to 0.6.2 and clean up code formatting in examples.
BREAKING CHANGE: BrowserConfig no longer defaults to headless=True when explicitly set to False
- Add tables field to CrawlResult model while maintaining backward compatibility
- Update async_webcrawler.py to extract tables from media and pass to tables field
- Update crypto_analysis_example.py to use the new tables field
- Add /config/dump examples to demo_docker_api.py
- Bump version to 0.6.1
Improve Ask AI button with better mobile support, animations, and positioning:
- Add button animations and hover effects
- Improve mobile responsiveness
- Add icon to button
- Fix positioning logic for different viewport sizes
- Add keyboard (Escape) support
Add comprehensive v0.6.0 release documentation:
- Create detailed release notes
- Update blog index with latest release
- Document all major features and breaking changes
BREAKING CHANGE: Documentation structure updated with new v0.6.0 section
Implements a responsive hamburger menu for mobile devices with the following changes:
- Add new mobile_menu.js for handling mobile navigation
- Update layout.css with mobile-specific styles and animations
- Enhance README with updated geolocation example
- Register mobile_menu.js in mkdocs.yml
The mobile menu includes:
- Hamburger button animation
- Slide-out sidebar
- Backdrop overlay
- Touch-friendly navigation
- Proper event handling
- Update Docker base image to Python 3.12-slim-bookworm
- Bump version from 0.6.0rc1 to 0.6.0
- Update documentation to reflect release version changes
- Fix license specification in pyproject.toml and setup.py
- Clean up code formatting in demo_docker_api.py
BREAKING CHANGE: Base Python version upgraded from 3.10 to 3.12
Major updates to Docker deployment infrastructure:
- Switch default port to 11235 for all services
- Add MCP (Model Context Protocol) support with WebSocket/SSE endpoints
- Simplify docker-compose.yml with auto-platform detection
- Update documentation with new features and examples
- Consolidate configuration and improve resource management
BREAKING CHANGE: Default port changed from 8020 to 11235. Update your configurations and deployment scripts accordingly.
Add support for controlling browser geolocation, locale and timezone settings:
- New GeolocationConfig class for managing GPS coordinates
- Add locale and timezone_id parameters to CrawlerRunConfig
- Update browser context creation to handle location settings
- Add example script for geolocation usage
- Update documentation with location-based identity features
This enables more precise control over browser identity and location reporting.
Improve the Docker API demo script with better error handling, more detailed output,
and enhanced visualization:
- Add detailed error messages and stack traces for debugging
- Implement better status code handling and display
- Enhance JSON output formatting with monokai theme and word wrap
- Add depth information display for deep crawls
- Improve proxy usage reporting
- Fix port number inconsistency
No breaking changes.
Add a new example script demonstrating Docker API usage with extensive features:
- Basic crawling with single/multi URL support
- Markdown generation with various filters
- Parameter demonstrations (CSS, JS, screenshots, SSL, proxies)
- Extraction strategies using CSS and LLM
- Deep crawling capabilities with streaming
- Integration examples with proxy rotation and SSL certificate fetching
Also includes minor formatting improvements in async_webcrawler.py
Adds a new content_source parameter to MarkdownGenerationStrategy that allows
selecting which HTML content to use for markdown generation:
- cleaned_html (default): uses post-processed HTML
- raw_html: uses original webpage HTML
- fit_html: uses preprocessed HTML for schema extraction
Changes include:
- Added content_source parameter to MarkdownGenerationStrategy
- Updated AsyncWebCrawler to handle HTML source selection
- Added examples and tests for the new feature
- Updated documentation with new parameter details
BREAKING CHANGE: Renamed cleaned_html parameter to input_html in generate_markdown()
method signature to better reflect its generalized purpose
Moved ProxyConfig class from proxy_strategy.py to async_configs.py for better organization.
Improved LLM token handling with new PROVIDER_MODELS_PREFIXES.
Added test cases for deep crawling and proxy rotation.
Removed docker_config from BrowserConfig as it's handled separately.
BREAKING CHANGE: ProxyConfig import path changed from crawl4ai.proxy_strategy to crawl4ai
Add new AI assistant chat interface with features:
- Real-time chat with markdown support
- Chat history management
- Citation tracking
- Selection-to-query functionality
Also adds code copy button to documentation code blocks and adjusts layout/styling.
Breaking changes: None
Add new features to documentation UI:
- Add table of contents with scroll spy functionality
- Add GitHub repository statistics badge
- Implement new centered layout system with fixed sidebar
- Add conditional Playwright installation based on CRAWL4AI_MODE
Breaking changes: None