- Implemented `test_adapter_verification.py` to verify correct usage of browser adapters.
- Created `test_all_features.py` for a comprehensive suite covering URL seeding, adaptive crawling, browser adapters, proxy rotation, and dispatchers.
- Developed `test_anti_bot_strategy.py` to validate the functionality of various anti-bot strategies.
- Added `test_antibot_simple.py` for simple testing of anti-bot strategies using async web crawling.
- Introduced `test_bot_detection.py` to assess adapter performance against bot detection mechanisms.
- Compiled `test_final_summary.py` to provide a detailed summary of all tests and their results.
- Implemented demo_proxy_rotation.py to showcase various proxy rotation strategies and their integration with the API.
- Included multiple demos demonstrating round robin, random, least used, failure-aware, and streaming strategies.
- Added error handling and real-world scenario examples for e-commerce price monitoring.
- Created quick_proxy_test.py to validate API integration without real proxies, testing parameter acceptance, invalid strategy rejection, and optional parameters.
- Ensured both scripts provide informative output and usage instructions.
The library no longer supports Python 3.9 and so it was important to drop all references to python 3.9.
Following changes have been made:
- pyproject.toml: set requires-python to ">=3.10"; remove 3.9 classifier
- setup.py: set python_requires to ">=3.10"; remove 3.9 classifier
- docs: update Python version mentions
- deploy/docker/c4ai-doc-context.md: options -> 3.10, 3.11, 3.12, 3.13
- Updated ProxyConfig.from_string to support multiple proxy formats, including URLs with credentials.
- Deprecated the 'proxy' parameter in BrowserConfig, replacing it with 'proxy_config' for better flexibility.
- Added warnings for deprecated usage and clarified behavior when both parameters are provided.
- Updated documentation and tests to reflect changes in proxy configuration handling.
- Return comprehensive error messages along with status codes for api internal errors.
- Fix fit_html property serialization issue in both /crawl and /crawl/stream endpoints
- Add sanitization to ensure fit_html is always JSON-serializable (string or None)
- Add comprehensive error handling test suite.
Implement hierarchical configuration for LLM parameters with support for:
- Temperature control (0.0-2.0) to adjust response creativity
- Custom base_url for proxy servers and alternative endpoints
- 4-tier priority: request params > provider env > global env > defaults
Add helper functions in utils.py, update API schemas and handlers,
support environment variables (LLM_TEMPERATURE, OPENAI_TEMPERATURE, etc.),
and provide comprehensive documentation with examples.
Previously, the system incorrectly used OPENAI_API_KEY for all LLM providers
due to a hardcoded api_key_env fallback in config.yml. This caused authentication
errors when using non-OpenAI providers like Gemini.
Changes:
- Remove api_key_env from config.yml to let litellm handle provider-specific env vars
- Simplify get_llm_api_key() to return None, allowing litellm to auto-detect keys
- Update validate_llm_provider() to trust litellm's built-in key detection
- Update documentation to reflect the new automatic key handling
The fix leverages litellm's existing capability to automatically find the correct
environment variable for each provider (OPENAI_API_KEY, GEMINI_API_TOKEN, etc.)
without manual configuration.
ref #1291
- Add raw HTML URL validation alongside http/https checks
- Fix URL preprocessing logic to handle raw: and raw:// prefixes
- Update error message and add comprehensive test cases
- Support LLM_PROVIDER env var to override default provider (openai/gpt-4o-mini)
- Add optional 'provider' parameter to API endpoints for per-request overrides
- Implement provider validation to ensure API keys exist
- Update documentation and examples with new configuration options
Closes the need to hardcode providers in config.yml
- Bump version to 0.7.0
- Add release notes and demo files
- Update README with v0.7.0 features
- Update Docker configurations for v0.7.0-r1
- Move v0.7.0 demo files to releases_review
- Fix BM25 scoring bug in URLSeeder
Major features:
- Adaptive Crawling with pattern learning
- Virtual Scroll support for infinite pages
- Link Preview with 3-layer scoring
- Async URL Seeder for massive discovery
- Performance optimizations
- Fixed widespread typo: `temprature` → `temperature` across LLMConfig and related files
- Enhanced CSS/XPath selector guidance for more reliable LinkedIn data extraction
- Added Google Colab display server support for running Crawl4AI in notebook environments
- Improved browser debugging with verbose startup args logging
- Updated LinkedIn schemas and HTML snippets for better parsing accuracy
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Revised the description for the `f` parameter in the `/mcp/md` tool schema to use lowercase enum values
(`raw`, `fit`, `bm25`, `llm`) for consistency with the actual `enum` definition. This change prevents
LLM-based clients (e.g., Gemini via LibreChat) from generating uppercase values like `"FIT"`, which
caused 422 validation errors due to strict case-sensitive matching.
Implements new asynchronous endpoints for handling long-running crawl and LLM tasks:
- POST /crawl/job and GET /crawl/job/{task_id} for crawl operations
- POST /llm/job and GET /llm/job/{task_id} for LLM operations
- Added Redis-based task management with configurable TTL
- Moved schema definitions to dedicated schemas.py
- Added example polling client demo_docker_polling.py
This change allows clients to handle long-running operations asynchronously through a polling pattern rather than holding connections open.
- Add tables field to CrawlResult model while maintaining backward compatibility
- Update async_webcrawler.py to extract tables from media and pass to tables field
- Update crypto_analysis_example.py to use the new tables field
- Add /config/dump examples to demo_docker_api.py
- Bump version to 0.6.1
- Update Docker base image to Python 3.12-slim-bookworm
- Bump version from 0.6.0rc1 to 0.6.0
- Update documentation to reflect release version changes
- Fix license specification in pyproject.toml and setup.py
- Clean up code formatting in demo_docker_api.py
BREAKING CHANGE: Base Python version upgraded from 3.10 to 3.12
Major updates to Docker deployment infrastructure:
- Switch default port to 11235 for all services
- Add MCP (Model Context Protocol) support with WebSocket/SSE endpoints
- Simplify docker-compose.yml with auto-platform detection
- Update documentation with new features and examples
- Consolidate configuration and improve resource management
BREAKING CHANGE: Default port changed from 8020 to 11235. Update your configurations and deployment scripts accordingly.