crawl4ai

Author	SHA1	Message	Date
AHMET YILMAZ	201843a204	Add comprehensive tests for anti-bot strategies and extended features - Implemented `test_adapter_verification.py` to verify correct usage of browser adapters. - Created `test_all_features.py` for a comprehensive suite covering URL seeding, adaptive crawling, browser adapters, proxy rotation, and dispatchers. - Developed `test_anti_bot_strategy.py` to validate the functionality of various anti-bot strategies. - Added `test_antibot_simple.py` for simple testing of anti-bot strategies using async web crawling. - Introduced `test_bot_detection.py` to assess adapter performance against bot detection mechanisms. - Compiled `test_final_summary.py` to provide a detailed summary of all tests and their results.	2025-10-07 18:51:13 +08:00
AHMET YILMAZ	f00e8cbf35	Add demo script for proxy rotation and quick test suite - Implemented demo_proxy_rotation.py to showcase various proxy rotation strategies and their integration with the API. - Included multiple demos demonstrating round robin, random, least used, failure-aware, and streaming strategies. - Added error handling and real-world scenario examples for e-commerce price monitoring. - Created quick_proxy_test.py to validate API integration without real proxies, testing parameter acceptance, invalid strategy rejection, and optional parameters. - Ensured both scripts provide informative output and usage instructions.	2025-10-06 13:40:38 +08:00
AHMET YILMAZ	5dc34dd210	feat: enhance crawling functionality with anti-bot strategies and headless mode options (Browser adapters , 12.Undetected/stealth browser)	2025-10-03 18:02:10 +08:00
AHMET YILMAZ	1a8e0236af	feat(adaptive-crawling): implement adaptive crawling endpoints and integrate with server	2025-10-01 15:53:56 +08:00
AHMET YILMAZ	1ea021b721	feat(api): add seed URL endpoint and related request model	2025-09-30 13:35:08 +08:00
ntohidi	fef715a891	Merge branch 'feature/docker-hooks' into develop	2025-09-25 14:11:46 +08:00
ntohidi	159207b86f	feat(docker): Add temperature and base_url parameters for LLM configuration. ref #1035 Implement hierarchical configuration for LLM parameters with support for: - Temperature control (0.0-2.0) to adjust response creativity - Custom base_url for proxy servers and alternative endpoints - 4-tier priority: request params > provider env > global env > defaults Add helper functions in utils.py, update API schemas and handlers, support environment variables (LLM_TEMPERATURE, OPENAI_TEMPERATURE, etc.), and provide comprehensive documentation with examples.	2025-08-26 16:44:07 +08:00
ntohidi	be63c98db3	feat(docker): add user-provided hooks support to Docker API Implements comprehensive hooks functionality allowing users to provide custom Python functions as strings that execute at specific points in the crawling pipeline. Key Features: - Support for all 8 crawl4ai hook points: • on_browser_created: Initialize browser settings • on_page_context_created: Configure page context • before_goto: Pre-navigation setup • after_goto: Post-navigation processing • on_user_agent_updated: User agent modification handling • on_execution_started: Crawl execution initialization • before_retrieve_html: Pre-extraction processing • before_return_html: Final HTML processing Implementation Details: - Created UserHookManager for validation, compilation, and safe execution - Added IsolatedHookWrapper for error isolation and timeout protection - AST-based validation ensures code structure correctness - Sandboxed execution with restricted builtins for security - Configurable timeout (1-120 seconds) prevents infinite loops - Comprehensive error handling ensures hooks don't crash main process - Execution tracking with detailed statistics and logging API Changes: - Added HookConfig schema with code and timeout fields - Extended CrawlRequest with optional hooks parameter - Added /hooks/info endpoint for hook discovery - Updated /crawl and /crawl/stream endpoints to support hooks Safety Features: - Malformed hooks return clear validation errors - Hook errors are isolated and reported without stopping crawl - Execution statistics track success/failure/timeout rates - All hook results are JSON-serializable Testing: - Comprehensive test suite covering all 8 hooks - Error handling and timeout scenarios validated - Authentication, performance, and content extraction examples - 100% success rate in production testing Documentation: - Added extensive hooks section to docker-deployment.md - Security warnings about user-provided code risks - Real-world examples using httpbin.org, GitHub, BBC - Best practices and troubleshooting guide ref #1377	2025-08-11 13:25:17 +08:00
ntohidi	ff6ea41ac3	feat(docker): add flexible LLM provider configuration - Support LLM_PROVIDER env var to override default provider (openai/gpt-4o-mini) - Add optional 'provider' parameter to API endpoints for per-request overrides - Implement provider validation to ensure API keys exist - Update documentation and examples with new configuration options Closes the need to hardcode providers in config.yml	2025-08-05 14:09:54 +08:00
ntohidi	e0fbd2b0a0	fix(schema): update `f` parameter description to use lowercase enum values. REF: #1070 Revised the description for the `f` parameter in the `/mcp/md` tool schema to use lowercase enum values (`raw`, `fit`, `bm25`, `llm`) for consistency with the actual `enum` definition. This change prevents LLM-based clients (e.g., Gemini via LibreChat) from generating uppercase values like `"FIT"`, which caused 422 validation errors due to strict case-sensitive matching.	2025-05-15 10:45:23 +02:00
UncleCode	94e9959fe0	feat(docker-api): add job-based polling endpoints for crawl and LLM tasks Implements new asynchronous endpoints for handling long-running crawl and LLM tasks: - POST /crawl/job and GET /crawl/job/{task_id} for crawl operations - POST /llm/job and GET /llm/job/{task_id} for LLM operations - Added Redis-based task management with configurable TTL - Moved schema definitions to dedicated schemas.py - Added example polling client demo_docker_polling.py This change allows clients to handle long-running operations asynchronously through a polling pattern rather than holding connections open.	2025-05-01 21:24:52 +08:00

11 Commits