crawl4ai

Author	SHA1	Message	Date
AHMET YILMAZ	00e9904609	feat: Add table extraction strategies and API documentation - Implemented table extraction strategies: default, LLM, financial, and none in utils.py. - Created new API documentation for table extraction endpoints and strategies. - Added integration tests for table extraction functionality covering various strategies and error handling. - Developed quick test script for rapid validation of table extraction features.	2025-10-17 12:30:37 +08:00
AHMET YILMAZ	3877335d89	Profiling/monitoring :Add interactive monitoring dashboard and integration tests for monitoring endpoints - Implemented an interactive monitoring dashboard in `demo_monitoring_dashboard.py` for real-time statistics, profiling session management, and system resource monitoring. - Created a quick test script `test_monitoring_quick.py` to verify the functionality of monitoring endpoints. - Developed comprehensive integration tests in `test_monitoring_endpoints.py` covering health checks, statistics, profiling sessions, and real-time streaming. - Added error handling and user-friendly output for better usability in the dashboard.	2025-10-16 16:48:13 +08:00
AHMET YILMAZ	674d0741da	feat: Add HTTP-only crawling endpoints and related models - Introduced HTTPCrawlRequest and HTTPCrawlRequestWithHooks models for HTTP-only crawling. - Implemented /crawl/http and /crawl/http/stream endpoints for fast, lightweight crawling without browser rendering. - Enhanced server.py to handle HTTP crawl requests and streaming responses. - Updated utils.py to disable memory wait timeout for testing. - Expanded API documentation to include new HTTP crawling features. - Added tests for HTTP crawling endpoints, including error handling and streaming responses.	2025-10-15 17:45:58 +08:00
AHMET YILMAZ	aebf5a3694	Add link analysis tests and integration tests for /links/analyze endpoint - Implemented `test_link_analysis` in `test_docker.py` to validate link analysis functionality. - Created `test_link_analysis.py` with comprehensive tests for link analysis, including basic functionality, configuration options, error handling, performance, and edge cases. - Added integration tests in `test_link_analysis_integration.py` to verify the /links/analyze endpoint, including health checks, authentication, and error handling.	2025-10-14 19:58:25 +08:00
AHMET YILMAZ	8cca9704eb	feat: add comprehensive type definitions and improve test coverage Add new type definitions file with extensive Union type aliases for all core components including AsyncUrlSeeder, SeedingConfig, and various crawler strategies. Enhance test coverage with improved bot detection tests, Docker-based testing, and extended features validation. The changes provide better type safety and more robust testing infrastructure for the crawling framework.	2025-10-13 18:49:01 +08:00
AHMET YILMAZ	201843a204	Add comprehensive tests for anti-bot strategies and extended features - Implemented `test_adapter_verification.py` to verify correct usage of browser adapters. - Created `test_all_features.py` for a comprehensive suite covering URL seeding, adaptive crawling, browser adapters, proxy rotation, and dispatchers. - Developed `test_anti_bot_strategy.py` to validate the functionality of various anti-bot strategies. - Added `test_antibot_simple.py` for simple testing of anti-bot strategies using async web crawling. - Introduced `test_bot_detection.py` to assess adapter performance against bot detection mechanisms. - Compiled `test_final_summary.py` to provide a detailed summary of all tests and their results.	2025-10-07 18:51:13 +08:00
AHMET YILMAZ	f00e8cbf35	Add demo script for proxy rotation and quick test suite - Implemented demo_proxy_rotation.py to showcase various proxy rotation strategies and their integration with the API. - Included multiple demos demonstrating round robin, random, least used, failure-aware, and streaming strategies. - Added error handling and real-world scenario examples for e-commerce price monitoring. - Created quick_proxy_test.py to validate API integration without real proxies, testing parameter acceptance, invalid strategy rejection, and optional parameters. - Ensured both scripts provide informative output and usage instructions.	2025-10-06 13:40:38 +08:00
AHMET YILMAZ	5dc34dd210	feat: enhance crawling functionality with anti-bot strategies and headless mode options (Browser adapters , 12.Undetected/stealth browser)	2025-10-03 18:02:10 +08:00
AHMET YILMAZ	1a8e0236af	feat(adaptive-crawling): implement adaptive crawling endpoints and integrate with server	2025-10-01 15:53:56 +08:00
AHMET YILMAZ	a62cfeebd9	feat(adaptive-crawling): implement adaptive crawling endpoints and job management	2025-09-30 18:17:40 +08:00
AHMET YILMAZ	1ea021b721	feat(api): add seed URL endpoint and related request model	2025-09-30 13:35:08 +08:00
ntohidi	fef715a891	Merge branch 'feature/docker-hooks' into develop	2025-09-25 14:11:46 +08:00
AHMET YILMAZ	a1950afd98	#1505 fix(api): update config handling to only set base config if not provided by user	2025-09-22 17:19:27 +08:00
Nasrin	3899ac3d3b	Merge pull request #1464 from unclecode/fix/proxy_deprecation Fix/proxy deprecation	2025-09-16 15:48:45 +08:00
Nasrin	f8eaf01ed1	Merge pull request #1467 from unclecode/fix/request-crawl-stream Fix: request /crawl with stream: true issue	2025-09-11 17:40:43 +08:00
AHMET YILMAZ	1874a7b8d2	fix: update option labels in request builder for clarity	2025-09-05 17:06:25 +08:00
Nasrin	0482c1eafc	Merge pull request #1469 from unclecode/fix/docker-jwt Fix(auth): Fixed Docker JWT authentication	2025-09-04 15:00:15 +08:00
AHMET YILMAZ	6a3b3e9d38	Commit without API	2025-09-03 17:02:40 +08:00
Nasrin	bc6d8147d2	Merge pull request #1451 from unclecode/fix/remove-python3.9-version Remove python 3.9 from supported versions and require Python >= 3.10	2025-09-02 16:50:40 +08:00
ntohidi	6e728096fa	fix(auth): fixed Docker JWT authentication. ref #1442	2025-09-01 12:48:16 +08:00
Soham Kukreti	70f473b84d	fix: drop Python 3.9 support and require Python >=3.10. The library no longer supports Python 3.9 and so it was important to drop all references to python 3.9. Following changes have been made: - pyproject.toml: set requires-python to ">=3.10"; remove 3.9 classifier - setup.py: set python_requires to ">=3.10"; remove 3.9 classifier - docs: update Python version mentions - deploy/docker/c4ai-doc-context.md: options -> 3.10, 3.11, 3.12, 3.13	2025-08-28 19:31:19 +05:30
AHMET YILMAZ	f7a3366f72	#1375 : refactor(proxy) Deprecate 'proxy' parameter in BrowserConfig and enhance proxy string parsing - Updated ProxyConfig.from_string to support multiple proxy formats, including URLs with credentials. - Deprecated the 'proxy' parameter in BrowserConfig, replacing it with 'proxy_config' for better flexibility. - Added warnings for deprecated usage and clarified behavior when both parameters are provided. - Updated documentation and tests to reflect changes in proxy configuration handling.	2025-08-28 17:21:49 +08:00
Soham Kukreti	2ad3fb5fc8	feat(docker): improve docker error handling - Return comprehensive error messages along with status codes for api internal errors. - Fix fit_html property serialization issue in both /crawl and /crawl/stream endpoints - Add sanitization to ensure fit_html is always JSON-serializable (string or None) - Add comprehensive error handling test suite.	2025-08-26 23:18:35 +05:30
ntohidi	159207b86f	feat(docker): Add temperature and base_url parameters for LLM configuration. ref #1035 Implement hierarchical configuration for LLM parameters with support for: - Temperature control (0.0-2.0) to adjust response creativity - Custom base_url for proxy servers and alternative endpoints - 4-tier priority: request params > provider env > global env > defaults Add helper functions in utils.py, update API schemas and handlers, support environment variables (LLM_TEMPERATURE, OPENAI_TEMPERATURE, etc.), and provide comprehensive documentation with examples.	2025-08-26 16:44:07 +08:00
ntohidi	95051020f4	fix(docker): Fix LLM API key handling for multi-provider support Previously, the system incorrectly used OPENAI_API_KEY for all LLM providers due to a hardcoded api_key_env fallback in config.yml. This caused authentication errors when using non-OpenAI providers like Gemini. Changes: - Remove api_key_env from config.yml to let litellm handle provider-specific env vars - Simplify get_llm_api_key() to return None, allowing litellm to auto-detect keys - Update validate_llm_provider() to trust litellm's built-in key detection - Update documentation to reflect the new automatic key handling The fix leverages litellm's existing capability to automatically find the correct environment variable for each provider (OPENAI_API_KEY, GEMINI_API_TOKEN, etc.) without manual configuration. ref #1291	2025-08-21 14:01:04 +08:00
Nasrin	ef174a4c7a	Merge pull request #1104 from emmanuel-ferdman/main fix(docker-api): migrate to modern datetime library API	2025-08-20 10:57:39 +08:00
Soham Kukreti	f30811b524	fix: Check for raw: and raw:// URLs before auto-appending https:// prefix - Add raw HTML URL validation alongside http/https checks - Fix URL preprocessing logic to handle raw: and raw:// prefixes - Update error message and add comprehensive test cases	2025-08-11 22:10:53 +05:30
ntohidi	be63c98db3	feat(docker): add user-provided hooks support to Docker API Implements comprehensive hooks functionality allowing users to provide custom Python functions as strings that execute at specific points in the crawling pipeline. Key Features: - Support for all 8 crawl4ai hook points: • on_browser_created: Initialize browser settings • on_page_context_created: Configure page context • before_goto: Pre-navigation setup • after_goto: Post-navigation processing • on_user_agent_updated: User agent modification handling • on_execution_started: Crawl execution initialization • before_retrieve_html: Pre-extraction processing • before_return_html: Final HTML processing Implementation Details: - Created UserHookManager for validation, compilation, and safe execution - Added IsolatedHookWrapper for error isolation and timeout protection - AST-based validation ensures code structure correctness - Sandboxed execution with restricted builtins for security - Configurable timeout (1-120 seconds) prevents infinite loops - Comprehensive error handling ensures hooks don't crash main process - Execution tracking with detailed statistics and logging API Changes: - Added HookConfig schema with code and timeout fields - Extended CrawlRequest with optional hooks parameter - Added /hooks/info endpoint for hook discovery - Updated /crawl and /crawl/stream endpoints to support hooks Safety Features: - Malformed hooks return clear validation errors - Hook errors are isolated and reported without stopping crawl - Execution statistics track success/failure/timeout rates - All hook results are JSON-serializable Testing: - Comprehensive test suite covering all 8 hooks - Error handling and timeout scenarios validated - Authentication, performance, and content extraction examples - 100% success rate in production testing Documentation: - Added extensive hooks section to docker-deployment.md - Security warnings about user-provided code risks - Real-world examples using httpbin.org, GitHub, BBC - Best practices and troubleshooting guide ref #1377	2025-08-11 13:25:17 +08:00
ntohidi	ff6ea41ac3	feat(docker): add flexible LLM provider configuration - Support LLM_PROVIDER env var to override default provider (openai/gpt-4o-mini) - Add optional 'provider' parameter to API endpoints for per-request overrides - Implement provider validation to ensure API keys exist - Update documentation and examples with new configuration options Closes the need to hardcode providers in config.yml	2025-08-05 14:09:54 +08:00
Emmanuel Ferdman	8e3c411a3e	Merge branch 'main' into main	2025-07-29 14:05:35 +03:00
ntohidi	1b6a31f88f	fix: encode PDF results to base64 in /crawl endpoint. ref #1301	2025-07-23 13:52:18 +02:00
UncleCode	0c8bb742b7	Release v0.7.0-r1: The Adaptive Intelligence Update - Bump version to 0.7.0 - Add release notes and demo files - Update README with v0.7.0 features - Update Docker configurations for v0.7.0-r1 - Move v0.7.0 demo files to releases_review - Fix BM25 scoring bug in URLSeeder Major features: - Adaptive Crawling with pattern learning - Virtual Scroll support for infinite pages - Link Preview with 3-layer scoring - Async URL Seeder for massive discovery - Performance optimizations	2025-07-12 18:51:13 +08:00
ntohidi	afe852935e	fix: show /llm API response in playground. ref #1288	2025-07-09 16:59:17 +02:00
ntohidi	0ebce590f8	Merge branch '2025-JUN-1' into next-MAY	2025-07-09 09:41:03 +02:00
ntohidi	0f210f6e02	Merge branch '2025-MAY-2' into next-MAY	2025-07-08 11:46:13 +02:00
ntohidi	b7a6e02236	fix: Update pdf and screenshot usage documentation. ref #1230	2025-06-18 19:04:32 +02:00
UncleCode	c0fd36982d	Update all documentation to import extraction strategies directly from crawl4ai.	2025-06-10 18:08:27 +08:00
Nasrin	f9b7090084	Merge pull request #1186 from zimmski/fix-typo-provoder fix, Typo	2025-06-10 10:26:45 +02:00
UncleCode	2a0c0ed18d	chore(deps): add httpx extras (#1195 )	2025-06-10 15:47:03 +08:00
UncleCode	c73a130c50	Set memory_wait_timeout default to 10 minutes (#1193 )	2025-06-10 15:47:03 +08:00
Markus Zimmermann	022cc2d92a	fix, Typo	2025-06-05 15:30:38 +02:00
ntohidi	28125c1980	Merge branch 'next' into 2025-MAY-2	2025-06-02 20:26:40 +02:00
ntohidi	773ed7b281	Merge branch '2025-APR-1' into 2025-MAY-2	2025-06-02 20:25:58 +02:00
ntohidi	b55e27d2ef	fix: chanegd error variable name handle_crawl_request, docker api	2025-05-26 11:08:23 +02:00
UncleCode	1fc45ffac8	Fix temperature typo and enhance LinkedIn extraction with Colab support - Fixed widespread typo: `temprature` → `temperature` across LLMConfig and related files - Enhanced CSS/XPath selector guidance for more reliable LinkedIn data extraction - Added Google Colab display server support for running Crawl4AI in notebook environments - Improved browser debugging with verbose startup args logging - Updated LinkedIn schemas and HTML snippets for better parsing accuracy 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-05-25 16:47:12 +08:00
ntohidi	cb8d581e47	fix(docs): update CrawlerRunConfig to use CacheMode for bypassing cache. REF: #1125	2025-05-19 18:03:05 +02:00
ntohidi	e0fbd2b0a0	fix(schema): update `f` parameter description to use lowercase enum values. REF: #1070 Revised the description for the `f` parameter in the `/mcp/md` tool schema to use lowercase enum values (`raw`, `fit`, `bm25`, `llm`) for consistency with the actual `enum` definition. This change prevents LLM-based clients (e.g., Gemini via LibreChat) from generating uppercase values like `"FIT"`, which caused 422 validation errors due to strict case-sensitive matching.	2025-05-15 10:45:23 +02:00
Emmanuel Ferdman	1e1c887a2f	fix(docker-api): migrate to modern datetime library API Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>	2025-05-13 00:04:58 -07:00
Aravind Karnam	2b17f234f8	docs: update direct passing of content_filter to CrawlerRunConfig and instead pass it via MarkdownGenerator. Ref: #603	2025-05-07 15:20:36 +05:30
UncleCode	94e9959fe0	feat(docker-api): add job-based polling endpoints for crawl and LLM tasks Implements new asynchronous endpoints for handling long-running crawl and LLM tasks: - POST /crawl/job and GET /crawl/job/{task_id} for crawl operations - POST /llm/job and GET /llm/job/{task_id} for LLM operations - Added Redis-based task management with configurable TTL - Moved schema definitions to dedicated schemas.py - Added example polling client demo_docker_polling.py This change allows clients to handle long-running operations asynchronously through a polling pattern rather than holding connections open.	2025-05-01 21:24:52 +08:00

1 2

88 Commits