ntohidi
be63c98db3
feat(docker): add user-provided hooks support to Docker API
Implements comprehensive hooks functionality allowing users to provide custom Python
functions as strings that execute at specific points in the crawling pipeline.
Key Features:
- Support for all 8 crawl4ai hook points:
• on_browser_created: Initialize browser settings
• on_page_context_created: Configure page context
• before_goto: Pre-navigation setup
• after_goto: Post-navigation processing
• on_user_agent_updated: User agent modification handling
• on_execution_started: Crawl execution initialization
• before_retrieve_html: Pre-extraction processing
• before_return_html: Final HTML processing
Implementation Details:
- Created UserHookManager for validation, compilation, and safe execution
- Added IsolatedHookWrapper for error isolation and timeout protection
- AST-based validation ensures code structure correctness
- Sandboxed execution with restricted builtins for security
- Configurable timeout (1-120 seconds) prevents infinite loops
- Comprehensive error handling ensures hooks don't crash main process
- Execution tracking with detailed statistics and logging
API Changes:
- Added HookConfig schema with code and timeout fields
- Extended CrawlRequest with optional hooks parameter
- Added /hooks/info endpoint for hook discovery
- Updated /crawl and /crawl/stream endpoints to support hooks
Safety Features:
- Malformed hooks return clear validation errors
- Hook errors are isolated and reported without stopping crawl
- Execution statistics track success/failure/timeout rates
- All hook results are JSON-serializable
Testing:
- Comprehensive test suite covering all 8 hooks
- Error handling and timeout scenarios validated
- Authentication, performance, and content extraction examples
- 100% success rate in production testing
Documentation:
- Added extensive hooks section to docker-deployment.md
- Security warnings about user-provided code risks
- Real-world examples using httpbin.org, GitHub, BBC
- Best practices and troubleshooting guide
ref #1377
2025-08-11 13:25:17 +08:00
..
2025-07-04 15:16:53 +08:00
2025-06-29 20:41:37 +08:00
2025-07-17 16:59:10 +08:00
2025-02-10 16:58:52 +08:00
2025-05-01 21:24:52 +08:00
2025-04-17 20:13:53 +08:00
2025-07-17 16:59:10 +08:00
2025-06-05 16:20:58 +08:00
2025-06-10 18:08:27 +08:00
2025-06-10 18:08:27 +08:00
2025-06-10 18:08:27 +08:00
2025-03-13 22:15:15 +08:00
2025-01-13 19:19:58 +08:00
2025-02-28 19:53:35 +08:00
2025-03-20 12:13:59 +08:00
2024-06-04 22:43:09 +08:00
2025-01-13 19:19:58 +08:00
2025-03-12 19:05:24 +08:00
2025-04-24 18:36:25 +08:00
2025-03-04 20:57:33 +08:00
2025-08-03 16:50:54 +08:00
2025-03-10 18:54:51 +08:00
2025-02-19 17:23:25 +08:00
2025-01-13 19:19:58 +08:00
2025-08-11 13:25:17 +08:00
2025-03-13 22:15:15 +08:00
2025-02-19 17:23:25 +08:00
2025-06-10 18:08:27 +08:00
2025-03-18 17:20:24 +05:30
2025-07-17 16:59:10 +08:00
2025-05-08 17:13:35 +08:00
2025-01-13 19:19:58 +08:00
2025-03-02 21:33:33 +08:00
2025-01-13 19:19:58 +08:00
2025-07-17 11:35:16 +02:00
2025-06-12 12:23:03 +02:00
2025-03-07 20:55:56 +08:00
2025-04-10 23:22:38 +08:00
2025-02-09 18:49:10 +08:00
2025-04-15 22:27:18 +08:00
2025-06-10 18:08:27 +08:00
2025-06-10 18:08:27 +08:00
2025-06-10 18:08:27 +08:00
2025-03-20 12:13:59 +08:00
2025-05-02 21:15:24 +08:00
2025-01-13 19:19:58 +08:00
2025-01-13 19:19:58 +08:00
2024-09-24 20:52:08 +08:00
2025-08-04 19:02:01 +08:00
2025-02-28 19:53:35 +08:00
2025-05-08 17:13:35 +08:00
2025-07-17 17:05:35 +08:00
2025-01-13 19:19:58 +08:00
2025-07-17 16:59:10 +08:00
2025-07-17 16:59:10 +08:00
2025-07-17 16:59:10 +08:00
2024-12-10 17:55:29 +08:00
2025-06-10 18:08:27 +08:00
2024-12-10 20:10:39 +08:00
2025-06-10 18:08:27 +08:00
2025-07-17 16:59:10 +08:00
2025-04-21 23:20:59 +08:00
2025-06-29 20:41:37 +08:00