ntohidi
be63c98db3
feat(docker): add user-provided hooks support to Docker API
Implements comprehensive hooks functionality allowing users to provide custom Python
functions as strings that execute at specific points in the crawling pipeline.
Key Features:
- Support for all 8 crawl4ai hook points:
• on_browser_created: Initialize browser settings
• on_page_context_created: Configure page context
• before_goto: Pre-navigation setup
• after_goto: Post-navigation processing
• on_user_agent_updated: User agent modification handling
• on_execution_started: Crawl execution initialization
• before_retrieve_html: Pre-extraction processing
• before_return_html: Final HTML processing
Implementation Details:
- Created UserHookManager for validation, compilation, and safe execution
- Added IsolatedHookWrapper for error isolation and timeout protection
- AST-based validation ensures code structure correctness
- Sandboxed execution with restricted builtins for security
- Configurable timeout (1-120 seconds) prevents infinite loops
- Comprehensive error handling ensures hooks don't crash main process
- Execution tracking with detailed statistics and logging
API Changes:
- Added HookConfig schema with code and timeout fields
- Extended CrawlRequest with optional hooks parameter
- Added /hooks/info endpoint for hook discovery
- Updated /crawl and /crawl/stream endpoints to support hooks
Safety Features:
- Malformed hooks return clear validation errors
- Hook errors are isolated and reported without stopping crawl
- Execution statistics track success/failure/timeout rates
- All hook results are JSON-serializable
Testing:
- Comprehensive test suite covering all 8 hooks
- Error handling and timeout scenarios validated
- Authentication, performance, and content extraction examples
- 100% success rate in production testing
Documentation:
- Added extensive hooks section to docker-deployment.md
- Security warnings about user-provided code risks
- Real-world examples using httpbin.org, GitHub, BBC
- Best practices and troubleshooting guide
ref #1377
2025-08-11 13:25:17 +08:00
..
2025-07-04 15:16:53 +08:00
2025-08-04 19:02:01 +08:00
2025-06-23 10:44:27 +08:00
2025-03-31 21:55:07 +08:00
2025-02-10 16:58:52 +08:00
2025-06-09 11:49:33 +08:00
2025-08-11 13:25:17 +08:00
2025-07-09 09:41:03 +02:00
2025-02-07 21:56:27 +08:00
2025-02-28 19:53:35 +08:00
2025-04-22 22:35:25 +08:00
2025-04-21 22:22:02 +08:00
2025-04-29 23:04:32 +08:00
2025-07-11 22:27:18 +08:00
2024-05-14 21:27:41 +08:00
2025-07-17 16:59:10 +08:00
2025-04-29 16:26:35 +02:00
2025-08-02 19:10:36 +08:00
2025-01-13 19:19:58 +08:00
2025-08-02 19:10:36 +08:00
2025-08-02 19:10:36 +08:00
2025-08-05 14:09:54 +08:00
2025-04-29 16:26:35 +02:00
2025-07-17 11:35:16 +02:00
2025-01-13 19:19:58 +08:00
2025-04-29 16:26:35 +02:00
2025-08-03 16:50:54 +08:00
2025-08-03 16:50:54 +08:00
2025-05-19 13:51:16 +08:00
2025-02-28 19:53:35 +08:00
2025-06-29 20:41:37 +08:00
2025-06-10 18:08:27 +08:00