ntohidi
be63c98db3
feat(docker): add user-provided hooks support to Docker API
Implements comprehensive hooks functionality allowing users to provide custom Python
functions as strings that execute at specific points in the crawling pipeline.
Key Features:
- Support for all 8 crawl4ai hook points:
• on_browser_created: Initialize browser settings
• on_page_context_created: Configure page context
• before_goto: Pre-navigation setup
• after_goto: Post-navigation processing
• on_user_agent_updated: User agent modification handling
• on_execution_started: Crawl execution initialization
• before_retrieve_html: Pre-extraction processing
• before_return_html: Final HTML processing
Implementation Details:
- Created UserHookManager for validation, compilation, and safe execution
- Added IsolatedHookWrapper for error isolation and timeout protection
- AST-based validation ensures code structure correctness
- Sandboxed execution with restricted builtins for security
- Configurable timeout (1-120 seconds) prevents infinite loops
- Comprehensive error handling ensures hooks don't crash main process
- Execution tracking with detailed statistics and logging
API Changes:
- Added HookConfig schema with code and timeout fields
- Extended CrawlRequest with optional hooks parameter
- Added /hooks/info endpoint for hook discovery
- Updated /crawl and /crawl/stream endpoints to support hooks
Safety Features:
- Malformed hooks return clear validation errors
- Hook errors are isolated and reported without stopping crawl
- Execution statistics track success/failure/timeout rates
- All hook results are JSON-serializable
Testing:
- Comprehensive test suite covering all 8 hooks
- Error handling and timeout scenarios validated
- Authentication, performance, and content extraction examples
- 100% success rate in production testing
Documentation:
- Added extensive hooks section to docker-deployment.md
- Security warnings about user-provided code risks
- Real-world examples using httpbin.org, GitHub, BBC
- Best practices and troubleshooting guide
ref #1377
2025-08-11 13:25:17 +08:00
..
2025-07-15 11:32:04 +02:00
2025-04-14 23:00:47 +08:00
2025-08-06 15:03:30 +08:00
2025-07-23 09:47:18 +02:00
2025-01-07 20:49:50 +08:00
2025-04-14 12:16:31 +02:00
2025-08-04 19:02:01 +08:00
2025-08-06 15:19:37 +08:00
2025-03-03 21:51:11 +08:00
2025-08-11 13:25:17 +08:00
2025-08-06 15:03:30 +08:00
2025-05-25 10:02:13 +08:00
2025-07-17 16:59:10 +08:00
2025-08-05 23:29:19 +05:30
2025-06-08 18:34:05 +08:00
2025-07-08 11:46:24 +02:00
2025-07-08 12:24:33 +02:00
2025-06-29 20:41:37 +08:00
2025-07-04 15:16:53 +08:00
2025-06-08 11:33:28 +08:00
2025-07-15 11:32:04 +02:00