Fix critical RCE and LFI vulnerabilities in Docker API deployment

Security fixes for vulnerabilities reported by ProjectDiscovery:

1. Remote Code Execution via Hooks (CVE pending)
   - Remove __import__ from allowed_builtins in hook_manager.py
   - Prevents arbitrary module imports (os, subprocess, etc.)
   - Hooks now disabled by default via CRAWL4AI_HOOKS_ENABLED env var

2. Local File Inclusion via file:// URLs (CVE pending)
   - Add URL scheme validation to /execute_js, /screenshot, /pdf, /html
   - Block file://, javascript:, data: and other dangerous schemes
   - Only allow http://, https://, and raw: (where appropriate)

3. Security hardening
   - Add CRAWL4AI_HOOKS_ENABLED=false as default (opt-in for hooks)
   - Add security warning comments in config.yml
   - Add validate_url_scheme() helper for consistent validation

Testing:
   - Add unit tests (test_security_fixes.py) - 16 tests
   - Add integration tests (run_security_tests.py) for live server

Affected endpoints:
   - POST /crawl (hooks disabled by default)
   - POST /crawl/stream (hooks disabled by default)
   - POST /execute_js (URL validation added)
   - POST /screenshot (URL validation added)
   - POST /pdf (URL validation added)
   - POST /html (URL validation added)

Breaking changes:
   - Hooks require CRAWL4AI_HOOKS_ENABLED=true to function
   - file:// URLs no longer work on API endpoints (use library directly)
This commit is contained in:
unclecode
2026-01-12 04:14:37 +00:00
parent 6b2dca76c3
commit f24396c23e
5 changed files with 399 additions and 4 deletions

View File

@@ -37,6 +37,10 @@ rate_limiting:
storage_uri: "memory://" # Use "redis://localhost:6379" for production
# Security Configuration
# WARNING: For production deployments, enable security and use proper SECRET_KEY:
# - Set jwt_enabled: true for authentication
# - Set SECRET_KEY environment variable to a secure random value
# - Set CRAWL4AI_HOOKS_ENABLED=true only if you need hooks (RCE risk)
security:
enabled: false
jwt_enabled: false