Fix critical RCE and LFI vulnerabilities in Docker API deployment

Security fixes for vulnerabilities reported by ProjectDiscovery:

1. Remote Code Execution via Hooks (CVE pending)
   - Remove __import__ from allowed_builtins in hook_manager.py
   - Prevents arbitrary module imports (os, subprocess, etc.)
   - Hooks now disabled by default via CRAWL4AI_HOOKS_ENABLED env var

2. Local File Inclusion via file:// URLs (CVE pending)
   - Add URL scheme validation to /execute_js, /screenshot, /pdf, /html
   - Block file://, javascript:, data: and other dangerous schemes
   - Only allow http://, https://, and raw: (where appropriate)

3. Security hardening
   - Add CRAWL4AI_HOOKS_ENABLED=false as default (opt-in for hooks)
   - Add security warning comments in config.yml
   - Add validate_url_scheme() helper for consistent validation

Testing:
   - Add unit tests (test_security_fixes.py) - 16 tests
   - Add integration tests (run_security_tests.py) for live server

Affected endpoints:
   - POST /crawl (hooks disabled by default)
   - POST /crawl/stream (hooks disabled by default)
   - POST /execute_js (URL validation added)
   - POST /screenshot (URL validation added)
   - POST /pdf (URL validation added)
   - POST /html (URL validation added)

Breaking changes:
   - Hooks require CRAWL4AI_HOOKS_ENABLED=true to function
   - file:// URLs no longer work on API endpoints (use library directly)
This commit is contained in:
unclecode
2026-01-12 04:14:37 +00:00
parent 6b2dca76c3
commit f24396c23e
5 changed files with 399 additions and 4 deletions

View File

@@ -117,18 +117,18 @@ class UserHookManager:
"""
try:
# Create a safe namespace for the hook
# Use a more complete builtins that includes __import__
# SECURITY: No __import__ to prevent arbitrary module imports (RCE risk)
import builtins
safe_builtins = {}
# Add safe built-in functions
# Add safe built-in functions (no __import__ for security)
allowed_builtins = [
'print', 'len', 'str', 'int', 'float', 'bool',
'list', 'dict', 'set', 'tuple', 'range', 'enumerate',
'zip', 'map', 'filter', 'any', 'all', 'sum', 'min', 'max',
'sorted', 'reversed', 'abs', 'round', 'isinstance', 'type',
'getattr', 'hasattr', 'setattr', 'callable', 'iter', 'next',
'__import__', '__build_class__' # Required for exec
'__build_class__' # Required for class definitions in exec
]
for name in allowed_builtins: