Documentation for v0.8.0 release: - SECURITY.md: Security policy and vulnerability reporting guidelines - RELEASE_NOTES_v0.8.0.md: Comprehensive release notes - migration/v0.8.0-upgrade-guide.md: Step-by-step migration guide - security/GHSA-DRAFT-RCE-LFI.md: GitHub security advisory drafts - CHANGELOG.md: Updated with v0.8.0 changes Breaking changes documented: - Docker API hooks disabled by default (CRAWL4AI_HOOKS_ENABLED) - file:// URLs blocked on Docker API endpoints Security fixes credited to Neo by ProjectDiscovery
6.7 KiB
Migration Guide: Upgrading to Crawl4AI v0.8.0
This guide helps you upgrade from v0.7.x to v0.8.0, with special attention to breaking changes and security updates.
Quick Summary
| Change | Impact | Action Required |
|---|---|---|
| Hooks disabled by default | Docker API users with hooks | Set CRAWL4AI_HOOKS_ENABLED=true |
| file:// URLs blocked | Docker API users reading local files | Use Python library directly |
| Security fixes | All Docker API users | Update immediately |
Step 1: Update the Package
PyPI Installation
pip install --upgrade crawl4ai
Docker Installation
docker pull unclecode/crawl4ai:latest
# or
docker pull unclecode/crawl4ai:0.8.0
From Source
git pull origin main
pip install -e .
Step 2: Check for Breaking Changes
Are You Affected?
You ARE affected if you:
- Use the Docker API deployment
- Use the
hooksparameter in/crawlrequests - Use
file://URLs via API endpoints
You are NOT affected if you:
- Only use Crawl4AI as a Python library
- Don't use hooks in your API calls
- Don't use
file://URLs via the API
Step 3: Migrate Hooks Usage
Before v0.8.0
Hooks worked by default:
# This worked without any configuration
curl -X POST http://localhost:11235/crawl \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com"],
"hooks": {
"code": {
"on_page_context_created": "async def hook(page, context, **kwargs):\n await context.add_cookies([...])\n return page"
}
}
}'
After v0.8.0
You must explicitly enable hooks:
Option A: Environment Variable (Recommended)
# In your Docker run command or docker-compose.yml
export CRAWL4AI_HOOKS_ENABLED=true
# docker-compose.yml
services:
crawl4ai:
image: unclecode/crawl4ai:0.8.0
environment:
- CRAWL4AI_HOOKS_ENABLED=true
Option B: For Kubernetes
env:
- name: CRAWL4AI_HOOKS_ENABLED
value: "true"
Security Warning
Only enable hooks if:
- You trust all users who can access the API
- The API is not exposed to the public internet
- You have other authentication/authorization in place
Step 4: Migrate file:// URL Usage
Before v0.8.0
# This worked via API
curl -X POST http://localhost:11235/execute_js \
-d '{"url": "file:///var/data/page.html", "scripts": ["document.title"]}'
After v0.8.0
Option A: Use the Python Library Directly
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
async def process_local_file():
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url="file:///var/data/page.html",
config=CrawlerRunConfig(js_code=["document.title"])
)
return result
Option B: Use raw: Protocol for HTML Content
If you have the HTML content, you can still use the API:
# Read file content and send as raw:
HTML_CONTENT=$(cat /var/data/page.html)
curl -X POST http://localhost:11235/html \
-H "Content-Type: application/json" \
-d "{\"url\": \"raw:$HTML_CONTENT\"}"
Option C: Create a Preprocessing Service
# preprocessing_service.py
from fastapi import FastAPI
from crawl4ai import AsyncWebCrawler
app = FastAPI()
@app.post("/process-local")
async def process_local(file_path: str):
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(url=f"file://{file_path}")
return result.model_dump()
Step 5: Review Security Configuration
Recommended Production Settings
# config.yml
security:
enabled: true
jwt_enabled: true
https_redirect: true # If behind HTTPS proxy
trusted_hosts:
- "your-domain.com"
- "api.your-domain.com"
Environment Variables
# Required for JWT authentication
export SECRET_KEY="your-secure-random-key-minimum-32-characters"
# Only if you need hooks
export CRAWL4AI_HOOKS_ENABLED=true
Generate a Secure Secret Key
import secrets
print(secrets.token_urlsafe(32))
Step 6: Test Your Integration
Quick Validation Script
import asyncio
import aiohttp
async def test_upgrade():
base_url = "http://localhost:11235"
# Test 1: Basic crawl should work
async with aiohttp.ClientSession() as session:
async with session.post(
f"{base_url}/crawl",
json={"urls": ["https://example.com"]}
) as resp:
assert resp.status == 200, "Basic crawl failed"
print("✓ Basic crawl works")
# Test 2: Hooks should be blocked (unless enabled)
async with aiohttp.ClientSession() as session:
async with session.post(
f"{base_url}/crawl",
json={
"urls": ["https://example.com"],
"hooks": {"code": {"on_page_context_created": "async def hook(page, context, **kwargs): return page"}}
}
) as resp:
if resp.status == 403:
print("✓ Hooks correctly blocked (default)")
elif resp.status == 200:
print("! Hooks enabled - ensure this is intentional")
# Test 3: file:// should be blocked
async with aiohttp.ClientSession() as session:
async with session.post(
f"{base_url}/execute_js",
json={"url": "file:///etc/passwd", "scripts": ["1"]}
) as resp:
assert resp.status == 400, "file:// should be blocked"
print("✓ file:// URLs correctly blocked")
asyncio.run(test_upgrade())
Troubleshooting
"Hooks are disabled" Error
Symptom: API returns 403 with "Hooks are disabled"
Solution: Set CRAWL4AI_HOOKS_ENABLED=true if you need hooks
"URL must start with http://, https://" Error
Symptom: API returns 400 when using file:// URLs
Solution: Use Python library directly or raw: protocol
Authentication Errors After Enabling JWT
Symptom: API returns 401 Unauthorized
Solution:
- Get a token:
POST /tokenwith your email - Include token in requests:
Authorization: Bearer <token>
Rollback Plan
If you need to rollback:
# PyPI
pip install crawl4ai==0.7.6
# Docker
docker pull unclecode/crawl4ai:0.7.6
Warning: Rolling back re-exposes the security vulnerabilities. Only do this temporarily while fixing integration issues.
Getting Help
- GitHub Issues: github.com/unclecode/crawl4ai/issues
- Security Issues: See SECURITY.md
- Documentation: docs.crawl4ai.com
Changelog Reference
For complete list of changes, see: