Add release notes for v0.8.0, detailing breaking changes, security fixes, new features, bug fixes, and documentation updates

Documentation for v0.8.0 release: - SECURITY.md: Security policy and vulnerability reporting guidelines - RELEASE_NOTES_v0.8.0.md: Comprehensive release notes - migration/v0.8.0-upgrade-guide.md: Step-by-step migration guide - security/GHSA-DRAFT-RCE-LFI.md: GitHub security advisory drafts - CHANGELOG.md: Updated with v0.8.0 changes Breaking changes documented: - Docker API hooks disabled by default (CRAWL4AI_HOOKS_ENABLED) - file:// URLs blocked on Docker API endpoints Security fixes credited to Neo by ProjectDiscovery
2026-01-12 13:45:42 +00:00
parent 122b4fe3f0
commit 530cde351f
6 changed files with 877 additions and 335 deletions
--- a/docs/migration/v0.8.0-upgrade-guide.md
+++ b/docs/migration/v0.8.0-upgrade-guide.md
@@ -0,0 +1,301 @@
+# Migration Guide: Upgrading to Crawl4AI v0.8.0
+
+This guide helps you upgrade from v0.7.x to v0.8.0, with special attention to breaking changes and security updates.
+
+## Quick Summary
+
+| Change | Impact | Action Required |
+|--------|--------|-----------------|
+| Hooks disabled by default | Docker API users with hooks | Set `CRAWL4AI_HOOKS_ENABLED=true` |
+| file:// URLs blocked | Docker API users reading local files | Use Python library directly |
+| Security fixes | All Docker API users | Update immediately |
+
+---
+
+## Step 1: Update the Package
+
+### PyPI Installation
+
+```bash
+pip install --upgrade crawl4ai
+```
+
+### Docker Installation
+
+```bash
+docker pull unclecode/crawl4ai:latest
+# or
+docker pull unclecode/crawl4ai:0.8.0
+```
+
+### From Source
+
+```bash
+git pull origin main
+pip install -e .
+```
+
+---
+
+## Step 2: Check for Breaking Changes
+
+### Are You Affected?
+
+**You ARE affected if you:**
+- Use the Docker API deployment
+- Use the `hooks` parameter in `/crawl` requests
+- Use `file://` URLs via API endpoints
+
+**You are NOT affected if you:**
+- Only use Crawl4AI as a Python library
+- Don't use hooks in your API calls
+- Don't use `file://` URLs via the API
+
+---
+
+## Step 3: Migrate Hooks Usage
+
+### Before v0.8.0
+
+Hooks worked by default:
+
+```bash
+# This worked without any configuration
+curl -X POST http://localhost:11235/crawl \
+  -H "Content-Type: application/json" \
+  -d '{
+    "urls": ["https://example.com"],
+    "hooks": {
+      "code": {
+        "on_page_context_created": "async def hook(page, context, **kwargs):\n    await context.add_cookies([...])\n    return page"
+      }
+    }
+  }'
+```
+
+### After v0.8.0
+
+You must explicitly enable hooks:
+
+**Option A: Environment Variable (Recommended)**
+```bash
+# In your Docker run command or docker-compose.yml
+export CRAWL4AI_HOOKS_ENABLED=true
+```
+
+```yaml
+# docker-compose.yml
+services:
+  crawl4ai:
+    image: unclecode/crawl4ai:0.8.0
+    environment:
+      - CRAWL4AI_HOOKS_ENABLED=true
+```
+
+**Option B: For Kubernetes**
+```yaml
+env:
+  - name: CRAWL4AI_HOOKS_ENABLED
+    value: "true"
+```
+
+### Security Warning
+
+Only enable hooks if:
+- You trust all users who can access the API
+- The API is not exposed to the public internet
+- You have other authentication/authorization in place
+
+---
+
+## Step 4: Migrate file:// URL Usage
+
+### Before v0.8.0
+
+```bash
+# This worked via API
+curl -X POST http://localhost:11235/execute_js \
+  -d '{"url": "file:///var/data/page.html", "scripts": ["document.title"]}'
+```
+
+### After v0.8.0
+
+**Option A: Use the Python Library Directly**
+
+```python
+from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
+
+async def process_local_file():
+    async with AsyncWebCrawler() as crawler:
+        result = await crawler.arun(
+            url="file:///var/data/page.html",
+            config=CrawlerRunConfig(js_code=["document.title"])
+        )
+        return result
+```
+
+**Option B: Use raw: Protocol for HTML Content**
+
+If you have the HTML content, you can still use the API:
+
+```bash
+# Read file content and send as raw:
+HTML_CONTENT=$(cat /var/data/page.html)
+curl -X POST http://localhost:11235/html \
+  -H "Content-Type: application/json" \
+  -d "{\"url\": \"raw:$HTML_CONTENT\"}"
+```
+
+**Option C: Create a Preprocessing Service**
+
+```python
+# preprocessing_service.py
+from fastapi import FastAPI
+from crawl4ai import AsyncWebCrawler
+
+app = FastAPI()
+
+@app.post("/process-local")
+async def process_local(file_path: str):
+    async with AsyncWebCrawler() as crawler:
+        result = await crawler.arun(url=f"file://{file_path}")
+        return result.model_dump()
+```
+
+---
+
+## Step 5: Review Security Configuration
+
+### Recommended Production Settings
+
+```yaml
+# config.yml
+security:
+  enabled: true
+  jwt_enabled: true
+  https_redirect: true  # If behind HTTPS proxy
+  trusted_hosts:
+    - "your-domain.com"
+    - "api.your-domain.com"
+```
+
+### Environment Variables
+
+```bash
+# Required for JWT authentication
+export SECRET_KEY="your-secure-random-key-minimum-32-characters"
+
+# Only if you need hooks
+export CRAWL4AI_HOOKS_ENABLED=true
+```
+
+### Generate a Secure Secret Key
+
+```python
+import secrets
+print(secrets.token_urlsafe(32))
+```
+
+---
+
+## Step 6: Test Your Integration
+
+### Quick Validation Script
+
+```python
+import asyncio
+import aiohttp
+
+async def test_upgrade():
+    base_url = "http://localhost:11235"
+
+    # Test 1: Basic crawl should work
+    async with aiohttp.ClientSession() as session:
+        async with session.post(
+            f"{base_url}/crawl",
+            json={"urls": ["https://example.com"]}
+        ) as resp:
+            assert resp.status == 200, "Basic crawl failed"
+            print("✓ Basic crawl works")
+
+    # Test 2: Hooks should be blocked (unless enabled)
+    async with aiohttp.ClientSession() as session:
+        async with session.post(
+            f"{base_url}/crawl",
+            json={
+                "urls": ["https://example.com"],
+                "hooks": {"code": {"on_page_context_created": "async def hook(page, context, **kwargs): return page"}}
+            }
+        ) as resp:
+            if resp.status == 403:
+                print("✓ Hooks correctly blocked (default)")
+            elif resp.status == 200:
+                print("! Hooks enabled - ensure this is intentional")
+
+    # Test 3: file:// should be blocked
+    async with aiohttp.ClientSession() as session:
+        async with session.post(
+            f"{base_url}/execute_js",
+            json={"url": "file:///etc/passwd", "scripts": ["1"]}
+        ) as resp:
+            assert resp.status == 400, "file:// should be blocked"
+            print("✓ file:// URLs correctly blocked")
+
+asyncio.run(test_upgrade())
+```
+
+---
+
+## Troubleshooting
+
+### "Hooks are disabled" Error
+
+**Symptom**: API returns 403 with "Hooks are disabled"
+
+**Solution**: Set `CRAWL4AI_HOOKS_ENABLED=true` if you need hooks
+
+### "URL must start with http://, https://" Error
+
+**Symptom**: API returns 400 when using `file://` URLs
+
+**Solution**: Use Python library directly or `raw:` protocol
+
+### Authentication Errors After Enabling JWT
+
+**Symptom**: API returns 401 Unauthorized
+
+**Solution**:
+1. Get a token: `POST /token` with your email
+2. Include token in requests: `Authorization: Bearer <token>`
+
+---
+
+## Rollback Plan
+
+If you need to rollback:
+
+```bash
+# PyPI
+pip install crawl4ai==0.7.6
+
+# Docker
+docker pull unclecode/crawl4ai:0.7.6
+```
+
+**Warning**: Rolling back re-exposes the security vulnerabilities. Only do this temporarily while fixing integration issues.
+
+---
+
+## Getting Help
+
+- **GitHub Issues**: [github.com/unclecode/crawl4ai/issues](https://github.com/unclecode/crawl4ai/issues)
+- **Security Issues**: See [SECURITY.md](../../SECURITY.md)
+- **Documentation**: [docs.crawl4ai.com](https://docs.crawl4ai.com)
+
+---
+
+## Changelog Reference
+
+For complete list of changes, see:
+- [Release Notes v0.8.0](../RELEASE_NOTES_v0.8.0.md)
+- [CHANGELOG.md](../../CHANGELOG.md)