Files

unclecode 530cde351f Add release notes for v0.8.0, detailing breaking changes, security fixes, new features, bug fixes, and documentation updates

Documentation for v0.8.0 release:

- SECURITY.md: Security policy and vulnerability reporting guidelines
- RELEASE_NOTES_v0.8.0.md: Comprehensive release notes
- migration/v0.8.0-upgrade-guide.md: Step-by-step migration guide
- security/GHSA-DRAFT-RCE-LFI.md: GitHub security advisory drafts
- CHANGELOG.md: Updated with v0.8.0 changes

Breaking changes documented:
- Docker API hooks disabled by default (CRAWL4AI_HOOKS_ENABLED)
- file:// URLs blocked on Docker API endpoints

Security fixes credited to Neo by ProjectDiscovery

2026-01-12 13:45:42 +00:00

6.7 KiB

Raw Blame History

Migration Guide: Upgrading to Crawl4AI v0.8.0

This guide helps you upgrade from v0.7.x to v0.8.0, with special attention to breaking changes and security updates.

Quick Summary

Change	Impact	Action Required
Hooks disabled by default	Docker API users with hooks	Set `CRAWL4AI_HOOKS_ENABLED=true`
file:// URLs blocked	Docker API users reading local files	Use Python library directly
Security fixes	All Docker API users	Update immediately

Step 1: Update the Package

PyPI Installation

pip install --upgrade crawl4ai

Docker Installation

docker pull unclecode/crawl4ai:latest
# or
docker pull unclecode/crawl4ai:0.8.0

From Source

git pull origin main
pip install -e .

Step 2: Check for Breaking Changes

Are You Affected?

You ARE affected if you:

Use the Docker API deployment
Use the hooks parameter in /crawl requests
Use file:// URLs via API endpoints

You are NOT affected if you:

Only use Crawl4AI as a Python library
Don't use hooks in your API calls
Don't use file:// URLs via the API

Step 3: Migrate Hooks Usage

Before v0.8.0

Hooks worked by default:

# This worked without any configuration
curl -X POST http://localhost:11235/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com"],
    "hooks": {
      "code": {
        "on_page_context_created": "async def hook(page, context, **kwargs):\n    await context.add_cookies([...])\n    return page"
      }
    }
  }'

After v0.8.0

You must explicitly enable hooks:

Option A: Environment Variable (Recommended)

# In your Docker run command or docker-compose.yml
export CRAWL4AI_HOOKS_ENABLED=true

# docker-compose.yml
services:
  crawl4ai:
    image: unclecode/crawl4ai:0.8.0
    environment:
      - CRAWL4AI_HOOKS_ENABLED=true

Option B: For Kubernetes

env:
  - name: CRAWL4AI_HOOKS_ENABLED
    value: "true"

Security Warning

Only enable hooks if:

You trust all users who can access the API
The API is not exposed to the public internet
You have other authentication/authorization in place

Step 4: Migrate file:// URL Usage

Before v0.8.0

# This worked via API
curl -X POST http://localhost:11235/execute_js \
  -d '{"url": "file:///var/data/page.html", "scripts": ["document.title"]}'

After v0.8.0

Option A: Use the Python Library Directly

from crawl4ai import AsyncWebCrawler, CrawlerRunConfig

async def process_local_file():
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url="file:///var/data/page.html",
            config=CrawlerRunConfig(js_code=["document.title"])
        )
        return result

Option B: Use raw: Protocol for HTML Content

If you have the HTML content, you can still use the API:

# Read file content and send as raw:
HTML_CONTENT=$(cat /var/data/page.html)
curl -X POST http://localhost:11235/html \
  -H "Content-Type: application/json" \
  -d "{\"url\": \"raw:$HTML_CONTENT\"}"

Option C: Create a Preprocessing Service

# preprocessing_service.py
from fastapi import FastAPI
from crawl4ai import AsyncWebCrawler

app = FastAPI()

@app.post("/process-local")
async def process_local(file_path: str):
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(url=f"file://{file_path}")
        return result.model_dump()

Step 5: Review Security Configuration

Recommended Production Settings

# config.yml
security:
  enabled: true
  jwt_enabled: true
  https_redirect: true  # If behind HTTPS proxy
  trusted_hosts:
    - "your-domain.com"
    - "api.your-domain.com"

Environment Variables

# Required for JWT authentication
export SECRET_KEY="your-secure-random-key-minimum-32-characters"

# Only if you need hooks
export CRAWL4AI_HOOKS_ENABLED=true

Generate a Secure Secret Key

import secrets
print(secrets.token_urlsafe(32))

Step 6: Test Your Integration

Quick Validation Script

import asyncio
import aiohttp

async def test_upgrade():
    base_url = "http://localhost:11235"

    # Test 1: Basic crawl should work
    async with aiohttp.ClientSession() as session:
        async with session.post(
            f"{base_url}/crawl",
            json={"urls": ["https://example.com"]}
        ) as resp:
            assert resp.status == 200, "Basic crawl failed"
            print("✓ Basic crawl works")

    # Test 2: Hooks should be blocked (unless enabled)
    async with aiohttp.ClientSession() as session:
        async with session.post(
            f"{base_url}/crawl",
            json={
                "urls": ["https://example.com"],
                "hooks": {"code": {"on_page_context_created": "async def hook(page, context, **kwargs): return page"}}
            }
        ) as resp:
            if resp.status == 403:
                print("✓ Hooks correctly blocked (default)")
            elif resp.status == 200:
                print("! Hooks enabled - ensure this is intentional")

    # Test 3: file:// should be blocked
    async with aiohttp.ClientSession() as session:
        async with session.post(
            f"{base_url}/execute_js",
            json={"url": "file:///etc/passwd", "scripts": ["1"]}
        ) as resp:
            assert resp.status == 400, "file:// should be blocked"
            print("✓ file:// URLs correctly blocked")

asyncio.run(test_upgrade())

Troubleshooting

"Hooks are disabled" Error

Symptom: API returns 403 with "Hooks are disabled"

Solution: Set CRAWL4AI_HOOKS_ENABLED=true if you need hooks

"URL must start with http://, https://" Error

Symptom: API returns 400 when using file:// URLs

Solution: Use Python library directly or raw: protocol

Authentication Errors After Enabling JWT

Symptom: API returns 401 Unauthorized

Solution:

Get a token: POST /token with your email
Include token in requests: Authorization: Bearer <token>

Rollback Plan

If you need to rollback:

# PyPI
pip install crawl4ai==0.7.6

# Docker
docker pull unclecode/crawl4ai:0.7.6

Warning: Rolling back re-exposes the security vulnerabilities. Only do this temporarily while fixing integration issues.

Getting Help

GitHub Issues: github.com/unclecode/crawl4ai/issues
Security Issues: See SECURITY.md
Documentation: docs.crawl4ai.com

Changelog Reference

For complete list of changes, see:

6.7 KiB Raw Blame History

Migration Guide: Upgrading to Crawl4AI v0.8.0

Quick Summary

Step 1: Update the Package

PyPI Installation

Docker Installation

From Source

Step 2: Check for Breaking Changes

Are You Affected?

Step 3: Migrate Hooks Usage

Before v0.8.0

After v0.8.0

Security Warning

Step 4: Migrate file:// URL Usage

Before v0.8.0

After v0.8.0

Step 5: Review Security Configuration

Recommended Production Settings

Environment Variables

Generate a Secure Secret Key

Step 6: Test Your Integration

Quick Validation Script

Troubleshooting

"Hooks are disabled" Error

"URL must start with http://, https://" Error

Authentication Errors After Enabling JWT

Rollback Plan

Getting Help

Changelog Reference

6.7 KiB

Raw Blame History