feat: add Script Builder to Chrome Extension and reorganize LLM context files

This commit introduces significant enhancements to the Crawl4AI ecosystem: Chrome Extension - Script Builder (Alpha): - Add recording functionality to capture user interactions (clicks, typing, scrolling) - Implement smart event grouping for cleaner script generation - Support export to both JavaScript and C4A script formats - Add timeline view for visualizing and editing recorded actions - Include wait commands (time-based and element-based) - Add saved flows functionality for reusing automation scripts - Update UI with consistent dark terminal theme (Dank Mono font, green/pink accents) - Release new extension versions: v1.1.0, v1.2.0, v1.2.1 LLM Context Builder Improvements: - Reorganize context files from llmtxt/ to llm.txt/ with better structure - Separate diagram templates from text content (diagrams/ and txt/ subdirectories) - Add comprehensive context files for all major Crawl4AI components - Improve file naming convention for better discoverability Documentation Updates: - Update apps index page to match main documentation theme - Standardize color scheme: "Available" tags use primary color (#50ffff) - Change "Coming Soon" tags to dark gray for better visual hierarchy - Add interactive two-column layout for extension landing page - Include code examples for both Schema Builder and Script Builder features Technical Improvements: - Enhance event capture mechanism with better element selection - Add support for contenteditable elements and complex form interactions - Implement proper scroll event handling for both window and element scrolling - Add meta key support for keyboard shortcuts - Improve selector generation for more reliable element targeting The Script Builder is released as Alpha, acknowledging potential bugs while providing early access to this powerful automation recording feature.
2025-06-08 22:02:12 +08:00
parent 926592649e
commit 40640badad
72 changed files with 28600 additions and 100986 deletions
--- a/docs/md_v2/assets/llm.txt/txt/docker.txt
+++ b/docs/md_v2/assets/llm.txt/txt/docker.txt
@@ -0,0 +1,826 @@
+## Docker Deployment
+
+Complete Docker deployment guide with pre-built images, API endpoints, configuration, and MCP integration.
+
+### Quick Start with Pre-built Images
+
+```bash
+# Pull latest image
+docker pull unclecode/crawl4ai:latest
+
+# Setup LLM API keys
+cat > .llm.env << EOL
+OPENAI_API_KEY=sk-your-key
+ANTHROPIC_API_KEY=your-anthropic-key
+GROQ_API_KEY=your-groq-key
+GEMINI_API_TOKEN=your-gemini-token
+EOL
+
+# Run with LLM support
+docker run -d \
+  -p 11235:11235 \
+  --name crawl4ai \
+  --env-file .llm.env \
+  --shm-size=1g \
+  unclecode/crawl4ai:latest
+
+# Basic run (no LLM)
+docker run -d \
+  -p 11235:11235 \
+  --name crawl4ai \
+  --shm-size=1g \
+  unclecode/crawl4ai:latest
+
+# Check health
+curl http://localhost:11235/health
+```
+
+### Docker Compose Deployment
+
+```bash
+# Clone and setup
+git clone https://github.com/unclecode/crawl4ai.git
+cd crawl4ai
+cp deploy/docker/.llm.env.example .llm.env
+# Edit .llm.env with your API keys
+
+# Run pre-built image
+IMAGE=unclecode/crawl4ai:latest docker compose up -d
+
+# Build locally
+docker compose up --build -d
+
+# Build with all features
+INSTALL_TYPE=all docker compose up --build -d
+
+# Build with GPU support
+ENABLE_GPU=true docker compose up --build -d
+
+# Stop service
+docker compose down
+```
+
+### Manual Build with Multi-Architecture
+
+```bash
+# Clone repository
+git clone https://github.com/unclecode/crawl4ai.git
+cd crawl4ai
+
+# Build for current architecture
+docker buildx build -t crawl4ai-local:latest --load .
+
+# Build for multiple architectures
+docker buildx build --platform linux/amd64,linux/arm64 \
+  -t crawl4ai-local:latest --load .
+
+# Build with specific features
+docker buildx build \
+  --build-arg INSTALL_TYPE=all \
+  --build-arg ENABLE_GPU=false \
+  -t crawl4ai-local:latest --load .
+
+# Run custom build
+docker run -d \
+  -p 11235:11235 \
+  --name crawl4ai-custom \
+  --env-file .llm.env \
+  --shm-size=1g \
+  crawl4ai-local:latest
+```
+
+### Build Arguments
+
+```bash
+# Available build options
+docker buildx build \
+  --build-arg INSTALL_TYPE=all \     # default|all|torch|transformer
+  --build-arg ENABLE_GPU=true \      # true|false
+  --build-arg APP_HOME=/app \        # Install path
+  --build-arg USE_LOCAL=true \       # Use local source
+  --build-arg GITHUB_REPO=url \      # Git repo if USE_LOCAL=false
+  --build-arg GITHUB_BRANCH=main \   # Git branch
+  -t crawl4ai-custom:latest --load .
+```
+
+### Core API Endpoints
+
+```python
+# Main crawling endpoints
+import requests
+import json
+
+# Basic crawl
+payload = {
+    "urls": ["https://example.com"],
+    "browser_config": {"type": "BrowserConfig", "params": {"headless": True}},
+    "crawler_config": {"type": "CrawlerRunConfig", "params": {"cache_mode": "bypass"}}
+}
+response = requests.post("http://localhost:11235/crawl", json=payload)
+
+# Streaming crawl
+payload["crawler_config"]["params"]["stream"] = True
+response = requests.post("http://localhost:11235/crawl/stream", json=payload)
+
+# Health check
+response = requests.get("http://localhost:11235/health")
+
+# API schema
+response = requests.get("http://localhost:11235/schema")
+
+# Metrics (Prometheus format)
+response = requests.get("http://localhost:11235/metrics")
+```
+
+### Specialized Endpoints
+
+```python
+# HTML extraction (preprocessed for schema)
+response = requests.post("http://localhost:11235/html", 
+    json={"url": "https://example.com"})
+
+# Screenshot capture
+response = requests.post("http://localhost:11235/screenshot", json={
+    "url": "https://example.com",
+    "screenshot_wait_for": 2,
+    "output_path": "/path/to/save/screenshot.png"
+})
+
+# PDF generation
+response = requests.post("http://localhost:11235/pdf", json={
+    "url": "https://example.com",
+    "output_path": "/path/to/save/document.pdf"
+})
+
+# JavaScript execution
+response = requests.post("http://localhost:11235/execute_js", json={
+    "url": "https://example.com",
+    "scripts": [
+        "return document.title",
+        "return Array.from(document.querySelectorAll('a')).map(a => a.href)"
+    ]
+})
+
+# Markdown generation
+response = requests.post("http://localhost:11235/md", json={
+    "url": "https://example.com",
+    "f": "fit",  # raw|fit|bm25|llm
+    "q": "extract main content",  # query for filtering
+    "c": "0"  # cache: 0=bypass, 1=use
+})
+
+# LLM Q&A
+response = requests.get("http://localhost:11235/llm/https://example.com?q=What is this page about?")
+
+# Library context (for AI assistants)
+response = requests.get("http://localhost:11235/ask", params={
+    "context_type": "all",  # code|doc|all
+    "query": "how to use extraction strategies",
+    "score_ratio": 0.5,
+    "max_results": 20
+})
+```
+
+### Python SDK Usage
+
+```python
+import asyncio
+from crawl4ai.docker_client import Crawl4aiDockerClient
+from crawl4ai import BrowserConfig, CrawlerRunConfig, CacheMode
+
+async def main():
+    async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
+        # Non-streaming crawl
+        results = await client.crawl(
+            ["https://example.com"],
+            browser_config=BrowserConfig(headless=True),
+            crawler_config=CrawlerRunConfig(cache_mode=CacheMode.BYPASS)
+        )
+        
+        for result in results:
+            print(f"URL: {result.url}, Success: {result.success}")
+            print(f"Content length: {len(result.markdown)}")
+        
+        # Streaming crawl
+        stream_config = CrawlerRunConfig(stream=True, cache_mode=CacheMode.BYPASS)
+        async for result in await client.crawl(
+            ["https://example.com", "https://python.org"],
+            browser_config=BrowserConfig(headless=True),
+            crawler_config=stream_config
+        ):
+            print(f"Streamed: {result.url} - {result.success}")
+        
+        # Get API schema
+        schema = await client.get_schema()
+        print(f"Schema available: {bool(schema)}")
+
+asyncio.run(main())
+```
+
+### Advanced API Configuration
+
+```python
+# Complex extraction with LLM
+payload = {
+    "urls": ["https://example.com"],
+    "browser_config": {
+        "type": "BrowserConfig",
+        "params": {
+            "headless": True,
+            "viewport": {"type": "dict", "value": {"width": 1200, "height": 800}}
+        }
+    },
+    "crawler_config": {
+        "type": "CrawlerRunConfig",
+        "params": {
+            "extraction_strategy": {
+                "type": "LLMExtractionStrategy",
+                "params": {
+                    "llm_config": {
+                        "type": "LLMConfig",
+                        "params": {
+                            "provider": "openai/gpt-4o-mini",
+                            "api_token": "env:OPENAI_API_KEY"
+                        }
+                    },
+                    "schema": {
+                        "type": "dict",
+                        "value": {
+                            "type": "object",
+                            "properties": {
+                                "title": {"type": "string"},
+                                "content": {"type": "string"}
+                            }
+                        }
+                    },
+                    "instruction": "Extract title and main content"
+                }
+            },
+            "markdown_generator": {
+                "type": "DefaultMarkdownGenerator",
+                "params": {
+                    "content_filter": {
+                        "type": "PruningContentFilter",
+                        "params": {"threshold": 0.6}
+                    }
+                }
+            }
+        }
+    }
+}
+
+response = requests.post("http://localhost:11235/crawl", json=payload)
+```
+
+### CSS Extraction Strategy
+
+```python
+# CSS-based structured extraction
+schema = {
+    "name": "ProductList",
+    "baseSelector": ".product",
+    "fields": [
+        {"name": "title", "selector": "h2", "type": "text"},
+        {"name": "price", "selector": ".price", "type": "text"},
+        {"name": "link", "selector": "a", "type": "attribute", "attribute": "href"}
+    ]
+}
+
+payload = {
+    "urls": ["https://example-shop.com"],
+    "browser_config": {"type": "BrowserConfig", "params": {"headless": True}},
+    "crawler_config": {
+        "type": "CrawlerRunConfig",
+        "params": {
+            "extraction_strategy": {
+                "type": "JsonCssExtractionStrategy",
+                "params": {
+                    "schema": {"type": "dict", "value": schema}
+                }
+            }
+        }
+    }
+}
+
+response = requests.post("http://localhost:11235/crawl", json=payload)
+data = response.json()
+extracted = json.loads(data["results"][0]["extracted_content"])
+```
+
+### MCP (Model Context Protocol) Integration
+
+```bash
+# Add Crawl4AI as MCP provider to Claude Code
+claude mcp add --transport sse c4ai-sse http://localhost:11235/mcp/sse
+
+# List MCP providers
+claude mcp list
+
+# Test MCP connection
+python tests/mcp/test_mcp_socket.py
+
+# Available MCP endpoints
+# SSE: http://localhost:11235/mcp/sse
+# WebSocket: ws://localhost:11235/mcp/ws
+# Schema: http://localhost:11235/mcp/schema
+```
+
+Available MCP tools:
+- `md` - Generate markdown from web content
+- `html` - Extract preprocessed HTML  
+- `screenshot` - Capture webpage screenshots
+- `pdf` - Generate PDF documents
+- `execute_js` - Run JavaScript on web pages
+- `crawl` - Perform multi-URL crawling
+- `ask` - Query Crawl4AI library context
+
+### Configuration Management
+
+```yaml
+# config.yml structure
+app:
+  title: "Crawl4AI API"
+  version: "1.0.0"
+  host: "0.0.0.0"
+  port: 11235
+  timeout_keep_alive: 300
+
+llm:
+  provider: "openai/gpt-4o-mini"
+  api_key_env: "OPENAI_API_KEY"
+
+security:
+  enabled: false
+  jwt_enabled: false
+  trusted_hosts: ["*"]
+
+crawler:
+  memory_threshold_percent: 95.0
+  rate_limiter:
+    base_delay: [1.0, 2.0]
+  timeouts:
+    stream_init: 30.0
+    batch_process: 300.0
+  pool:
+    max_pages: 40
+    idle_ttl_sec: 1800
+
+rate_limiting:
+  enabled: true
+  default_limit: "1000/minute"
+  storage_uri: "memory://"
+
+logging:
+  level: "INFO"
+  format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
+```
+
+### Custom Configuration Deployment
+
+```bash
+# Method 1: Mount custom config
+docker run -d -p 11235:11235 \
+  --name crawl4ai-custom \
+  --env-file .llm.env \
+  --shm-size=1g \
+  -v $(pwd)/my-config.yml:/app/config.yml \
+  unclecode/crawl4ai:latest
+
+# Method 2: Build with custom config
+# Edit deploy/docker/config.yml then build
+docker buildx build -t crawl4ai-custom:latest --load .
+```
+
+### Monitoring and Health Checks
+
+```bash
+# Health endpoint
+curl http://localhost:11235/health
+
+# Prometheus metrics
+curl http://localhost:11235/metrics
+
+# Configuration validation
+curl -X POST http://localhost:11235/config/dump \
+  -H "Content-Type: application/json" \
+  -d '{"code": "CrawlerRunConfig(cache_mode=\"BYPASS\", screenshot=True)"}'
+```
+
+### Playground Interface
+
+Access the interactive playground at `http://localhost:11235/playground` for:
+- Testing configurations with visual interface
+- Generating JSON payloads for REST API
+- Converting Python config to JSON format
+- Testing crawl operations directly in browser
+
+### Async Job Processing
+
+```python
+# Submit job for async processing
+import time
+
+# Submit crawl job
+response = requests.post("http://localhost:11235/crawl/job", json=payload)
+task_id = response.json()["task_id"]
+
+# Poll for completion
+while True:
+    result = requests.get(f"http://localhost:11235/crawl/job/{task_id}")
+    status = result.json()
+    
+    if status["status"] in ["COMPLETED", "FAILED"]:
+        break
+    time.sleep(1.5)
+
+print("Final result:", status)
+```
+
+### Production Deployment
+
+```bash
+# Production-ready deployment
+docker run -d \
+  --name crawl4ai-prod \
+  --restart unless-stopped \
+  -p 11235:11235 \
+  --env-file .llm.env \
+  --shm-size=2g \
+  --memory=8g \
+  --cpus=4 \
+  -v /path/to/custom-config.yml:/app/config.yml \
+  unclecode/crawl4ai:latest
+
+# With Docker Compose for production
+version: '3.8'
+services:
+  crawl4ai:
+    image: unclecode/crawl4ai:latest
+    ports:
+      - "11235:11235"
+    environment:
+      - OPENAI_API_KEY=${OPENAI_API_KEY}
+    volumes:
+      - ./config.yml:/app/config.yml
+    shm_size: 2g
+    deploy:
+      resources:
+        limits:
+          memory: 8G
+          cpus: '4'
+    restart: unless-stopped
+```
+
+### Configuration Validation and JSON Structure
+
+```python
+# Method 1: Create config objects and dump to see expected JSON structure
+from crawl4ai import BrowserConfig, CrawlerRunConfig, LLMConfig, CacheMode
+from crawl4ai.extraction_strategy import JsonCssExtractionStrategy, LLMExtractionStrategy
+import json
+
+# Create browser config and see JSON structure
+browser_config = BrowserConfig(
+    headless=True,
+    viewport_width=1280,
+    viewport_height=720,
+    proxy="http://user:pass@proxy:8080"
+)
+
+# Get JSON structure
+browser_json = browser_config.dump()
+print("BrowserConfig JSON structure:")
+print(json.dumps(browser_json, indent=2))
+
+# Create crawler config with extraction strategy
+schema = {
+    "name": "Articles",
+    "baseSelector": ".article",
+    "fields": [
+        {"name": "title", "selector": "h2", "type": "text"},
+        {"name": "content", "selector": ".content", "type": "html"}
+    ]
+}
+
+crawler_config = CrawlerRunConfig(
+    cache_mode=CacheMode.BYPASS,
+    screenshot=True,
+    extraction_strategy=JsonCssExtractionStrategy(schema),
+    js_code=["window.scrollTo(0, document.body.scrollHeight);"],
+    wait_for="css:.loaded"
+)
+
+crawler_json = crawler_config.dump()
+print("\nCrawlerRunConfig JSON structure:")
+print(json.dumps(crawler_json, indent=2))
+```
+
+### Reverse Validation - JSON to Objects
+
+```python
+# Method 2: Load JSON back to config objects for validation
+from crawl4ai.async_configs import from_serializable_dict
+
+# Test JSON structure by converting back to objects
+test_browser_json = {
+    "type": "BrowserConfig",
+    "params": {
+        "headless": True,
+        "viewport_width": 1280,
+        "proxy": "http://user:pass@proxy:8080"
+    }
+}
+
+try:
+    # Convert JSON back to object
+    restored_browser = from_serializable_dict(test_browser_json)
+    print(f"✅ Valid BrowserConfig: {type(restored_browser)}")
+    print(f"Headless: {restored_browser.headless}")
+    print(f"Proxy: {restored_browser.proxy}")
+except Exception as e:
+    print(f"❌ Invalid BrowserConfig JSON: {e}")
+
+# Test complex crawler config JSON
+test_crawler_json = {
+    "type": "CrawlerRunConfig", 
+    "params": {
+        "cache_mode": "bypass",
+        "screenshot": True,
+        "extraction_strategy": {
+            "type": "JsonCssExtractionStrategy",
+            "params": {
+                "schema": {
+                    "type": "dict",
+                    "value": {
+                        "name": "Products",
+                        "baseSelector": ".product",
+                        "fields": [
+                            {"name": "title", "selector": "h3", "type": "text"}
+                        ]
+                    }
+                }
+            }
+        }
+    }
+}
+
+try:
+    restored_crawler = from_serializable_dict(test_crawler_json)
+    print(f"✅ Valid CrawlerRunConfig: {type(restored_crawler)}")
+    print(f"Cache mode: {restored_crawler.cache_mode}")
+    print(f"Has extraction strategy: {restored_crawler.extraction_strategy is not None}")
+except Exception as e:
+    print(f"❌ Invalid CrawlerRunConfig JSON: {e}")
+```
+
+### Using Server's /config/dump Endpoint for Validation
+
+```python
+import requests
+
+# Method 3: Use server endpoint to validate configuration syntax
+def validate_config_with_server(config_code: str) -> dict:
+    """Validate configuration using server's /config/dump endpoint"""
+    response = requests.post(
+        "http://localhost:11235/config/dump",
+        json={"code": config_code}
+    )
+    
+    if response.status_code == 200:
+        print("✅ Valid configuration syntax")
+        return response.json()
+    else:
+        print(f"❌ Invalid configuration: {response.status_code}")
+        print(response.json())
+        return None
+
+# Test valid configuration
+valid_config = """
+CrawlerRunConfig(
+    cache_mode=CacheMode.BYPASS,
+    screenshot=True,
+    js_code=["window.scrollTo(0, document.body.scrollHeight);"],
+    wait_for="css:.content-loaded"
+)
+"""
+
+result = validate_config_with_server(valid_config)
+if result:
+    print("Generated JSON structure:")
+    print(json.dumps(result, indent=2))
+
+# Test invalid configuration (should fail)
+invalid_config = """
+CrawlerRunConfig(
+    cache_mode="invalid_mode",
+    screenshot=True,
+    js_code=some_function()  # This will fail
+)
+"""
+
+validate_config_with_server(invalid_config)
+```
+
+### Configuration Builder Helper
+
+```python
+def build_and_validate_request(urls, browser_params=None, crawler_params=None):
+    """Helper to build and validate complete request payload"""
+    
+    # Create configurations
+    browser_config = BrowserConfig(**(browser_params or {}))
+    crawler_config = CrawlerRunConfig(**(crawler_params or {}))
+    
+    # Build complete request payload
+    payload = {
+        "urls": urls if isinstance(urls, list) else [urls],
+        "browser_config": browser_config.dump(),
+        "crawler_config": crawler_config.dump()
+    }
+    
+    print("✅ Complete request payload:")
+    print(json.dumps(payload, indent=2))
+    
+    # Validate by attempting to reconstruct
+    try:
+        test_browser = from_serializable_dict(payload["browser_config"])
+        test_crawler = from_serializable_dict(payload["crawler_config"])
+        print("✅ Payload validation successful")
+        return payload
+    except Exception as e:
+        print(f"❌ Payload validation failed: {e}")
+        return None
+
+# Example usage
+payload = build_and_validate_request(
+    urls=["https://example.com"],
+    browser_params={"headless": True, "viewport_width": 1280},
+    crawler_params={
+        "cache_mode": CacheMode.BYPASS,
+        "screenshot": True,
+        "word_count_threshold": 10
+    }
+)
+
+if payload:
+    # Send to server
+    response = requests.post("http://localhost:11235/crawl", json=payload)
+    print(f"Server response: {response.status_code}")
+```
+
+### Common JSON Structure Patterns
+
+```python
+# Pattern 1: Simple primitive values
+simple_config = {
+    "type": "CrawlerRunConfig",
+    "params": {
+        "cache_mode": "bypass",  # String enum value
+        "screenshot": True,      # Boolean
+        "page_timeout": 60000   # Integer
+    }
+}
+
+# Pattern 2: Nested objects
+nested_config = {
+    "type": "CrawlerRunConfig", 
+    "params": {
+        "extraction_strategy": {
+            "type": "LLMExtractionStrategy",
+            "params": {
+                "llm_config": {
+                    "type": "LLMConfig",
+                    "params": {
+                        "provider": "openai/gpt-4o-mini",
+                        "api_token": "env:OPENAI_API_KEY"
+                    }
+                },
+                "instruction": "Extract main content"
+            }
+        }
+    }
+}
+
+# Pattern 3: Dictionary values (must use type: dict wrapper)
+dict_config = {
+    "type": "CrawlerRunConfig",
+    "params": {
+        "extraction_strategy": {
+            "type": "JsonCssExtractionStrategy", 
+            "params": {
+                "schema": {
+                    "type": "dict",  # Required wrapper
+                    "value": {       # Actual dictionary content
+                        "name": "Products",
+                        "baseSelector": ".product",
+                        "fields": [
+                            {"name": "title", "selector": "h2", "type": "text"}
+                        ]
+                    }
+                }
+            }
+        }
+    }
+}
+
+# Pattern 4: Lists and arrays
+list_config = {
+    "type": "CrawlerRunConfig",
+    "params": {
+        "js_code": [  # Lists are handled directly
+            "window.scrollTo(0, document.body.scrollHeight);",
+            "document.querySelector('.load-more')?.click();"
+        ],
+        "excluded_tags": ["script", "style", "nav"]
+    }
+}
+```
+
+### Troubleshooting Common JSON Errors
+
+```python
+def diagnose_json_errors():
+    """Common JSON structure errors and fixes"""
+    
+    # ❌ WRONG: Missing type wrapper for objects
+    wrong_config = {
+        "browser_config": {
+            "headless": True  # Missing type wrapper
+        }
+    }
+    
+    # ✅ CORRECT: Proper type wrapper
+    correct_config = {
+        "browser_config": {
+            "type": "BrowserConfig",
+            "params": {
+                "headless": True
+            }
+        }
+    }
+    
+    # ❌ WRONG: Dictionary without type: dict wrapper  
+    wrong_dict = {
+        "schema": {
+            "name": "Products"  # Raw dict, should be wrapped
+        }
+    }
+    
+    # ✅ CORRECT: Dictionary with proper wrapper
+    correct_dict = {
+        "schema": {
+            "type": "dict",
+            "value": {
+                "name": "Products"
+            }
+        }
+    }
+    
+    # ❌ WRONG: Invalid enum string
+    wrong_enum = {
+        "cache_mode": "DISABLED"  # Wrong case/value
+    }
+    
+    # ✅ CORRECT: Valid enum string  
+    correct_enum = {
+        "cache_mode": "bypass"  # or "enabled", "disabled", etc.
+    }
+    
+    print("Common error patterns documented above")
+
+# Validate your JSON structure before sending
+def pre_flight_check(payload):
+    """Run checks before sending to server"""
+    required_keys = ["urls", "browser_config", "crawler_config"]
+    
+    for key in required_keys:
+        if key not in payload:
+            print(f"❌ Missing required key: {key}")
+            return False
+    
+    # Check type wrappers
+    for config_key in ["browser_config", "crawler_config"]:
+        config = payload[config_key]
+        if not isinstance(config, dict) or "type" not in config:
+            print(f"❌ {config_key} missing type wrapper")
+            return False
+        if "params" not in config:
+            print(f"❌ {config_key} missing params")
+            return False
+    
+    print("✅ Pre-flight check passed")
+    return True
+
+# Example usage
+payload = {
+    "urls": ["https://example.com"],
+    "browser_config": {"type": "BrowserConfig", "params": {"headless": True}},
+    "crawler_config": {"type": "CrawlerRunConfig", "params": {"cache_mode": "bypass"}}
+}
+
+if pre_flight_check(payload):
+    # Safe to send to server
+    pass
+```
+
+**📖 Learn more:** [Complete Docker Guide](https://docs.crawl4ai.com/core/docker-deployment/), [API Reference](https://docs.crawl4ai.com/api/), [MCP Integration](https://docs.crawl4ai.com/core/docker-deployment/#mcp-model-context-protocol-support), [Configuration Options](https://docs.crawl4ai.com/core/docker-deployment/#server-configuration)