crawl4ai/docs/md_v2/assets/llm.txt/diagrams/docker.txt

## Docker Deployment Architecture and Workflows

Visual representations of Crawl4AI Docker deployment, API architecture, configuration management, and service interactions.

### Docker Deployment Decision Flow

```mermaid
flowchart TD
    A[Start Docker Deployment] --> B{Deployment Type?}

    B -->|Quick Start| C[Pre-built Image]
    B -->|Development| D[Docker Compose]
    B -->|Custom Build| E[Manual Build]
    B -->|Production| F[Production Setup]

    C --> C1[docker pull unclecode/crawl4ai]
    C1 --> C2{Need LLM Support?}
    C2 -->|Yes| C3[Setup .llm.env]
    C2 -->|No| C4[Basic run]
    C3 --> C5[docker run with --env-file]
    C4 --> C6[docker run basic]

    D --> D1[git clone repository]
    D1 --> D2[cp .llm.env.example .llm.env]
    D2 --> D3{Build Type?}
    D3 -->|Pre-built| D4[IMAGE=latest docker compose up]
    D3 -->|Local Build| D5[docker compose up --build]
    D3 -->|All Features| D6[INSTALL_TYPE=all docker compose up]

    E --> E1[docker buildx build]
    E1 --> E2{Architecture?}
    E2 -->|Single| E3[--platform linux/amd64]
    E2 -->|Multi| E4[--platform linux/amd64,linux/arm64]
    E3 --> E5[Build complete]
    E4 --> E5

    F --> F1[Production configuration]
    F1 --> F2[Custom config.yml]
    F2 --> F3[Resource limits]
    F3 --> F4[Health monitoring]
    F4 --> F5[Production ready]

    C5 --> G[Service running on :11235]
    C6 --> G
    D4 --> G
    D5 --> G
    D6 --> G
    E5 --> H[docker run custom image]
    H --> G
    F5 --> I[Production deployment]

    G --> J[Access playground at /playground]
    G --> K[Health check at /health]
    I --> L[Production monitoring]

    style A fill:#e1f5fe
    style G fill:#c8e6c9
    style I fill:#c8e6c9
    style J fill:#fff3e0
    style K fill:#fff3e0
    style L fill:#e8f5e8
```

### Docker Container Architecture

```mermaid
graph TB
    subgraph "Host Environment"
        A[Docker Engine] --> B[Crawl4AI Container]
        C[.llm.env] --> B
        D[Custom config.yml] --> B
        E[Port 11235] --> B
        F[Shared Memory 1GB+] --> B
    end

    subgraph "Container Services"
        B --> G[FastAPI Server :8020]
        B --> H[Gunicorn WSGI]
        B --> I[Supervisord Process Manager]
        B --> J[Redis Cache :6379]

        G --> K[REST API Endpoints]
        G --> L[WebSocket Connections]
        G --> M[MCP Protocol]

        H --> N[Worker Processes]
        I --> O[Service Monitoring]
        J --> P[Request Caching]
    end

    subgraph "Browser Management"
        B --> Q[Playwright Framework]
        Q --> R[Chromium Browser]
        Q --> S[Firefox Browser]
        Q --> T[WebKit Browser]

        R --> U[Browser Pool]
        S --> U
        T --> U

        U --> V[Page Sessions]
        U --> W[Context Management]
    end

    subgraph "External Services"
        X[OpenAI API] -.-> K
        Y[Anthropic Claude] -.-> K
        Z[Local Ollama] -.-> K
        AA[Groq API] -.-> K
        BB[Google Gemini] -.-> K
    end

    subgraph "Client Interactions"
        CC[Python SDK] --> K
        DD[REST API Calls] --> K
        EE[MCP Clients] --> M
        FF[Web Browser] --> G
        GG[Monitoring Tools] --> K
    end

    style B fill:#e3f2fd
    style G fill:#f3e5f5
    style Q fill:#e8f5e8
    style K fill:#fff3e0
```

### API Endpoints Architecture

```mermaid
graph LR
    subgraph "Core Endpoints"
        A[/crawl] --> A1[Single URL crawl]
        A2[/crawl/stream] --> A3[Streaming multi-URL]
        A4[/crawl/job] --> A5[Async job submission]
        A6[/crawl/job/{id}] --> A7[Job status check]
    end

    subgraph "Specialized Endpoints"
        B[/html] --> B1[Preprocessed HTML]
        B2[/screenshot] --> B3[PNG capture]
        B4[/pdf] --> B5[PDF generation]
        B6[/execute_js] --> B7[JavaScript execution]
        B8[/md] --> B9[Markdown extraction]
    end

    subgraph "Utility Endpoints"
        C[/health] --> C1[Service status]
        C2[/metrics] --> C3[Prometheus metrics]
        C4[/schema] --> C5[API documentation]
        C6[/playground] --> C7[Interactive testing]
    end

    subgraph "LLM Integration"
        D[/llm/{url}] --> D1[Q&A over URL]
        D2[/ask] --> D3[Library context search]
        D4[/config/dump] --> D5[Config validation]
    end

    subgraph "MCP Protocol"
        E[/mcp/sse] --> E1[Server-Sent Events]
        E2[/mcp/ws] --> E3[WebSocket connection]
        E4[/mcp/schema] --> E5[MCP tool definitions]
    end

    style A fill:#e3f2fd
    style B fill:#f3e5f5
    style C fill:#e8f5e8
    style D fill:#fff3e0
    style E fill:#fce4ec
```

### Request Processing Flow

```mermaid
sequenceDiagram
    participant Client
    participant FastAPI
    participant RequestValidator
    participant BrowserPool
    participant Playwright
    participant ExtractionEngine
    participant LLMProvider

    Client->>FastAPI: POST /crawl with config
    FastAPI->>RequestValidator: Validate JSON structure

    alt Valid Request
        RequestValidator-->>FastAPI: ✓ Validated
        FastAPI->>BrowserPool: Request browser instance
        BrowserPool->>Playwright: Launch browser/reuse session
        Playwright-->>BrowserPool: Browser ready
        BrowserPool-->>FastAPI: Browser allocated

        FastAPI->>Playwright: Navigate to URL
        Playwright->>Playwright: Execute JS, wait conditions
        Playwright-->>FastAPI: Page content ready

        FastAPI->>ExtractionEngine: Process content

        alt LLM Extraction
            ExtractionEngine->>LLMProvider: Send content + schema
            LLMProvider-->>ExtractionEngine: Structured data
        else CSS Extraction
            ExtractionEngine->>ExtractionEngine: Apply CSS selectors
        end

        ExtractionEngine-->>FastAPI: Extraction complete
        FastAPI->>BrowserPool: Release browser
        FastAPI-->>Client: CrawlResult response

    else Invalid Request
        RequestValidator-->>FastAPI: ✗ Validation error
        FastAPI-->>Client: 400 Bad Request
    end
```

### Configuration Management Flow

```mermaid
stateDiagram-v2
    [*] --> ConfigLoading

    ConfigLoading --> DefaultConfig: Load default config.yml
    ConfigLoading --> CustomConfig: Custom config mounted
    ConfigLoading --> EnvOverrides: Environment variables

    DefaultConfig --> ConfigMerging
    CustomConfig --> ConfigMerging
    EnvOverrides --> ConfigMerging

    ConfigMerging --> ConfigValidation

    ConfigValidation --> Valid: Schema validation passes
    ConfigValidation --> Invalid: Validation errors

    Invalid --> ConfigError: Log errors and exit
    ConfigError --> [*]

    Valid --> ServiceInitialization
    ServiceInitialization --> FastAPISetup
    ServiceInitialization --> BrowserPoolInit
    ServiceInitialization --> CacheSetup

    FastAPISetup --> Running
    BrowserPoolInit --> Running
    CacheSetup --> Running

    Running --> ConfigReload: Config change detected
    ConfigReload --> ConfigValidation

    Running --> [*]: Service shutdown

    note right of ConfigMerging : Priority: ENV > Custom > Default
    note right of ServiceInitialization : All services must initialize successfully
```

### Multi-Architecture Build Process

```mermaid
flowchart TD
    A[Developer Push] --> B[GitHub Repository]

    B --> C[Docker Buildx]
    C --> D{Build Strategy}

    D -->|Multi-arch| E[Parallel Builds]
    D -->|Single-arch| F[Platform-specific Build]

    E --> G[AMD64 Build]
    E --> H[ARM64 Build]

    F --> I[Target Platform Build]

    subgraph "AMD64 Build Process"
        G --> G1[Ubuntu base image]
        G1 --> G2[Python 3.11 install]
        G2 --> G3[System dependencies]
        G3 --> G4[Crawl4AI installation]
        G4 --> G5[Playwright setup]
        G5 --> G6[FastAPI configuration]
        G6 --> G7[AMD64 image ready]
    end

    subgraph "ARM64 Build Process"
        H --> H1[Ubuntu ARM64 base]
        H1 --> H2[Python 3.11 install]
        H2 --> H3[ARM-specific deps]
        H3 --> H4[Crawl4AI installation]
        H4 --> H5[Playwright setup]
        H5 --> H6[FastAPI configuration]
        H6 --> H7[ARM64 image ready]
    end

    subgraph "Single Architecture"
        I --> I1[Base image selection]
        I1 --> I2[Platform dependencies]
        I2 --> I3[Application setup]
        I3 --> I4[Platform image ready]
    end

    G7 --> J[Multi-arch Manifest]
    H7 --> J
    I4 --> K[Platform Image]

    J --> L[Docker Hub Registry]
    K --> L

    L --> M[Pull Request Auto-selects Architecture]

    style A fill:#e1f5fe
    style J fill:#c8e6c9
    style K fill:#c8e6c9
    style L fill:#f3e5f5
    style M fill:#e8f5e8
```

### MCP Integration Architecture

```mermaid
graph TB
    subgraph "MCP Client Applications"
        A[Claude Code] --> B[MCP Protocol]
        C[Cursor IDE] --> B
        D[Windsurf] --> B
        E[Custom MCP Client] --> B
    end

    subgraph "Crawl4AI MCP Server"
        B --> F[MCP Endpoint Router]
        F --> G[SSE Transport /mcp/sse]
        F --> H[WebSocket Transport /mcp/ws]
        F --> I[Schema Endpoint /mcp/schema]

        G --> J[MCP Tool Handler]
        H --> J

        J --> K[Tool: md]
        J --> L[Tool: html]
        J --> M[Tool: screenshot]
        J --> N[Tool: pdf]
        J --> O[Tool: execute_js]
        J --> P[Tool: crawl]
        J --> Q[Tool: ask]
    end

    subgraph "Crawl4AI Core Services"
        K --> R[Markdown Generator]
        L --> S[HTML Preprocessor]
        M --> T[Screenshot Service]
        N --> U[PDF Generator]
        O --> V[JavaScript Executor]
        P --> W[Batch Crawler]
        Q --> X[Context Search]

        R --> Y[Browser Pool]
        S --> Y
        T --> Y
        U --> Y
        V --> Y
        W --> Y
        X --> Z[Knowledge Base]
    end

    subgraph "External Resources"
        Y --> AA[Playwright Browsers]
        Z --> BB[Library Documentation]
        Z --> CC[Code Examples]
        AA --> DD[Web Pages]
    end

    style B fill:#e3f2fd
    style J fill:#f3e5f5
    style Y fill:#e8f5e8
    style Z fill:#fff3e0
```

### API Request/Response Flow Patterns

```mermaid
sequenceDiagram
    participant Client
    participant LoadBalancer
    participant FastAPI
    participant ConfigValidator
    participant BrowserManager
    participant CrawlEngine
    participant ResponseBuilder

    Note over Client,ResponseBuilder: Basic Crawl Request

    Client->>LoadBalancer: POST /crawl
    LoadBalancer->>FastAPI: Route request

    FastAPI->>ConfigValidator: Validate browser_config
    ConfigValidator-->>FastAPI: ✓ Valid BrowserConfig

    FastAPI->>ConfigValidator: Validate crawler_config
    ConfigValidator-->>FastAPI: ✓ Valid CrawlerRunConfig

    FastAPI->>BrowserManager: Allocate browser
    BrowserManager-->>FastAPI: Browser instance

    FastAPI->>CrawlEngine: Execute crawl

    Note over CrawlEngine: Page processing
    CrawlEngine->>CrawlEngine: Navigate & wait
    CrawlEngine->>CrawlEngine: Extract content
    CrawlEngine->>CrawlEngine: Apply strategies

    CrawlEngine-->>FastAPI: CrawlResult

    FastAPI->>ResponseBuilder: Format response
    ResponseBuilder-->>FastAPI: JSON response

    FastAPI->>BrowserManager: Release browser
    FastAPI-->>LoadBalancer: Response ready
    LoadBalancer-->>Client: 200 OK + CrawlResult

    Note over Client,ResponseBuilder: Streaming Request

    Client->>FastAPI: POST /crawl/stream
    FastAPI-->>Client: 200 OK (stream start)

    loop For each URL
        FastAPI->>CrawlEngine: Process URL
        CrawlEngine-->>FastAPI: Result ready
        FastAPI-->>Client: NDJSON line
    end

    FastAPI-->>Client: Stream completed
```

### Configuration Validation Workflow

```mermaid
flowchart TD
    A[Client Request] --> B[JSON Payload]
    B --> C{Pre-validation}

    C -->|✓ Valid JSON| D[Extract Configurations]
    C -->|✗ Invalid JSON| E[Return 400 Bad Request]

    D --> F[BrowserConfig Validation]
    D --> G[CrawlerRunConfig Validation]

    F --> H{BrowserConfig Valid?}
    G --> I{CrawlerRunConfig Valid?}

    H -->|✓ Valid| J[Browser Setup]
    H -->|✗ Invalid| K[Log Browser Config Errors]

    I -->|✓ Valid| L[Crawler Setup]
    I -->|✗ Invalid| M[Log Crawler Config Errors]

    K --> N[Collect All Errors]
    M --> N
    N --> O[Return 422 Validation Error]

    J --> P{Both Configs Valid?}
    L --> P

    P -->|✓ Yes| Q[Proceed to Crawling]
    P -->|✗ No| O

    Q --> R[Execute Crawl Pipeline]
    R --> S[Return CrawlResult]

    E --> T[Client Error Response]
    O --> T
    S --> U[Client Success Response]

    style A fill:#e1f5fe
    style Q fill:#c8e6c9
    style S fill:#c8e6c9
    style U fill:#c8e6c9
    style E fill:#ffcdd2
    style O fill:#ffcdd2
    style T fill:#ffcdd2
```

### Production Deployment Architecture

```mermaid
graph TB
    subgraph "Load Balancer Layer"
        A[NGINX/HAProxy] --> B[Health Check]
        A --> C[Request Routing]
        A --> D[SSL Termination]
    end

    subgraph "Application Layer"
        C --> E[Crawl4AI Instance 1]
        C --> F[Crawl4AI Instance 2]
        C --> G[Crawl4AI Instance N]

        E --> H[FastAPI Server]
        F --> I[FastAPI Server]
        G --> J[FastAPI Server]

        H --> K[Browser Pool 1]
        I --> L[Browser Pool 2]
        J --> M[Browser Pool N]
    end

    subgraph "Shared Services"
        N[Redis Cluster] --> E
        N --> F
        N --> G

        O[Monitoring Stack] --> P[Prometheus]
        O --> Q[Grafana]
        O --> R[AlertManager]

        P --> E
        P --> F
        P --> G
    end

    subgraph "External Dependencies"
        S[OpenAI API] -.-> H
        T[Anthropic API] -.-> I
        U[Local LLM Cluster] -.-> J
    end

    subgraph "Persistent Storage"
        V[Configuration Volume] --> E
        V --> F
        V --> G

        W[Cache Volume] --> N
        X[Logs Volume] --> O
    end

    style A fill:#e3f2fd
    style E fill:#f3e5f5
    style F fill:#f3e5f5
    style G fill:#f3e5f5
    style N fill:#e8f5e8
    style O fill:#fff3e0
```

### Docker Resource Management

```mermaid
graph TD
    subgraph "Resource Allocation"
        A[Host Resources] --> B[CPU Cores]
        A --> C[Memory GB]
        A --> D[Disk Space]
        A --> E[Network Bandwidth]

        B --> F[Container Limits]
        C --> F
        D --> F
        E --> F
    end

    subgraph "Container Configuration"
        F --> G[--cpus=4]
        F --> H[--memory=8g]
        F --> I[--shm-size=2g]
        F --> J[Volume Mounts]

        G --> K[Browser Processes]
        H --> L[Browser Memory]
        I --> M[Shared Memory for Browsers]
        J --> N[Config & Cache Storage]
    end

    subgraph "Monitoring & Scaling"
        O[Resource Monitor] --> P[CPU Usage %]
        O --> Q[Memory Usage %]
        O --> R[Request Queue Length]

        P --> S{CPU > 80%?}
        Q --> T{Memory > 90%?}
        R --> U{Queue > 100?}

        S -->|Yes| V[Scale Up]
        T -->|Yes| V
        U -->|Yes| V

        V --> W[Add Container Instance]
        W --> X[Update Load Balancer]
    end

    subgraph "Performance Optimization"
        Y[Browser Pool Tuning] --> Z[Max Pages: 40]
        Y --> AA[Idle TTL: 30min]
        Y --> BB[Concurrency Limits]

        Z --> CC[Memory Efficiency]
        AA --> DD[Resource Cleanup]
        BB --> EE[Throughput Control]
    end

    style A fill:#e1f5fe
    style F fill:#f3e5f5
    style O fill:#e8f5e8
    style Y fill:#fff3e0
```

**📖 Learn more:** [Docker Deployment Guide](https://docs.crawl4ai.com/core/docker-deployment/), [API Reference](https://docs.crawl4ai.com/api/), [MCP Integration](https://docs.crawl4ai.com/core/docker-deployment/#mcp-model-context-protocol-support), [Production Configuration](https://docs.crawl4ai.com/core/docker-deployment/#production-deployment)