crawl4ai/docs/md_v2/assets/llm.txt/diagrams/multi_urls_crawling.txt

## Multi-URL Crawling Workflows and Architecture

Visual representations of concurrent crawling patterns, resource management, and monitoring systems for handling multiple URLs efficiently.

### Multi-URL Processing Modes

```mermaid
flowchart TD
    A[Multi-URL Crawling Request] --> B{Processing Mode?}

    B -->|Batch Mode| C[Collect All URLs]
    B -->|Streaming Mode| D[Process URLs Individually]

    C --> C1[Queue All URLs]
    C1 --> C2[Execute Concurrently]
    C2 --> C3[Wait for All Completion]
    C3 --> C4[Return Complete Results Array]

    D --> D1[Queue URLs]
    D1 --> D2[Start First Batch]
    D2 --> D3[Yield Results as Available]
    D3 --> D4{More URLs?}
    D4 -->|Yes| D5[Start Next URLs]
    D4 -->|No| D6[Stream Complete]
    D5 --> D3

    C4 --> E[Process Results]
    D6 --> E

    E --> F[Success/Failure Analysis]
    F --> G[End]

    style C fill:#e3f2fd
    style D fill:#f3e5f5
    style C4 fill:#c8e6c9
    style D6 fill:#c8e6c9
```

### Memory-Adaptive Dispatcher Flow

```mermaid
stateDiagram-v2
    [*] --> Initializing

    Initializing --> MonitoringMemory: Start dispatcher

    MonitoringMemory --> CheckingMemory: Every check_interval
    CheckingMemory --> MemoryOK: Memory < threshold
    CheckingMemory --> MemoryHigh: Memory >= threshold

    MemoryOK --> DispatchingTasks: Start new crawls
    MemoryHigh --> WaitingForMemory: Pause dispatching

    DispatchingTasks --> TaskRunning: Launch crawler
    TaskRunning --> TaskCompleted: Crawl finished
    TaskRunning --> TaskFailed: Crawl error

    TaskCompleted --> MonitoringMemory: Update stats
    TaskFailed --> MonitoringMemory: Update stats

    WaitingForMemory --> CheckingMemory: Wait timeout
    WaitingForMemory --> MonitoringMemory: Memory freed

    note right of MemoryHigh: Prevents OOM crashes
    note right of DispatchingTasks: Respects max_session_permit
    note right of WaitingForMemory: Configurable timeout
```

### Concurrent Crawling Architecture

```mermaid
graph TB
    subgraph "URL Queue Management"
        A[URL Input List] --> B[URL Queue]
        B --> C[Priority Scheduler]
        C --> D[Batch Assignment]
    end

    subgraph "Dispatcher Layer"
        E[Memory Adaptive Dispatcher]
        F[Semaphore Dispatcher]
        G[Rate Limiter]
        H[Resource Monitor]

        E --> I[Memory Checker]
        F --> J[Concurrency Controller]
        G --> K[Delay Calculator]
        H --> L[System Stats]
    end

    subgraph "Crawler Pool"
        M[Crawler Instance 1]
        N[Crawler Instance 2]
        O[Crawler Instance 3]
        P[Crawler Instance N]

        M --> Q[Browser Session 1]
        N --> R[Browser Session 2]
        O --> S[Browser Session 3]
        P --> T[Browser Session N]
    end

    subgraph "Result Processing"
        U[Result Collector]
        V[Success Handler]
        W[Error Handler]
        X[Retry Queue]
        Y[Final Results]
    end

    D --> E
    D --> F
    E --> M
    F --> N
    G --> O
    H --> P

    Q --> U
    R --> U
    S --> U
    T --> U

    U --> V
    U --> W
    W --> X
    X --> B
    V --> Y

    style E fill:#e3f2fd
    style F fill:#f3e5f5
    style G fill:#e8f5e8
    style H fill:#fff3e0
```

### Rate Limiting and Backoff Strategy

```mermaid
sequenceDiagram
    participant C as Crawler
    participant RL as Rate Limiter
    participant S as Server
    participant D as Dispatcher

    C->>RL: Request to crawl URL
    RL->>RL: Calculate delay
    RL->>RL: Apply base delay (1-3s)
    RL->>C: Delay applied

    C->>S: HTTP Request

    alt Success Response
        S-->>C: 200 OK + Content
        C->>RL: Report success
        RL->>RL: Reset failure count
        C->>D: Return successful result
    else Rate Limited
        S-->>C: 429 Too Many Requests
        C->>RL: Report rate limit
        RL->>RL: Exponential backoff
        RL->>RL: Increase delay (up to max_delay)
        RL->>C: Apply longer delay
        C->>S: Retry request after delay
    else Server Error
        S-->>C: 503 Service Unavailable
        C->>RL: Report server error
        RL->>RL: Moderate backoff
        RL->>C: Retry with backoff
    else Max Retries Exceeded
        RL->>C: Stop retrying
        C->>D: Return failed result
    end
```

### Large-Scale Crawling Workflow

```mermaid
flowchart TD
    A[Load URL List 10k+ URLs] --> B[Initialize Dispatcher]

    B --> C{Select Dispatcher Type}
    C -->|Memory Constrained| D[Memory Adaptive]
    C -->|Fixed Resources| E[Semaphore Based]

    D --> F[Set Memory Threshold 70%]
    E --> G[Set Concurrency Limit]

    F --> H[Configure Monitoring]
    G --> H

    H --> I[Start Crawling Process]
    I --> J[Monitor System Resources]

    J --> K{Memory Usage?}
    K -->|< Threshold| L[Continue Dispatching]
    K -->|>= Threshold| M[Pause New Tasks]

    L --> N[Process Results Stream]
    M --> O[Wait for Memory]
    O --> K

    N --> P{Result Type?}
    P -->|Success| Q[Save to Database]
    P -->|Failure| R[Log Error]

    Q --> S[Update Progress Counter]
    R --> S

    S --> T{More URLs?}
    T -->|Yes| U[Get Next Batch]
    T -->|No| V[Generate Final Report]

    U --> L
    V --> W[Analysis Complete]

    style A fill:#e1f5fe
    style D fill:#e8f5e8
    style E fill:#f3e5f5
    style V fill:#c8e6c9
    style W fill:#a5d6a7
```

### Real-Time Monitoring Dashboard Flow

```mermaid
graph LR
    subgraph "Data Collection"
        A[Crawler Tasks] --> B[Performance Metrics]
        A --> C[Memory Usage]
        A --> D[Success/Failure Rates]
        A --> E[Response Times]
    end

    subgraph "Monitor Processing"
        F[CrawlerMonitor] --> G[Aggregate Statistics]
        F --> H[Display Formatter]
        F --> I[Update Scheduler]
    end

    subgraph "Display Modes"
        J[DETAILED Mode]
        K[AGGREGATED Mode]

        J --> L[Individual Task Status]
        J --> M[Task-Level Metrics]
        K --> N[Summary Statistics]
        K --> O[Overall Progress]
    end

    subgraph "Output Interface"
        P[Console Display]
        Q[Progress Bars]
        R[Status Tables]
        S[Real-time Updates]
    end

    B --> F
    C --> F
    D --> F
    E --> F

    G --> J
    G --> K
    H --> J
    H --> K
    I --> J
    I --> K

    L --> P
    M --> Q
    N --> R
    O --> S

    style F fill:#e3f2fd
    style J fill:#f3e5f5
    style K fill:#e8f5e8
```

### Error Handling and Recovery Pattern

```mermaid
stateDiagram-v2
    [*] --> ProcessingURL

    ProcessingURL --> CrawlAttempt: Start crawl

    CrawlAttempt --> Success: HTTP 200
    CrawlAttempt --> NetworkError: Connection failed
    CrawlAttempt --> RateLimit: HTTP 429
    CrawlAttempt --> ServerError: HTTP 5xx
    CrawlAttempt --> Timeout: Request timeout

    Success --> [*]: Return result

    NetworkError --> RetryCheck: Check retry count
    RateLimit --> BackoffWait: Apply exponential backoff
    ServerError --> RetryCheck: Check retry count
    Timeout --> RetryCheck: Check retry count

    BackoffWait --> RetryCheck: After delay

    RetryCheck --> CrawlAttempt: retries < max_retries
    RetryCheck --> Failed: retries >= max_retries

    Failed --> ErrorLog: Log failure details
    ErrorLog --> [*]: Return failed result

    note right of BackoffWait: Exponential backoff for rate limits
    note right of RetryCheck: Configurable max_retries
    note right of ErrorLog: Detailed error tracking
```

### Resource Management Timeline

```mermaid
gantt
    title Multi-URL Crawling Resource Management
    dateFormat X
    axisFormat %s

    section Memory Usage
    Initialize Dispatcher    :0, 1
    Memory Monitoring       :1, 10
    Peak Usage Period       :3, 7
    Memory Cleanup          :7, 9

    section Task Execution
    URL Queue Setup         :0, 2
    Batch 1 Processing      :2, 5
    Batch 2 Processing      :4, 7
    Batch 3 Processing      :6, 9
    Final Results           :9, 10

    section Rate Limiting
    Normal Delays           :2, 4
    Backoff Period          :4, 6
    Recovery Period         :6, 8

    section Monitoring
    System Health Check     :0, 10
    Progress Updates        :1, 9
    Performance Metrics     :2, 8
```

### Concurrent Processing Performance Matrix

```mermaid
graph TD
    subgraph "Input Factors"
        A[Number of URLs]
        B[Concurrency Level]
        C[Memory Threshold]
        D[Rate Limiting]
    end

    subgraph "Processing Characteristics"
        A --> E[Low 1-100 URLs]
        A --> F[Medium 100-1k URLs]
        A --> G[High 1k-10k URLs]
        A --> H[Very High 10k+ URLs]

        B --> I[Conservative 1-5]
        B --> J[Moderate 5-15]
        B --> K[Aggressive 15-30]

        C --> L[Strict 60-70%]
        C --> M[Balanced 70-80%]
        C --> N[Relaxed 80-90%]
    end

    subgraph "Recommended Configurations"
        E --> O[Simple Semaphore]
        F --> P[Memory Adaptive Basic]
        G --> Q[Memory Adaptive Advanced]
        H --> R[Memory Adaptive + Monitoring]

        I --> O
        J --> P
        K --> Q
        K --> R

        L --> Q
        M --> P
        N --> O
    end

    style O fill:#c8e6c9
    style P fill:#fff3e0
    style Q fill:#ffecb3
    style R fill:#ffcdd2
```

**📖 Learn more:** [Multi-URL Crawling Guide](https://docs.crawl4ai.com/advanced/multi-url-crawling/), [Dispatcher Configuration](https://docs.crawl4ai.com/advanced/crawl-dispatcher/), [Performance Optimization](https://docs.crawl4ai.com/advanced/multi-url-crawling/#performance-optimization)