## Multi-URL Crawling Workflows and Architecture

Visual representations of concurrent crawling patterns, resource management, and monitoring systems for handling multiple URLs efficiently.

### Multi-URL Processing Modes

```mermaid
flowchart TD
    A[Multi-URL Crawling Request] --> B{Processing Mode?}
    
    B -->|Batch Mode| C[Collect All URLs]
    B -->|Streaming Mode| D[Process URLs Individually]
    
    C --> C1[Queue All URLs]
    C1 --> C2[Execute Concurrently]
    C2 --> C3[Wait for All Completion]
    C3 --> C4[Return Complete Results Array]
    
    D --> D1[Queue URLs]
    D1 --> D2[Start First Batch]
    D2 --> D3[Yield Results as Available]
    D3 --> D4{More URLs?}
    D4 -->|Yes| D5[Start Next URLs]
    D4 -->|No| D6[Stream Complete]
    D5 --> D3
    
    C4 --> E[Process Results]
    D6 --> E
    
    E --> F[Success/Failure Analysis]
    F --> G[End]
    
    style C fill:#e3f2fd
    style D fill:#f3e5f5
    style C4 fill:#c8e6c9
    style D6 fill:#c8e6c9
```

### Memory-Adaptive Dispatcher Flow

```mermaid
stateDiagram-v2
    [*] --> Initializing
    
    Initializing --> MonitoringMemory: Start dispatcher
    
    MonitoringMemory --> CheckingMemory: Every check_interval
    CheckingMemory --> MemoryOK: Memory < threshold
    CheckingMemory --> MemoryHigh: Memory >= threshold
    
    MemoryOK --> DispatchingTasks: Start new crawls
    MemoryHigh --> WaitingForMemory: Pause dispatching
    
    DispatchingTasks --> TaskRunning: Launch crawler
    TaskRunning --> TaskCompleted: Crawl finished
    TaskRunning --> TaskFailed: Crawl error
    
    TaskCompleted --> MonitoringMemory: Update stats
    TaskFailed --> MonitoringMemory: Update stats
    
    WaitingForMemory --> CheckingMemory: Wait timeout
    WaitingForMemory --> MonitoringMemory: Memory freed
    
    note right of MemoryHigh: Prevents OOM crashes
    note right of DispatchingTasks: Respects max_session_permit
    note right of WaitingForMemory: Configurable timeout
```

### Concurrent Crawling Architecture

```mermaid
graph TB
    subgraph "URL Queue Management"
        A[URL Input List] --> B[URL Queue]
        B --> C[Priority Scheduler]
        C --> D[Batch Assignment]
    end
    
    subgraph "Dispatcher Layer"
        E[Memory Adaptive Dispatcher]
        F[Semaphore Dispatcher]
        G[Rate Limiter]
        H[Resource Monitor]
        
        E --> I[Memory Checker]
        F --> J[Concurrency Controller]
        G --> K[Delay Calculator]
        H --> L[System Stats]
    end
    
    subgraph "Crawler Pool"
        M[Crawler Instance 1]
        N[Crawler Instance 2]
        O[Crawler Instance 3]
        P[Crawler Instance N]
        
        M --> Q[Browser Session 1]
        N --> R[Browser Session 2]
        O --> S[Browser Session 3]
        P --> T[Browser Session N]
    end
    
    subgraph "Result Processing"
        U[Result Collector]
        V[Success Handler]
        W[Error Handler]
        X[Retry Queue]
        Y[Final Results]
    end
    
    D --> E
    D --> F
    E --> M
    F --> N
    G --> O
    H --> P
    
    Q --> U
    R --> U
    S --> U
    T --> U
    
    U --> V
    U --> W
    W --> X
    X --> B
    V --> Y
    
    style E fill:#e3f2fd
    style F fill:#f3e5f5
    style G fill:#e8f5e8
    style H fill:#fff3e0
```

### Rate Limiting and Backoff Strategy

```mermaid
sequenceDiagram
    participant C as Crawler
    participant RL as Rate Limiter
    participant S as Server
    participant D as Dispatcher
    
    C->>RL: Request to crawl URL
    RL->>RL: Calculate delay
    RL->>RL: Apply base delay (1-3s)
    RL->>C: Delay applied
    
    C->>S: HTTP Request
    
    alt Success Response
        S-->>C: 200 OK + Content
        C->>RL: Report success
        RL->>RL: Reset failure count
        C->>D: Return successful result
    else Rate Limited
        S-->>C: 429 Too Many Requests
        C->>RL: Report rate limit
        RL->>RL: Exponential backoff
        RL->>RL: Increase delay (up to max_delay)
        RL->>C: Apply longer delay
        C->>S: Retry request after delay
    else Server Error
        S-->>C: 503 Service Unavailable
        C->>RL: Report server error
        RL->>RL: Moderate backoff
        RL->>C: Retry with backoff
    else Max Retries Exceeded
        RL->>C: Stop retrying
        C->>D: Return failed result
    end
```

### Large-Scale Crawling Workflow

```mermaid
flowchart TD
    A[Load URL List 10k+ URLs] --> B[Initialize Dispatcher]
    
    B --> C{Select Dispatcher Type}
    C -->|Memory Constrained| D[Memory Adaptive]
    C -->|Fixed Resources| E[Semaphore Based]
    
    D --> F[Set Memory Threshold 70%]
    E --> G[Set Concurrency Limit]
    
    F --> H[Configure Monitoring]
    G --> H
    
    H --> I[Start Crawling Process]
    I --> J[Monitor System Resources]
    
    J --> K{Memory Usage?}
    K -->|< Threshold| L[Continue Dispatching]
    K -->|>= Threshold| M[Pause New Tasks]
    
    L --> N[Process Results Stream]
    M --> O[Wait for Memory]
    O --> K
    
    N --> P{Result Type?}
    P -->|Success| Q[Save to Database]
    P -->|Failure| R[Log Error]
    
    Q --> S[Update Progress Counter]
    R --> S
    
    S --> T{More URLs?}
    T -->|Yes| U[Get Next Batch]
    T -->|No| V[Generate Final Report]
    
    U --> L
    V --> W[Analysis Complete]
    
    style A fill:#e1f5fe
    style D fill:#e8f5e8
    style E fill:#f3e5f5
    style V fill:#c8e6c9
    style W fill:#a5d6a7
```

### Real-Time Monitoring Dashboard Flow

```mermaid
graph LR
    subgraph "Data Collection"
        A[Crawler Tasks] --> B[Performance Metrics]
        A --> C[Memory Usage]
        A --> D[Success/Failure Rates]
        A --> E[Response Times]
    end
    
    subgraph "Monitor Processing"
        F[CrawlerMonitor] --> G[Aggregate Statistics]
        F --> H[Display Formatter]
        F --> I[Update Scheduler]
    end
    
    subgraph "Display Modes"
        J[DETAILED Mode]
        K[AGGREGATED Mode]
        
        J --> L[Individual Task Status]
        J --> M[Task-Level Metrics]
        K --> N[Summary Statistics]
        K --> O[Overall Progress]
    end
    
    subgraph "Output Interface"
        P[Console Display]
        Q[Progress Bars]
        R[Status Tables]
        S[Real-time Updates]
    end
    
    B --> F
    C --> F
    D --> F
    E --> F
    
    G --> J
    G --> K
    H --> J
    H --> K
    I --> J
    I --> K
    
    L --> P
    M --> Q
    N --> R
    O --> S
    
    style F fill:#e3f2fd
    style J fill:#f3e5f5
    style K fill:#e8f5e8
```

### Error Handling and Recovery Pattern

```mermaid
stateDiagram-v2
    [*] --> ProcessingURL
    
    ProcessingURL --> CrawlAttempt: Start crawl
    
    CrawlAttempt --> Success: HTTP 200
    CrawlAttempt --> NetworkError: Connection failed
    CrawlAttempt --> RateLimit: HTTP 429
    CrawlAttempt --> ServerError: HTTP 5xx
    CrawlAttempt --> Timeout: Request timeout
    
    Success --> [*]: Return result
    
    NetworkError --> RetryCheck: Check retry count
    RateLimit --> BackoffWait: Apply exponential backoff
    ServerError --> RetryCheck: Check retry count
    Timeout --> RetryCheck: Check retry count
    
    BackoffWait --> RetryCheck: After delay
    
    RetryCheck --> CrawlAttempt: retries < max_retries
    RetryCheck --> Failed: retries >= max_retries
    
    Failed --> ErrorLog: Log failure details
    ErrorLog --> [*]: Return failed result
    
    note right of BackoffWait: Exponential backoff for rate limits
    note right of RetryCheck: Configurable max_retries
    note right of ErrorLog: Detailed error tracking
```

### Resource Management Timeline

```mermaid
gantt
    title Multi-URL Crawling Resource Management
    dateFormat X
    axisFormat %s
    
    section Memory Usage
    Initialize Dispatcher    :0, 1
    Memory Monitoring       :1, 10
    Peak Usage Period       :3, 7
    Memory Cleanup          :7, 9
    
    section Task Execution
    URL Queue Setup         :0, 2
    Batch 1 Processing      :2, 5
    Batch 2 Processing      :4, 7
    Batch 3 Processing      :6, 9
    Final Results           :9, 10
    
    section Rate Limiting
    Normal Delays           :2, 4
    Backoff Period          :4, 6
    Recovery Period         :6, 8
    
    section Monitoring
    System Health Check     :0, 10
    Progress Updates        :1, 9
    Performance Metrics     :2, 8
```

### Concurrent Processing Performance Matrix

```mermaid
graph TD
    subgraph "Input Factors"
        A[Number of URLs]
        B[Concurrency Level]
        C[Memory Threshold]
        D[Rate Limiting]
    end
    
    subgraph "Processing Characteristics"
        A --> E[Low 1-100 URLs]
        A --> F[Medium 100-1k URLs]
        A --> G[High 1k-10k URLs]
        A --> H[Very High 10k+ URLs]
        
        B --> I[Conservative 1-5]
        B --> J[Moderate 5-15]
        B --> K[Aggressive 15-30]
        
        C --> L[Strict 60-70%]
        C --> M[Balanced 70-80%]
        C --> N[Relaxed 80-90%]
    end
    
    subgraph "Recommended Configurations"
        E --> O[Simple Semaphore]
        F --> P[Memory Adaptive Basic]
        G --> Q[Memory Adaptive Advanced]
        H --> R[Memory Adaptive + Monitoring]
        
        I --> O
        J --> P
        K --> Q
        K --> R
        
        L --> Q
        M --> P
        N --> O
    end
    
    style O fill:#c8e6c9
    style P fill:#fff3e0
    style Q fill:#ffecb3
    style R fill:#ffcdd2
```

**📖 Learn more:** [Multi-URL Crawling Guide](https://docs.crawl4ai.com/advanced/multi-url-crawling/), [Dispatcher Configuration](https://docs.crawl4ai.com/advanced/crawl-dispatcher/), [Performance Optimization](https://docs.crawl4ai.com/advanced/multi-url-crawling/#performance-optimization)