Add comprehensive Docker cluster orchestration with horizontal scaling support. CLI Commands: - crwl server start/stop/restart/status/scale/logs - Auto-detection: Single (N=1) → Swarm (N>1) → Compose (N>1 fallback) - Support for 1-100 container replicas with zero-downtime scaling Infrastructure: - Nginx load balancing (round-robin API, sticky sessions monitoring) - Redis-based container discovery via heartbeats (30s interval) - Real-time monitoring dashboard with cluster-wide visibility - WebSocket aggregation from all containers Security & Stability Fixes (12 critical issues): - Add timeout protection to browser pool locks (prevent deadlocks) - Implement Redis retry logic with exponential backoff - Add container ID validation (prevent Redis key injection) - Add CLI input sanitization (prevent shell injection) - Add file locking for state management (prevent corruption) - Fix WebSocket resource leaks and connection cleanup - Add graceful degradation and circuit breakers Configuration: - RedisTTLConfig dataclass with environment variable support - Template-based docker-compose.yml and nginx.conf generation - Comprehensive error handling with actionable messages Documentation: - AGENT.md: Complete DevOps context for AI assistants - MULTI_CONTAINER_ARCHITECTURE.md: Technical architecture guide - Reorganized docs into deploy/docker/docs/
1061 lines
25 KiB
Markdown
1061 lines
25 KiB
Markdown
# Multi-Container Architecture - Technical Documentation
|
||
|
||
## Table of Contents
|
||
|
||
1. [Overview](#overview)
|
||
2. [Architecture Diagram](#architecture-diagram)
|
||
3. [Components](#components)
|
||
4. [Data Flow](#data-flow)
|
||
5. [Redis Aggregation Strategy](#redis-aggregation-strategy)
|
||
6. [Container Discovery](#container-discovery)
|
||
7. [Load Balancing & Routing](#load-balancing--routing)
|
||
8. [Monitoring Dashboard](#monitoring-dashboard)
|
||
9. [CLI Commands](#cli-commands)
|
||
10. [Configuration](#configuration)
|
||
11. [Deployment Modes](#deployment-modes)
|
||
12. [Troubleshooting](#troubleshooting)
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
Crawl4AI's multi-container deployment architecture enables horizontal scaling with intelligent load balancing, centralized monitoring, and real-time data aggregation using Redis as the coordination layer.
|
||
|
||
### Key Features
|
||
|
||
- **Horizontal Scaling**: Deploy 1 to N containers
|
||
- **Load Balancing**: Nginx with round-robin for API, sticky sessions for monitoring
|
||
- **Centralized Monitoring**: Redis-backed data aggregation across all containers
|
||
- **Real-time Dashboard**: WebSocket-powered monitoring with per-container filtering
|
||
- **Zero-downtime Scaling**: Add/remove containers without service interruption
|
||
- **Container Discovery**: Automatic heartbeat-based registration
|
||
|
||
---
|
||
|
||
## Architecture Diagram
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ Client Requests │
|
||
└─────────────────────────┬───────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌───────────────┐
|
||
│ Nginx │ Port 11235
|
||
│ Load Balancer │
|
||
└───────┬───────┘
|
||
│
|
||
┌─────────────────┼─────────────────┐
|
||
│ │ │
|
||
▼ ▼ ▼
|
||
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
||
│ Crawl4AI-1 │ │ Crawl4AI-2 │ │ Crawl4AI-3 │
|
||
│ Container │ │ Container │ │ Container │
|
||
│ │ │ │ │ │
|
||
│ ┌──────────┐ │ │ ┌──────────┐ │ │ ┌──────────┐ │
|
||
│ │ Monitor │ │ │ │ Monitor │ │ │ │ Monitor │ │
|
||
│ │ Stats │ │ │ │ Stats │ │ │ │ Stats │ │
|
||
│ └────┬─────┘ │ │ └────┬─────┘ │ │ └────┬─────┘ │
|
||
│ │ │ │ │ │ │ │ │
|
||
│ │ Write │ │ │ Write │ │ │ Write │
|
||
│ ▼ │ │ ▼ │ │ ▼ │
|
||
└──────┼───────┘ └──────┼───────┘ └──────┼───────┘
|
||
│ │ │
|
||
└─────────────────┼─────────────────┘
|
||
▼
|
||
┌─────────────┐
|
||
│ Redis │
|
||
│ Datastore │
|
||
└─────────────┘
|
||
│
|
||
│ Aggregate Read
|
||
▼
|
||
┌─────────────┐
|
||
│ Dashboard │
|
||
│ /monitor │
|
||
└─────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## Components
|
||
|
||
### 1. Nginx Load Balancer
|
||
|
||
**Purpose**: Entry point for all requests, distributes load across containers
|
||
|
||
**Configuration**: `crawl4ai/templates/nginx.conf.template`
|
||
|
||
**Upstreams**:
|
||
|
||
```nginx
|
||
# Backend API (round-robin load balancing)
|
||
upstream crawl4ai_backend {
|
||
server crawl4ai:11235;
|
||
}
|
||
|
||
# Monitor/Dashboard (sticky sessions using ip_hash)
|
||
upstream crawl4ai_monitor {
|
||
ip_hash; # Same client always goes to same container
|
||
server crawl4ai:11235;
|
||
}
|
||
```
|
||
|
||
**Routing Rules**:
|
||
|
||
- `/crawl`, `/health`, `/batch` → `crawl4ai_backend` (round-robin)
|
||
- `/monitor/*`, `/dashboard` → `crawl4ai_monitor` (sticky sessions)
|
||
- `/monitor/ws` → WebSocket proxy with upgrade headers
|
||
|
||
**Port Mapping**:
|
||
- Host: `11235` → Nginx: `80` → Containers: `11235`
|
||
|
||
---
|
||
|
||
### 2. Crawl4AI Containers
|
||
|
||
**Base Image**: `unclecode/crawl4ai:latest`
|
||
|
||
**Scaling**: Configured via Docker Compose `deploy.replicas` or `--scale` flag
|
||
|
||
**Environment Variables**:
|
||
```bash
|
||
REDIS_HOST=redis
|
||
REDIS_PORT=6379
|
||
OPENAI_API_KEY=${OPENAI_API_KEY}
|
||
# ... other LLM provider keys
|
||
```
|
||
|
||
**Internal Services**:
|
||
- **API Server**: FastAPI/Gunicorn on port 11235
|
||
- **Monitor Stats**: Background worker tracking metrics
|
||
- **Heartbeat Worker**: Registers container in Redis every 30s
|
||
- **Browser Pool**: Permanent/Hot/Cold browser management
|
||
|
||
**Container ID**: Extracted from `/proc/self/cgroup` or hostname
|
||
|
||
---
|
||
|
||
### 3. Redis Datastore
|
||
|
||
**Purpose**: Centralized coordination and data aggregation
|
||
|
||
**Image**: `redis:alpine`
|
||
|
||
**Persistence**: `appendonly yes` with volume mount
|
||
|
||
**Data Structure**:
|
||
|
||
```
|
||
# Container Discovery
|
||
monitor:active_containers # SET of container IDs
|
||
monitor:heartbeat:{container_id} # JSON heartbeat data (60s TTL)
|
||
|
||
# Per-Container Data
|
||
monitor:{container_id}:active_requests # JSON list (5min TTL)
|
||
monitor:{container_id}:completed # JSON list (1h TTL)
|
||
monitor:{container_id}:janitor # JSON list (1h TTL)
|
||
monitor:{container_id}:errors # JSON list (1h TTL)
|
||
|
||
# Shared Aggregate Data
|
||
monitor:endpoint_stats # JSON aggregate stats (24h TTL)
|
||
```
|
||
|
||
**Volume**: `redis_data:/data` for persistence
|
||
|
||
---
|
||
|
||
## Data Flow
|
||
|
||
### Request Lifecycle
|
||
|
||
```
|
||
1. Client → Nginx (port 11235)
|
||
2. Nginx → Crawl4AI Container (round-robin)
|
||
3. Container:
|
||
a. Track request start → monitor.track_request_start()
|
||
b. Persist to Redis: monitor:{container_id}:active_requests
|
||
c. Process crawl request
|
||
d. Track request end → monitor.track_request_end()
|
||
e. Persist to Redis: monitor:{container_id}:completed
|
||
4. Response → Client
|
||
```
|
||
|
||
### Monitoring Data Flow
|
||
|
||
```
|
||
1. All Containers:
|
||
- Write stats to Redis with container_id prefix
|
||
- Send heartbeat every 30s
|
||
- Track: requests, browsers, errors, janitor events
|
||
|
||
2. Redis:
|
||
- Stores per-container data
|
||
- TTL-based expiration
|
||
- Active container set maintained
|
||
|
||
3. Monitor API (/monitor/*):
|
||
- Reads from Redis
|
||
- Aggregates data from ALL containers
|
||
- Sorts by timestamp
|
||
- Returns unified view
|
||
|
||
4. Dashboard:
|
||
- Fetches aggregated data
|
||
- Maps container IDs to labels (C-1, C-2, C-3)
|
||
- Client-side filtering
|
||
- WebSocket for real-time updates
|
||
```
|
||
|
||
---
|
||
|
||
## Redis Aggregation Strategy
|
||
|
||
### Why Redis?
|
||
|
||
1. **No Direct Communication**: Containers don't need to discover/talk to each other
|
||
2. **Decoupled**: Adding/removing containers doesn't affect others
|
||
3. **Atomic Operations**: Redis handles concurrent writes
|
||
4. **TTL Support**: Automatic cleanup of stale data
|
||
5. **Fast Reads**: In-memory aggregation queries
|
||
|
||
### Write Strategy
|
||
|
||
**Container-Side** (`monitor.py`):
|
||
|
||
```python
|
||
# Each container writes its own data
|
||
await redis.set(
|
||
f"monitor:{self.container_id}:completed",
|
||
json.dumps(list(self.completed_requests)),
|
||
ex=3600 # 1 hour TTL
|
||
)
|
||
|
||
# Add to active containers set
|
||
await redis.sadd("monitor:active_containers", self.container_id)
|
||
|
||
# Heartbeat with metadata
|
||
await redis.setex(
|
||
f"monitor:heartbeat:{self.container_id}",
|
||
60, # 60s TTL
|
||
json.dumps({"id": self.container_id, "hostname": hostname})
|
||
)
|
||
```
|
||
|
||
### Read Strategy
|
||
|
||
**API-Side** (`monitor_routes.py`):
|
||
|
||
```python
|
||
async def _aggregate_completed_requests(limit=100):
|
||
# 1. Get all active containers
|
||
container_ids = await redis.smembers("monitor:active_containers")
|
||
|
||
# 2. Fetch from each container
|
||
all_requests = []
|
||
for container_id in container_ids:
|
||
data = await redis.get(f"monitor:{container_id}:completed")
|
||
if data:
|
||
all_requests.extend(json.loads(data))
|
||
|
||
# 3. Sort and limit
|
||
all_requests.sort(key=lambda x: x.get("end_time", 0), reverse=True)
|
||
return all_requests[:limit]
|
||
```
|
||
|
||
---
|
||
|
||
## Container Discovery
|
||
|
||
### Heartbeat Mechanism
|
||
|
||
**Frequency**: Every 30 seconds
|
||
|
||
**Worker**: `monitor.py` - `_heartbeat_worker()`
|
||
|
||
**Data Sent**:
|
||
```json
|
||
{
|
||
"id": "b790d0b6c9d4",
|
||
"hostname": "b790d0b6c9d4",
|
||
"last_seen": 1760785944.18,
|
||
"mode": "compose"
|
||
}
|
||
```
|
||
|
||
**TTL**: 60 seconds (2x heartbeat interval for fault tolerance)
|
||
|
||
**Discovery API**: `/monitor/containers`
|
||
|
||
```python
|
||
async def get_containers():
|
||
# Read from Redis heartbeats
|
||
container_ids = await redis.smembers("monitor:active_containers")
|
||
|
||
containers = []
|
||
for cid in container_ids:
|
||
heartbeat = await redis.get(f"monitor:heartbeat:{cid}")
|
||
if heartbeat:
|
||
info = json.loads(heartbeat)
|
||
containers.append({
|
||
"id": info["id"],
|
||
"hostname": info["hostname"],
|
||
"healthy": True # If heartbeat exists, container is alive
|
||
})
|
||
|
||
return {"containers": containers, "count": len(containers)}
|
||
```
|
||
|
||
### Container Failure Handling
|
||
|
||
1. Container stops → Heartbeat stops
|
||
2. After 60s → Redis TTL expires → Key deleted
|
||
3. Next `/monitor/containers` call → Container no longer in list
|
||
4. Dashboard auto-updates → Shows only healthy containers
|
||
|
||
---
|
||
|
||
## Load Balancing & Routing
|
||
|
||
### API Endpoints (Round-Robin)
|
||
|
||
**Nginx Config**:
|
||
```nginx
|
||
location / {
|
||
proxy_pass http://crawl4ai_backend; # No ip_hash
|
||
}
|
||
```
|
||
|
||
**Behavior**:
|
||
- Sequential distribution: Req1→C1, Req2→C2, Req3→C3, Req4→C1...
|
||
- Maximizes throughput
|
||
- Balanced load across containers
|
||
|
||
**Use Cases**:
|
||
- `/crawl` - Crawl requests
|
||
- `/batch` - Batch operations
|
||
- `/health` - Health checks
|
||
|
||
---
|
||
|
||
### Monitor/Dashboard (Sticky Sessions)
|
||
|
||
**Nginx Config**:
|
||
```nginx
|
||
upstream crawl4ai_monitor {
|
||
ip_hash; # Client IP-based routing
|
||
server crawl4ai:11235;
|
||
}
|
||
|
||
location ~ ^/(monitor|dashboard) {
|
||
proxy_pass http://crawl4ai_monitor;
|
||
}
|
||
```
|
||
|
||
**Behavior**:
|
||
- Client IP hashed → Always same container for same client
|
||
- Dashboard consistency
|
||
- WebSocket connection persistence
|
||
|
||
**Why Sticky Sessions?**:
|
||
- WebSocket requires persistent connection
|
||
- Dashboard state consistency
|
||
- Simpler debugging (same container per user)
|
||
|
||
---
|
||
|
||
### WebSocket Routing
|
||
|
||
**Nginx Config**:
|
||
```nginx
|
||
location = /monitor/ws {
|
||
proxy_pass http://crawl4ai_monitor;
|
||
proxy_http_version 1.1;
|
||
proxy_set_header Upgrade $http_upgrade;
|
||
proxy_set_header Connection "upgrade";
|
||
proxy_connect_timeout 7d;
|
||
proxy_send_timeout 7d;
|
||
proxy_read_timeout 7d;
|
||
}
|
||
```
|
||
|
||
**Key Features**:
|
||
- **Exact match** (`location =`) - Highest priority
|
||
- **Upgrade headers** - HTTP → WebSocket protocol switch
|
||
- **Long timeouts** - 7 days for persistent connections
|
||
- **Sticky upstream** - Uses `crawl4ai_monitor` with `ip_hash`
|
||
|
||
---
|
||
|
||
## Monitoring Dashboard
|
||
|
||
### Architecture
|
||
|
||
**Frontend**: Single-page HTML/CSS/JavaScript
|
||
- **Path**: `/app/static/monitor/index.html`
|
||
- **URL**: `http://localhost:11235/dashboard/`
|
||
|
||
**Backend**:
|
||
- REST API: `/monitor/*` endpoints
|
||
- WebSocket: `/monitor/ws` for real-time updates
|
||
|
||
### Data Sources
|
||
|
||
**API Endpoints**:
|
||
|
||
```
|
||
GET /monitor/containers # Container discovery
|
||
GET /monitor/requests # All requests (aggregated)
|
||
GET /monitor/browsers # All browsers (aggregated)
|
||
GET /monitor/logs/janitor # Janitor events (aggregated)
|
||
GET /monitor/logs/errors # Errors (aggregated)
|
||
GET /monitor/health # System health
|
||
GET /monitor/endpoints/stats # Endpoint analytics
|
||
GET /monitor/timeline # Metrics timeline
|
||
WS /monitor/ws # Real-time updates
|
||
```
|
||
|
||
**Aggregation**:
|
||
- API reads from **all containers** via Redis
|
||
- Sorts by timestamp across containers
|
||
- Returns unified dataset with `container_id` on each item
|
||
|
||
### Container Filtering
|
||
|
||
**UI Components**:
|
||
|
||
1. **Infrastructure Card**:
|
||
```
|
||
[All] [C-1] [C-2] [C-3]
|
||
```
|
||
|
||
2. **Container Mapping**:
|
||
```javascript
|
||
containerMapping = {
|
||
"b790d0b6c9d4": "C-1", // container_id → label
|
||
"f899b55bd5f5": "C-2",
|
||
"076a35479dd9": "C-3"
|
||
}
|
||
```
|
||
|
||
3. **Filter Logic**:
|
||
```javascript
|
||
// Filter active requests
|
||
const filteredActive = currentContainerFilter === 'all'
|
||
? requests.active
|
||
: requests.active.filter(r => r.container_id === currentContainerFilter);
|
||
```
|
||
|
||
**All Data Shows Container Labels**:
|
||
- Requests: `C-1 req_abc123 /crawl ...`
|
||
- Browsers: `Type: permanent, Container: C-1`
|
||
- Janitor: `C-1 19:27:42 close_hot ...`
|
||
- Errors: `C-2 Error: ...`
|
||
|
||
### Real-Time Updates (WebSocket)
|
||
|
||
**Connection**:
|
||
```javascript
|
||
const wsUrl = `${protocol}//${window.location.host}/monitor/ws`;
|
||
ws = new WebSocket(wsUrl);
|
||
```
|
||
|
||
**Update Frequency**: Every 2 seconds
|
||
|
||
**Data Payload**:
|
||
```json
|
||
{
|
||
"timestamp": 1760785944.18,
|
||
"container_id": "b790d0b6c9d4",
|
||
"health": { ... },
|
||
"requests": {
|
||
"active": [ ... ],
|
||
"completed": [ ... ]
|
||
},
|
||
"browsers": [ ... ],
|
||
"timeline": { ... },
|
||
"janitor": [ ... ],
|
||
"errors": [ ... ]
|
||
}
|
||
```
|
||
|
||
**Note**: WebSocket currently sends from **one container** (sticky session), but all API calls aggregate from Redis.
|
||
|
||
---
|
||
|
||
## CLI Commands
|
||
|
||
### Start Multi-Container Deployment
|
||
|
||
```bash
|
||
# Default: 3 replicas
|
||
docker compose up -d
|
||
|
||
# Custom scale
|
||
docker compose up -d --scale crawl4ai=5
|
||
|
||
# With build
|
||
docker compose up -d --build --scale crawl4ai=3
|
||
```
|
||
|
||
### Scale Running Deployment
|
||
|
||
```bash
|
||
# Scale up
|
||
docker compose up -d --scale crawl4ai=5 --no-recreate
|
||
|
||
# Scale down
|
||
docker compose up -d --scale crawl4ai=2 --no-recreate
|
||
```
|
||
|
||
### View Container Status
|
||
|
||
```bash
|
||
# List all containers
|
||
docker compose ps
|
||
|
||
# Check health
|
||
docker ps --format "table {{.Names}}\t{{.Status}}"
|
||
|
||
# View specific container logs
|
||
docker logs fix-docker-crawl4ai-1 -f
|
||
|
||
# View nginx logs
|
||
docker logs fix-docker-nginx-1 -f
|
||
```
|
||
|
||
### Redis Inspection
|
||
|
||
```bash
|
||
# Enter Redis CLI
|
||
docker exec -it fix-docker-redis-1 redis-cli
|
||
|
||
# Inside Redis CLI:
|
||
KEYS monitor:* # List all monitor keys
|
||
SMEMBERS monitor:active_containers # Show active containers
|
||
GET monitor:b790d0b6c9d4:completed # Get completed requests
|
||
TTL monitor:heartbeat:b790d0b6c9d4 # Check heartbeat TTL
|
||
```
|
||
|
||
### Debugging
|
||
|
||
```bash
|
||
# Check container IDs
|
||
docker ps --filter "name=crawl4ai" --format "{{.ID}} {{.Names}}"
|
||
|
||
# Inspect Redis data
|
||
docker exec fix-docker-redis-1 redis-cli KEYS "monitor:*:completed"
|
||
|
||
# Test API directly
|
||
curl http://localhost:11235/monitor/containers | jq
|
||
|
||
# Test WebSocket (requires websocat or wscat)
|
||
websocat ws://localhost:11235/monitor/ws
|
||
|
||
# View nginx upstream routing
|
||
docker exec fix-docker-nginx-1 cat /etc/nginx/nginx.conf | grep -A 5 "upstream"
|
||
```
|
||
|
||
---
|
||
|
||
## Configuration
|
||
|
||
### Docker Compose (`docker-compose.yml`)
|
||
|
||
```yaml
|
||
version: '3.8'
|
||
|
||
services:
|
||
redis:
|
||
image: redis:alpine
|
||
command: redis-server --appendonly yes
|
||
volumes:
|
||
- redis_data:/data
|
||
networks:
|
||
- crawl4ai_net
|
||
restart: unless-stopped
|
||
|
||
crawl4ai:
|
||
image: unclecode/crawl4ai:latest
|
||
build:
|
||
context: .
|
||
dockerfile: Dockerfile
|
||
env_file:
|
||
- .llm.env
|
||
environment:
|
||
- REDIS_HOST=redis
|
||
- REDIS_PORT=6379
|
||
volumes:
|
||
- /dev/shm:/dev/shm
|
||
deploy:
|
||
replicas: 3
|
||
resources:
|
||
limits:
|
||
memory: 4G
|
||
depends_on:
|
||
- redis
|
||
networks:
|
||
- crawl4ai_net
|
||
healthcheck:
|
||
test: ["CMD", "curl", "-f", "http://localhost:11235/health"]
|
||
interval: 30s
|
||
timeout: 10s
|
||
retries: 3
|
||
start_period: 40s
|
||
|
||
nginx:
|
||
image: nginx:alpine
|
||
ports:
|
||
- "11235:80"
|
||
volumes:
|
||
- ./crawl4ai/templates/nginx.conf.template:/etc/nginx/nginx.conf:ro
|
||
depends_on:
|
||
- crawl4ai
|
||
networks:
|
||
- crawl4ai_net
|
||
restart: unless-stopped
|
||
|
||
networks:
|
||
crawl4ai_net:
|
||
driver: bridge
|
||
|
||
volumes:
|
||
redis_data:
|
||
```
|
||
|
||
### Environment Variables (`.llm.env`)
|
||
|
||
```bash
|
||
OPENAI_API_KEY=sk-...
|
||
ANTHROPIC_API_KEY=sk-ant-...
|
||
DEEPSEEK_API_KEY=...
|
||
GROQ_API_KEY=...
|
||
TOGETHER_API_KEY=...
|
||
MISTRAL_API_KEY=...
|
||
GEMINI_API_TOKEN=...
|
||
LLM_PROVIDER=openai/gpt-4 # Optional default provider
|
||
```
|
||
|
||
### Nginx Configuration
|
||
|
||
**Template**: `crawl4ai/templates/nginx.conf.template`
|
||
|
||
**Key Settings**:
|
||
```nginx
|
||
worker_processes auto;
|
||
|
||
upstream crawl4ai_backend {
|
||
# Round-robin for API
|
||
server crawl4ai:11235;
|
||
}
|
||
|
||
upstream crawl4ai_monitor {
|
||
# Sticky sessions for monitoring
|
||
ip_hash;
|
||
server crawl4ai:11235;
|
||
}
|
||
|
||
server {
|
||
listen 80;
|
||
client_max_body_size 10M;
|
||
|
||
# WebSocket (exact match, highest priority)
|
||
location = /monitor/ws { ... }
|
||
|
||
# Monitor/Dashboard (sticky)
|
||
location ~ ^/(monitor|dashboard) {
|
||
proxy_pass http://crawl4ai_monitor;
|
||
}
|
||
|
||
# API (round-robin)
|
||
location / {
|
||
proxy_pass http://crawl4ai_backend;
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Deployment Modes
|
||
|
||
### Single Container
|
||
|
||
**Use Case**: Development, testing, low-traffic
|
||
|
||
**Command**:
|
||
```bash
|
||
docker compose up -d --scale crawl4ai=1
|
||
```
|
||
|
||
**Characteristics**:
|
||
- No load balancing overhead
|
||
- Direct port access possible
|
||
- Simpler debugging
|
||
- Dashboard shows `mode: "single"`
|
||
|
||
---
|
||
|
||
### Compose (Multi-Container)
|
||
|
||
**Use Case**: Production, high-availability, horizontal scaling
|
||
|
||
**Command**:
|
||
```bash
|
||
docker compose up -d --scale crawl4ai=3
|
||
```
|
||
|
||
**Characteristics**:
|
||
- Nginx load balancing
|
||
- Redis aggregation
|
||
- Horizontal scaling (1-N containers)
|
||
- Dashboard shows `mode: "compose"`
|
||
- Zero-downtime scaling
|
||
|
||
**Scaling Limits**:
|
||
- **Minimum**: 1 container
|
||
- **Maximum**: Limited by host resources
|
||
- **Recommended**: 3-10 containers per host
|
||
|
||
---
|
||
|
||
### Docker Swarm (Future)
|
||
|
||
**Use Case**: Multi-host orchestration, auto-scaling
|
||
|
||
**Command**:
|
||
```bash
|
||
docker stack deploy -c docker-compose.yml crawl4ai
|
||
```
|
||
|
||
**Characteristics**:
|
||
- Multi-host deployment
|
||
- Built-in service discovery
|
||
- Auto-healing
|
||
- Dashboard shows `mode: "swarm"`
|
||
- Requires shared Redis (external or global service)
|
||
|
||
---
|
||
|
||
## Troubleshooting
|
||
|
||
### Container Discovery Issues
|
||
|
||
**Symptom**: Dashboard shows fewer containers than expected
|
||
|
||
**Diagnosis**:
|
||
```bash
|
||
# Check active containers
|
||
docker exec fix-docker-redis-1 redis-cli SMEMBERS monitor:active_containers
|
||
|
||
# Check heartbeats
|
||
docker exec fix-docker-redis-1 redis-cli KEYS "monitor:heartbeat:*"
|
||
|
||
# Check container logs for heartbeat errors
|
||
docker logs fix-docker-crawl4ai-1 | grep -i heartbeat
|
||
```
|
||
|
||
**Solutions**:
|
||
- Wait 30s for heartbeat to register
|
||
- Check Redis connectivity from containers
|
||
- Verify containers are healthy: `docker ps`
|
||
|
||
---
|
||
|
||
### No Data in Dashboard
|
||
|
||
**Symptom**: Dashboard shows "No data" or empty sections
|
||
|
||
**Diagnosis**:
|
||
```bash
|
||
# Check if containers are writing to Redis
|
||
docker exec fix-docker-redis-1 redis-cli KEYS "monitor:*:completed"
|
||
|
||
# Test aggregation endpoint
|
||
curl http://localhost:11235/monitor/requests | jq
|
||
|
||
# Check for errors in container logs
|
||
docker logs fix-docker-crawl4ai-1 | grep -i "error\|redis"
|
||
```
|
||
|
||
**Solutions**:
|
||
- Make some API requests to generate data
|
||
- Check Redis connection (REDIS_HOST, REDIS_PORT)
|
||
- Verify containers can write to Redis
|
||
|
||
---
|
||
|
||
### WebSocket Connection Failed
|
||
|
||
**Symptom**: Dashboard shows "Disconnected" or WebSocket errors
|
||
|
||
**Diagnosis**:
|
||
```bash
|
||
# Test WebSocket upgrade
|
||
curl -i -H "Connection: Upgrade" -H "Upgrade: websocket" \
|
||
-H "Sec-WebSocket-Version: 13" \
|
||
-H "Sec-WebSocket-Key: test" \
|
||
http://localhost:11235/monitor/ws
|
||
|
||
# Check nginx config
|
||
docker exec fix-docker-nginx-1 cat /etc/nginx/nginx.conf | grep -A 10 "/monitor/ws"
|
||
|
||
# Check nginx error logs
|
||
docker logs fix-docker-nginx-1 | grep -i "websocket\|upgrade"
|
||
```
|
||
|
||
**Solutions**:
|
||
- Verify nginx has WebSocket proxy config
|
||
- Check `location = /monitor/ws` is before regex locations
|
||
- Ensure upgrade headers are set correctly
|
||
|
||
---
|
||
|
||
### Filtering Not Working
|
||
|
||
**Symptom**: Clicking container filter buttons doesn't filter data
|
||
|
||
**Diagnosis**:
|
||
```bash
|
||
# Check if container_id is in data
|
||
curl http://localhost:11235/monitor/requests | jq '.completed[0].container_id'
|
||
|
||
# Verify container mapping in browser console
|
||
# Open browser console and check: containerMapping
|
||
```
|
||
|
||
**Solutions**:
|
||
- Ensure all data has `container_id` field
|
||
- Check JavaScript console for errors
|
||
- Rebuild image if backend changes weren't applied
|
||
|
||
---
|
||
|
||
### Load Balancing Issues
|
||
|
||
**Symptom**: All requests going to one container
|
||
|
||
**Diagnosis**:
|
||
```bash
|
||
# Check nginx upstream config
|
||
docker exec fix-docker-nginx-1 cat /etc/nginx/nginx.conf | grep -A 5 "upstream crawl4ai"
|
||
|
||
# Monitor which container handles requests
|
||
docker logs fix-docker-crawl4ai-1 | grep "GET /crawl"
|
||
docker logs fix-docker-crawl4ai-2 | grep "GET /crawl"
|
||
docker logs fix-docker-crawl4ai-3 | grep "GET /crawl"
|
||
```
|
||
|
||
**Solutions**:
|
||
- Verify nginx upstream has no `ip_hash` for API endpoints
|
||
- Check if all containers are healthy
|
||
- Restart nginx: `docker restart fix-docker-nginx-1`
|
||
|
||
---
|
||
|
||
## Performance Considerations
|
||
|
||
### Redis Memory Usage
|
||
|
||
**Per Container** (approximate):
|
||
- Active requests: ~1KB × 10 = 10KB
|
||
- Completed requests: ~500B × 100 = 50KB
|
||
- Janitor events: ~200B × 100 = 20KB
|
||
- Errors: ~300B × 100 = 30KB
|
||
- Heartbeat: ~100B
|
||
|
||
**Total per container**: ~110KB
|
||
|
||
**For 10 containers**: ~1.1MB
|
||
|
||
**Recommendation**: Redis with 256MB is more than sufficient
|
||
|
||
---
|
||
|
||
### Container Resource Limits
|
||
|
||
**Recommended per container**:
|
||
```yaml
|
||
resources:
|
||
limits:
|
||
memory: 4G
|
||
cpus: '2'
|
||
reservations:
|
||
memory: 1G
|
||
cpus: '1'
|
||
```
|
||
|
||
**Considerations**:
|
||
- Each container runs permanent browser (~270MB)
|
||
- Hot pool browsers (~180MB each)
|
||
- Peak memory during crawls
|
||
- Adjust based on workload
|
||
|
||
---
|
||
|
||
### Scaling Guidelines
|
||
|
||
| Containers | Use Case | Expected Throughput |
|
||
|-----------|----------|---------------------|
|
||
| 1 | Development | ~10 req/min |
|
||
| 3 | Small production | ~30 req/min |
|
||
| 5 | Medium production | ~50 req/min |
|
||
| 10 | Large production | ~100 req/min |
|
||
|
||
**Bottlenecks**:
|
||
1. Redis throughput (unlikely with <1000 req/min)
|
||
2. Nginx connection limits (adjust worker_connections)
|
||
3. Host CPU/memory
|
||
4. Browser pool limits (adjust pool sizes)
|
||
|
||
---
|
||
|
||
## Security Considerations
|
||
|
||
### Redis Security
|
||
|
||
**Current Setup**: No authentication (internal network only)
|
||
|
||
**Production Recommendations**:
|
||
```yaml
|
||
redis:
|
||
command: redis-server --requirepass ${REDIS_PASSWORD}
|
||
environment:
|
||
- REDIS_PASSWORD=strong_password_here
|
||
```
|
||
|
||
Update containers:
|
||
```yaml
|
||
environment:
|
||
- REDIS_HOST=redis
|
||
- REDIS_PASSWORD=${REDIS_PASSWORD}
|
||
```
|
||
|
||
---
|
||
|
||
### Nginx Security
|
||
|
||
**Recommendations**:
|
||
- Enable rate limiting
|
||
- Add authentication for sensitive endpoints
|
||
- Use HTTPS with TLS certificates
|
||
- Restrict `/monitor` to internal IPs
|
||
|
||
**Example Rate Limiting**:
|
||
```nginx
|
||
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
|
||
|
||
location /crawl {
|
||
limit_req zone=api burst=20 nodelay;
|
||
proxy_pass http://crawl4ai_backend;
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Maintenance
|
||
|
||
### Backup Redis Data
|
||
|
||
```bash
|
||
# Create backup
|
||
docker exec fix-docker-redis-1 redis-cli BGSAVE
|
||
|
||
# Copy dump file
|
||
docker cp fix-docker-redis-1:/data/dump.rdb ./backup-$(date +%Y%m%d).rdb
|
||
```
|
||
|
||
### Cleanup Old Data
|
||
|
||
```bash
|
||
# Redis TTLs handle automatic cleanup
|
||
# Manual cleanup if needed:
|
||
docker exec fix-docker-redis-1 redis-cli KEYS "monitor:*:completed" | xargs redis-cli DEL
|
||
```
|
||
|
||
### Rolling Updates
|
||
|
||
```bash
|
||
# Update one container at a time
|
||
docker compose up -d --no-deps --scale crawl4ai=3 crawl4ai
|
||
|
||
# Or rebuild and rolling restart
|
||
docker compose build crawl4ai
|
||
docker compose up -d --no-deps --scale crawl4ai=3 crawl4ai
|
||
```
|
||
|
||
---
|
||
|
||
## Appendix
|
||
|
||
### File Locations
|
||
|
||
```
|
||
deploy/docker/
|
||
├── server.py # Main FastAPI server
|
||
├── monitor.py # Monitoring stats with Redis
|
||
├── monitor_routes.py # Monitor API endpoints
|
||
├── utils.py # get_container_id(), detect_deployment_mode()
|
||
├── static/monitor/index.html # Dashboard UI
|
||
├── supervisord.conf # Process manager config
|
||
└── requirements.txt # Python dependencies
|
||
|
||
crawl4ai/templates/
|
||
├── docker-compose.template.yml # Docker Compose template
|
||
└── nginx.conf.template # Nginx configuration
|
||
|
||
docker-compose.yml # Active compose file
|
||
Dockerfile # Container image definition
|
||
```
|
||
|
||
### API Response Examples
|
||
|
||
**GET /monitor/containers**:
|
||
```json
|
||
{
|
||
"mode": "compose",
|
||
"container_id": "b790d0b6c9d4",
|
||
"containers": [
|
||
{"id": "b790d0b6c9d4", "hostname": "b790d0b6c9d4", "healthy": true},
|
||
{"id": "f899b55bd5f5", "hostname": "f899b55bd5f5", "healthy": true},
|
||
{"id": "076a35479dd9", "hostname": "076a35479dd9", "healthy": true}
|
||
],
|
||
"count": 3
|
||
}
|
||
```
|
||
|
||
**GET /monitor/requests**:
|
||
```json
|
||
{
|
||
"active": [],
|
||
"completed": [
|
||
{
|
||
"id": "req_26d1cbf8",
|
||
"endpoint": "/crawl",
|
||
"url": "https://httpbin.org/html",
|
||
"container_id": "b790d0b6c9d4",
|
||
"elapsed": 2.66,
|
||
"success": true,
|
||
"status_code": 200
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Changelog
|
||
|
||
### Version 0.7.4
|
||
|
||
- Added Redis aggregation for multi-container support
|
||
- Implemented container heartbeat discovery
|
||
- Added per-container filtering in dashboard
|
||
- Updated nginx config for WebSocket proxy
|
||
- Added infrastructure monitoring card
|
||
|
||
---
|
||
|
||
**Document Version**: 1.0
|
||
**Last Updated**: 2025-01-18
|
||
**Author**: Crawl4AI Team
|