# Crawl4AI DevOps Agent Context ## Service Overview **Crawl4AI**: Browser-based web crawling service with AI extraction. Docker deployment with horizontal scaling (1-N containers), Redis coordination, Nginx load balancing. ## Architecture Quick Reference ``` Client → Nginx:11235 → [crawl4ai-1, crawl4ai-2, ...crawl4ai-N] ← Redis ↓ Monitor Dashboard ``` **Components:** - **Nginx**: Load balancer (round-robin API, sticky monitoring) - **Crawl4AI containers**: FastAPI + Playwright browsers - **Redis**: Container discovery (heartbeats 30s), monitoring data aggregation - **Monitor**: Real-time dashboard at `/dashboard` ## CLI Commands ### Start/Stop ```bash crwl server start [-r N] [--port P] [--mode auto|single|swarm|compose] [--env-file F] [--image I] crwl server stop [--remove-volumes] crwl server restart [-r N] ``` ### Management ```bash crwl server status # Show mode, replicas, port, uptime crwl server scale N # Live scaling (Swarm/Compose only) crwl server logs [-f] [--tail N] ``` **Defaults**: replicas=1, port=11235, mode=auto, image=unclecode/crawl4ai:latest ## Deployment Modes | Replicas | Mode | Load Balancer | Use Case | |----------|------|---------------|----------| | N=1 | single | None | Dev/testing | | N>1 | swarm | Built-in | Production (if `docker swarm init` done) | | N>1 | compose | Nginx | Production (fallback) | **Mode Detection** (when mode=auto): 1. If N=1 → single 2. If N>1 & Swarm active → swarm 3. If N>1 & Swarm inactive → compose ## File Locations ``` ~/.crawl4ai/server/ ├── state.json # Current deployment state ├── docker-compose.yml # Generated compose file └── nginx.conf # Generated nginx config /app/ # Inside container ├── deploy/docker/server.py ├── deploy/docker/monitor.py ├── deploy/docker/static/monitor/index.html └── crawler_pool.py # Browser pool (PERMANENT, HOT_POOL, COLD_POOL) ``` ## Monitoring & Troubleshooting ### Health Checks ```bash curl http://localhost:11235/health # Service health curl http://localhost:11235/monitor/containers # Container discovery curl http://localhost:11235/monitor/requests # Aggregated requests ``` ### Dashboard - URL: `http://localhost:11235/dashboard/` - Features: Container filtering (All/C-1/C-2/C-3), real-time WebSocket, timeline charts - WebSocket: `/monitor/ws` (sticky sessions) ### Common Issues **No containers showing in dashboard:** ```bash docker exec redis-cli SMEMBERS monitor:active_containers docker exec redis-cli KEYS "monitor:heartbeat:*" ``` Wait 30s for heartbeat registration. **Load balancing not working:** ```bash docker exec cat /etc/nginx/nginx.conf | grep upstream docker logs | grep error ``` Check Nginx upstream has no `ip_hash` for API endpoints. **Redis connection errors:** ```bash docker logs | grep -i redis docker exec ping redis ``` Verify REDIS_HOST=redis, REDIS_PORT=6379. **Containers not scaling:** ```bash # Swarm docker service ls docker service ps crawl4ai # Compose docker compose -f ~/.crawl4ai/server/docker-compose.yml ps docker compose -f ~/.crawl4ai/server/docker-compose.yml up -d --scale crawl4ai=N ``` ### Redis Data Structure ``` monitor:active_containers # SET: {container_ids} monitor:heartbeat:{cid} # STRING: {id, hostname, last_seen} TTL=60s monitor:{cid}:active_requests # STRING: JSON list, TTL=5min monitor:{cid}:completed # STRING: JSON list, TTL=1h monitor:{cid}:janitor # STRING: JSON list, TTL=1h monitor:{cid}:errors # STRING: JSON list, TTL=1h monitor:endpoint_stats # STRING: JSON aggregate, TTL=24h ``` ## Environment Variables ### Required for Multi-LLM ```bash OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... DEEPSEEK_API_KEY=... GROQ_API_KEY=... TOGETHER_API_KEY=... MISTRAL_API_KEY=... GEMINI_API_TOKEN=... ``` ### Redis Configuration (Optional) ```bash REDIS_HOST=redis # Default: redis REDIS_PORT=6379 # Default: 6379 REDIS_TTL_ACTIVE_REQUESTS=300 # Default: 5min REDIS_TTL_COMPLETED_REQUESTS=3600 # Default: 1h REDIS_TTL_JANITOR_EVENTS=3600 # Default: 1h REDIS_TTL_ERRORS=3600 # Default: 1h REDIS_TTL_ENDPOINT_STATS=86400 # Default: 24h REDIS_TTL_HEARTBEAT=60 # Default: 1min ``` ## API Endpoints ### Core API - `POST /crawl` - Crawl URL (load-balanced) - `POST /batch` - Batch crawl (load-balanced) - `GET /health` - Health check (load-balanced) ### Monitor API (Aggregated from all containers) - `GET /monitor/health` - Local container health - `GET /monitor/containers` - All active containers - `GET /monitor/requests` - All requests (active + completed) - `GET /monitor/browsers` - Browser pool status (local only) - `GET /monitor/logs/janitor` - Janitor cleanup events - `GET /monitor/logs/errors` - Error logs - `GET /monitor/endpoints/stats` - Endpoint analytics - `WS /monitor/ws` - Real-time updates (aggregated) ### Control Actions - `POST /monitor/actions/cleanup` - Force browser cleanup - `POST /monitor/actions/kill_browser` - Kill specific browser - `POST /monitor/actions/restart_browser` - Restart browser - `POST /monitor/stats/reset` - Reset endpoint counters ## Docker Commands Reference ### Inspection ```bash # List containers docker ps --filter "name=crawl4ai" # Container logs docker logs -f --tail 100 # Redis CLI docker exec -it redis-cli KEYS monitor:* SMEMBERS monitor:active_containers GET monitor::completed TTL monitor:heartbeat: # Nginx config docker exec cat /etc/nginx/nginx.conf # Container stats docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" ``` ### Compose Operations ```bash # Scale docker compose -f ~/.crawl4ai/server/docker-compose.yml up -d --scale crawl4ai=5 # Restart service docker compose -f ~/.crawl4ai/server/docker-compose.yml restart crawl4ai # View services docker compose -f ~/.crawl4ai/server/docker-compose.yml ps ``` ### Swarm Operations ```bash # Initialize Swarm docker swarm init # Scale service docker service scale crawl4ai=5 # Service info docker service ls docker service ps crawl4ai --no-trunc # Service logs docker service logs crawl4ai --tail 100 -f ``` ## Performance & Scaling ### Resource Recommendations | Containers | Memory/Container | Total Memory | Use Case | |------------|-----------------|--------------|----------| | 1 | 4GB | 4GB | Development | | 3 | 4GB | 12GB | Small prod | | 5 | 4GB | 20GB | Medium prod | | 10 | 4GB | 40GB | Large prod | **Expected Throughput**: ~10 req/min per container (depends on crawl complexity) ### Scaling Guidelines - **Horizontal**: Add replicas (`crwl server scale N`) - **Vertical**: Adjust `--memory 8G --cpus 4` in kwargs - **Browser Pool**: Permanent (1) + Hot pool (adaptive) + Cold pool (cleanup by janitor) ### Redis Memory Usage - **Per container**: ~110KB (requests + events + errors + heartbeat) - **10 containers**: ~1.1MB - **Recommendation**: 256MB Redis is sufficient for <100 containers ## Security Notes ### Input Validation All CLI inputs validated: - Image name: alphanumeric + `.-/:_@` only, max 256 chars - Port: 1-65535 - Replicas: 1-100 - Env file: must exist and be readable - Container IDs: alphanumeric + `-_` only (prevents Redis injection) ### Network Security - Nginx forwards to internal `crawl4ai` service (Docker network) - Monitor endpoints have NO authentication (add MONITOR_TOKEN env for security) - Redis is internal-only (no external port) ### Recommended Production Setup ```bash # Add authentication export MONITOR_TOKEN="your-secret-token" # Use Redis password redis: command: redis-server --requirepass ${REDIS_PASSWORD} # Enable rate limiting in Nginx limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s; ``` ## Common User Scenarios ### Scenario 1: Fresh Deployment ```bash crwl server start --replicas 3 --env-file .env # Wait for health check, then access http://localhost:11235/health ``` ### Scenario 2: Scaling Under Load ```bash crwl server scale 10 # Live scaling, no downtime ``` ### Scenario 3: Debugging Slow Requests ```bash # Check dashboard open http://localhost:11235/dashboard/ # Check container logs docker logs --tail 100 # Check browser pool curl http://localhost:11235/monitor/browsers | jq ``` ### Scenario 4: Redis Connection Issues ```bash # Check Redis connectivity docker exec nc -zv redis 6379 # Check Redis logs docker logs # Restart containers (triggers reconnect with retry logic) crwl server restart ``` ### Scenario 5: Container Not Appearing in Dashboard ```bash # Wait 30s for heartbeat sleep 30 # Check Redis docker exec redis-cli SMEMBERS monitor:active_containers # Check container logs for heartbeat errors docker logs | grep -i heartbeat ``` ## Code Context for Advanced Debugging ### Key Classes - `MonitorStats` (monitor.py): Tracks stats, Redis persistence, heartbeat worker - `ServerManager` (server_manager.py): CLI orchestration, mode detection - Browser pool globals: `PERMANENT`, `HOT_POOL`, `COLD_POOL`, `LOCK` (crawler_pool.py) ### Critical Timeouts - Browser pool lock: 2s timeout (prevents deadlock) - WebSocket connection: 5s timeout - Health check: 30-60s timeout - Heartbeat interval: 30s, TTL: 60s - Redis retry: 3 attempts, backoff: 0.5s/1s/2s - Circuit breaker: 5 failures → 5min backoff ### State Transitions ``` NOT_RUNNING → STARTING → HEALTHY → RUNNING ↓ ↓ FAILED UNHEALTHY → STOPPED ``` State file: `~/.crawl4ai/server/state.json` (atomic writes, fcntl locking) ## Quick Diagnostic Commands ```bash # Full system check crwl server status docker ps curl http://localhost:11235/health curl http://localhost:11235/monitor/containers | jq # Redis check docker exec redis-cli PING docker exec redis-cli INFO stats # Network check docker network ls docker network inspect # Logs check docker logs --tail 50 docker logs --tail 50 docker compose -f ~/.crawl4ai/server/docker-compose.yml logs --tail 100 ``` ## Agent Decision Tree **User reports slow crawling:** 1. Check dashboard for active requests stuck → kill browser if >5min 2. Check browser pool status → cleanup if hot/cold pool >10 3. Check container CPU/memory → scale up if >80% 4. Check Redis latency → restart Redis if >100ms **User reports missing containers:** 1. Wait 30s for heartbeat 2. Check `docker ps` vs dashboard count 3. Check Redis SMEMBERS monitor:active_containers 4. Check container logs for Redis connection errors 5. Verify REDIS_HOST/PORT env vars **User reports 502/503 errors:** 1. Check Nginx logs for upstream errors 2. Check container health: `curl http://localhost:11235/health` 3. Check if all containers are healthy: `docker ps` 4. Restart Nginx: `docker restart ` **User wants to update image:** 1. `crwl server stop` 2. `docker pull unclecode/crawl4ai:latest` 3. `crwl server start --replicas ` --- **Version**: Crawl4AI v0.7.4+ **Last Updated**: 2025-01-20 **AI Agent Note**: All commands, file paths, and Redis keys verified against codebase. Use exact syntax shown. For user-facing responses, translate technical details to plain language.