- Users can install directly from GitHub without hosting - Added both crawl4ai.com and GitHub raw URL options - Clarified Method 2 is for development/contributors
14 KiB
Crawl4AI Node Manager (cnode) - User Guide 🚀
Self-host your own Crawl4AI server cluster with one command. Scale from development to production effortlessly.
Table of Contents
- What is cnode?
- Quick Start
- Installation
- Basic Usage
- Scaling & Production
- Monitoring Dashboard
- Using the API
- Management Commands
- Troubleshooting
- Advanced Topics
What is cnode?
cnode (Crawl4AI Node Manager) is a CLI tool that manages Crawl4AI Docker server instances with automatic scaling and load balancing.
Key Features
✅ One-Command Deployment - Start a server or cluster instantly ✅ Automatic Scaling - Single container or multi-replica cluster ✅ Built-in Load Balancing - Docker Swarm or Nginx (auto-detected) ✅ Real-time Monitoring - Beautiful web dashboard ✅ Zero Configuration - Works out of the box ✅ Production Ready - Auto-scaling, health checks, rolling updates
Architecture Modes
| Replicas | Mode | Load Balancer | Use Case |
|---|---|---|---|
| 1 | Single Container | None | Development, testing |
| 2+ | Docker Swarm | Built-in | Production (if Swarm available) |
| 2+ | Docker Compose | Nginx | Production (fallback) |
Quick Start
1. Install cnode
# One-line installation
curl -sSL https://crawl4ai.com/install-cnode.sh | bash
Requirements:
- Python 3.8+
- Docker
- Git
2. Start Your First Server
# Start single development server
cnode start
# Or start a production cluster with 5 replicas
cnode start --replicas 5
That's it! Your server is running at http://localhost:11235 🎉
Installation
Method 1: Quick Install (Recommended)
# From crawl4ai.com (when hosted)
curl -sSL https://crawl4ai.com/install-cnode.sh | bash
# Or directly from GitHub
curl -sSL https://raw.githubusercontent.com/unclecode/crawl4ai/main/deploy/installer/install-cnode.sh | bash
Method 2: Clone Repository (For Development)
# Clone the repository
git clone https://github.com/unclecode/crawl4ai.git
cd crawl4ai/deploy/installer
# Run installer
./install-cnode.sh
Method 3: Custom Location
# Install to custom directory (using GitHub raw URL)
INSTALL_DIR=$HOME/.local/bin curl -sSL https://raw.githubusercontent.com/unclecode/crawl4ai/main/deploy/installer/install-cnode.sh | bash
# Add to PATH
export PATH="$HOME/.local/bin:$PATH"
Verify Installation
cnode --help
Basic Usage
Start Server
# Development server (1 replica)
cnode start
# Production cluster (5 replicas with auto-scaling)
cnode start --replicas 5
# Custom port
cnode start --port 8080
# Specific Docker image
cnode start --image unclecode/crawl4ai:0.7.0
Check Status
cnode status
Example Output:
╭─────────────────── Crawl4AI Server Status ───────────────────╮
│ Status │ 🟢 Running │
│ Mode │ swarm │
│ Replicas │ 5 │
│ Port │ 11235 │
│ Image │ unclecode/crawl4ai:latest │
│ Uptime │ 2 hours 34 minutes │
│ Started │ 2025-10-21 14:30:00 │
╰─────────────────────────────────────────────────────────────╯
✓ Server is healthy
Access: http://localhost:11235
View Logs
# Show last 100 lines
cnode logs
# Follow logs in real-time
cnode logs -f
# Show last 500 lines
cnode logs --tail 500
Stop Server
# Stop server (keeps data)
cnode stop
# Stop and remove all data
cnode stop --remove-volumes
Scaling & Production
Scale Your Cluster
# Scale to 10 replicas (live, no downtime)
cnode scale 10
# Scale down to 2 replicas
cnode scale 2
Note: Scaling is live for Swarm/Compose modes. Single container mode requires restart.
Production Deployment
# Start production cluster
cnode start --replicas 5 --port 11235
# Verify health
curl http://localhost:11235/health
# Monitor performance
cnode logs -f
Restart Server
# Restart with same configuration
cnode restart
# Restart with new replica count
cnode restart --replicas 10
Monitoring Dashboard
Access the Dashboard
Once your server is running, access the real-time monitoring dashboard:
# Dashboard URL
http://localhost:11235/monitor
Dashboard Features
📊 Real-time Metrics
- Requests per second
- Active connections
- Response times
- Error rates
📈 Performance Graphs
- CPU usage
- Memory consumption
- Request latency
- Throughput
🔍 System Health
- Container status
- Replica health
- Load distribution
- Resource utilization
API Health Endpoint
# Quick health check
curl http://localhost:11235/health
# Response
{
"status": "healthy",
"version": "1.0.0",
"uptime": 9876,
"containers": 5
}
Using the API
Interactive Playground
Test the API interactively:
http://localhost:11235/playground
Basic Crawl Example
Python:
import requests
# Simple crawl
response = requests.post(
"http://localhost:11235/crawl",
json={
"urls": ["https://example.com"],
"browser_config": {
"type": "BrowserConfig",
"params": {"headless": True}
},
"crawler_config": {
"type": "CrawlerRunConfig",
"params": {"cache_mode": "bypass"}
}
}
)
result = response.json()
print(f"Title: {result['result']['metadata']['title']}")
print(f"Content: {result['result']['markdown'][:200]}...")
cURL:
curl -X POST http://localhost:11235/crawl \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com"],
"browser_config": {
"type": "BrowserConfig",
"params": {"headless": true}
},
"crawler_config": {
"type": "CrawlerRunConfig",
"params": {"cache_mode": "bypass"}
}
}'
JavaScript (Node.js):
const axios = require('axios');
async function crawl() {
const response = await axios.post('http://localhost:11235/crawl', {
urls: ['https://example.com'],
browser_config: {
type: 'BrowserConfig',
params: { headless: true }
},
crawler_config: {
type: 'CrawlerRunConfig',
params: { cache_mode: 'bypass' }
}
});
console.log('Title:', response.data.result.metadata.title);
console.log('Content:', response.data.result.markdown.substring(0, 200));
}
crawl();
Advanced Examples
Extract with CSS Selectors:
import requests
response = requests.post(
"http://localhost:11235/crawl",
json={
"urls": ["https://news.ycombinator.com"],
"browser_config": {
"type": "BrowserConfig",
"params": {"headless": True}
},
"crawler_config": {
"type": "CrawlerRunConfig",
"params": {
"extraction_strategy": {
"type": "JsonCssExtractionStrategy",
"params": {
"schema": {
"type": "dict",
"value": {
"baseSelector": ".athing",
"fields": [
{"name": "title", "selector": ".titleline > a", "type": "text"},
{"name": "url", "selector": ".titleline > a", "type": "attribute", "attribute": "href"},
{"name": "points", "selector": ".score", "type": "text"}
]
}
}
}
}
}
}
}
)
articles = response.json()['result']['extracted_content']
for article in articles:
print(f"{article['title']} - {article['points']}")
Streaming Multiple URLs:
import requests
import json
response = requests.post(
"http://localhost:11235/crawl/stream",
json={
"urls": [
"https://example.com",
"https://httpbin.org/html",
"https://python.org"
],
"browser_config": {
"type": "BrowserConfig",
"params": {"headless": True}
},
"crawler_config": {
"type": "CrawlerRunConfig",
"params": {"stream": True}
}
},
stream=True
)
for line in response.iter_lines():
if line:
data = json.loads(line)
if data.get("status") == "completed":
break
print(f"Crawled: {data['url']} - Success: {data['success']}")
Additional Endpoints
Screenshot:
curl -X POST http://localhost:11235/screenshot \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}' \
--output screenshot.png
PDF Export:
curl -X POST http://localhost:11235/pdf \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}' \
--output page.pdf
HTML Extraction:
curl -X POST http://localhost:11235/html \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
Management Commands
All Available Commands
cnode --help # Show help
cnode start [OPTIONS] # Start server
cnode stop [OPTIONS] # Stop server
cnode status # Show status
cnode scale N # Scale to N replicas
cnode logs [OPTIONS] # View logs
cnode restart [OPTIONS] # Restart server
cnode cleanup [--force] # Clean up resources
Command Options
start:
--replicas, -r N # Number of replicas (default: 1)
--mode MODE # Deployment mode: auto, single, swarm, compose
--port, -p PORT # External port (default: 11235)
--env-file FILE # Environment file path
--image IMAGE # Docker image (default: unclecode/crawl4ai:latest)
stop:
--remove-volumes # Remove persistent data (WARNING: deletes data)
logs:
--follow, -f # Follow log output (like tail -f)
--tail N # Number of lines to show (default: 100)
scale:
N # Target replica count (minimum: 1)
Troubleshooting
Server Won't Start
# Check Docker is running
docker ps
# Check port availability
lsof -i :11235
# Check logs for errors
cnode logs
High Memory Usage
# Check current status
cnode status
# Restart to clear memory
cnode restart
# Scale down if needed
cnode scale 2
Slow Response Times
# Scale up for better performance
cnode scale 10
# Check system resources
docker stats
Cannot Connect to API
# Verify server is running
cnode status
# Check firewall
sudo ufw status
# Test locally
curl http://localhost:11235/health
Clean Slate
# Complete cleanup and restart
cnode cleanup --force
cnode start --replicas 5
Advanced Topics
Environment Variables
Create .env file for API keys:
# .env file
OPENAI_API_KEY=sk-your-key
ANTHROPIC_API_KEY=your-key
Use with cnode:
cnode start --env-file .env --replicas 3
Custom Docker Image
# Use specific version
cnode start --image unclecode/crawl4ai:0.7.0-r1
# Use custom registry
cnode start --image myregistry.com/crawl4ai:custom
Production Best Practices
-
Use Multiple Replicas
cnode start --replicas 5 -
Monitor Regularly
# Set up monitoring cron */5 * * * * cnode status | mail -s "Crawl4AI Status" admin@example.com -
Regular Log Rotation
cnode logs --tail 1000 > crawl4ai.log cnode restart -
Resource Limits
- Ensure adequate RAM (2GB per replica minimum)
- Monitor disk space for cached data
- Use SSD for better performance
Integration Examples
With Python App:
from crawl4ai.docker_client import Crawl4aiDockerClient
async def main():
async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
results = await client.crawl(["https://example.com"])
print(results[0].markdown)
With Node.js:
const Crawl4AI = require('crawl4ai-client');
const client = new Crawl4AI('http://localhost:11235');
client.crawl('https://example.com')
.then(result => console.log(result.markdown));
With REST API: Any language with HTTP client support can use the API!
Getting Help
Resources
Common Questions
Q: How many replicas should I use? A: Start with 1 for development. Use 3-5 for production. Scale based on load.
Q: What's the difference between Swarm and Compose mode? A: Swarm has built-in load balancing (faster). Compose uses Nginx (fallback if Swarm unavailable).
Q: Can I run multiple cnode instances?
A: Yes! Use different ports: cnode start --port 8080
Q: How do I update to the latest version?
A: Pull new image: cnode stop && docker pull unclecode/crawl4ai:latest && cnode start
Summary
You now know how to:
- ✅ Install cnode with one command
- ✅ Start and manage Crawl4AI servers
- ✅ Scale from 1 to 100+ replicas
- ✅ Monitor performance in real-time
- ✅ Use the API from any language
- ✅ Troubleshoot common issues
Ready to crawl at scale! 🚀
For detailed Docker configuration and advanced deployment options, see the Docker Guide.
Happy Crawling! 🕷️
Made with ❤️ by the Crawl4AI team
