- Users can install directly from GitHub without hosting - Added both crawl4ai.com and GitHub raw URL options - Clarified Method 2 is for development/contributors
677 lines
14 KiB
Markdown
677 lines
14 KiB
Markdown
# Crawl4AI Node Manager (cnode) - User Guide 🚀
|
|
|
|
Self-host your own Crawl4AI server cluster with one command. Scale from development to production effortlessly.
|
|
|
|
## Table of Contents
|
|
- [What is cnode?](#what-is-cnode)
|
|
- [Quick Start](#quick-start)
|
|
- [Installation](#installation)
|
|
- [Basic Usage](#basic-usage)
|
|
- [Scaling & Production](#scaling--production)
|
|
- [Monitoring Dashboard](#monitoring-dashboard)
|
|
- [Using the API](#using-the-api)
|
|
- [Management Commands](#management-commands)
|
|
- [Troubleshooting](#troubleshooting)
|
|
- [Advanced Topics](#advanced-topics)
|
|
|
|
---
|
|
|
|
## What is cnode?
|
|
|
|
**cnode** (Crawl4AI Node Manager) is a CLI tool that manages Crawl4AI Docker server instances with automatic scaling and load balancing.
|
|
|
|
### Key Features
|
|
|
|
✅ **One-Command Deployment** - Start a server or cluster instantly
|
|
✅ **Automatic Scaling** - Single container or multi-replica cluster
|
|
✅ **Built-in Load Balancing** - Docker Swarm or Nginx (auto-detected)
|
|
✅ **Real-time Monitoring** - Beautiful web dashboard
|
|
✅ **Zero Configuration** - Works out of the box
|
|
✅ **Production Ready** - Auto-scaling, health checks, rolling updates
|
|
|
|
### Architecture Modes
|
|
|
|
| Replicas | Mode | Load Balancer | Use Case |
|
|
|----------|------|---------------|----------|
|
|
| 1 | Single Container | None | Development, testing |
|
|
| 2+ | Docker Swarm | Built-in | Production (if Swarm available) |
|
|
| 2+ | Docker Compose | Nginx | Production (fallback) |
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### 1. Install cnode
|
|
|
|
```bash
|
|
# One-line installation
|
|
curl -sSL https://crawl4ai.com/install-cnode.sh | bash
|
|
```
|
|
|
|
**Requirements:**
|
|
- Python 3.8+
|
|
- Docker
|
|
- Git
|
|
|
|
### 2. Start Your First Server
|
|
|
|
```bash
|
|
# Start single development server
|
|
cnode start
|
|
|
|
# Or start a production cluster with 5 replicas
|
|
cnode start --replicas 5
|
|
```
|
|
|
|
That's it! Your server is running at **http://localhost:11235** 🎉
|
|
|
|
---
|
|
|
|
## Installation
|
|
|
|
### Method 1: Quick Install (Recommended)
|
|
|
|
```bash
|
|
# From crawl4ai.com (when hosted)
|
|
curl -sSL https://crawl4ai.com/install-cnode.sh | bash
|
|
|
|
# Or directly from GitHub
|
|
curl -sSL https://raw.githubusercontent.com/unclecode/crawl4ai/main/deploy/installer/install-cnode.sh | bash
|
|
```
|
|
|
|
### Method 2: Clone Repository (For Development)
|
|
|
|
```bash
|
|
# Clone the repository
|
|
git clone https://github.com/unclecode/crawl4ai.git
|
|
cd crawl4ai/deploy/installer
|
|
|
|
# Run installer
|
|
./install-cnode.sh
|
|
```
|
|
|
|
### Method 3: Custom Location
|
|
|
|
```bash
|
|
# Install to custom directory (using GitHub raw URL)
|
|
INSTALL_DIR=$HOME/.local/bin curl -sSL https://raw.githubusercontent.com/unclecode/crawl4ai/main/deploy/installer/install-cnode.sh | bash
|
|
|
|
# Add to PATH
|
|
export PATH="$HOME/.local/bin:$PATH"
|
|
```
|
|
|
|
### Verify Installation
|
|
|
|
```bash
|
|
cnode --help
|
|
```
|
|
|
|
---
|
|
|
|
## Basic Usage
|
|
|
|
### Start Server
|
|
|
|
```bash
|
|
# Development server (1 replica)
|
|
cnode start
|
|
|
|
# Production cluster (5 replicas with auto-scaling)
|
|
cnode start --replicas 5
|
|
|
|
# Custom port
|
|
cnode start --port 8080
|
|
|
|
# Specific Docker image
|
|
cnode start --image unclecode/crawl4ai:0.7.0
|
|
```
|
|
|
|
### Check Status
|
|
|
|
```bash
|
|
cnode status
|
|
```
|
|
|
|
**Example Output:**
|
|
```
|
|
╭─────────────────── Crawl4AI Server Status ───────────────────╮
|
|
│ Status │ 🟢 Running │
|
|
│ Mode │ swarm │
|
|
│ Replicas │ 5 │
|
|
│ Port │ 11235 │
|
|
│ Image │ unclecode/crawl4ai:latest │
|
|
│ Uptime │ 2 hours 34 minutes │
|
|
│ Started │ 2025-10-21 14:30:00 │
|
|
╰─────────────────────────────────────────────────────────────╯
|
|
|
|
✓ Server is healthy
|
|
Access: http://localhost:11235
|
|
```
|
|
|
|
### View Logs
|
|
|
|
```bash
|
|
# Show last 100 lines
|
|
cnode logs
|
|
|
|
# Follow logs in real-time
|
|
cnode logs -f
|
|
|
|
# Show last 500 lines
|
|
cnode logs --tail 500
|
|
```
|
|
|
|
### Stop Server
|
|
|
|
```bash
|
|
# Stop server (keeps data)
|
|
cnode stop
|
|
|
|
# Stop and remove all data
|
|
cnode stop --remove-volumes
|
|
```
|
|
|
|
---
|
|
|
|
## Scaling & Production
|
|
|
|
### Scale Your Cluster
|
|
|
|
```bash
|
|
# Scale to 10 replicas (live, no downtime)
|
|
cnode scale 10
|
|
|
|
# Scale down to 2 replicas
|
|
cnode scale 2
|
|
```
|
|
|
|
**Note:** Scaling is live for Swarm/Compose modes. Single container mode requires restart.
|
|
|
|
### Production Deployment
|
|
|
|
```bash
|
|
# Start production cluster
|
|
cnode start --replicas 5 --port 11235
|
|
|
|
# Verify health
|
|
curl http://localhost:11235/health
|
|
|
|
# Monitor performance
|
|
cnode logs -f
|
|
```
|
|
|
|
### Restart Server
|
|
|
|
```bash
|
|
# Restart with same configuration
|
|
cnode restart
|
|
|
|
# Restart with new replica count
|
|
cnode restart --replicas 10
|
|
```
|
|
|
|
---
|
|
|
|
## Monitoring Dashboard
|
|
|
|
### Access the Dashboard
|
|
|
|
Once your server is running, access the real-time monitoring dashboard:
|
|
|
|
```bash
|
|
# Dashboard URL
|
|
http://localhost:11235/monitor
|
|
```
|
|
|
|
### Dashboard Features
|
|
|
|
📊 **Real-time Metrics**
|
|
- Requests per second
|
|
- Active connections
|
|
- Response times
|
|
- Error rates
|
|
|
|
📈 **Performance Graphs**
|
|
- CPU usage
|
|
- Memory consumption
|
|
- Request latency
|
|
- Throughput
|
|
|
|
🔍 **System Health**
|
|
- Container status
|
|
- Replica health
|
|
- Load distribution
|
|
- Resource utilization
|
|
|
|

|
|
|
|
### API Health Endpoint
|
|
|
|
```bash
|
|
# Quick health check
|
|
curl http://localhost:11235/health
|
|
|
|
# Response
|
|
{
|
|
"status": "healthy",
|
|
"version": "1.0.0",
|
|
"uptime": 9876,
|
|
"containers": 5
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Using the API
|
|
|
|
### Interactive Playground
|
|
|
|
Test the API interactively:
|
|
|
|
```
|
|
http://localhost:11235/playground
|
|
```
|
|
|
|
### Basic Crawl Example
|
|
|
|
**Python:**
|
|
|
|
```python
|
|
import requests
|
|
|
|
# Simple crawl
|
|
response = requests.post(
|
|
"http://localhost:11235/crawl",
|
|
json={
|
|
"urls": ["https://example.com"],
|
|
"browser_config": {
|
|
"type": "BrowserConfig",
|
|
"params": {"headless": True}
|
|
},
|
|
"crawler_config": {
|
|
"type": "CrawlerRunConfig",
|
|
"params": {"cache_mode": "bypass"}
|
|
}
|
|
}
|
|
)
|
|
|
|
result = response.json()
|
|
print(f"Title: {result['result']['metadata']['title']}")
|
|
print(f"Content: {result['result']['markdown'][:200]}...")
|
|
```
|
|
|
|
**cURL:**
|
|
|
|
```bash
|
|
curl -X POST http://localhost:11235/crawl \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"urls": ["https://example.com"],
|
|
"browser_config": {
|
|
"type": "BrowserConfig",
|
|
"params": {"headless": true}
|
|
},
|
|
"crawler_config": {
|
|
"type": "CrawlerRunConfig",
|
|
"params": {"cache_mode": "bypass"}
|
|
}
|
|
}'
|
|
```
|
|
|
|
**JavaScript (Node.js):**
|
|
|
|
```javascript
|
|
const axios = require('axios');
|
|
|
|
async function crawl() {
|
|
const response = await axios.post('http://localhost:11235/crawl', {
|
|
urls: ['https://example.com'],
|
|
browser_config: {
|
|
type: 'BrowserConfig',
|
|
params: { headless: true }
|
|
},
|
|
crawler_config: {
|
|
type: 'CrawlerRunConfig',
|
|
params: { cache_mode: 'bypass' }
|
|
}
|
|
});
|
|
|
|
console.log('Title:', response.data.result.metadata.title);
|
|
console.log('Content:', response.data.result.markdown.substring(0, 200));
|
|
}
|
|
|
|
crawl();
|
|
```
|
|
|
|
### Advanced Examples
|
|
|
|
**Extract with CSS Selectors:**
|
|
|
|
```python
|
|
import requests
|
|
|
|
response = requests.post(
|
|
"http://localhost:11235/crawl",
|
|
json={
|
|
"urls": ["https://news.ycombinator.com"],
|
|
"browser_config": {
|
|
"type": "BrowserConfig",
|
|
"params": {"headless": True}
|
|
},
|
|
"crawler_config": {
|
|
"type": "CrawlerRunConfig",
|
|
"params": {
|
|
"extraction_strategy": {
|
|
"type": "JsonCssExtractionStrategy",
|
|
"params": {
|
|
"schema": {
|
|
"type": "dict",
|
|
"value": {
|
|
"baseSelector": ".athing",
|
|
"fields": [
|
|
{"name": "title", "selector": ".titleline > a", "type": "text"},
|
|
{"name": "url", "selector": ".titleline > a", "type": "attribute", "attribute": "href"},
|
|
{"name": "points", "selector": ".score", "type": "text"}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
)
|
|
|
|
articles = response.json()['result']['extracted_content']
|
|
for article in articles:
|
|
print(f"{article['title']} - {article['points']}")
|
|
```
|
|
|
|
**Streaming Multiple URLs:**
|
|
|
|
```python
|
|
import requests
|
|
import json
|
|
|
|
response = requests.post(
|
|
"http://localhost:11235/crawl/stream",
|
|
json={
|
|
"urls": [
|
|
"https://example.com",
|
|
"https://httpbin.org/html",
|
|
"https://python.org"
|
|
],
|
|
"browser_config": {
|
|
"type": "BrowserConfig",
|
|
"params": {"headless": True}
|
|
},
|
|
"crawler_config": {
|
|
"type": "CrawlerRunConfig",
|
|
"params": {"stream": True}
|
|
}
|
|
},
|
|
stream=True
|
|
)
|
|
|
|
for line in response.iter_lines():
|
|
if line:
|
|
data = json.loads(line)
|
|
if data.get("status") == "completed":
|
|
break
|
|
print(f"Crawled: {data['url']} - Success: {data['success']}")
|
|
```
|
|
|
|
### Additional Endpoints
|
|
|
|
**Screenshot:**
|
|
```bash
|
|
curl -X POST http://localhost:11235/screenshot \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"url": "https://example.com"}' \
|
|
--output screenshot.png
|
|
```
|
|
|
|
**PDF Export:**
|
|
```bash
|
|
curl -X POST http://localhost:11235/pdf \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"url": "https://example.com"}' \
|
|
--output page.pdf
|
|
```
|
|
|
|
**HTML Extraction:**
|
|
```bash
|
|
curl -X POST http://localhost:11235/html \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"url": "https://example.com"}'
|
|
```
|
|
|
|
---
|
|
|
|
## Management Commands
|
|
|
|
### All Available Commands
|
|
|
|
```bash
|
|
cnode --help # Show help
|
|
cnode start [OPTIONS] # Start server
|
|
cnode stop [OPTIONS] # Stop server
|
|
cnode status # Show status
|
|
cnode scale N # Scale to N replicas
|
|
cnode logs [OPTIONS] # View logs
|
|
cnode restart [OPTIONS] # Restart server
|
|
cnode cleanup [--force] # Clean up resources
|
|
```
|
|
|
|
### Command Options
|
|
|
|
**start:**
|
|
```bash
|
|
--replicas, -r N # Number of replicas (default: 1)
|
|
--mode MODE # Deployment mode: auto, single, swarm, compose
|
|
--port, -p PORT # External port (default: 11235)
|
|
--env-file FILE # Environment file path
|
|
--image IMAGE # Docker image (default: unclecode/crawl4ai:latest)
|
|
```
|
|
|
|
**stop:**
|
|
```bash
|
|
--remove-volumes # Remove persistent data (WARNING: deletes data)
|
|
```
|
|
|
|
**logs:**
|
|
```bash
|
|
--follow, -f # Follow log output (like tail -f)
|
|
--tail N # Number of lines to show (default: 100)
|
|
```
|
|
|
|
**scale:**
|
|
```bash
|
|
N # Target replica count (minimum: 1)
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Server Won't Start
|
|
|
|
```bash
|
|
# Check Docker is running
|
|
docker ps
|
|
|
|
# Check port availability
|
|
lsof -i :11235
|
|
|
|
# Check logs for errors
|
|
cnode logs
|
|
```
|
|
|
|
### High Memory Usage
|
|
|
|
```bash
|
|
# Check current status
|
|
cnode status
|
|
|
|
# Restart to clear memory
|
|
cnode restart
|
|
|
|
# Scale down if needed
|
|
cnode scale 2
|
|
```
|
|
|
|
### Slow Response Times
|
|
|
|
```bash
|
|
# Scale up for better performance
|
|
cnode scale 10
|
|
|
|
# Check system resources
|
|
docker stats
|
|
```
|
|
|
|
### Cannot Connect to API
|
|
|
|
```bash
|
|
# Verify server is running
|
|
cnode status
|
|
|
|
# Check firewall
|
|
sudo ufw status
|
|
|
|
# Test locally
|
|
curl http://localhost:11235/health
|
|
```
|
|
|
|
### Clean Slate
|
|
|
|
```bash
|
|
# Complete cleanup and restart
|
|
cnode cleanup --force
|
|
cnode start --replicas 5
|
|
```
|
|
|
|
---
|
|
|
|
## Advanced Topics
|
|
|
|
### Environment Variables
|
|
|
|
Create `.env` file for API keys:
|
|
|
|
```bash
|
|
# .env file
|
|
OPENAI_API_KEY=sk-your-key
|
|
ANTHROPIC_API_KEY=your-key
|
|
```
|
|
|
|
Use with cnode:
|
|
```bash
|
|
cnode start --env-file .env --replicas 3
|
|
```
|
|
|
|
### Custom Docker Image
|
|
|
|
```bash
|
|
# Use specific version
|
|
cnode start --image unclecode/crawl4ai:0.7.0-r1
|
|
|
|
# Use custom registry
|
|
cnode start --image myregistry.com/crawl4ai:custom
|
|
```
|
|
|
|
### Production Best Practices
|
|
|
|
1. **Use Multiple Replicas**
|
|
```bash
|
|
cnode start --replicas 5
|
|
```
|
|
|
|
2. **Monitor Regularly**
|
|
```bash
|
|
# Set up monitoring cron
|
|
*/5 * * * * cnode status | mail -s "Crawl4AI Status" admin@example.com
|
|
```
|
|
|
|
3. **Regular Log Rotation**
|
|
```bash
|
|
cnode logs --tail 1000 > crawl4ai.log
|
|
cnode restart
|
|
```
|
|
|
|
4. **Resource Limits**
|
|
- Ensure adequate RAM (2GB per replica minimum)
|
|
- Monitor disk space for cached data
|
|
- Use SSD for better performance
|
|
|
|
### Integration Examples
|
|
|
|
**With Python App:**
|
|
```python
|
|
from crawl4ai.docker_client import Crawl4aiDockerClient
|
|
|
|
async def main():
|
|
async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
|
|
results = await client.crawl(["https://example.com"])
|
|
print(results[0].markdown)
|
|
```
|
|
|
|
**With Node.js:**
|
|
```javascript
|
|
const Crawl4AI = require('crawl4ai-client');
|
|
const client = new Crawl4AI('http://localhost:11235');
|
|
|
|
client.crawl('https://example.com')
|
|
.then(result => console.log(result.markdown));
|
|
```
|
|
|
|
**With REST API:**
|
|
Any language with HTTP client support can use the API!
|
|
|
|
---
|
|
|
|
## Getting Help
|
|
|
|
### Resources
|
|
|
|
- 📖 [Full Documentation](https://docs.crawl4ai.com)
|
|
- 🐛 [Report Issues](https://github.com/unclecode/crawl4ai/issues)
|
|
- 💬 [Discord Community](https://discord.gg/crawl4ai)
|
|
- 📺 [Video Tutorials](https://youtube.com/@crawl4ai)
|
|
|
|
### Common Questions
|
|
|
|
**Q: How many replicas should I use?**
|
|
A: Start with 1 for development. Use 3-5 for production. Scale based on load.
|
|
|
|
**Q: What's the difference between Swarm and Compose mode?**
|
|
A: Swarm has built-in load balancing (faster). Compose uses Nginx (fallback if Swarm unavailable).
|
|
|
|
**Q: Can I run multiple cnode instances?**
|
|
A: Yes! Use different ports: `cnode start --port 8080`
|
|
|
|
**Q: How do I update to the latest version?**
|
|
A: Pull new image: `cnode stop && docker pull unclecode/crawl4ai:latest && cnode start`
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
You now know how to:
|
|
- ✅ Install cnode with one command
|
|
- ✅ Start and manage Crawl4AI servers
|
|
- ✅ Scale from 1 to 100+ replicas
|
|
- ✅ Monitor performance in real-time
|
|
- ✅ Use the API from any language
|
|
- ✅ Troubleshoot common issues
|
|
|
|
**Ready to crawl at scale!** 🚀
|
|
|
|
For detailed Docker configuration and advanced deployment options, see the [Docker Guide](../docker/README.md).
|
|
|
|
---
|
|
|
|
**Happy Crawling!** 🕷️
|
|
|
|
*Made with ❤️ by the Crawl4AI team*
|