Files
crawl4ai/deploy/installer/USER_GUIDE.md
unclecode 418dd60a80 docs(cnode): add direct GitHub raw URL install option
- Users can install directly from GitHub without hosting
- Added both crawl4ai.com and GitHub raw URL options
- Clarified Method 2 is for development/contributors
2025-10-21 11:03:51 +08:00

14 KiB

Crawl4AI Node Manager (cnode) - User Guide 🚀

Self-host your own Crawl4AI server cluster with one command. Scale from development to production effortlessly.

Table of Contents


What is cnode?

cnode (Crawl4AI Node Manager) is a CLI tool that manages Crawl4AI Docker server instances with automatic scaling and load balancing.

Key Features

One-Command Deployment - Start a server or cluster instantly Automatic Scaling - Single container or multi-replica cluster Built-in Load Balancing - Docker Swarm or Nginx (auto-detected) Real-time Monitoring - Beautiful web dashboard Zero Configuration - Works out of the box Production Ready - Auto-scaling, health checks, rolling updates

Architecture Modes

Replicas Mode Load Balancer Use Case
1 Single Container None Development, testing
2+ Docker Swarm Built-in Production (if Swarm available)
2+ Docker Compose Nginx Production (fallback)

Quick Start

1. Install cnode

# One-line installation
curl -sSL https://crawl4ai.com/install-cnode.sh | bash

Requirements:

  • Python 3.8+
  • Docker
  • Git

2. Start Your First Server

# Start single development server
cnode start

# Or start a production cluster with 5 replicas
cnode start --replicas 5

That's it! Your server is running at http://localhost:11235 🎉


Installation

# From crawl4ai.com (when hosted)
curl -sSL https://crawl4ai.com/install-cnode.sh | bash

# Or directly from GitHub
curl -sSL https://raw.githubusercontent.com/unclecode/crawl4ai/main/deploy/installer/install-cnode.sh | bash

Method 2: Clone Repository (For Development)

# Clone the repository
git clone https://github.com/unclecode/crawl4ai.git
cd crawl4ai/deploy/installer

# Run installer
./install-cnode.sh

Method 3: Custom Location

# Install to custom directory (using GitHub raw URL)
INSTALL_DIR=$HOME/.local/bin curl -sSL https://raw.githubusercontent.com/unclecode/crawl4ai/main/deploy/installer/install-cnode.sh | bash

# Add to PATH
export PATH="$HOME/.local/bin:$PATH"

Verify Installation

cnode --help

Basic Usage

Start Server

# Development server (1 replica)
cnode start

# Production cluster (5 replicas with auto-scaling)
cnode start --replicas 5

# Custom port
cnode start --port 8080

# Specific Docker image
cnode start --image unclecode/crawl4ai:0.7.0

Check Status

cnode status

Example Output:

╭─────────────────── Crawl4AI Server Status ───────────────────╮
│ Status     │ 🟢 Running                                      │
│ Mode       │ swarm                                           │
│ Replicas   │ 5                                               │
│ Port       │ 11235                                           │
│ Image      │ unclecode/crawl4ai:latest                       │
│ Uptime     │ 2 hours 34 minutes                              │
│ Started    │ 2025-10-21 14:30:00                            │
╰─────────────────────────────────────────────────────────────╯

✓ Server is healthy
Access: http://localhost:11235

View Logs

# Show last 100 lines
cnode logs

# Follow logs in real-time
cnode logs -f

# Show last 500 lines
cnode logs --tail 500

Stop Server

# Stop server (keeps data)
cnode stop

# Stop and remove all data
cnode stop --remove-volumes

Scaling & Production

Scale Your Cluster

# Scale to 10 replicas (live, no downtime)
cnode scale 10

# Scale down to 2 replicas
cnode scale 2

Note: Scaling is live for Swarm/Compose modes. Single container mode requires restart.

Production Deployment

# Start production cluster
cnode start --replicas 5 --port 11235

# Verify health
curl http://localhost:11235/health

# Monitor performance
cnode logs -f

Restart Server

# Restart with same configuration
cnode restart

# Restart with new replica count
cnode restart --replicas 10

Monitoring Dashboard

Access the Dashboard

Once your server is running, access the real-time monitoring dashboard:

# Dashboard URL
http://localhost:11235/monitor

Dashboard Features

📊 Real-time Metrics

  • Requests per second
  • Active connections
  • Response times
  • Error rates

📈 Performance Graphs

  • CPU usage
  • Memory consumption
  • Request latency
  • Throughput

🔍 System Health

  • Container status
  • Replica health
  • Load distribution
  • Resource utilization

Monitor Dashboard

API Health Endpoint

# Quick health check
curl http://localhost:11235/health

# Response
{
  "status": "healthy",
  "version": "1.0.0",
  "uptime": 9876,
  "containers": 5
}

Using the API

Interactive Playground

Test the API interactively:

http://localhost:11235/playground

Basic Crawl Example

Python:

import requests

# Simple crawl
response = requests.post(
    "http://localhost:11235/crawl",
    json={
        "urls": ["https://example.com"],
        "browser_config": {
            "type": "BrowserConfig",
            "params": {"headless": True}
        },
        "crawler_config": {
            "type": "CrawlerRunConfig",
            "params": {"cache_mode": "bypass"}
        }
    }
)

result = response.json()
print(f"Title: {result['result']['metadata']['title']}")
print(f"Content: {result['result']['markdown'][:200]}...")

cURL:

curl -X POST http://localhost:11235/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com"],
    "browser_config": {
      "type": "BrowserConfig",
      "params": {"headless": true}
    },
    "crawler_config": {
      "type": "CrawlerRunConfig",
      "params": {"cache_mode": "bypass"}
    }
  }'

JavaScript (Node.js):

const axios = require('axios');

async function crawl() {
  const response = await axios.post('http://localhost:11235/crawl', {
    urls: ['https://example.com'],
    browser_config: {
      type: 'BrowserConfig',
      params: { headless: true }
    },
    crawler_config: {
      type: 'CrawlerRunConfig',
      params: { cache_mode: 'bypass' }
    }
  });

  console.log('Title:', response.data.result.metadata.title);
  console.log('Content:', response.data.result.markdown.substring(0, 200));
}

crawl();

Advanced Examples

Extract with CSS Selectors:

import requests

response = requests.post(
    "http://localhost:11235/crawl",
    json={
        "urls": ["https://news.ycombinator.com"],
        "browser_config": {
            "type": "BrowserConfig",
            "params": {"headless": True}
        },
        "crawler_config": {
            "type": "CrawlerRunConfig",
            "params": {
                "extraction_strategy": {
                    "type": "JsonCssExtractionStrategy",
                    "params": {
                        "schema": {
                            "type": "dict",
                            "value": {
                                "baseSelector": ".athing",
                                "fields": [
                                    {"name": "title", "selector": ".titleline > a", "type": "text"},
                                    {"name": "url", "selector": ".titleline > a", "type": "attribute", "attribute": "href"},
                                    {"name": "points", "selector": ".score", "type": "text"}
                                ]
                            }
                        }
                    }
                }
            }
        }
    }
)

articles = response.json()['result']['extracted_content']
for article in articles:
    print(f"{article['title']} - {article['points']}")

Streaming Multiple URLs:

import requests
import json

response = requests.post(
    "http://localhost:11235/crawl/stream",
    json={
        "urls": [
            "https://example.com",
            "https://httpbin.org/html",
            "https://python.org"
        ],
        "browser_config": {
            "type": "BrowserConfig",
            "params": {"headless": True}
        },
        "crawler_config": {
            "type": "CrawlerRunConfig",
            "params": {"stream": True}
        }
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        data = json.loads(line)
        if data.get("status") == "completed":
            break
        print(f"Crawled: {data['url']} - Success: {data['success']}")

Additional Endpoints

Screenshot:

curl -X POST http://localhost:11235/screenshot \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}' \
  --output screenshot.png

PDF Export:

curl -X POST http://localhost:11235/pdf \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}' \
  --output page.pdf

HTML Extraction:

curl -X POST http://localhost:11235/html \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Management Commands

All Available Commands

cnode --help              # Show help
cnode start [OPTIONS]     # Start server
cnode stop [OPTIONS]      # Stop server
cnode status              # Show status
cnode scale N             # Scale to N replicas
cnode logs [OPTIONS]      # View logs
cnode restart [OPTIONS]   # Restart server
cnode cleanup [--force]   # Clean up resources

Command Options

start:

--replicas, -r N      # Number of replicas (default: 1)
--mode MODE           # Deployment mode: auto, single, swarm, compose
--port, -p PORT       # External port (default: 11235)
--env-file FILE       # Environment file path
--image IMAGE         # Docker image (default: unclecode/crawl4ai:latest)

stop:

--remove-volumes      # Remove persistent data (WARNING: deletes data)

logs:

--follow, -f          # Follow log output (like tail -f)
--tail N              # Number of lines to show (default: 100)

scale:

N                     # Target replica count (minimum: 1)

Troubleshooting

Server Won't Start

# Check Docker is running
docker ps

# Check port availability
lsof -i :11235

# Check logs for errors
cnode logs

High Memory Usage

# Check current status
cnode status

# Restart to clear memory
cnode restart

# Scale down if needed
cnode scale 2

Slow Response Times

# Scale up for better performance
cnode scale 10

# Check system resources
docker stats

Cannot Connect to API

# Verify server is running
cnode status

# Check firewall
sudo ufw status

# Test locally
curl http://localhost:11235/health

Clean Slate

# Complete cleanup and restart
cnode cleanup --force
cnode start --replicas 5

Advanced Topics

Environment Variables

Create .env file for API keys:

# .env file
OPENAI_API_KEY=sk-your-key
ANTHROPIC_API_KEY=your-key

Use with cnode:

cnode start --env-file .env --replicas 3

Custom Docker Image

# Use specific version
cnode start --image unclecode/crawl4ai:0.7.0-r1

# Use custom registry
cnode start --image myregistry.com/crawl4ai:custom

Production Best Practices

  1. Use Multiple Replicas

    cnode start --replicas 5
    
  2. Monitor Regularly

    # Set up monitoring cron
    */5 * * * * cnode status | mail -s "Crawl4AI Status" admin@example.com
    
  3. Regular Log Rotation

    cnode logs --tail 1000 > crawl4ai.log
    cnode restart
    
  4. Resource Limits

    • Ensure adequate RAM (2GB per replica minimum)
    • Monitor disk space for cached data
    • Use SSD for better performance

Integration Examples

With Python App:

from crawl4ai.docker_client import Crawl4aiDockerClient

async def main():
    async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
        results = await client.crawl(["https://example.com"])
        print(results[0].markdown)

With Node.js:

const Crawl4AI = require('crawl4ai-client');
const client = new Crawl4AI('http://localhost:11235');

client.crawl('https://example.com')
  .then(result => console.log(result.markdown));

With REST API: Any language with HTTP client support can use the API!


Getting Help

Resources

Common Questions

Q: How many replicas should I use? A: Start with 1 for development. Use 3-5 for production. Scale based on load.

Q: What's the difference between Swarm and Compose mode? A: Swarm has built-in load balancing (faster). Compose uses Nginx (fallback if Swarm unavailable).

Q: Can I run multiple cnode instances? A: Yes! Use different ports: cnode start --port 8080

Q: How do I update to the latest version? A: Pull new image: cnode stop && docker pull unclecode/crawl4ai:latest && cnode start


Summary

You now know how to:

  • Install cnode with one command
  • Start and manage Crawl4AI servers
  • Scale from 1 to 100+ replicas
  • Monitor performance in real-time
  • Use the API from any language
  • Troubleshoot common issues

Ready to crawl at scale! 🚀

For detailed Docker configuration and advanced deployment options, see the Docker Guide.


Happy Crawling! 🕷️

Made with ❤️ by the Crawl4AI team