docs: rename Docker deployment to self-hosting guide with comprehensive monitoring documentation
Major documentation restructuring to emphasize self-hosting capabilities and fully document the real-time monitoring system. Changes: - Renamed docker-deployment.md → self-hosting.md to better reflect the value proposition - Updated mkdocs.yml navigation to "Self-Hosting Guide" - Completely rewrote introduction emphasizing self-hosting benefits: * Data privacy and ownership * Cost control and transparency * Performance and security advantages * Full customization capabilities - Expanded "Metrics & Monitoring" → "Real-time Monitoring & Operations" with: * Monitoring Dashboard section documenting the /monitor UI * Complete feature breakdown (system health, requests, browsers, janitor, errors) * Monitor API Endpoints with all REST endpoints and examples * WebSocket Streaming integration guide with Python examples * Control Actions for manual browser management * Production Integration patterns (Prometheus, custom dashboards, alerting) * Key production metrics to track - Enhanced summary section: * What users learned checklist * Why self-hosting matters * Clear next steps * Key resources with monitoring dashboard URL The monitoring dashboard built 2-3 weeks ago is now fully documented and discoverable. Users will understand they have complete operational visibility at http://localhost:11235/monitor with real-time updates, browser pool management, and programmatic control via REST/WebSocket APIs. This positions Crawl4AI as an enterprise-grade self-hosting solution with DevOps-level monitoring capabilities, not just a Docker deployment.
This commit is contained in:
@@ -1,4 +1,20 @@
|
|||||||
# Crawl4AI Docker Guide 🐳
|
# Self-Hosting Crawl4AI 🚀
|
||||||
|
|
||||||
|
**Take Control of Your Web Crawling Infrastructure**
|
||||||
|
|
||||||
|
Self-hosting Crawl4AI gives you complete control over your web crawling and data extraction pipeline. Unlike cloud-based solutions, you own your data, infrastructure, and destiny.
|
||||||
|
|
||||||
|
## Why Self-Host?
|
||||||
|
|
||||||
|
- **🔒 Data Privacy**: Your crawled data never leaves your infrastructure
|
||||||
|
- **💰 Cost Control**: No per-request pricing - scale within your own resources
|
||||||
|
- **🎯 Customization**: Full control over browser configurations, extraction strategies, and performance tuning
|
||||||
|
- **📊 Transparency**: Real-time monitoring dashboard shows exactly what's happening
|
||||||
|
- **⚡ Performance**: Direct access without API rate limits or geographic restrictions
|
||||||
|
- **🛡️ Security**: Keep sensitive data extraction workflows behind your firewall
|
||||||
|
- **🔧 Flexibility**: Customize, extend, and integrate with your existing infrastructure
|
||||||
|
|
||||||
|
When you self-host, you can scale from a single container to a full browser infrastructure, all while maintaining complete control and visibility.
|
||||||
|
|
||||||
## Table of Contents
|
## Table of Contents
|
||||||
- [Prerequisites](#prerequisites)
|
- [Prerequisites](#prerequisites)
|
||||||
@@ -25,7 +41,12 @@
|
|||||||
- [Available MCP Tools](#available-mcp-tools)
|
- [Available MCP Tools](#available-mcp-tools)
|
||||||
- [Testing MCP Connections](#testing-mcp-connections)
|
- [Testing MCP Connections](#testing-mcp-connections)
|
||||||
- [MCP Schemas](#mcp-schemas)
|
- [MCP Schemas](#mcp-schemas)
|
||||||
- [Metrics & Monitoring](#metrics--monitoring)
|
- [Real-time Monitoring & Operations](#real-time-monitoring--operations)
|
||||||
|
- [Monitoring Dashboard](#monitoring-dashboard)
|
||||||
|
- [Monitor API Endpoints](#monitor-api-endpoints)
|
||||||
|
- [WebSocket Streaming](#websocket-streaming)
|
||||||
|
- [Control Actions](#control-actions)
|
||||||
|
- [Production Integration](#production-integration)
|
||||||
- [Deployment Scenarios](#deployment-scenarios)
|
- [Deployment Scenarios](#deployment-scenarios)
|
||||||
- [Complete Examples](#complete-examples)
|
- [Complete Examples](#complete-examples)
|
||||||
- [Server Configuration](#server-configuration)
|
- [Server Configuration](#server-configuration)
|
||||||
@@ -1175,22 +1196,469 @@ async def test_stream_crawl(token: str = None): # Made token optional
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Metrics & Monitoring
|
## Real-time Monitoring & Operations
|
||||||
|
|
||||||
Keep an eye on your crawler with these endpoints:
|
One of the key advantages of self-hosting is complete visibility into your infrastructure. Crawl4AI includes a comprehensive real-time monitoring system that gives you full transparency and control.
|
||||||
|
|
||||||
- `/health` - Quick health check
|
### Monitoring Dashboard
|
||||||
- `/metrics` - Detailed Prometheus metrics
|
|
||||||
- `/schema` - Full API schema
|
|
||||||
|
|
||||||
Example health check:
|
Access the **built-in real-time monitoring dashboard** for complete operational visibility:
|
||||||
|
|
||||||
|
```
|
||||||
|
http://localhost:11235/monitor
|
||||||
|
```
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
**Dashboard Features:**
|
||||||
|
|
||||||
|
#### 1. System Health Overview
|
||||||
|
- **CPU & Memory**: Live usage with progress bars and percentage indicators
|
||||||
|
- **Network I/O**: Total bytes sent/received since startup
|
||||||
|
- **Server Uptime**: How long your server has been running
|
||||||
|
- **Browser Pool Status**:
|
||||||
|
- 🔥 Permanent browser (always-on default config, ~270MB)
|
||||||
|
- ♨️ Hot pool (frequently used configs, ~180MB each)
|
||||||
|
- ❄️ Cold pool (idle browsers awaiting cleanup, ~180MB each)
|
||||||
|
- **Memory Pressure**: LOW/MEDIUM/HIGH indicator for janitor behavior
|
||||||
|
|
||||||
|
#### 2. Live Request Tracking
|
||||||
|
- **Active Requests**: Currently running crawls with:
|
||||||
|
- Request ID for tracking
|
||||||
|
- Target URL (truncated for display)
|
||||||
|
- Endpoint being used
|
||||||
|
- Elapsed time (updates in real-time)
|
||||||
|
- Memory usage from start
|
||||||
|
- **Completed Requests**: Last 10 finished requests showing:
|
||||||
|
- Success/failure status (color-coded)
|
||||||
|
- Total execution time
|
||||||
|
- Memory delta (how much memory changed)
|
||||||
|
- Pool hit (was browser reused?)
|
||||||
|
- HTTP status code
|
||||||
|
- **Filtering**: View all, success only, or errors only
|
||||||
|
|
||||||
|
#### 3. Browser Pool Management
|
||||||
|
Interactive table showing all active browsers:
|
||||||
|
|
||||||
|
| Type | Signature | Age | Last Used | Hits | Actions |
|
||||||
|
|------|-----------|-----|-----------|------|---------|
|
||||||
|
| permanent | abc12345 | 2h | 5s ago | 1,247 | Restart |
|
||||||
|
| hot | def67890 | 45m | 2m ago | 89 | Kill / Restart |
|
||||||
|
| cold | ghi11213 | 30m | 15m ago | 3 | Kill / Restart |
|
||||||
|
|
||||||
|
- **Reuse Rate**: Percentage of requests that reused existing browsers
|
||||||
|
- **Memory Estimates**: Total memory used by browser pool
|
||||||
|
- **Manual Control**: Kill or restart individual browsers
|
||||||
|
|
||||||
|
#### 4. Janitor Events Log
|
||||||
|
Real-time log of browser pool cleanup events:
|
||||||
|
- When cold browsers are closed due to memory pressure
|
||||||
|
- When browsers are promoted from cold to hot pool
|
||||||
|
- Forced cleanups triggered manually
|
||||||
|
- Detailed cleanup reasons and browser signatures
|
||||||
|
|
||||||
|
#### 5. Error Monitoring
|
||||||
|
Recent errors with full context:
|
||||||
|
- Timestamp
|
||||||
|
- Endpoint where error occurred
|
||||||
|
- Target URL
|
||||||
|
- Error message
|
||||||
|
- Request ID for correlation
|
||||||
|
|
||||||
|
**Live Updates:**
|
||||||
|
The dashboard connects via WebSocket and refreshes every **2 seconds** with the latest data. Connection status indicator shows when you're connected/disconnected.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Monitor API Endpoints
|
||||||
|
|
||||||
|
For programmatic monitoring, automation, and integration with your existing infrastructure:
|
||||||
|
|
||||||
|
#### Health & Statistics
|
||||||
|
|
||||||
|
**Get System Health**
|
||||||
```bash
|
```bash
|
||||||
curl http://localhost:11235/health
|
GET /monitor/health
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns current system snapshot:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"container": {
|
||||||
|
"memory_percent": 45.2,
|
||||||
|
"cpu_percent": 23.1,
|
||||||
|
"network_sent_mb": 1250.45,
|
||||||
|
"network_recv_mb": 3421.12,
|
||||||
|
"uptime_seconds": 7234
|
||||||
|
},
|
||||||
|
"pool": {
|
||||||
|
"permanent": {"active": true, "memory_mb": 270},
|
||||||
|
"hot": {"count": 3, "memory_mb": 540},
|
||||||
|
"cold": {"count": 1, "memory_mb": 180},
|
||||||
|
"total_memory_mb": 990
|
||||||
|
},
|
||||||
|
"janitor": {
|
||||||
|
"next_cleanup_estimate": "adaptive",
|
||||||
|
"memory_pressure": "MEDIUM"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Get Request Statistics**
|
||||||
|
```bash
|
||||||
|
GET /monitor/requests?status=all&limit=50
|
||||||
|
```
|
||||||
|
|
||||||
|
Query parameters:
|
||||||
|
- `status`: Filter by `all`, `active`, `completed`, `success`, or `error`
|
||||||
|
- `limit`: Number of completed requests to return (1-1000)
|
||||||
|
|
||||||
|
**Get Browser Pool Details**
|
||||||
|
```bash
|
||||||
|
GET /monitor/browsers
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns detailed information about all active browsers:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"browsers": [
|
||||||
|
{
|
||||||
|
"type": "permanent",
|
||||||
|
"sig": "abc12345",
|
||||||
|
"age_seconds": 7234,
|
||||||
|
"last_used_seconds": 5,
|
||||||
|
"memory_mb": 270,
|
||||||
|
"hits": 1247,
|
||||||
|
"killable": false
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "hot",
|
||||||
|
"sig": "def67890",
|
||||||
|
"age_seconds": 2701,
|
||||||
|
"last_used_seconds": 120,
|
||||||
|
"memory_mb": 180,
|
||||||
|
"hits": 89,
|
||||||
|
"killable": true
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"summary": {
|
||||||
|
"total_count": 5,
|
||||||
|
"total_memory_mb": 990,
|
||||||
|
"reuse_rate_percent": 87.3
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Get Endpoint Performance Statistics**
|
||||||
|
```bash
|
||||||
|
GET /monitor/endpoints/stats
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns aggregated metrics per endpoint:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"/crawl": {
|
||||||
|
"count": 1523,
|
||||||
|
"avg_latency_ms": 2341.5,
|
||||||
|
"success_rate_percent": 98.2,
|
||||||
|
"pool_hit_rate_percent": 89.1,
|
||||||
|
"errors": 27
|
||||||
|
},
|
||||||
|
"/md": {
|
||||||
|
"count": 891,
|
||||||
|
"avg_latency_ms": 1823.7,
|
||||||
|
"success_rate_percent": 99.4,
|
||||||
|
"pool_hit_rate_percent": 92.3,
|
||||||
|
"errors": 5
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Get Timeline Data**
|
||||||
|
```bash
|
||||||
|
GET /monitor/timeline?metric=memory&window=5m
|
||||||
|
```
|
||||||
|
|
||||||
|
Parameters:
|
||||||
|
- `metric`: `memory`, `requests`, or `browsers`
|
||||||
|
- `window`: Currently only `5m` (5-minute window, 5-second resolution)
|
||||||
|
|
||||||
|
Returns time-series data for charts:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"timestamps": [1699564800, 1699564805, 1699564810, ...],
|
||||||
|
"values": [42.1, 43.5, 41.8, ...]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Logs
|
||||||
|
|
||||||
|
**Get Janitor Events**
|
||||||
|
```bash
|
||||||
|
GET /monitor/logs/janitor?limit=100
|
||||||
|
```
|
||||||
|
|
||||||
|
**Get Error Log**
|
||||||
|
```bash
|
||||||
|
GET /monitor/logs/errors?limit=100
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
*(Deployment Scenarios and Complete Examples sections remain the same, maybe update links if examples moved)*
|
### WebSocket Streaming
|
||||||
|
|
||||||
|
For real-time monitoring in your own dashboards or applications:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
WS /monitor/ws
|
||||||
|
```
|
||||||
|
|
||||||
|
**Connection Example (Python):**
|
||||||
|
```python
|
||||||
|
import asyncio
|
||||||
|
import websockets
|
||||||
|
import json
|
||||||
|
|
||||||
|
async def monitor_server():
|
||||||
|
uri = "ws://localhost:11235/monitor/ws"
|
||||||
|
|
||||||
|
async with websockets.connect(uri) as websocket:
|
||||||
|
print("Connected to Crawl4AI monitor")
|
||||||
|
|
||||||
|
while True:
|
||||||
|
# Receive update every 2 seconds
|
||||||
|
data = await websocket.recv()
|
||||||
|
update = json.loads(data)
|
||||||
|
|
||||||
|
# Extract key metrics
|
||||||
|
health = update['health']
|
||||||
|
active_requests = len(update['requests']['active'])
|
||||||
|
browsers = len(update['browsers'])
|
||||||
|
|
||||||
|
print(f"Memory: {health['container']['memory_percent']:.1f}% | "
|
||||||
|
f"Active: {active_requests} | "
|
||||||
|
f"Browsers: {browsers}")
|
||||||
|
|
||||||
|
# Check for high memory pressure
|
||||||
|
if health['janitor']['memory_pressure'] == 'HIGH':
|
||||||
|
print("⚠️ HIGH MEMORY PRESSURE - Consider cleanup")
|
||||||
|
|
||||||
|
asyncio.run(monitor_server())
|
||||||
|
```
|
||||||
|
|
||||||
|
**Update Payload Structure:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"timestamp": 1699564823.456,
|
||||||
|
"health": { /* System health snapshot */ },
|
||||||
|
"requests": {
|
||||||
|
"active": [ /* Currently running */ ],
|
||||||
|
"completed": [ /* Last 10 completed */ ]
|
||||||
|
},
|
||||||
|
"browsers": [ /* All active browsers */ ],
|
||||||
|
"timeline": {
|
||||||
|
"memory": { /* Last 5 minutes */ },
|
||||||
|
"requests": { /* Request rate */ },
|
||||||
|
"browsers": { /* Pool composition */ }
|
||||||
|
},
|
||||||
|
"janitor": [ /* Last 10 cleanup events */ ],
|
||||||
|
"errors": [ /* Last 10 errors */ ]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Control Actions
|
||||||
|
|
||||||
|
Take manual control when needed:
|
||||||
|
|
||||||
|
**Force Immediate Cleanup**
|
||||||
|
```bash
|
||||||
|
POST /monitor/actions/cleanup
|
||||||
|
```
|
||||||
|
|
||||||
|
Kills all cold pool browsers immediately (useful when memory is tight):
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"success": true,
|
||||||
|
"killed_browsers": 3
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Kill Specific Browser**
|
||||||
|
```bash
|
||||||
|
POST /monitor/actions/kill_browser
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"sig": "abc12345" // First 8 chars of browser signature
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Response:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"success": true,
|
||||||
|
"killed_sig": "abc12345",
|
||||||
|
"pool_type": "hot"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Restart Browser**
|
||||||
|
```bash
|
||||||
|
POST /monitor/actions/restart_browser
|
||||||
|
Content-Type: application/json
|
||||||
|
|
||||||
|
{
|
||||||
|
"sig": "permanent" // Or first 8 chars of signature
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
For permanent browser, this will close and reinitialize it. For hot/cold browsers, it kills them and lets new requests create fresh ones.
|
||||||
|
|
||||||
|
**Reset Statistics**
|
||||||
|
```bash
|
||||||
|
POST /monitor/stats/reset
|
||||||
|
```
|
||||||
|
|
||||||
|
Clears endpoint counters (useful for starting fresh after testing).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Production Integration
|
||||||
|
|
||||||
|
#### Integration with Existing Monitoring Systems
|
||||||
|
|
||||||
|
**Prometheus Integration:**
|
||||||
|
```bash
|
||||||
|
# Scrape metrics endpoint
|
||||||
|
curl http://localhost:11235/metrics
|
||||||
|
```
|
||||||
|
|
||||||
|
**Custom Dashboard Integration:**
|
||||||
|
```python
|
||||||
|
# Example: Push metrics to your monitoring system
|
||||||
|
import asyncio
|
||||||
|
import websockets
|
||||||
|
import json
|
||||||
|
from your_monitoring import push_metric
|
||||||
|
|
||||||
|
async def integrate_monitoring():
|
||||||
|
async with websockets.connect("ws://localhost:11235/monitor/ws") as ws:
|
||||||
|
while True:
|
||||||
|
data = json.loads(await ws.recv())
|
||||||
|
|
||||||
|
# Push to your monitoring system
|
||||||
|
push_metric("crawl4ai.memory.percent",
|
||||||
|
data['health']['container']['memory_percent'])
|
||||||
|
push_metric("crawl4ai.active_requests",
|
||||||
|
len(data['requests']['active']))
|
||||||
|
push_metric("crawl4ai.browser_count",
|
||||||
|
len(data['browsers']))
|
||||||
|
```
|
||||||
|
|
||||||
|
**Alerting Example:**
|
||||||
|
```python
|
||||||
|
import requests
|
||||||
|
import time
|
||||||
|
|
||||||
|
def check_health():
|
||||||
|
"""Poll health endpoint and alert on issues"""
|
||||||
|
response = requests.get("http://localhost:11235/monitor/health")
|
||||||
|
health = response.json()
|
||||||
|
|
||||||
|
# Alert on high memory
|
||||||
|
if health['container']['memory_percent'] > 85:
|
||||||
|
send_alert(f"High memory: {health['container']['memory_percent']}%")
|
||||||
|
|
||||||
|
# Alert on high error rate
|
||||||
|
stats = requests.get("http://localhost:11235/monitor/endpoints/stats").json()
|
||||||
|
for endpoint, metrics in stats.items():
|
||||||
|
if metrics['success_rate_percent'] < 95:
|
||||||
|
send_alert(f"{endpoint} success rate: {metrics['success_rate_percent']}%")
|
||||||
|
|
||||||
|
# Run every minute
|
||||||
|
while True:
|
||||||
|
check_health()
|
||||||
|
time.sleep(60)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Log Aggregation:**
|
||||||
|
```python
|
||||||
|
import requests
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
def aggregate_errors():
|
||||||
|
"""Fetch and aggregate errors for logging system"""
|
||||||
|
response = requests.get("http://localhost:11235/monitor/logs/errors?limit=100")
|
||||||
|
errors = response.json()['errors']
|
||||||
|
|
||||||
|
for error in errors:
|
||||||
|
log_to_system({
|
||||||
|
'timestamp': datetime.fromtimestamp(error['timestamp']),
|
||||||
|
'service': 'crawl4ai',
|
||||||
|
'endpoint': error['endpoint'],
|
||||||
|
'url': error['url'],
|
||||||
|
'message': error['error'],
|
||||||
|
'request_id': error['request_id']
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Key Metrics to Track
|
||||||
|
|
||||||
|
For production self-hosted deployments, monitor these metrics:
|
||||||
|
|
||||||
|
1. **Memory Usage Trends**
|
||||||
|
- Track `container.memory_percent` over time
|
||||||
|
- Alert when consistently above 80%
|
||||||
|
- Prevents OOM kills
|
||||||
|
|
||||||
|
2. **Request Success Rates**
|
||||||
|
- Monitor per-endpoint success rates
|
||||||
|
- Alert when below 95%
|
||||||
|
- Indicates crawling issues
|
||||||
|
|
||||||
|
3. **Average Latency**
|
||||||
|
- Track `avg_latency_ms` per endpoint
|
||||||
|
- Detect performance degradation
|
||||||
|
- Optimize slow endpoints
|
||||||
|
|
||||||
|
4. **Browser Pool Efficiency**
|
||||||
|
- Monitor `reuse_rate_percent`
|
||||||
|
- Should be >80% for good efficiency
|
||||||
|
- Low rates indicate pool churn
|
||||||
|
|
||||||
|
5. **Error Frequency**
|
||||||
|
- Count errors per time window
|
||||||
|
- Alert on sudden spikes
|
||||||
|
- Track error patterns
|
||||||
|
|
||||||
|
6. **Janitor Activity**
|
||||||
|
- Monitor cleanup frequency
|
||||||
|
- Excessive cleanup indicates memory pressure
|
||||||
|
- Adjust pool settings if needed
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Quick Health Check
|
||||||
|
|
||||||
|
For simple uptime monitoring:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl http://localhost:11235/health
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"status": "healthy",
|
||||||
|
"version": "0.7.4"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Other useful endpoints:
|
||||||
|
- `/metrics` - Prometheus metrics
|
||||||
|
- `/schema` - Full API schema
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -1350,22 +1818,46 @@ We're here to help you succeed with Crawl4AI! Here's how to get support:
|
|||||||
|
|
||||||
## Summary
|
## Summary
|
||||||
|
|
||||||
In this guide, we've covered everything you need to get started with Crawl4AI's Docker deployment:
|
Congratulations! You now have everything you need to self-host your own Crawl4AI infrastructure with complete control and visibility.
|
||||||
- Building and running the Docker container
|
|
||||||
- Configuring the environment
|
|
||||||
- Using the interactive playground for testing
|
|
||||||
- Making API requests with proper typing
|
|
||||||
- Using the Python SDK
|
|
||||||
- Leveraging specialized endpoints for screenshots, PDFs, and JavaScript execution
|
|
||||||
- Connecting via the Model Context Protocol (MCP)
|
|
||||||
- Monitoring your deployment
|
|
||||||
|
|
||||||
The new playground interface at `http://localhost:11235/playground` makes it much easier to test configurations and generate the corresponding JSON for API requests.
|
**What You've Learned:**
|
||||||
|
- ✅ Multiple deployment options (Docker Hub, Docker Compose, manual builds)
|
||||||
|
- ✅ Environment configuration and LLM integration
|
||||||
|
- ✅ Using the interactive playground for testing
|
||||||
|
- ✅ Making API requests with proper typing (SDK and REST)
|
||||||
|
- ✅ Specialized endpoints (screenshots, PDFs, JavaScript execution)
|
||||||
|
- ✅ MCP integration for AI-assisted development
|
||||||
|
- ✅ **Real-time monitoring dashboard** for operational transparency
|
||||||
|
- ✅ **Monitor API** for programmatic control and integration
|
||||||
|
- ✅ Production deployment best practices
|
||||||
|
|
||||||
For AI application developers, the MCP integration allows tools like Claude Code to directly access Crawl4AI's capabilities without complex API handling.
|
**Why This Matters:**
|
||||||
|
|
||||||
Remember, the examples in the `examples` folder are your friends - they show real-world usage patterns that you can adapt for your needs.
|
By self-hosting Crawl4AI, you:
|
||||||
|
- 🔒 **Own Your Data**: Everything stays in your infrastructure
|
||||||
|
- 📊 **See Everything**: Real-time dashboard shows exactly what's happening
|
||||||
|
- 💰 **Control Costs**: Scale within your resources, no per-request fees
|
||||||
|
- ⚡ **Maximize Performance**: Direct access with smart browser pooling (10x memory efficiency)
|
||||||
|
- 🛡️ **Stay Secure**: Keep sensitive workflows behind your firewall
|
||||||
|
- 🔧 **Customize Freely**: Full control over configs, strategies, and optimizations
|
||||||
|
|
||||||
Keep exploring, and don't hesitate to reach out if you need help! We're building something amazing together. 🚀
|
**Next Steps:**
|
||||||
|
|
||||||
|
1. **Start Simple**: Deploy with Docker Hub image and test with the playground
|
||||||
|
2. **Monitor Everything**: Open `http://localhost:11235/monitor` to watch your server
|
||||||
|
3. **Integrate**: Connect your applications using the Python SDK or REST API
|
||||||
|
4. **Scale Smart**: Use the monitoring data to optimize your deployment
|
||||||
|
5. **Go Production**: Set up alerting, log aggregation, and automated cleanup
|
||||||
|
|
||||||
|
**Key Resources:**
|
||||||
|
- 🎮 **Playground**: `http://localhost:11235/playground` - Interactive testing
|
||||||
|
- 📊 **Monitor Dashboard**: `http://localhost:11235/monitor` - Real-time visibility
|
||||||
|
- 📖 **Architecture Docs**: `deploy/docker/ARCHITECTURE.md` - Deep technical dive
|
||||||
|
- 💬 **Discord Community**: Get help and share experiences
|
||||||
|
- ⭐ **GitHub**: Report issues, contribute, show support
|
||||||
|
|
||||||
|
Remember: The monitoring dashboard is your window into your infrastructure. Use it to understand performance, troubleshoot issues, and optimize your deployment. The examples in the `examples` folder show real-world usage patterns you can adapt.
|
||||||
|
|
||||||
|
**You're now in control of your web crawling destiny!** 🚀
|
||||||
|
|
||||||
Happy crawling! 🕷️
|
Happy crawling! 🕷️
|
||||||
@@ -18,7 +18,7 @@ nav:
|
|||||||
- "Marketplace Admin": "marketplace/admin/index.html"
|
- "Marketplace Admin": "marketplace/admin/index.html"
|
||||||
- Setup & Installation:
|
- Setup & Installation:
|
||||||
- "Installation": "core/installation.md"
|
- "Installation": "core/installation.md"
|
||||||
- "Docker Deployment": "core/docker-deployment.md"
|
- "Self-Hosting Guide": "core/self-hosting.md"
|
||||||
- "Blog & Changelog":
|
- "Blog & Changelog":
|
||||||
- "Blog Home": "blog/index.md"
|
- "Blog Home": "blog/index.md"
|
||||||
- "Changelog": "https://github.com/unclecode/crawl4ai/blob/main/CHANGELOG.md"
|
- "Changelog": "https://github.com/unclecode/crawl4ai/blob/main/CHANGELOG.md"
|
||||||
|
|||||||
Reference in New Issue
Block a user