feat(docker): update Dockerfile for improved installation process and enhance deployment documentation with Docker Compose setup and API token security

This commit is contained in:
UncleCode
2024-11-16 18:19:44 +08:00
parent 6360d0545a
commit 9139ef3125
2 changed files with 86 additions and 46 deletions

View File

@@ -79,7 +79,6 @@ COPY . .
RUN pip install --no-cache-dir -r requirements.txt RUN pip install --no-cache-dir -r requirements.txt
# Install required library for FastAPI # Install required library for FastAPI
RUN pip install .
RUN pip install fastapi uvicorn psutil RUN pip install fastapi uvicorn psutil
# Install ML dependencies first for better layer caching # Install ML dependencies first for better layer caching
@@ -97,15 +96,15 @@ RUN if [ "$INSTALL_TYPE" = "all" ] ; then \
# Install the package # Install the package
RUN if [ "$INSTALL_TYPE" = "all" ] ; then \ RUN if [ "$INSTALL_TYPE" = "all" ] ; then \
pip install -e ".[all]" && \ pip install ".[all]" && \
python -m crawl4ai.model_loader ; \ python -m crawl4ai.model_loader ; \
elif [ "$INSTALL_TYPE" = "torch" ] ; then \ elif [ "$INSTALL_TYPE" = "torch" ] ; then \
pip install -e ".[torch]" ; \ pip install ".[torch]" ; \
elif [ "$INSTALL_TYPE" = "transformer" ] ; then \ elif [ "$INSTALL_TYPE" = "transformer" ] ; then \
pip install -e ".[transformer]" && \ pip install ".[transformer]" && \
python -m crawl4ai.model_loader ; \ python -m crawl4ai.model_loader ; \
else \ else \
pip install -e "." ; \ pip install "." ; \
fi fi
# Install MkDocs and required plugins # Install MkDocs and required plugins

View File

@@ -1,71 +1,112 @@
# Docker Deployment # Docker Deployment 🐳
Crawl4AI provides official Docker images for easy deployment and scalability. This guide covers installation, configuration, and usage of Crawl4AI in Docker environments. Crawl4AI provides official Docker images for easy deployment and scalability. This guide covers installation, configuration, and usage of Crawl4AI in Docker environments.
## Quick Start 🚀 ## Docker Compose Setup 🐳
Pull and run the basic version: ### Basic Usage
```bash Create a `docker-compose.yml`:
docker pull unclecode/crawl4ai:basic ```yaml
docker run -p 11235:11235 unclecode/crawl4ai:basic version: '3.8'
services:
crawl4ai:
image: unclecode/crawl4ai:all
ports:
- "11235:11235"
volumes:
- /dev/shm:/dev/shm
deploy:
resources:
limits:
memory: 4G
restart: unless-stopped
``` ```
Test the deployment: Run with:
```bash
docker-compose up -d
```
### Secure Mode with API Token
To enable API authentication, simply set the `CRAWL4AI_API_TOKEN`:
```bash
CRAWL4AI_API_TOKEN=your-secret-token docker-compose up -d
```
### Using Environment Variables
Create a `.env` file for your API tokens:
```env
# Crawl4AI API Security (optional)
CRAWL4AI_API_TOKEN=your-secret-token
# LLM Provider API Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...
GEMINI_API_KEY=...
OLLAMA_API_KEY=...
# Additional Configuration
MAX_CONCURRENT_TASKS=5
```
Docker Compose will automatically load variables from the `.env` file. No additional configuration needed!
### Testing with API Token
```python ```python
import requests import requests
# Test health endpoint # Initialize headers with token if using secure mode
health = requests.get("http://localhost:11235/health") headers = {}
print("Health check:", health.json()) if api_token := os.getenv('CRAWL4AI_API_TOKEN'):
headers['Authorization'] = f'Bearer {api_token}'
# Test basic crawl # Test crawl with authentication
response = requests.post( response = requests.post(
"http://localhost:11235/crawl", "http://localhost:11235/crawl",
headers=headers,
json={ json={
"urls": "https://www.nbcnews.com/business", "urls": "https://www.nbcnews.com/business",
"priority": 10 "priority": 10
} }
) )
task_id = response.json()["task_id"] task_id = response.json()["task_id"]
print("Task ID:", task_id)
``` ```
## Available Images 🏷️ ### Security Best Practices 🔒
- `unclecode/crawl4ai:basic` - Basic web crawling capabilities - Add `.env` to your `.gitignore`
- `unclecode/crawl4ai:all` - Full installation with all features - Use different API tokens for development and production
- `unclecode/crawl4ai:gpu` - GPU-enabled version for ML features - Rotate API tokens periodically
- Use secure methods to pass tokens in production environments
## Configuration Options 🔧
### Environment Variables
```bash
docker run -p 11235:11235 \
-e MAX_CONCURRENT_TASKS=5 \
-e OPENAI_API_KEY=your_key \
unclecode/crawl4ai:all
``` ```
### Volume Mounting This addition to your documentation:
1. Shows how to use Docker Compose
2. Explains both secure and non-secure modes
3. Demonstrates environment variable configuration
4. Provides example code for authenticated requests
5. Includes security best practices
Mount a directory for persistent data:
```bash
docker run -p 11235:11235 \
-v $(pwd)/data:/app/data \
unclecode/crawl4ai:all
```
### Resource Limits
Control container resources:
```bash
docker run -p 11235:11235 \
--memory=4g \
--cpus=2 \
unclecode/crawl4ai:all
```
## Usage Examples 📝 ## Usage Examples 📝