Release v0.7.6: The 0.7.6 Update

- Updated version to 0.7.6 - Added comprehensive demo and release notes - Updated all documentation - Update the veriosn in Dockerfile to 0.7.6
2025-10-22 13:46:54 +02:00
parent 7e8fb3a8f3
commit ca100c6518
9 changed files with 1021 additions and 19 deletions
--- a/docs/md_v2/blog/index.md
+++ b/docs/md_v2/blog/index.md
@@ -20,6 +20,23 @@ Ever wondered why your AI coding assistant struggles with your library despite c

 ## Latest Release

+### [Crawl4AI v0.7.6 – The Webhook Infrastructure Update](../blog/release-v0.7.6.md)
+*October 22, 2025*
+
+Crawl4AI v0.7.6 introduces comprehensive webhook support for the Docker job queue API, bringing real-time notifications to both crawling and LLM extraction workflows. No more polling!
+
+Key highlights:
+- **🪝 Complete Webhook Support**: Real-time notifications for both `/crawl/job` and `/llm/job` endpoints
+- **🔄 Reliable Delivery**: Exponential backoff retry mechanism (5 attempts: 1s → 2s → 4s → 8s → 16s)
+- **🔐 Custom Authentication**: Add custom headers for webhook authentication
+- **📊 Flexible Delivery**: Choose notification-only or include full data in payload
+- **⚙️ Global Configuration**: Set default webhook URL in config.yml for all jobs
+- **🎯 Zero Breaking Changes**: Fully backward compatible, webhooks are opt-in
+
+[Read full release notes →](../blog/release-v0.7.6.md)
+
+## Recent Releases
+
 ### [Crawl4AI v0.7.5 – The Docker Hooks & Security Update](../blog/release-v0.7.5.md)
 *September 29, 2025*

--- a/docs/md_v2/blog/releases/0.7.6.md
+++ b/docs/md_v2/blog/releases/0.7.6.md
@@ -0,0 +1,314 @@
+# Crawl4AI v0.7.6 Release Notes
+
+*Release Date: October 22, 2025*
+
+I'm excited to announce Crawl4AI v0.7.6, featuring a complete webhook infrastructure for the Docker job queue API! This release eliminates polling and brings real-time notifications to both crawling and LLM extraction workflows.
+
+## 🎯 What's New
+
+### Webhook Support for Docker Job Queue API
+
+The headline feature of v0.7.6 is comprehensive webhook support for asynchronous job processing. No more constant polling to check if your jobs are done - get instant notifications when they complete!
+
+**Key Capabilities:**
+
+- ✅ **Universal Webhook Support**: Both `/crawl/job` and `/llm/job` endpoints now support webhooks
+- ✅ **Flexible Delivery Modes**: Choose notification-only or include full data in the webhook payload
+- ✅ **Reliable Delivery**: Exponential backoff retry mechanism (5 attempts: 1s → 2s → 4s → 8s → 16s)
+- ✅ **Custom Authentication**: Add custom headers for webhook authentication
+- ✅ **Global Configuration**: Set default webhook URL in `config.yml` for all jobs
+- ✅ **Task Type Identification**: Distinguish between `crawl` and `llm_extraction` tasks
+
+### How It Works
+
+Instead of constantly checking job status:
+
+**OLD WAY (Polling):**
+```python
+# Submit job
+response = requests.post("http://localhost:11235/crawl/job", json=payload)
+task_id = response.json()['task_id']
+
+# Poll until complete
+while True:
+    status = requests.get(f"http://localhost:11235/crawl/job/{task_id}")
+    if status.json()['status'] == 'completed':
+        break
+    time.sleep(5)  # Wait and try again
+```
+
+**NEW WAY (Webhooks):**
+```python
+# Submit job with webhook
+payload = {
+    "urls": ["https://example.com"],
+    "webhook_config": {
+        "webhook_url": "https://myapp.com/webhook",
+        "webhook_data_in_payload": True
+    }
+}
+response = requests.post("http://localhost:11235/crawl/job", json=payload)
+
+# Done! Webhook will notify you when complete
+# Your webhook handler receives the results automatically
+```
+
+### Crawl Job Webhooks
+
+```bash
+curl -X POST http://localhost:11235/crawl/job \
+  -H "Content-Type: application/json" \
+  -d '{
+    "urls": ["https://example.com"],
+    "browser_config": {"headless": true},
+    "crawler_config": {"cache_mode": "bypass"},
+    "webhook_config": {
+      "webhook_url": "https://myapp.com/webhooks/crawl-complete",
+      "webhook_data_in_payload": false,
+      "webhook_headers": {
+        "X-Webhook-Secret": "your-secret-token"
+      }
+    }
+  }'
+```
+
+### LLM Extraction Job Webhooks (NEW!)
+
+```bash
+curl -X POST http://localhost:11235/llm/job \
+  -H "Content-Type: application/json" \
+  -d '{
+    "url": "https://example.com/article",
+    "q": "Extract the article title, author, and publication date",
+    "schema": "{\"type\":\"object\",\"properties\":{\"title\":{\"type\":\"string\"}}}",
+    "provider": "openai/gpt-4o-mini",
+    "webhook_config": {
+      "webhook_url": "https://myapp.com/webhooks/llm-complete",
+      "webhook_data_in_payload": true
+    }
+  }'
+```
+
+### Webhook Payload Structure
+
+**Success (with data):**
+```json
+{
+  "task_id": "llm_1698765432",
+  "task_type": "llm_extraction",
+  "status": "completed",
+  "timestamp": "2025-10-22T10:30:00.000000+00:00",
+  "urls": ["https://example.com/article"],
+  "data": {
+    "extracted_content": {
+      "title": "Understanding Web Scraping",
+      "author": "John Doe",
+      "date": "2025-10-22"
+    }
+  }
+}
+```
+
+**Failure:**
+```json
+{
+  "task_id": "crawl_abc123",
+  "task_type": "crawl",
+  "status": "failed",
+  "timestamp": "2025-10-22T10:30:00.000000+00:00",
+  "urls": ["https://example.com"],
+  "error": "Connection timeout after 30s"
+}
+```
+
+### Simple Webhook Handler Example
+
+```python
+from flask import Flask, request, jsonify
+
+app = Flask(__name__)
+
+@app.route('/webhook', methods=['POST'])
+def handle_webhook():
+    payload = request.json
+
+    task_id = payload['task_id']
+    task_type = payload['task_type']
+    status = payload['status']
+
+    if status == 'completed':
+        if 'data' in payload:
+            # Process data directly
+            data = payload['data']
+        else:
+            # Fetch from API
+            endpoint = 'crawl' if task_type == 'crawl' else 'llm'
+            response = requests.get(f'http://localhost:11235/{endpoint}/job/{task_id}')
+            data = response.json()
+
+        # Your business logic here
+        print(f"Job {task_id} completed!")
+
+    elif status == 'failed':
+        error = payload.get('error', 'Unknown error')
+        print(f"Job {task_id} failed: {error}")
+
+    return jsonify({"status": "received"}), 200
+
+app.run(port=8080)
+```
+
+## 📊 Performance Improvements
+
+- **Reduced Server Load**: Eliminates constant polling requests
+- **Lower Latency**: Instant notification vs. polling interval delay
+- **Better Resource Usage**: Frees up client connections while jobs run in background
+- **Scalable Architecture**: Handles high-volume crawling workflows efficiently
+
+## 🐛 Bug Fixes
+
+- Fixed webhook configuration serialization for Pydantic HttpUrl fields
+- Improved error handling in webhook delivery service
+- Enhanced Redis task storage for webhook config persistence
+
+## 🌍 Expected Real-World Impact
+
+### For Web Scraping Workflows
+- **Reduced Costs**: Less API calls = lower bandwidth and server costs
+- **Better UX**: Instant notifications improve user experience
+- **Scalability**: Handle 100s of concurrent jobs without polling overhead
+
+### For LLM Extraction Pipelines
+- **Async Processing**: Submit LLM extraction jobs and move on
+- **Batch Processing**: Queue multiple extractions, get notified as they complete
+- **Integration**: Easy integration with workflow automation tools (Zapier, n8n, etc.)
+
+### For Microservices
+- **Event-Driven**: Perfect for event-driven microservice architectures
+- **Decoupling**: Decouple job submission from result processing
+- **Reliability**: Automatic retries ensure webhooks are delivered
+
+## 🔄 Breaking Changes
+
+**None!** This release is fully backward compatible.
+
+- Webhook configuration is optional
+- Existing code continues to work without modification
+- Polling is still supported for jobs without webhook config
+
+## 📚 Documentation
+
+### New Documentation
+- **[WEBHOOK_EXAMPLES.md](../deploy/docker/WEBHOOK_EXAMPLES.md)** - Comprehensive webhook usage guide
+- **[docker_webhook_example.py](../docs/examples/docker_webhook_example.py)** - Working code examples
+
+### Updated Documentation
+- **[Docker README](../deploy/docker/README.md)** - Added webhook sections
+- API documentation with webhook examples
+
+## 🛠️ Migration Guide
+
+No migration needed! Webhooks are opt-in:
+
+1. **To use webhooks**: Add `webhook_config` to your job payload
+2. **To keep polling**: Continue using your existing code
+
+### Quick Start
+
+```python
+# Just add webhook_config to your existing payload
+payload = {
+    # Your existing configuration
+    "urls": ["https://example.com"],
+    "browser_config": {...},
+    "crawler_config": {...},
+
+    # NEW: Add webhook configuration
+    "webhook_config": {
+        "webhook_url": "https://myapp.com/webhook",
+        "webhook_data_in_payload": True
+    }
+}
+```
+
+## 🔧 Configuration
+
+### Global Webhook Configuration (config.yml)
+
+```yaml
+webhooks:
+  enabled: true
+  default_url: "https://myapp.com/webhooks/default"  # Optional
+  data_in_payload: false
+  retry:
+    max_attempts: 5
+    initial_delay_ms: 1000
+    max_delay_ms: 32000
+    timeout_ms: 30000
+  headers:
+    User-Agent: "Crawl4AI-Webhook/1.0"
+```
+
+## 🚀 Upgrade Instructions
+
+### Docker
+
+```bash
+# Pull the latest image
+docker pull unclecode/crawl4ai:0.7.6
+
+# Or use latest tag
+docker pull unclecode/crawl4ai:latest
+
+# Run with webhook support
+docker run -d \
+  -p 11235:11235 \
+  --env-file .llm.env \
+  --name crawl4ai \
+  unclecode/crawl4ai:0.7.6
+```
+
+### Python Package
+
+```bash
+pip install --upgrade crawl4ai
+```
+
+## 💡 Pro Tips
+
+1. **Use notification-only mode** for large results - fetch data separately to avoid large webhook payloads
+2. **Set custom headers** for webhook authentication and request tracking
+3. **Configure global default webhook** for consistent handling across all jobs
+4. **Implement idempotent webhook handlers** - same webhook may be delivered multiple times on retry
+5. **Use structured schemas** with LLM extraction for predictable webhook data
+
+## 🎬 Demo
+
+Try the release demo:
+
+```bash
+python docs/releases_review/demo_v0.7.6.py
+```
+
+This comprehensive demo showcases:
+- Crawl job webhooks (notification-only and with data)
+- LLM extraction webhooks (with JSON schema support)
+- Custom headers for authentication
+- Webhook retry mechanism
+- Real-time webhook receiver
+
+## 🙏 Acknowledgments
+
+Thank you to the community for the feedback that shaped this feature! Special thanks to everyone who requested webhook support for asynchronous job processing.
+
+## 📞 Support
+
+- **Documentation**: https://docs.crawl4ai.com
+- **GitHub Issues**: https://github.com/unclecode/crawl4ai/issues
+- **Discord**: https://discord.gg/crawl4ai
+
+---
+
+**Happy crawling with webhooks!** 🕷️🪝
+
+*- unclecode*
--- a/docs/md_v2/core/docker-deployment.md
+++ b/docs/md_v2/core/docker-deployment.md
@@ -65,13 +65,13 @@ Pull and run images directly from Docker Hub without building locally.

 #### 1. Pull the Image

-Our latest release is `0.7.3`. Images are built with multi-arch manifests, so Docker automatically pulls the correct version for your system.
+Our latest release is `0.7.6`. Images are built with multi-arch manifests, so Docker automatically pulls the correct version for your system.

-> 💡 **Note**: The `latest` tag points to the stable `0.7.3` version.
+> 💡 **Note**: The `latest` tag points to the stable `0.7.6` version.

 ```bash
 # Pull the latest version
-docker pull unclecode/crawl4ai:0.7.3
+docker pull unclecode/crawl4ai:0.7.6

 # Or pull using the latest tag
 docker pull unclecode/crawl4ai:latest
@@ -143,7 +143,7 @@ docker stop crawl4ai && docker rm crawl4ai
 #### Docker Hub Versioning Explained

 *   **Image Name:** `unclecode/crawl4ai`
-*   **Tag Format:** `LIBRARY_VERSION[-SUFFIX]` (e.g., `0.7.3`)
+*   **Tag Format:** `LIBRARY_VERSION[-SUFFIX]` (e.g., `0.7.6`)
    *   `LIBRARY_VERSION`: The semantic version of the core `crawl4ai` Python library
    *   `SUFFIX`: Optional tag for release candidates (``) and revisions (`r1`)
 *   **`latest` Tag:** Points to the most recent stable version