Merge branch 'develop' into feature/docker-cluster
This commit is contained in:
251
tests/WEBHOOK_TEST_README.md
Normal file
251
tests/WEBHOOK_TEST_README.md
Normal file
@@ -0,0 +1,251 @@
|
||||
# Webhook Feature Test Script
|
||||
|
||||
This directory contains a comprehensive test script for the webhook feature implementation.
|
||||
|
||||
## Overview
|
||||
|
||||
The `test_webhook_feature.sh` script automates the entire process of testing the webhook feature:
|
||||
|
||||
1. ✅ Fetches and switches to the webhook feature branch
|
||||
2. ✅ Activates the virtual environment
|
||||
3. ✅ Installs all required dependencies
|
||||
4. ✅ Starts Redis server in background
|
||||
5. ✅ Starts Crawl4AI server in background
|
||||
6. ✅ Runs webhook integration test
|
||||
7. ✅ Verifies job completion via webhook
|
||||
8. ✅ Cleans up and returns to original branch
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.10+
|
||||
- Virtual environment already created (`venv/` in project root)
|
||||
- Git repository with the webhook feature branch
|
||||
- `redis-server` (script will attempt to install if missing)
|
||||
- `curl` and `lsof` commands available
|
||||
|
||||
## Usage
|
||||
|
||||
### Quick Start
|
||||
|
||||
From the project root:
|
||||
|
||||
```bash
|
||||
./tests/test_webhook_feature.sh
|
||||
```
|
||||
|
||||
Or from the tests directory:
|
||||
|
||||
```bash
|
||||
cd tests
|
||||
./test_webhook_feature.sh
|
||||
```
|
||||
|
||||
### What the Script Does
|
||||
|
||||
#### Step 1: Branch Management
|
||||
- Saves your current branch
|
||||
- Fetches the webhook feature branch from remote
|
||||
- Switches to the webhook feature branch
|
||||
|
||||
#### Step 2: Environment Setup
|
||||
- Activates your existing virtual environment
|
||||
- Installs dependencies from `deploy/docker/requirements.txt`
|
||||
- Installs Flask for the webhook receiver
|
||||
|
||||
#### Step 3: Service Startup
|
||||
- Starts Redis server on port 6379
|
||||
- Starts Crawl4AI server on port 11235
|
||||
- Waits for server health check to pass
|
||||
|
||||
#### Step 4: Webhook Test
|
||||
- Creates a webhook receiver on port 8080
|
||||
- Submits a crawl job for `https://example.com` with webhook config
|
||||
- Waits for webhook notification (60s timeout)
|
||||
- Verifies webhook payload contains expected data
|
||||
|
||||
#### Step 5: Cleanup
|
||||
- Stops webhook receiver
|
||||
- Stops Crawl4AI server
|
||||
- Stops Redis server
|
||||
- Returns to your original branch
|
||||
|
||||
## Expected Output
|
||||
|
||||
```
|
||||
[INFO] Starting webhook feature test script
|
||||
[INFO] Project root: /path/to/crawl4ai
|
||||
[INFO] Step 1: Fetching PR branch...
|
||||
[INFO] Current branch: develop
|
||||
[SUCCESS] Branch fetched
|
||||
[INFO] Step 2: Switching to branch: claude/implement-webhook-crawl-feature-011CULZY1Jy8N5MUkZqXkRVp
|
||||
[SUCCESS] Switched to webhook feature branch
|
||||
[INFO] Step 3: Activating virtual environment...
|
||||
[SUCCESS] Virtual environment activated
|
||||
[INFO] Step 4: Installing server dependencies...
|
||||
[SUCCESS] Dependencies installed
|
||||
[INFO] Step 5a: Starting Redis...
|
||||
[SUCCESS] Redis started (PID: 12345)
|
||||
[INFO] Step 5b: Starting server on port 11235...
|
||||
[INFO] Server started (PID: 12346)
|
||||
[INFO] Waiting for server to be ready...
|
||||
[SUCCESS] Server is ready!
|
||||
[INFO] Step 6: Creating webhook test script...
|
||||
[INFO] Running webhook test...
|
||||
|
||||
🚀 Submitting crawl job with webhook...
|
||||
✅ Job submitted successfully, task_id: crawl_abc123
|
||||
⏳ Waiting for webhook notification...
|
||||
|
||||
✅ Webhook received: {
|
||||
"task_id": "crawl_abc123",
|
||||
"task_type": "crawl",
|
||||
"status": "completed",
|
||||
"timestamp": "2025-10-22T00:00:00.000000+00:00",
|
||||
"urls": ["https://example.com"],
|
||||
"data": { ... }
|
||||
}
|
||||
|
||||
✅ Webhook received!
|
||||
Task ID: crawl_abc123
|
||||
Status: completed
|
||||
URLs: ['https://example.com']
|
||||
✅ Data included in webhook payload
|
||||
📄 Crawled 1 URL(s)
|
||||
- https://example.com: 1234 chars
|
||||
|
||||
🎉 Webhook test PASSED!
|
||||
|
||||
[INFO] Step 7: Verifying test results...
|
||||
[SUCCESS] ✅ Webhook test PASSED!
|
||||
[SUCCESS] All tests completed successfully! 🎉
|
||||
[INFO] Cleanup will happen automatically...
|
||||
[INFO] Starting cleanup...
|
||||
[INFO] Stopping webhook receiver...
|
||||
[INFO] Stopping server...
|
||||
[INFO] Stopping Redis...
|
||||
[INFO] Switching back to branch: develop
|
||||
[SUCCESS] Cleanup complete
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Server Failed to Start
|
||||
|
||||
If the server fails to start, check the logs:
|
||||
|
||||
```bash
|
||||
tail -100 /tmp/crawl4ai_server.log
|
||||
```
|
||||
|
||||
Common issues:
|
||||
- Port 11235 already in use: `lsof -ti:11235 | xargs kill -9`
|
||||
- Missing dependencies: Check that all packages are installed
|
||||
|
||||
### Redis Connection Failed
|
||||
|
||||
Check if Redis is running:
|
||||
|
||||
```bash
|
||||
redis-cli ping
|
||||
# Should return: PONG
|
||||
```
|
||||
|
||||
If not running:
|
||||
|
||||
```bash
|
||||
redis-server --port 6379 --daemonize yes
|
||||
```
|
||||
|
||||
### Webhook Not Received
|
||||
|
||||
The script has a 60-second timeout for webhook delivery. If the webhook isn't received:
|
||||
|
||||
1. Check server logs: `/tmp/crawl4ai_server.log`
|
||||
2. Verify webhook receiver is running on port 8080
|
||||
3. Check network connectivity between components
|
||||
|
||||
### Script Interruption
|
||||
|
||||
If the script is interrupted (Ctrl+C), cleanup happens automatically via trap. The script will:
|
||||
- Kill all background processes
|
||||
- Stop Redis
|
||||
- Return to your original branch
|
||||
|
||||
To manually cleanup if needed:
|
||||
|
||||
```bash
|
||||
# Kill processes by port
|
||||
lsof -ti:11235 | xargs kill -9 # Server
|
||||
lsof -ti:8080 | xargs kill -9 # Webhook receiver
|
||||
lsof -ti:6379 | xargs kill -9 # Redis
|
||||
|
||||
# Return to your branch
|
||||
git checkout develop # or your branch name
|
||||
```
|
||||
|
||||
## Testing Different URLs
|
||||
|
||||
To test with a different URL, modify the script or create a custom test:
|
||||
|
||||
```python
|
||||
payload = {
|
||||
"urls": ["https://your-url-here.com"],
|
||||
"browser_config": {"headless": True},
|
||||
"crawler_config": {"cache_mode": "bypass"},
|
||||
"webhook_config": {
|
||||
"webhook_url": "http://localhost:8080/webhook",
|
||||
"webhook_data_in_payload": True
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Files Generated
|
||||
|
||||
The script creates temporary files:
|
||||
|
||||
- `/tmp/crawl4ai_server.log` - Server output logs
|
||||
- `/tmp/test_webhook.py` - Webhook test Python script
|
||||
|
||||
These are not cleaned up automatically so you can review them after the test.
|
||||
|
||||
## Exit Codes
|
||||
|
||||
- `0` - All tests passed successfully
|
||||
- `1` - Test failed (check output for details)
|
||||
|
||||
## Safety Features
|
||||
|
||||
- ✅ Automatic cleanup on exit, interrupt, or error
|
||||
- ✅ Returns to original branch on completion
|
||||
- ✅ Kills all background processes
|
||||
- ✅ Comprehensive error handling
|
||||
- ✅ Colored output for easy reading
|
||||
- ✅ Detailed logging at each step
|
||||
|
||||
## Notes
|
||||
|
||||
- The script uses `set -e` to exit on any command failure
|
||||
- All background processes are tracked and cleaned up
|
||||
- The virtual environment must exist before running
|
||||
- Redis must be available (installed or installable via apt-get/brew)
|
||||
|
||||
## Integration with CI/CD
|
||||
|
||||
This script can be integrated into CI/CD pipelines:
|
||||
|
||||
```yaml
|
||||
# Example GitHub Actions
|
||||
- name: Test Webhook Feature
|
||||
run: |
|
||||
chmod +x tests/test_webhook_feature.sh
|
||||
./tests/test_webhook_feature.sh
|
||||
```
|
||||
|
||||
## Support
|
||||
|
||||
If you encounter issues:
|
||||
|
||||
1. Check the troubleshooting section above
|
||||
2. Review server logs at `/tmp/crawl4ai_server.log`
|
||||
3. Ensure all prerequisites are met
|
||||
4. Open an issue with the full output of the script
|
||||
193
tests/docker/test_hooks_utility.py
Normal file
193
tests/docker/test_hooks_utility.py
Normal file
@@ -0,0 +1,193 @@
|
||||
"""
|
||||
Test script demonstrating the hooks_to_string utility and Docker client integration.
|
||||
"""
|
||||
import asyncio
|
||||
from crawl4ai import Crawl4aiDockerClient, hooks_to_string
|
||||
|
||||
|
||||
# Define hook functions as regular Python functions
|
||||
async def auth_hook(page, context, **kwargs):
|
||||
"""Add authentication cookies."""
|
||||
await context.add_cookies([{
|
||||
'name': 'test_cookie',
|
||||
'value': 'test_value',
|
||||
'domain': '.httpbin.org',
|
||||
'path': '/'
|
||||
}])
|
||||
return page
|
||||
|
||||
|
||||
async def scroll_hook(page, context, **kwargs):
|
||||
"""Scroll to load lazy content."""
|
||||
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
|
||||
await page.wait_for_timeout(1000)
|
||||
return page
|
||||
|
||||
|
||||
async def viewport_hook(page, context, **kwargs):
|
||||
"""Set custom viewport."""
|
||||
await page.set_viewport_size({"width": 1920, "height": 1080})
|
||||
return page
|
||||
|
||||
|
||||
async def test_hooks_utility():
|
||||
"""Test the hooks_to_string utility function."""
|
||||
print("=" * 60)
|
||||
print("Testing hooks_to_string utility")
|
||||
print("=" * 60)
|
||||
|
||||
# Create hooks dictionary with function objects
|
||||
hooks_dict = {
|
||||
"on_page_context_created": auth_hook,
|
||||
"before_retrieve_html": scroll_hook
|
||||
}
|
||||
|
||||
# Convert to string format
|
||||
hooks_string = hooks_to_string(hooks_dict)
|
||||
|
||||
print("\n✓ Successfully converted function objects to strings")
|
||||
print(f"\n✓ Converted {len(hooks_string)} hooks:")
|
||||
for hook_name in hooks_string.keys():
|
||||
print(f" - {hook_name}")
|
||||
|
||||
print("\n✓ Preview of converted hook:")
|
||||
print("-" * 60)
|
||||
print(hooks_string["on_page_context_created"][:200] + "...")
|
||||
print("-" * 60)
|
||||
|
||||
return hooks_string
|
||||
|
||||
|
||||
async def test_docker_client_with_functions():
|
||||
"""Test Docker client with function objects (automatic conversion)."""
|
||||
print("\n" + "=" * 60)
|
||||
print("Testing Docker Client with Function Objects")
|
||||
print("=" * 60)
|
||||
|
||||
# Note: This requires a running Crawl4AI Docker server
|
||||
# Uncomment the following to test with actual server:
|
||||
|
||||
async with Crawl4aiDockerClient(base_url="http://localhost:11234", verbose=True) as client:
|
||||
# Pass function objects directly - they'll be converted automatically
|
||||
result = await client.crawl(
|
||||
["https://httpbin.org/html"],
|
||||
hooks={
|
||||
"on_page_context_created": auth_hook,
|
||||
"before_retrieve_html": scroll_hook
|
||||
},
|
||||
hooks_timeout=30
|
||||
)
|
||||
print(f"\n✓ Crawl successful: {result.success}")
|
||||
print(f"✓ URL: {result.url}")
|
||||
|
||||
print("\n✓ Docker client accepts function objects directly")
|
||||
print("✓ Automatic conversion happens internally")
|
||||
print("✓ No manual string formatting needed!")
|
||||
|
||||
|
||||
async def test_docker_client_with_strings():
|
||||
"""Test Docker client with pre-converted strings."""
|
||||
print("\n" + "=" * 60)
|
||||
print("Testing Docker Client with String Hooks")
|
||||
print("=" * 60)
|
||||
|
||||
# Convert hooks to strings first
|
||||
hooks_dict = {
|
||||
"on_page_context_created": viewport_hook,
|
||||
"before_retrieve_html": scroll_hook
|
||||
}
|
||||
hooks_string = hooks_to_string(hooks_dict)
|
||||
|
||||
# Note: This requires a running Crawl4AI Docker server
|
||||
# Uncomment the following to test with actual server:
|
||||
|
||||
async with Crawl4aiDockerClient(base_url="http://localhost:11234", verbose=True) as client:
|
||||
# Pass string hooks - they'll be used as-is
|
||||
result = await client.crawl(
|
||||
["https://httpbin.org/html"],
|
||||
hooks=hooks_string,
|
||||
hooks_timeout=30
|
||||
)
|
||||
print(f"\n✓ Crawl successful: {result.success}")
|
||||
|
||||
print("\n✓ Docker client also accepts pre-converted strings")
|
||||
print("✓ Backward compatible with existing code")
|
||||
|
||||
|
||||
async def show_usage_patterns():
|
||||
"""Show different usage patterns."""
|
||||
print("\n" + "=" * 60)
|
||||
print("Usage Patterns")
|
||||
print("=" * 60)
|
||||
|
||||
print("\n1. Direct function usage (simplest):")
|
||||
print("-" * 60)
|
||||
print("""
|
||||
async def my_hook(page, context, **kwargs):
|
||||
await page.set_viewport_size({"width": 1920, "height": 1080})
|
||||
return page
|
||||
|
||||
result = await client.crawl(
|
||||
["https://example.com"],
|
||||
hooks={"on_page_context_created": my_hook}
|
||||
)
|
||||
""")
|
||||
|
||||
print("\n2. Convert then use:")
|
||||
print("-" * 60)
|
||||
print("""
|
||||
hooks_dict = {"on_page_context_created": my_hook}
|
||||
hooks_string = hooks_to_string(hooks_dict)
|
||||
|
||||
result = await client.crawl(
|
||||
["https://example.com"],
|
||||
hooks=hooks_string
|
||||
)
|
||||
""")
|
||||
|
||||
print("\n3. Manual string (backward compatible):")
|
||||
print("-" * 60)
|
||||
print("""
|
||||
hooks_string = {
|
||||
"on_page_context_created": '''
|
||||
async def hook(page, context, **kwargs):
|
||||
await page.set_viewport_size({"width": 1920, "height": 1080})
|
||||
return page
|
||||
'''
|
||||
}
|
||||
|
||||
result = await client.crawl(
|
||||
["https://example.com"],
|
||||
hooks=hooks_string
|
||||
)
|
||||
""")
|
||||
|
||||
|
||||
async def main():
|
||||
"""Run all tests."""
|
||||
print("\n🚀 Crawl4AI Hooks Utility Test Suite\n")
|
||||
|
||||
# Test the utility function
|
||||
# await test_hooks_utility()
|
||||
|
||||
# Show usage with Docker client
|
||||
# await test_docker_client_with_functions()
|
||||
await test_docker_client_with_strings()
|
||||
|
||||
# Show different patterns
|
||||
# await show_usage_patterns()
|
||||
|
||||
# print("\n" + "=" * 60)
|
||||
# print("✓ All tests completed successfully!")
|
||||
# print("=" * 60)
|
||||
# print("\nKey Benefits:")
|
||||
# print(" • Write hooks as regular Python functions")
|
||||
# print(" • IDE support with autocomplete and type checking")
|
||||
# print(" • Automatic conversion to API format")
|
||||
# print(" • Backward compatible with string hooks")
|
||||
# print(" • Same utility used everywhere")
|
||||
# print("\n")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
305
tests/test_webhook_feature.sh
Executable file
305
tests/test_webhook_feature.sh
Executable file
@@ -0,0 +1,305 @@
|
||||
#!/bin/bash
|
||||
|
||||
#############################################################################
|
||||
# Webhook Feature Test Script
|
||||
#
|
||||
# This script tests the webhook feature implementation by:
|
||||
# 1. Switching to the webhook feature branch
|
||||
# 2. Installing dependencies
|
||||
# 3. Starting the server
|
||||
# 4. Running webhook tests
|
||||
# 5. Cleaning up and returning to original branch
|
||||
#
|
||||
# Usage: ./test_webhook_feature.sh
|
||||
#############################################################################
|
||||
|
||||
set -e # Exit on error
|
||||
|
||||
# Colors for output
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
# Configuration
|
||||
BRANCH_NAME="claude/implement-webhook-crawl-feature-011CULZY1Jy8N5MUkZqXkRVp"
|
||||
VENV_PATH="venv"
|
||||
SERVER_PORT=11235
|
||||
WEBHOOK_PORT=8080
|
||||
PROJECT_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
|
||||
|
||||
# PID files for cleanup
|
||||
REDIS_PID=""
|
||||
SERVER_PID=""
|
||||
WEBHOOK_PID=""
|
||||
|
||||
#############################################################################
|
||||
# Utility Functions
|
||||
#############################################################################
|
||||
|
||||
log_info() {
|
||||
echo -e "${BLUE}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
log_success() {
|
||||
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||
}
|
||||
|
||||
log_warning() {
|
||||
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||
}
|
||||
|
||||
log_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
cleanup() {
|
||||
log_info "Starting cleanup..."
|
||||
|
||||
# Kill webhook receiver if running
|
||||
if [ ! -z "$WEBHOOK_PID" ] && kill -0 $WEBHOOK_PID 2>/dev/null; then
|
||||
log_info "Stopping webhook receiver (PID: $WEBHOOK_PID)..."
|
||||
kill $WEBHOOK_PID 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# Kill server if running
|
||||
if [ ! -z "$SERVER_PID" ] && kill -0 $SERVER_PID 2>/dev/null; then
|
||||
log_info "Stopping server (PID: $SERVER_PID)..."
|
||||
kill $SERVER_PID 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# Kill Redis if running
|
||||
if [ ! -z "$REDIS_PID" ] && kill -0 $REDIS_PID 2>/dev/null; then
|
||||
log_info "Stopping Redis (PID: $REDIS_PID)..."
|
||||
kill $REDIS_PID 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# Also kill by port if PIDs didn't work
|
||||
lsof -ti:$SERVER_PORT | xargs kill -9 2>/dev/null || true
|
||||
lsof -ti:$WEBHOOK_PORT | xargs kill -9 2>/dev/null || true
|
||||
lsof -ti:6379 | xargs kill -9 2>/dev/null || true
|
||||
|
||||
# Return to original branch
|
||||
if [ ! -z "$ORIGINAL_BRANCH" ]; then
|
||||
log_info "Switching back to branch: $ORIGINAL_BRANCH"
|
||||
git checkout $ORIGINAL_BRANCH 2>/dev/null || true
|
||||
fi
|
||||
|
||||
log_success "Cleanup complete"
|
||||
}
|
||||
|
||||
# Set trap to cleanup on exit
|
||||
trap cleanup EXIT INT TERM
|
||||
|
||||
#############################################################################
|
||||
# Main Script
|
||||
#############################################################################
|
||||
|
||||
log_info "Starting webhook feature test script"
|
||||
log_info "Project root: $PROJECT_ROOT"
|
||||
|
||||
cd "$PROJECT_ROOT"
|
||||
|
||||
# Step 1: Save current branch and fetch PR
|
||||
log_info "Step 1: Fetching PR branch..."
|
||||
ORIGINAL_BRANCH=$(git rev-parse --abbrev-ref HEAD)
|
||||
log_info "Current branch: $ORIGINAL_BRANCH"
|
||||
|
||||
git fetch origin $BRANCH_NAME
|
||||
log_success "Branch fetched"
|
||||
|
||||
# Step 2: Switch to new branch
|
||||
log_info "Step 2: Switching to branch: $BRANCH_NAME"
|
||||
git checkout $BRANCH_NAME
|
||||
log_success "Switched to webhook feature branch"
|
||||
|
||||
# Step 3: Activate virtual environment
|
||||
log_info "Step 3: Activating virtual environment..."
|
||||
if [ ! -d "$VENV_PATH" ]; then
|
||||
log_error "Virtual environment not found at $VENV_PATH"
|
||||
log_info "Creating virtual environment..."
|
||||
python3 -m venv $VENV_PATH
|
||||
fi
|
||||
|
||||
source $VENV_PATH/bin/activate
|
||||
log_success "Virtual environment activated: $(which python)"
|
||||
|
||||
# Step 4: Install server dependencies
|
||||
log_info "Step 4: Installing server dependencies..."
|
||||
pip install -q -r deploy/docker/requirements.txt
|
||||
log_success "Dependencies installed"
|
||||
|
||||
# Check if Redis is available
|
||||
log_info "Checking Redis availability..."
|
||||
if ! command -v redis-server &> /dev/null; then
|
||||
log_warning "Redis not found, attempting to install..."
|
||||
if command -v apt-get &> /dev/null; then
|
||||
sudo apt-get update && sudo apt-get install -y redis-server
|
||||
elif command -v brew &> /dev/null; then
|
||||
brew install redis
|
||||
else
|
||||
log_error "Cannot install Redis automatically. Please install Redis manually."
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# Step 5: Start Redis in background
|
||||
log_info "Step 5a: Starting Redis..."
|
||||
redis-server --port 6379 --daemonize yes
|
||||
sleep 2
|
||||
REDIS_PID=$(pgrep redis-server)
|
||||
log_success "Redis started (PID: $REDIS_PID)"
|
||||
|
||||
# Step 5b: Start server in background
|
||||
log_info "Step 5b: Starting server on port $SERVER_PORT..."
|
||||
cd deploy/docker
|
||||
|
||||
# Start server in background
|
||||
python3 -m uvicorn server:app --host 0.0.0.0 --port $SERVER_PORT > /tmp/crawl4ai_server.log 2>&1 &
|
||||
SERVER_PID=$!
|
||||
cd "$PROJECT_ROOT"
|
||||
|
||||
log_info "Server started (PID: $SERVER_PID)"
|
||||
|
||||
# Wait for server to be ready
|
||||
log_info "Waiting for server to be ready..."
|
||||
for i in {1..30}; do
|
||||
if curl -s http://localhost:$SERVER_PORT/health > /dev/null 2>&1; then
|
||||
log_success "Server is ready!"
|
||||
break
|
||||
fi
|
||||
if [ $i -eq 30 ]; then
|
||||
log_error "Server failed to start within 30 seconds"
|
||||
log_info "Server logs:"
|
||||
tail -50 /tmp/crawl4ai_server.log
|
||||
exit 1
|
||||
fi
|
||||
echo -n "."
|
||||
sleep 1
|
||||
done
|
||||
echo ""
|
||||
|
||||
# Step 6: Create and run webhook test
|
||||
log_info "Step 6: Creating webhook test script..."
|
||||
|
||||
cat > /tmp/test_webhook.py << 'PYTHON_SCRIPT'
|
||||
import requests
|
||||
import json
|
||||
import time
|
||||
from flask import Flask, request, jsonify
|
||||
from threading import Thread, Event
|
||||
|
||||
# Configuration
|
||||
CRAWL4AI_BASE_URL = "http://localhost:11235"
|
||||
WEBHOOK_BASE_URL = "http://localhost:8080"
|
||||
|
||||
# Flask app for webhook receiver
|
||||
app = Flask(__name__)
|
||||
webhook_received = Event()
|
||||
webhook_data = {}
|
||||
|
||||
@app.route('/webhook', methods=['POST'])
|
||||
def handle_webhook():
|
||||
global webhook_data
|
||||
webhook_data = request.json
|
||||
webhook_received.set()
|
||||
print(f"\n✅ Webhook received: {json.dumps(webhook_data, indent=2)}")
|
||||
return jsonify({"status": "received"}), 200
|
||||
|
||||
def start_webhook_server():
|
||||
app.run(host='0.0.0.0', port=8080, debug=False, use_reloader=False)
|
||||
|
||||
# Start webhook server in background
|
||||
webhook_thread = Thread(target=start_webhook_server, daemon=True)
|
||||
webhook_thread.start()
|
||||
time.sleep(2)
|
||||
|
||||
print("🚀 Submitting crawl job with webhook...")
|
||||
|
||||
# Submit job with webhook
|
||||
payload = {
|
||||
"urls": ["https://example.com"],
|
||||
"browser_config": {"headless": True},
|
||||
"crawler_config": {"cache_mode": "bypass"},
|
||||
"webhook_config": {
|
||||
"webhook_url": f"{WEBHOOK_BASE_URL}/webhook",
|
||||
"webhook_data_in_payload": True
|
||||
}
|
||||
}
|
||||
|
||||
response = requests.post(
|
||||
f"{CRAWL4AI_BASE_URL}/crawl/job",
|
||||
json=payload,
|
||||
headers={"Content-Type": "application/json"}
|
||||
)
|
||||
|
||||
if not response.ok:
|
||||
print(f"❌ Failed to submit job: {response.text}")
|
||||
exit(1)
|
||||
|
||||
task_id = response.json()['task_id']
|
||||
print(f"✅ Job submitted successfully, task_id: {task_id}")
|
||||
|
||||
# Wait for webhook (with timeout)
|
||||
print("⏳ Waiting for webhook notification...")
|
||||
if webhook_received.wait(timeout=60):
|
||||
print(f"✅ Webhook received!")
|
||||
print(f" Task ID: {webhook_data.get('task_id')}")
|
||||
print(f" Status: {webhook_data.get('status')}")
|
||||
print(f" URLs: {webhook_data.get('urls')}")
|
||||
|
||||
if webhook_data.get('status') == 'completed':
|
||||
if 'data' in webhook_data:
|
||||
print(f" ✅ Data included in webhook payload")
|
||||
results = webhook_data['data'].get('results', [])
|
||||
if results:
|
||||
print(f" 📄 Crawled {len(results)} URL(s)")
|
||||
for result in results:
|
||||
print(f" - {result.get('url')}: {len(result.get('markdown', ''))} chars")
|
||||
print("\n🎉 Webhook test PASSED!")
|
||||
exit(0)
|
||||
else:
|
||||
print(f" ❌ Job failed: {webhook_data.get('error')}")
|
||||
exit(1)
|
||||
else:
|
||||
print("❌ Webhook not received within 60 seconds")
|
||||
# Try polling as fallback
|
||||
print("⏳ Trying to poll job status...")
|
||||
for i in range(10):
|
||||
status_response = requests.get(f"{CRAWL4AI_BASE_URL}/crawl/job/{task_id}")
|
||||
if status_response.ok:
|
||||
status = status_response.json()
|
||||
print(f" Status: {status.get('status')}")
|
||||
if status.get('status') in ['completed', 'failed']:
|
||||
break
|
||||
time.sleep(2)
|
||||
exit(1)
|
||||
PYTHON_SCRIPT
|
||||
|
||||
# Install Flask for webhook receiver
|
||||
pip install -q flask
|
||||
|
||||
# Run the webhook test
|
||||
log_info "Running webhook test..."
|
||||
python3 /tmp/test_webhook.py &
|
||||
WEBHOOK_PID=$!
|
||||
|
||||
# Wait for test to complete
|
||||
wait $WEBHOOK_PID
|
||||
TEST_EXIT_CODE=$?
|
||||
|
||||
# Step 7: Verify results
|
||||
log_info "Step 7: Verifying test results..."
|
||||
if [ $TEST_EXIT_CODE -eq 0 ]; then
|
||||
log_success "✅ Webhook test PASSED!"
|
||||
else
|
||||
log_error "❌ Webhook test FAILED (exit code: $TEST_EXIT_CODE)"
|
||||
log_info "Server logs:"
|
||||
tail -100 /tmp/crawl4ai_server.log
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Step 8: Cleanup happens automatically via trap
|
||||
log_success "All tests completed successfully! 🎉"
|
||||
log_info "Cleanup will happen automatically..."
|
||||
Reference in New Issue
Block a user