crawl4ai/tests/WEBHOOK_TEST_README.md

# Webhook Feature Test Script

This directory contains a comprehensive test script for the webhook feature implementation.

## Overview

The `test_webhook_feature.sh` script automates the entire process of testing the webhook feature:

1. ✅ Fetches and switches to the webhook feature branch
2. ✅ Activates the virtual environment
3. ✅ Installs all required dependencies
4. ✅ Starts Redis server in background
5. ✅ Starts Crawl4AI server in background
6. ✅ Runs webhook integration test
7. ✅ Verifies job completion via webhook
8. ✅ Cleans up and returns to original branch

## Prerequisites

- Python 3.10+
- Virtual environment already created (`venv/` in project root)
- Git repository with the webhook feature branch
- `redis-server` (script will attempt to install if missing)
- `curl` and `lsof` commands available

## Usage

### Quick Start

From the project root:

```bash
./tests/test_webhook_feature.sh
```

Or from the tests directory:

```bash
cd tests
./test_webhook_feature.sh
```

### What the Script Does

#### Step 1: Branch Management
- Saves your current branch
- Fetches the webhook feature branch from remote
- Switches to the webhook feature branch

#### Step 2: Environment Setup
- Activates your existing virtual environment
- Installs dependencies from `deploy/docker/requirements.txt`
- Installs Flask for the webhook receiver

#### Step 3: Service Startup
- Starts Redis server on port 6379
- Starts Crawl4AI server on port 11235
- Waits for server health check to pass

#### Step 4: Webhook Test
- Creates a webhook receiver on port 8080
- Submits a crawl job for `https://example.com` with webhook config
- Waits for webhook notification (60s timeout)
- Verifies webhook payload contains expected data

#### Step 5: Cleanup
- Stops webhook receiver
- Stops Crawl4AI server
- Stops Redis server
- Returns to your original branch

## Expected Output

```
[INFO] Starting webhook feature test script
[INFO] Project root: /path/to/crawl4ai
[INFO] Step 1: Fetching PR branch...
[INFO] Current branch: develop
[SUCCESS] Branch fetched
[INFO] Step 2: Switching to branch: claude/implement-webhook-crawl-feature-011CULZY1Jy8N5MUkZqXkRVp
[SUCCESS] Switched to webhook feature branch
[INFO] Step 3: Activating virtual environment...
[SUCCESS] Virtual environment activated
[INFO] Step 4: Installing server dependencies...
[SUCCESS] Dependencies installed
[INFO] Step 5a: Starting Redis...
[SUCCESS] Redis started (PID: 12345)
[INFO] Step 5b: Starting server on port 11235...
[INFO] Server started (PID: 12346)
[INFO] Waiting for server to be ready...
[SUCCESS] Server is ready!
[INFO] Step 6: Creating webhook test script...
[INFO] Running webhook test...

🚀 Submitting crawl job with webhook...
✅ Job submitted successfully, task_id: crawl_abc123
⏳ Waiting for webhook notification...

✅ Webhook received: {
  "task_id": "crawl_abc123",
  "task_type": "crawl",
  "status": "completed",
  "timestamp": "2025-10-22T00:00:00.000000+00:00",
  "urls": ["https://example.com"],
  "data": { ... }
}

✅ Webhook received!
   Task ID: crawl_abc123
   Status: completed
   URLs: ['https://example.com']
   ✅ Data included in webhook payload
   📄 Crawled 1 URL(s)
      - https://example.com: 1234 chars

🎉 Webhook test PASSED!

[INFO] Step 7: Verifying test results...
[SUCCESS] ✅ Webhook test PASSED!
[SUCCESS] All tests completed successfully! 🎉
[INFO] Cleanup will happen automatically...
[INFO] Starting cleanup...
[INFO] Stopping webhook receiver...
[INFO] Stopping server...
[INFO] Stopping Redis...
[INFO] Switching back to branch: develop
[SUCCESS] Cleanup complete
```

## Troubleshooting

### Server Failed to Start

If the server fails to start, check the logs:

```bash
tail -100 /tmp/crawl4ai_server.log
```

Common issues:
- Port 11235 already in use: `lsof -ti:11235 | xargs kill -9`
- Missing dependencies: Check that all packages are installed

### Redis Connection Failed

Check if Redis is running:

```bash
redis-cli ping
# Should return: PONG
```

If not running:

```bash
redis-server --port 6379 --daemonize yes
```

### Webhook Not Received

The script has a 60-second timeout for webhook delivery. If the webhook isn't received:

1. Check server logs: `/tmp/crawl4ai_server.log`
2. Verify webhook receiver is running on port 8080
3. Check network connectivity between components

### Script Interruption

If the script is interrupted (Ctrl+C), cleanup happens automatically via trap. The script will:
- Kill all background processes
- Stop Redis
- Return to your original branch

To manually cleanup if needed:

```bash
# Kill processes by port
lsof -ti:11235 | xargs kill -9  # Server
lsof -ti:8080 | xargs kill -9   # Webhook receiver
lsof -ti:6379 | xargs kill -9   # Redis

# Return to your branch
git checkout develop  # or your branch name
```

## Testing Different URLs

To test with a different URL, modify the script or create a custom test:

```python
payload = {
    "urls": ["https://your-url-here.com"],
    "browser_config": {"headless": True},
    "crawler_config": {"cache_mode": "bypass"},
    "webhook_config": {
        "webhook_url": "http://localhost:8080/webhook",
        "webhook_data_in_payload": True
    }
}
```

## Files Generated

The script creates temporary files:

- `/tmp/crawl4ai_server.log` - Server output logs
- `/tmp/test_webhook.py` - Webhook test Python script

These are not cleaned up automatically so you can review them after the test.

## Exit Codes

- `0` - All tests passed successfully
- `1` - Test failed (check output for details)

## Safety Features

- ✅ Automatic cleanup on exit, interrupt, or error
- ✅ Returns to original branch on completion
- ✅ Kills all background processes
- ✅ Comprehensive error handling
- ✅ Colored output for easy reading
- ✅ Detailed logging at each step

## Notes

- The script uses `set -e` to exit on any command failure
- All background processes are tracked and cleaned up
- The virtual environment must exist before running
- Redis must be available (installed or installable via apt-get/brew)

## Integration with CI/CD

This script can be integrated into CI/CD pipelines:

```yaml
# Example GitHub Actions
- name: Test Webhook Feature
  run: |
    chmod +x tests/test_webhook_feature.sh
    ./tests/test_webhook_feature.sh
```

## Support

If you encounter issues:

1. Check the troubleshooting section above
2. Review server logs at `/tmp/crawl4ai_server.log`
3. Ensure all prerequisites are met
4. Open an issue with the full output of the script