Add comprehensive stress testing solution for SDK using arun_many and dispatcher system: - Create test_stress_sdk.py for running high volume crawl tests - Add run_benchmark.py for orchestrating tests with predefined configs - Implement benchmark_report.py for generating performance reports - Add memory tracking and local test site generation - Support both streaming and batch processing modes - Add detailed documentation in README.md The framework enables testing SDK performance, concurrency handling, and memory behavior under high-volume scenarios.
316 lines
16 KiB
Markdown
316 lines
16 KiB
Markdown
# Crawl4AI Stress Testing and Benchmarking
|
|
|
|
This directory contains tools for stress testing Crawl4AI's `arun_many` method and dispatcher system with high volumes of URLs to evaluate performance, concurrency handling, and potentially detect memory issues. It also includes a benchmarking system to track performance over time.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Run a default stress test (small config) and generate a report
|
|
# (Assumes run_all.sh is updated to call run_benchmark.py)
|
|
./run_all.sh
|
|
```
|
|
*Note: `run_all.sh` might need to be updated if it directly called the old script.*
|
|
|
|
## Overview
|
|
|
|
The stress testing system works by:
|
|
|
|
1. Generating a local test site with heavy HTML pages (regenerated by default for each test).
|
|
2. Starting a local HTTP server to serve these pages.
|
|
3. Running Crawl4AI's `arun_many` method against this local site using the `MemoryAdaptiveDispatcher` with configurable concurrency (`max_sessions`).
|
|
4. Monitoring performance metrics via the `CrawlerMonitor` and optionally logging memory usage.
|
|
5. Optionally generating detailed benchmark reports with visualizations using `benchmark_report.py`.
|
|
|
|
## Available Tools
|
|
|
|
- `test_stress_sdk.py` - Main stress testing script utilizing `arun_many` and dispatchers.
|
|
- `benchmark_report.py` - Report generator for comparing test results (assumes compatibility with `test_stress_sdk.py` outputs).
|
|
- `run_benchmark.py` - Python script with predefined test configurations that orchestrates tests using `test_stress_sdk.py`.
|
|
- `run_all.sh` - Simple wrapper script (may need updating).
|
|
|
|
## Usage Guide
|
|
|
|
### Using Predefined Configurations (Recommended)
|
|
|
|
The `run_benchmark.py` script offers the easiest way to run standardized tests:
|
|
|
|
```bash
|
|
# Quick test (50 URLs, 4 max sessions)
|
|
python run_benchmark.py quick
|
|
|
|
# Medium test (500 URLs, 16 max sessions)
|
|
python run_benchmark.py medium
|
|
|
|
# Large test (1000 URLs, 32 max sessions)
|
|
python run_benchmark.py large
|
|
|
|
# Extreme test (2000 URLs, 64 max sessions)
|
|
python run_benchmark.py extreme
|
|
|
|
# Custom configuration
|
|
python run_benchmark.py custom --urls 300 --max-sessions 24 --chunk-size 50
|
|
|
|
# Run 'small' test in streaming mode
|
|
python run_benchmark.py small --stream
|
|
|
|
# Override max_sessions for the 'medium' config
|
|
python run_benchmark.py medium --max-sessions 20
|
|
|
|
# Skip benchmark report generation after the test
|
|
python run_benchmark.py small --no-report
|
|
|
|
# Clean up reports and site files before running
|
|
python run_benchmark.py medium --clean
|
|
```
|
|
|
|
#### `run_benchmark.py` Parameters
|
|
|
|
| Parameter | Default | Description |
|
|
| -------------------- | --------------- | --------------------------------------------------------------------------- |
|
|
| `config` | *required* | Test configuration: `quick`, `small`, `medium`, `large`, `extreme`, `custom`|
|
|
| `--urls` | config-specific | Number of URLs (required for `custom`) |
|
|
| `--max-sessions` | config-specific | Max concurrent sessions managed by dispatcher (required for `custom`) |
|
|
| `--chunk-size` | config-specific | URLs per batch for non-stream logging (required for `custom`) |
|
|
| `--stream` | False | Enable streaming results (disables batch logging) |
|
|
| `--monitor-mode` | DETAILED | `DETAILED` or `AGGREGATED` display for the live monitor |
|
|
| `--use-rate-limiter` | False | Enable basic rate limiter in the dispatcher |
|
|
| `--port` | 8000 | HTTP server port |
|
|
| `--no-report` | False | Skip generating comparison report via `benchmark_report.py` |
|
|
| `--clean` | False | Clean up reports and site files before running |
|
|
| `--keep-server-alive`| False | Keep local HTTP server running after test |
|
|
| `--use-existing-site`| False | Use existing site on specified port (no local server start/site gen) |
|
|
| `--skip-generation` | False | Use existing site files but start local server |
|
|
| `--keep-site` | False | Keep generated site files after test |
|
|
|
|
#### Predefined Configurations
|
|
|
|
| Configuration | URLs | Max Sessions | Chunk Size | Description |
|
|
| ------------- | ------ | ------------ | ---------- | -------------------------------- |
|
|
| `quick` | 50 | 4 | 10 | Quick test for basic validation |
|
|
| `small` | 100 | 8 | 20 | Small test for routine checks |
|
|
| `medium` | 500 | 16 | 50 | Medium test for thorough checks |
|
|
| `large` | 1000 | 32 | 100 | Large test for stress testing |
|
|
| `extreme` | 2000 | 64 | 200 | Extreme test for limit testing |
|
|
|
|
### Direct Usage of `test_stress_sdk.py`
|
|
|
|
For fine-grained control or debugging, you can run the stress test script directly:
|
|
|
|
```bash
|
|
# Test with 200 URLs and 32 max concurrent sessions
|
|
python test_stress_sdk.py --urls 200 --max-sessions 32 --chunk-size 40
|
|
|
|
# Clean up previous test data first
|
|
python test_stress_sdk.py --clean-reports --clean-site --urls 100 --max-sessions 16 --chunk-size 20
|
|
|
|
# Change the HTTP server port and use aggregated monitor
|
|
python test_stress_sdk.py --port 8088 --urls 100 --max-sessions 16 --monitor-mode AGGREGATED
|
|
|
|
# Enable streaming mode and use rate limiting
|
|
python test_stress_sdk.py --urls 50 --max-sessions 8 --stream --use-rate-limiter
|
|
|
|
# Change report output location
|
|
python test_stress_sdk.py --report-path custom_reports --urls 100 --max-sessions 16
|
|
```
|
|
|
|
#### `test_stress_sdk.py` Parameters
|
|
|
|
| Parameter | Default | Description |
|
|
| -------------------- | ---------- | -------------------------------------------------------------------- |
|
|
| `--urls` | 100 | Number of URLs to test |
|
|
| `--max-sessions` | 16 | Maximum concurrent crawling sessions managed by the dispatcher |
|
|
| `--chunk-size` | 10 | Number of URLs per batch (relevant for non-stream logging) |
|
|
| `--stream` | False | Enable streaming results (disables batch logging) |
|
|
| `--monitor-mode` | DETAILED | `DETAILED` or `AGGREGATED` display for the live `CrawlerMonitor` |
|
|
| `--use-rate-limiter` | False | Enable a basic `RateLimiter` within the dispatcher |
|
|
| `--site-path` | "test_site"| Path to store/use the generated test site |
|
|
| `--port` | 8000 | Port for the local HTTP server |
|
|
| `--report-path` | "reports" | Path to save test result summary (JSON) and memory samples (CSV) |
|
|
| `--skip-generation` | False | Use existing test site files but still start local server |
|
|
| `--use-existing-site`| False | Use existing site on specified port (no local server/site gen) |
|
|
| `--keep-server-alive`| False | Keep local HTTP server running after test completion |
|
|
| `--keep-site` | False | Keep the generated test site files after test completion |
|
|
| `--clean-reports` | False | Clean up report directory before running |
|
|
| `--clean-site` | False | Clean up site directory before/after running (see script logic) |
|
|
|
|
### Generating Reports Only
|
|
|
|
If you only want to generate a benchmark report from existing test results (assuming `benchmark_report.py` is compatible):
|
|
|
|
```bash
|
|
# Generate a report from existing test results in ./reports/
|
|
python benchmark_report.py
|
|
|
|
# Limit to the most recent 5 test results
|
|
python benchmark_report.py --limit 5
|
|
|
|
# Specify a custom source directory for test results
|
|
python benchmark_report.py --reports-dir alternate_results
|
|
```
|
|
|
|
#### `benchmark_report.py` Parameters (Assumed)
|
|
|
|
| Parameter | Default | Description |
|
|
| --------------- | -------------------- | ----------------------------------------------------------- |
|
|
| `--reports-dir` | "reports" | Directory containing `test_stress_sdk.py` result files |
|
|
| `--output-dir` | "benchmark_reports" | Directory to save generated HTML reports and charts |
|
|
| `--limit` | None (all results) | Limit comparison to N most recent test results |
|
|
| `--output-file` | Auto-generated | Custom output filename for the HTML report |
|
|
|
|
## Understanding the Test Output
|
|
|
|
### Real-time Progress Display (`CrawlerMonitor`)
|
|
|
|
When running `test_stress_sdk.py`, the `CrawlerMonitor` provides a live view of the crawling process managed by the dispatcher.
|
|
|
|
- **DETAILED Mode (Default):** Shows individual task status (Queued, Active, Completed, Failed), timings, memory usage per task (if `psutil` is available), overall queue statistics, and memory pressure status (if `psutil` available).
|
|
- **AGGREGATED Mode:** Shows summary counts (Queued, Active, Completed, Failed), overall progress percentage, estimated time remaining, average URLs/sec, and memory pressure status.
|
|
|
|
### Batch Log Output (Non-Streaming Mode Only)
|
|
|
|
If running `test_stress_sdk.py` **without** the `--stream` flag, you will *also* see per-batch summary lines printed to the console *after* the monitor display, once each chunk of URLs finishes processing:
|
|
|
|
```
|
|
Batch | Progress | Start Mem | End Mem | URLs/sec | Success/Fail | Time (s) | Status
|
|
───────────────────────────────────────────────────────────────────────────────────────────
|
|
1 | 10.0% | 50.1 MB | 55.3 MB | 23.8 | 10/0 | 0.42 | Success
|
|
2 | 20.0% | 55.3 MB | 60.1 MB | 24.1 | 10/0 | 0.41 | Success
|
|
...
|
|
```
|
|
|
|
This display provides chunk-specific metrics:
|
|
- **Batch**: The batch number being reported.
|
|
- **Progress**: Overall percentage of total URLs processed *after* this batch.
|
|
- **Start Mem / End Mem**: Memory usage before and after processing this batch (if tracked).
|
|
- **URLs/sec**: Processing speed *for this specific batch*.
|
|
- **Success/Fail**: Number of successful and failed URLs *in this batch*.
|
|
- **Time (s)**: Wall-clock time taken to process *this batch*.
|
|
- **Status**: Color-coded status for the batch outcome.
|
|
|
|
### Summary Output
|
|
|
|
After test completion, a final summary is displayed:
|
|
|
|
```
|
|
================================================================================
|
|
Test Completed
|
|
================================================================================
|
|
Test ID: 20250418_103015
|
|
Configuration: 100 URLs, 16 max sessions, Chunk: 10, Stream: False, Monitor: DETAILED
|
|
Results: 100 successful, 0 failed (100 processed, 100.0% success)
|
|
Performance: 5.85 seconds total, 17.09 URLs/second avg
|
|
Memory Usage: Start: 50.1 MB, End: 75.3 MB, Max: 78.1 MB, Growth: 25.2 MB
|
|
Results summary saved to reports/test_summary_20250418_103015.json
|
|
```
|
|
|
|
### HTML Report Structure (Generated by `benchmark_report.py`)
|
|
|
|
(This section remains the same, assuming `benchmark_report.py` generates these)
|
|
The benchmark report contains several sections:
|
|
1. **Summary**: Overview of the latest test results and trends
|
|
2. **Performance Comparison**: Charts showing throughput across tests
|
|
3. **Memory Usage**: Detailed memory usage graphs for each test
|
|
4. **Detailed Results**: Tabular data of all test metrics
|
|
5. **Conclusion**: Automated analysis of performance and memory patterns
|
|
|
|
### Memory Metrics
|
|
|
|
(This section remains conceptually the same)
|
|
Memory growth is the key metric for detecting leaks...
|
|
|
|
### Performance Metrics
|
|
|
|
(This section remains conceptually the same, though "URLs per Worker" is less relevant - focus on overall URLs/sec)
|
|
Key performance indicators include:
|
|
- **URLs per Second**: Higher is better (throughput)
|
|
- **Success Rate**: Should be 100% in normal conditions
|
|
- **Total Processing Time**: Lower is better
|
|
- **Dispatcher Efficiency**: Observe queue lengths and wait times in the monitor (Detailed mode)
|
|
|
|
### Raw Data Files
|
|
|
|
Raw data is saved in the `--report-path` directory (default `./reports/`):
|
|
|
|
- **JSON files** (`test_summary_*.json`): Contains the final summary for each test run.
|
|
- **CSV files** (`memory_samples_*.csv`): Contains time-series memory samples taken during the test run.
|
|
|
|
Example of reading raw data:
|
|
```python
|
|
import json
|
|
import pandas as pd
|
|
|
|
# Load test summary
|
|
test_id = "20250418_103015" # Example ID
|
|
with open(f'reports/test_summary_{test_id}.json', 'r') as f:
|
|
results = json.load(f)
|
|
|
|
# Load memory samples
|
|
memory_df = pd.read_csv(f'reports/memory_samples_{test_id}.csv')
|
|
|
|
# Analyze memory_df (e.g., calculate growth, plot)
|
|
if not memory_df['memory_info_mb'].isnull().all():
|
|
growth = memory_df['memory_info_mb'].iloc[-1] - memory_df['memory_info_mb'].iloc[0]
|
|
print(f"Total Memory Growth: {growth:.1f} MB")
|
|
else:
|
|
print("No valid memory samples found.")
|
|
|
|
print(f"Avg URLs/sec: {results['urls_processed'] / results['total_time_seconds']:.2f}")
|
|
```
|
|
|
|
## Visualization Dependencies
|
|
|
|
(This section remains the same)
|
|
For full visualization capabilities in the HTML reports generated by `benchmark_report.py`, install additional dependencies...
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
benchmarking/ # Or your top-level directory name
|
|
├── benchmark_reports/ # Generated HTML reports (by benchmark_report.py)
|
|
├── reports/ # Raw test result data (from test_stress_sdk.py)
|
|
├── test_site/ # Generated test content (temporary)
|
|
├── benchmark_report.py# Report generator
|
|
├── run_benchmark.py # Test runner with predefined configs
|
|
├── test_stress_sdk.py # Main stress test implementation using arun_many
|
|
└── run_all.sh # Simple wrapper script (may need updates)
|
|
#└── requirements.txt # Optional: Visualization dependencies for benchmark_report.py
|
|
```
|
|
|
|
## Cleanup
|
|
|
|
To clean up after testing:
|
|
|
|
```bash
|
|
# Remove the test site content (if not using --keep-site)
|
|
rm -rf test_site
|
|
|
|
# Remove all raw reports and generated benchmark reports
|
|
rm -rf reports benchmark_reports
|
|
|
|
# Or use the --clean flag with run_benchmark.py
|
|
python run_benchmark.py medium --clean
|
|
```
|
|
|
|
## Use in CI/CD
|
|
|
|
(This section remains conceptually the same, just update script names)
|
|
These tests can be integrated into CI/CD pipelines:
|
|
```bash
|
|
# Example CI script
|
|
python run_benchmark.py medium --no-report # Run test without interactive report gen
|
|
# Check exit code
|
|
if [ $? -ne 0 ]; then echo "Stress test failed!"; exit 1; fi
|
|
# Optionally, run report generator and check its output/metrics
|
|
# python benchmark_report.py
|
|
# check_report_metrics.py reports/test_summary_*.json || exit 1
|
|
exit 0
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
- **HTTP Server Port Conflict**: Use `--port` with `run_benchmark.py` or `test_stress_sdk.py`.
|
|
- **Memory Tracking Issues**: The `SimpleMemoryTracker` uses platform commands (`ps`, `/proc`, `tasklist`). Ensure these are available and the script has permission. If it consistently fails, memory reporting will be limited.
|
|
- **Visualization Missing**: Related to `benchmark_report.py` and its dependencies.
|
|
- **Site Generation Issues**: Check permissions for creating `./test_site/`. Use `--skip-generation` if you want to manage the site manually.
|
|
- **Testing Against External Site**: Ensure the external site is running and use `--use-existing-site --port <correct_port>`.
|