feat(tests): implement high volume stress testing framework
Add comprehensive stress testing solution for SDK using arun_many and dispatcher system: - Create test_stress_sdk.py for running high volume crawl tests - Add run_benchmark.py for orchestrating tests with predefined configs - Implement benchmark_report.py for generating performance reports - Add memory tracking and local test site generation - Support both streaming and batch processing modes - Add detailed documentation in README.md The framework enables testing SDK performance, concurrency handling, and memory behavior under high-volume scenarios.
This commit is contained in:
315
tests/memory/README.md
Normal file
315
tests/memory/README.md
Normal file
@@ -0,0 +1,315 @@
|
||||
# Crawl4AI Stress Testing and Benchmarking
|
||||
|
||||
This directory contains tools for stress testing Crawl4AI's `arun_many` method and dispatcher system with high volumes of URLs to evaluate performance, concurrency handling, and potentially detect memory issues. It also includes a benchmarking system to track performance over time.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Run a default stress test (small config) and generate a report
|
||||
# (Assumes run_all.sh is updated to call run_benchmark.py)
|
||||
./run_all.sh
|
||||
```
|
||||
*Note: `run_all.sh` might need to be updated if it directly called the old script.*
|
||||
|
||||
## Overview
|
||||
|
||||
The stress testing system works by:
|
||||
|
||||
1. Generating a local test site with heavy HTML pages (regenerated by default for each test).
|
||||
2. Starting a local HTTP server to serve these pages.
|
||||
3. Running Crawl4AI's `arun_many` method against this local site using the `MemoryAdaptiveDispatcher` with configurable concurrency (`max_sessions`).
|
||||
4. Monitoring performance metrics via the `CrawlerMonitor` and optionally logging memory usage.
|
||||
5. Optionally generating detailed benchmark reports with visualizations using `benchmark_report.py`.
|
||||
|
||||
## Available Tools
|
||||
|
||||
- `test_stress_sdk.py` - Main stress testing script utilizing `arun_many` and dispatchers.
|
||||
- `benchmark_report.py` - Report generator for comparing test results (assumes compatibility with `test_stress_sdk.py` outputs).
|
||||
- `run_benchmark.py` - Python script with predefined test configurations that orchestrates tests using `test_stress_sdk.py`.
|
||||
- `run_all.sh` - Simple wrapper script (may need updating).
|
||||
|
||||
## Usage Guide
|
||||
|
||||
### Using Predefined Configurations (Recommended)
|
||||
|
||||
The `run_benchmark.py` script offers the easiest way to run standardized tests:
|
||||
|
||||
```bash
|
||||
# Quick test (50 URLs, 4 max sessions)
|
||||
python run_benchmark.py quick
|
||||
|
||||
# Medium test (500 URLs, 16 max sessions)
|
||||
python run_benchmark.py medium
|
||||
|
||||
# Large test (1000 URLs, 32 max sessions)
|
||||
python run_benchmark.py large
|
||||
|
||||
# Extreme test (2000 URLs, 64 max sessions)
|
||||
python run_benchmark.py extreme
|
||||
|
||||
# Custom configuration
|
||||
python run_benchmark.py custom --urls 300 --max-sessions 24 --chunk-size 50
|
||||
|
||||
# Run 'small' test in streaming mode
|
||||
python run_benchmark.py small --stream
|
||||
|
||||
# Override max_sessions for the 'medium' config
|
||||
python run_benchmark.py medium --max-sessions 20
|
||||
|
||||
# Skip benchmark report generation after the test
|
||||
python run_benchmark.py small --no-report
|
||||
|
||||
# Clean up reports and site files before running
|
||||
python run_benchmark.py medium --clean
|
||||
```
|
||||
|
||||
#### `run_benchmark.py` Parameters
|
||||
|
||||
| Parameter | Default | Description |
|
||||
| -------------------- | --------------- | --------------------------------------------------------------------------- |
|
||||
| `config` | *required* | Test configuration: `quick`, `small`, `medium`, `large`, `extreme`, `custom`|
|
||||
| `--urls` | config-specific | Number of URLs (required for `custom`) |
|
||||
| `--max-sessions` | config-specific | Max concurrent sessions managed by dispatcher (required for `custom`) |
|
||||
| `--chunk-size` | config-specific | URLs per batch for non-stream logging (required for `custom`) |
|
||||
| `--stream` | False | Enable streaming results (disables batch logging) |
|
||||
| `--monitor-mode` | DETAILED | `DETAILED` or `AGGREGATED` display for the live monitor |
|
||||
| `--use-rate-limiter` | False | Enable basic rate limiter in the dispatcher |
|
||||
| `--port` | 8000 | HTTP server port |
|
||||
| `--no-report` | False | Skip generating comparison report via `benchmark_report.py` |
|
||||
| `--clean` | False | Clean up reports and site files before running |
|
||||
| `--keep-server-alive`| False | Keep local HTTP server running after test |
|
||||
| `--use-existing-site`| False | Use existing site on specified port (no local server start/site gen) |
|
||||
| `--skip-generation` | False | Use existing site files but start local server |
|
||||
| `--keep-site` | False | Keep generated site files after test |
|
||||
|
||||
#### Predefined Configurations
|
||||
|
||||
| Configuration | URLs | Max Sessions | Chunk Size | Description |
|
||||
| ------------- | ------ | ------------ | ---------- | -------------------------------- |
|
||||
| `quick` | 50 | 4 | 10 | Quick test for basic validation |
|
||||
| `small` | 100 | 8 | 20 | Small test for routine checks |
|
||||
| `medium` | 500 | 16 | 50 | Medium test for thorough checks |
|
||||
| `large` | 1000 | 32 | 100 | Large test for stress testing |
|
||||
| `extreme` | 2000 | 64 | 200 | Extreme test for limit testing |
|
||||
|
||||
### Direct Usage of `test_stress_sdk.py`
|
||||
|
||||
For fine-grained control or debugging, you can run the stress test script directly:
|
||||
|
||||
```bash
|
||||
# Test with 200 URLs and 32 max concurrent sessions
|
||||
python test_stress_sdk.py --urls 200 --max-sessions 32 --chunk-size 40
|
||||
|
||||
# Clean up previous test data first
|
||||
python test_stress_sdk.py --clean-reports --clean-site --urls 100 --max-sessions 16 --chunk-size 20
|
||||
|
||||
# Change the HTTP server port and use aggregated monitor
|
||||
python test_stress_sdk.py --port 8088 --urls 100 --max-sessions 16 --monitor-mode AGGREGATED
|
||||
|
||||
# Enable streaming mode and use rate limiting
|
||||
python test_stress_sdk.py --urls 50 --max-sessions 8 --stream --use-rate-limiter
|
||||
|
||||
# Change report output location
|
||||
python test_stress_sdk.py --report-path custom_reports --urls 100 --max-sessions 16
|
||||
```
|
||||
|
||||
#### `test_stress_sdk.py` Parameters
|
||||
|
||||
| Parameter | Default | Description |
|
||||
| -------------------- | ---------- | -------------------------------------------------------------------- |
|
||||
| `--urls` | 100 | Number of URLs to test |
|
||||
| `--max-sessions` | 16 | Maximum concurrent crawling sessions managed by the dispatcher |
|
||||
| `--chunk-size` | 10 | Number of URLs per batch (relevant for non-stream logging) |
|
||||
| `--stream` | False | Enable streaming results (disables batch logging) |
|
||||
| `--monitor-mode` | DETAILED | `DETAILED` or `AGGREGATED` display for the live `CrawlerMonitor` |
|
||||
| `--use-rate-limiter` | False | Enable a basic `RateLimiter` within the dispatcher |
|
||||
| `--site-path` | "test_site"| Path to store/use the generated test site |
|
||||
| `--port` | 8000 | Port for the local HTTP server |
|
||||
| `--report-path` | "reports" | Path to save test result summary (JSON) and memory samples (CSV) |
|
||||
| `--skip-generation` | False | Use existing test site files but still start local server |
|
||||
| `--use-existing-site`| False | Use existing site on specified port (no local server/site gen) |
|
||||
| `--keep-server-alive`| False | Keep local HTTP server running after test completion |
|
||||
| `--keep-site` | False | Keep the generated test site files after test completion |
|
||||
| `--clean-reports` | False | Clean up report directory before running |
|
||||
| `--clean-site` | False | Clean up site directory before/after running (see script logic) |
|
||||
|
||||
### Generating Reports Only
|
||||
|
||||
If you only want to generate a benchmark report from existing test results (assuming `benchmark_report.py` is compatible):
|
||||
|
||||
```bash
|
||||
# Generate a report from existing test results in ./reports/
|
||||
python benchmark_report.py
|
||||
|
||||
# Limit to the most recent 5 test results
|
||||
python benchmark_report.py --limit 5
|
||||
|
||||
# Specify a custom source directory for test results
|
||||
python benchmark_report.py --reports-dir alternate_results
|
||||
```
|
||||
|
||||
#### `benchmark_report.py` Parameters (Assumed)
|
||||
|
||||
| Parameter | Default | Description |
|
||||
| --------------- | -------------------- | ----------------------------------------------------------- |
|
||||
| `--reports-dir` | "reports" | Directory containing `test_stress_sdk.py` result files |
|
||||
| `--output-dir` | "benchmark_reports" | Directory to save generated HTML reports and charts |
|
||||
| `--limit` | None (all results) | Limit comparison to N most recent test results |
|
||||
| `--output-file` | Auto-generated | Custom output filename for the HTML report |
|
||||
|
||||
## Understanding the Test Output
|
||||
|
||||
### Real-time Progress Display (`CrawlerMonitor`)
|
||||
|
||||
When running `test_stress_sdk.py`, the `CrawlerMonitor` provides a live view of the crawling process managed by the dispatcher.
|
||||
|
||||
- **DETAILED Mode (Default):** Shows individual task status (Queued, Active, Completed, Failed), timings, memory usage per task (if `psutil` is available), overall queue statistics, and memory pressure status (if `psutil` available).
|
||||
- **AGGREGATED Mode:** Shows summary counts (Queued, Active, Completed, Failed), overall progress percentage, estimated time remaining, average URLs/sec, and memory pressure status.
|
||||
|
||||
### Batch Log Output (Non-Streaming Mode Only)
|
||||
|
||||
If running `test_stress_sdk.py` **without** the `--stream` flag, you will *also* see per-batch summary lines printed to the console *after* the monitor display, once each chunk of URLs finishes processing:
|
||||
|
||||
```
|
||||
Batch | Progress | Start Mem | End Mem | URLs/sec | Success/Fail | Time (s) | Status
|
||||
───────────────────────────────────────────────────────────────────────────────────────────
|
||||
1 | 10.0% | 50.1 MB | 55.3 MB | 23.8 | 10/0 | 0.42 | Success
|
||||
2 | 20.0% | 55.3 MB | 60.1 MB | 24.1 | 10/0 | 0.41 | Success
|
||||
...
|
||||
```
|
||||
|
||||
This display provides chunk-specific metrics:
|
||||
- **Batch**: The batch number being reported.
|
||||
- **Progress**: Overall percentage of total URLs processed *after* this batch.
|
||||
- **Start Mem / End Mem**: Memory usage before and after processing this batch (if tracked).
|
||||
- **URLs/sec**: Processing speed *for this specific batch*.
|
||||
- **Success/Fail**: Number of successful and failed URLs *in this batch*.
|
||||
- **Time (s)**: Wall-clock time taken to process *this batch*.
|
||||
- **Status**: Color-coded status for the batch outcome.
|
||||
|
||||
### Summary Output
|
||||
|
||||
After test completion, a final summary is displayed:
|
||||
|
||||
```
|
||||
================================================================================
|
||||
Test Completed
|
||||
================================================================================
|
||||
Test ID: 20250418_103015
|
||||
Configuration: 100 URLs, 16 max sessions, Chunk: 10, Stream: False, Monitor: DETAILED
|
||||
Results: 100 successful, 0 failed (100 processed, 100.0% success)
|
||||
Performance: 5.85 seconds total, 17.09 URLs/second avg
|
||||
Memory Usage: Start: 50.1 MB, End: 75.3 MB, Max: 78.1 MB, Growth: 25.2 MB
|
||||
Results summary saved to reports/test_summary_20250418_103015.json
|
||||
```
|
||||
|
||||
### HTML Report Structure (Generated by `benchmark_report.py`)
|
||||
|
||||
(This section remains the same, assuming `benchmark_report.py` generates these)
|
||||
The benchmark report contains several sections:
|
||||
1. **Summary**: Overview of the latest test results and trends
|
||||
2. **Performance Comparison**: Charts showing throughput across tests
|
||||
3. **Memory Usage**: Detailed memory usage graphs for each test
|
||||
4. **Detailed Results**: Tabular data of all test metrics
|
||||
5. **Conclusion**: Automated analysis of performance and memory patterns
|
||||
|
||||
### Memory Metrics
|
||||
|
||||
(This section remains conceptually the same)
|
||||
Memory growth is the key metric for detecting leaks...
|
||||
|
||||
### Performance Metrics
|
||||
|
||||
(This section remains conceptually the same, though "URLs per Worker" is less relevant - focus on overall URLs/sec)
|
||||
Key performance indicators include:
|
||||
- **URLs per Second**: Higher is better (throughput)
|
||||
- **Success Rate**: Should be 100% in normal conditions
|
||||
- **Total Processing Time**: Lower is better
|
||||
- **Dispatcher Efficiency**: Observe queue lengths and wait times in the monitor (Detailed mode)
|
||||
|
||||
### Raw Data Files
|
||||
|
||||
Raw data is saved in the `--report-path` directory (default `./reports/`):
|
||||
|
||||
- **JSON files** (`test_summary_*.json`): Contains the final summary for each test run.
|
||||
- **CSV files** (`memory_samples_*.csv`): Contains time-series memory samples taken during the test run.
|
||||
|
||||
Example of reading raw data:
|
||||
```python
|
||||
import json
|
||||
import pandas as pd
|
||||
|
||||
# Load test summary
|
||||
test_id = "20250418_103015" # Example ID
|
||||
with open(f'reports/test_summary_{test_id}.json', 'r') as f:
|
||||
results = json.load(f)
|
||||
|
||||
# Load memory samples
|
||||
memory_df = pd.read_csv(f'reports/memory_samples_{test_id}.csv')
|
||||
|
||||
# Analyze memory_df (e.g., calculate growth, plot)
|
||||
if not memory_df['memory_info_mb'].isnull().all():
|
||||
growth = memory_df['memory_info_mb'].iloc[-1] - memory_df['memory_info_mb'].iloc[0]
|
||||
print(f"Total Memory Growth: {growth:.1f} MB")
|
||||
else:
|
||||
print("No valid memory samples found.")
|
||||
|
||||
print(f"Avg URLs/sec: {results['urls_processed'] / results['total_time_seconds']:.2f}")
|
||||
```
|
||||
|
||||
## Visualization Dependencies
|
||||
|
||||
(This section remains the same)
|
||||
For full visualization capabilities in the HTML reports generated by `benchmark_report.py`, install additional dependencies...
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
benchmarking/ # Or your top-level directory name
|
||||
├── benchmark_reports/ # Generated HTML reports (by benchmark_report.py)
|
||||
├── reports/ # Raw test result data (from test_stress_sdk.py)
|
||||
├── test_site/ # Generated test content (temporary)
|
||||
├── benchmark_report.py# Report generator
|
||||
├── run_benchmark.py # Test runner with predefined configs
|
||||
├── test_stress_sdk.py # Main stress test implementation using arun_many
|
||||
└── run_all.sh # Simple wrapper script (may need updates)
|
||||
#└── requirements.txt # Optional: Visualization dependencies for benchmark_report.py
|
||||
```
|
||||
|
||||
## Cleanup
|
||||
|
||||
To clean up after testing:
|
||||
|
||||
```bash
|
||||
# Remove the test site content (if not using --keep-site)
|
||||
rm -rf test_site
|
||||
|
||||
# Remove all raw reports and generated benchmark reports
|
||||
rm -rf reports benchmark_reports
|
||||
|
||||
# Or use the --clean flag with run_benchmark.py
|
||||
python run_benchmark.py medium --clean
|
||||
```
|
||||
|
||||
## Use in CI/CD
|
||||
|
||||
(This section remains conceptually the same, just update script names)
|
||||
These tests can be integrated into CI/CD pipelines:
|
||||
```bash
|
||||
# Example CI script
|
||||
python run_benchmark.py medium --no-report # Run test without interactive report gen
|
||||
# Check exit code
|
||||
if [ $? -ne 0 ]; then echo "Stress test failed!"; exit 1; fi
|
||||
# Optionally, run report generator and check its output/metrics
|
||||
# python benchmark_report.py
|
||||
# check_report_metrics.py reports/test_summary_*.json || exit 1
|
||||
exit 0
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- **HTTP Server Port Conflict**: Use `--port` with `run_benchmark.py` or `test_stress_sdk.py`.
|
||||
- **Memory Tracking Issues**: The `SimpleMemoryTracker` uses platform commands (`ps`, `/proc`, `tasklist`). Ensure these are available and the script has permission. If it consistently fails, memory reporting will be limited.
|
||||
- **Visualization Missing**: Related to `benchmark_report.py` and its dependencies.
|
||||
- **Site Generation Issues**: Check permissions for creating `./test_site/`. Use `--skip-generation` if you want to manage the site manually.
|
||||
- **Testing Against External Site**: Ensure the external site is running and use `--use-existing-site --port <correct_port>`.
|
||||
Reference in New Issue
Block a user