Add comprehensive stress testing solution for SDK using arun_many and dispatcher system: - Create test_stress_sdk.py for running high volume crawl tests - Add run_benchmark.py for orchestrating tests with predefined configs - Implement benchmark_report.py for generating performance reports - Add memory tracking and local test site generation - Support both streaming and batch processing modes - Add detailed documentation in README.md The framework enables testing SDK performance, concurrency handling, and memory behavior under high-volume scenarios.
16 KiB
Crawl4AI Stress Testing and Benchmarking
This directory contains tools for stress testing Crawl4AI's arun_many method and dispatcher system with high volumes of URLs to evaluate performance, concurrency handling, and potentially detect memory issues. It also includes a benchmarking system to track performance over time.
Quick Start
# Run a default stress test (small config) and generate a report
# (Assumes run_all.sh is updated to call run_benchmark.py)
./run_all.sh
Note: run_all.sh might need to be updated if it directly called the old script.
Overview
The stress testing system works by:
- Generating a local test site with heavy HTML pages (regenerated by default for each test).
- Starting a local HTTP server to serve these pages.
- Running Crawl4AI's
arun_manymethod against this local site using theMemoryAdaptiveDispatcherwith configurable concurrency (max_sessions). - Monitoring performance metrics via the
CrawlerMonitorand optionally logging memory usage. - Optionally generating detailed benchmark reports with visualizations using
benchmark_report.py.
Available Tools
test_stress_sdk.py- Main stress testing script utilizingarun_manyand dispatchers.benchmark_report.py- Report generator for comparing test results (assumes compatibility withtest_stress_sdk.pyoutputs).run_benchmark.py- Python script with predefined test configurations that orchestrates tests usingtest_stress_sdk.py.run_all.sh- Simple wrapper script (may need updating).
Usage Guide
Using Predefined Configurations (Recommended)
The run_benchmark.py script offers the easiest way to run standardized tests:
# Quick test (50 URLs, 4 max sessions)
python run_benchmark.py quick
# Medium test (500 URLs, 16 max sessions)
python run_benchmark.py medium
# Large test (1000 URLs, 32 max sessions)
python run_benchmark.py large
# Extreme test (2000 URLs, 64 max sessions)
python run_benchmark.py extreme
# Custom configuration
python run_benchmark.py custom --urls 300 --max-sessions 24 --chunk-size 50
# Run 'small' test in streaming mode
python run_benchmark.py small --stream
# Override max_sessions for the 'medium' config
python run_benchmark.py medium --max-sessions 20
# Skip benchmark report generation after the test
python run_benchmark.py small --no-report
# Clean up reports and site files before running
python run_benchmark.py medium --clean
run_benchmark.py Parameters
| Parameter | Default | Description |
|---|---|---|
config |
required | Test configuration: quick, small, medium, large, extreme, custom |
--urls |
config-specific | Number of URLs (required for custom) |
--max-sessions |
config-specific | Max concurrent sessions managed by dispatcher (required for custom) |
--chunk-size |
config-specific | URLs per batch for non-stream logging (required for custom) |
--stream |
False | Enable streaming results (disables batch logging) |
--monitor-mode |
DETAILED | DETAILED or AGGREGATED display for the live monitor |
--use-rate-limiter |
False | Enable basic rate limiter in the dispatcher |
--port |
8000 | HTTP server port |
--no-report |
False | Skip generating comparison report via benchmark_report.py |
--clean |
False | Clean up reports and site files before running |
--keep-server-alive |
False | Keep local HTTP server running after test |
--use-existing-site |
False | Use existing site on specified port (no local server start/site gen) |
--skip-generation |
False | Use existing site files but start local server |
--keep-site |
False | Keep generated site files after test |
Predefined Configurations
| Configuration | URLs | Max Sessions | Chunk Size | Description |
|---|---|---|---|---|
quick |
50 | 4 | 10 | Quick test for basic validation |
small |
100 | 8 | 20 | Small test for routine checks |
medium |
500 | 16 | 50 | Medium test for thorough checks |
large |
1000 | 32 | 100 | Large test for stress testing |
extreme |
2000 | 64 | 200 | Extreme test for limit testing |
Direct Usage of test_stress_sdk.py
For fine-grained control or debugging, you can run the stress test script directly:
# Test with 200 URLs and 32 max concurrent sessions
python test_stress_sdk.py --urls 200 --max-sessions 32 --chunk-size 40
# Clean up previous test data first
python test_stress_sdk.py --clean-reports --clean-site --urls 100 --max-sessions 16 --chunk-size 20
# Change the HTTP server port and use aggregated monitor
python test_stress_sdk.py --port 8088 --urls 100 --max-sessions 16 --monitor-mode AGGREGATED
# Enable streaming mode and use rate limiting
python test_stress_sdk.py --urls 50 --max-sessions 8 --stream --use-rate-limiter
# Change report output location
python test_stress_sdk.py --report-path custom_reports --urls 100 --max-sessions 16
test_stress_sdk.py Parameters
| Parameter | Default | Description |
|---|---|---|
--urls |
100 | Number of URLs to test |
--max-sessions |
16 | Maximum concurrent crawling sessions managed by the dispatcher |
--chunk-size |
10 | Number of URLs per batch (relevant for non-stream logging) |
--stream |
False | Enable streaming results (disables batch logging) |
--monitor-mode |
DETAILED | DETAILED or AGGREGATED display for the live CrawlerMonitor |
--use-rate-limiter |
False | Enable a basic RateLimiter within the dispatcher |
--site-path |
"test_site" | Path to store/use the generated test site |
--port |
8000 | Port for the local HTTP server |
--report-path |
"reports" | Path to save test result summary (JSON) and memory samples (CSV) |
--skip-generation |
False | Use existing test site files but still start local server |
--use-existing-site |
False | Use existing site on specified port (no local server/site gen) |
--keep-server-alive |
False | Keep local HTTP server running after test completion |
--keep-site |
False | Keep the generated test site files after test completion |
--clean-reports |
False | Clean up report directory before running |
--clean-site |
False | Clean up site directory before/after running (see script logic) |
Generating Reports Only
If you only want to generate a benchmark report from existing test results (assuming benchmark_report.py is compatible):
# Generate a report from existing test results in ./reports/
python benchmark_report.py
# Limit to the most recent 5 test results
python benchmark_report.py --limit 5
# Specify a custom source directory for test results
python benchmark_report.py --reports-dir alternate_results
benchmark_report.py Parameters (Assumed)
| Parameter | Default | Description |
|---|---|---|
--reports-dir |
"reports" | Directory containing test_stress_sdk.py result files |
--output-dir |
"benchmark_reports" | Directory to save generated HTML reports and charts |
--limit |
None (all results) | Limit comparison to N most recent test results |
--output-file |
Auto-generated | Custom output filename for the HTML report |
Understanding the Test Output
Real-time Progress Display (CrawlerMonitor)
When running test_stress_sdk.py, the CrawlerMonitor provides a live view of the crawling process managed by the dispatcher.
- DETAILED Mode (Default): Shows individual task status (Queued, Active, Completed, Failed), timings, memory usage per task (if
psutilis available), overall queue statistics, and memory pressure status (ifpsutilavailable). - AGGREGATED Mode: Shows summary counts (Queued, Active, Completed, Failed), overall progress percentage, estimated time remaining, average URLs/sec, and memory pressure status.
Batch Log Output (Non-Streaming Mode Only)
If running test_stress_sdk.py without the --stream flag, you will also see per-batch summary lines printed to the console after the monitor display, once each chunk of URLs finishes processing:
Batch | Progress | Start Mem | End Mem | URLs/sec | Success/Fail | Time (s) | Status
───────────────────────────────────────────────────────────────────────────────────────────
1 | 10.0% | 50.1 MB | 55.3 MB | 23.8 | 10/0 | 0.42 | Success
2 | 20.0% | 55.3 MB | 60.1 MB | 24.1 | 10/0 | 0.41 | Success
...
This display provides chunk-specific metrics:
- Batch: The batch number being reported.
- Progress: Overall percentage of total URLs processed after this batch.
- Start Mem / End Mem: Memory usage before and after processing this batch (if tracked).
- URLs/sec: Processing speed for this specific batch.
- Success/Fail: Number of successful and failed URLs in this batch.
- Time (s): Wall-clock time taken to process this batch.
- Status: Color-coded status for the batch outcome.
Summary Output
After test completion, a final summary is displayed:
================================================================================
Test Completed
================================================================================
Test ID: 20250418_103015
Configuration: 100 URLs, 16 max sessions, Chunk: 10, Stream: False, Monitor: DETAILED
Results: 100 successful, 0 failed (100 processed, 100.0% success)
Performance: 5.85 seconds total, 17.09 URLs/second avg
Memory Usage: Start: 50.1 MB, End: 75.3 MB, Max: 78.1 MB, Growth: 25.2 MB
Results summary saved to reports/test_summary_20250418_103015.json
HTML Report Structure (Generated by benchmark_report.py)
(This section remains the same, assuming benchmark_report.py generates these)
The benchmark report contains several sections:
- Summary: Overview of the latest test results and trends
- Performance Comparison: Charts showing throughput across tests
- Memory Usage: Detailed memory usage graphs for each test
- Detailed Results: Tabular data of all test metrics
- Conclusion: Automated analysis of performance and memory patterns
Memory Metrics
(This section remains conceptually the same) Memory growth is the key metric for detecting leaks...
Performance Metrics
(This section remains conceptually the same, though "URLs per Worker" is less relevant - focus on overall URLs/sec) Key performance indicators include:
- URLs per Second: Higher is better (throughput)
- Success Rate: Should be 100% in normal conditions
- Total Processing Time: Lower is better
- Dispatcher Efficiency: Observe queue lengths and wait times in the monitor (Detailed mode)
Raw Data Files
Raw data is saved in the --report-path directory (default ./reports/):
- JSON files (
test_summary_*.json): Contains the final summary for each test run. - CSV files (
memory_samples_*.csv): Contains time-series memory samples taken during the test run.
Example of reading raw data:
import json
import pandas as pd
# Load test summary
test_id = "20250418_103015" # Example ID
with open(f'reports/test_summary_{test_id}.json', 'r') as f:
results = json.load(f)
# Load memory samples
memory_df = pd.read_csv(f'reports/memory_samples_{test_id}.csv')
# Analyze memory_df (e.g., calculate growth, plot)
if not memory_df['memory_info_mb'].isnull().all():
growth = memory_df['memory_info_mb'].iloc[-1] - memory_df['memory_info_mb'].iloc[0]
print(f"Total Memory Growth: {growth:.1f} MB")
else:
print("No valid memory samples found.")
print(f"Avg URLs/sec: {results['urls_processed'] / results['total_time_seconds']:.2f}")
Visualization Dependencies
(This section remains the same)
For full visualization capabilities in the HTML reports generated by benchmark_report.py, install additional dependencies...
Directory Structure
benchmarking/ # Or your top-level directory name
├── benchmark_reports/ # Generated HTML reports (by benchmark_report.py)
├── reports/ # Raw test result data (from test_stress_sdk.py)
├── test_site/ # Generated test content (temporary)
├── benchmark_report.py# Report generator
├── run_benchmark.py # Test runner with predefined configs
├── test_stress_sdk.py # Main stress test implementation using arun_many
└── run_all.sh # Simple wrapper script (may need updates)
#└── requirements.txt # Optional: Visualization dependencies for benchmark_report.py
Cleanup
To clean up after testing:
# Remove the test site content (if not using --keep-site)
rm -rf test_site
# Remove all raw reports and generated benchmark reports
rm -rf reports benchmark_reports
# Or use the --clean flag with run_benchmark.py
python run_benchmark.py medium --clean
Use in CI/CD
(This section remains conceptually the same, just update script names) These tests can be integrated into CI/CD pipelines:
# Example CI script
python run_benchmark.py medium --no-report # Run test without interactive report gen
# Check exit code
if [ $? -ne 0 ]; then echo "Stress test failed!"; exit 1; fi
# Optionally, run report generator and check its output/metrics
# python benchmark_report.py
# check_report_metrics.py reports/test_summary_*.json || exit 1
exit 0
Troubleshooting
- HTTP Server Port Conflict: Use
--portwithrun_benchmark.pyortest_stress_sdk.py. - Memory Tracking Issues: The
SimpleMemoryTrackeruses platform commands (ps,/proc,tasklist). Ensure these are available and the script has permission. If it consistently fails, memory reporting will be limited. - Visualization Missing: Related to
benchmark_report.pyand its dependencies. - Site Generation Issues: Check permissions for creating
./test_site/. Use--skip-generationif you want to manage the site manually. - Testing Against External Site: Ensure the external site is running and use
--use-existing-site --port <correct_port>.