Files

UncleCode 921e0c46b6 feat(tests): implement high volume stress testing framework

Add comprehensive stress testing solution for SDK using arun_many and dispatcher system:
- Create test_stress_sdk.py for running high volume crawl tests
- Add run_benchmark.py for orchestrating tests with predefined configs
- Implement benchmark_report.py for generating performance reports
- Add memory tracking and local test site generation
- Support both streaming and batch processing modes
- Add detailed documentation in README.md

The framework enables testing SDK performance, concurrency handling,
and memory behavior under high-volume scenarios.

2025-04-17 22:31:51 +08:00

16 KiB

Raw Blame History

Crawl4AI Stress Testing and Benchmarking

This directory contains tools for stress testing Crawl4AI's arun_many method and dispatcher system with high volumes of URLs to evaluate performance, concurrency handling, and potentially detect memory issues. It also includes a benchmarking system to track performance over time.

Quick Start

# Run a default stress test (small config) and generate a report
# (Assumes run_all.sh is updated to call run_benchmark.py)
./run_all.sh

Note: run_all.sh might need to be updated if it directly called the old script.

Overview

The stress testing system works by:

Generating a local test site with heavy HTML pages (regenerated by default for each test).
Starting a local HTTP server to serve these pages.
Running Crawl4AI's arun_many method against this local site using the MemoryAdaptiveDispatcher with configurable concurrency (max_sessions).
Monitoring performance metrics via the CrawlerMonitor and optionally logging memory usage.
Optionally generating detailed benchmark reports with visualizations using benchmark_report.py.

Available Tools

test_stress_sdk.py - Main stress testing script utilizing arun_many and dispatchers.
benchmark_report.py - Report generator for comparing test results (assumes compatibility with test_stress_sdk.py outputs).
run_benchmark.py - Python script with predefined test configurations that orchestrates tests using test_stress_sdk.py.
run_all.sh - Simple wrapper script (may need updating).

Usage Guide

Using Predefined Configurations (Recommended)

The run_benchmark.py script offers the easiest way to run standardized tests:

# Quick test (50 URLs, 4 max sessions)
python run_benchmark.py quick

# Medium test (500 URLs, 16 max sessions)
python run_benchmark.py medium

# Large test (1000 URLs, 32 max sessions)
python run_benchmark.py large

# Extreme test (2000 URLs, 64 max sessions)
python run_benchmark.py extreme

# Custom configuration
python run_benchmark.py custom --urls 300 --max-sessions 24 --chunk-size 50

# Run 'small' test in streaming mode
python run_benchmark.py small --stream

# Override max_sessions for the 'medium' config
python run_benchmark.py medium --max-sessions 20

# Skip benchmark report generation after the test
python run_benchmark.py small --no-report

# Clean up reports and site files before running
python run_benchmark.py medium --clean

`run_benchmark.py` Parameters

Parameter	Default	Description
`config`	required	Test configuration: `quick`, `small`, `medium`, `large`, `extreme`, `custom`
`--urls`	config-specific	Number of URLs (required for `custom`)
`--max-sessions`	config-specific	Max concurrent sessions managed by dispatcher (required for `custom`)
`--chunk-size`	config-specific	URLs per batch for non-stream logging (required for `custom`)
`--stream`	False	Enable streaming results (disables batch logging)
`--monitor-mode`	DETAILED	`DETAILED` or `AGGREGATED` display for the live monitor
`--use-rate-limiter`	False	Enable basic rate limiter in the dispatcher
`--port`	8000	HTTP server port
`--no-report`	False	Skip generating comparison report via `benchmark_report.py`
`--clean`	False	Clean up reports and site files before running
`--keep-server-alive`	False	Keep local HTTP server running after test
`--use-existing-site`	False	Use existing site on specified port (no local server start/site gen)
`--skip-generation`	False	Use existing site files but start local server
`--keep-site`	False	Keep generated site files after test

Predefined Configurations

Configuration	URLs	Max Sessions	Chunk Size	Description
`quick`	50	4	10	Quick test for basic validation
`small`	100	8	20	Small test for routine checks
`medium`	500	16	50	Medium test for thorough checks
`large`	1000	32	100	Large test for stress testing
`extreme`	2000	64	200	Extreme test for limit testing

Direct Usage of `test_stress_sdk.py`

For fine-grained control or debugging, you can run the stress test script directly:

# Test with 200 URLs and 32 max concurrent sessions
python test_stress_sdk.py --urls 200 --max-sessions 32 --chunk-size 40

# Clean up previous test data first
python test_stress_sdk.py --clean-reports --clean-site --urls 100 --max-sessions 16 --chunk-size 20

# Change the HTTP server port and use aggregated monitor
python test_stress_sdk.py --port 8088 --urls 100 --max-sessions 16 --monitor-mode AGGREGATED

# Enable streaming mode and use rate limiting
python test_stress_sdk.py --urls 50 --max-sessions 8 --stream --use-rate-limiter

# Change report output location
python test_stress_sdk.py --report-path custom_reports --urls 100 --max-sessions 16

`test_stress_sdk.py` Parameters

Parameter	Default	Description
`--urls`	100	Number of URLs to test
`--max-sessions`	16	Maximum concurrent crawling sessions managed by the dispatcher
`--chunk-size`	10	Number of URLs per batch (relevant for non-stream logging)
`--stream`	False	Enable streaming results (disables batch logging)
`--monitor-mode`	DETAILED	`DETAILED` or `AGGREGATED` display for the live `CrawlerMonitor`
`--use-rate-limiter`	False	Enable a basic `RateLimiter` within the dispatcher
`--site-path`	"test_site"	Path to store/use the generated test site
`--port`	8000	Port for the local HTTP server
`--report-path`	"reports"	Path to save test result summary (JSON) and memory samples (CSV)
`--skip-generation`	False	Use existing test site files but still start local server
`--use-existing-site`	False	Use existing site on specified port (no local server/site gen)
`--keep-server-alive`	False	Keep local HTTP server running after test completion
`--keep-site`	False	Keep the generated test site files after test completion
`--clean-reports`	False	Clean up report directory before running
`--clean-site`	False	Clean up site directory before/after running (see script logic)

Generating Reports Only

If you only want to generate a benchmark report from existing test results (assuming benchmark_report.py is compatible):

# Generate a report from existing test results in ./reports/
python benchmark_report.py

# Limit to the most recent 5 test results
python benchmark_report.py --limit 5

# Specify a custom source directory for test results
python benchmark_report.py --reports-dir alternate_results

`benchmark_report.py` Parameters (Assumed)

Parameter	Default	Description
`--reports-dir`	"reports"	Directory containing `test_stress_sdk.py` result files
`--output-dir`	"benchmark_reports"	Directory to save generated HTML reports and charts
`--limit`	None (all results)	Limit comparison to N most recent test results
`--output-file`	Auto-generated	Custom output filename for the HTML report

Understanding the Test Output

Real-time Progress Display (`CrawlerMonitor`)

When running test_stress_sdk.py, the CrawlerMonitor provides a live view of the crawling process managed by the dispatcher.

DETAILED Mode (Default): Shows individual task status (Queued, Active, Completed, Failed), timings, memory usage per task (if psutil is available), overall queue statistics, and memory pressure status (if psutil available).
AGGREGATED Mode: Shows summary counts (Queued, Active, Completed, Failed), overall progress percentage, estimated time remaining, average URLs/sec, and memory pressure status.

Batch Log Output (Non-Streaming Mode Only)

If running test_stress_sdk.py without the --stream flag, you will also see per-batch summary lines printed to the console after the monitor display, once each chunk of URLs finishes processing:

 Batch | Progress | Start Mem | End Mem   | URLs/sec | Success/Fail | Time (s) | Status
───────────────────────────────────────────────────────────────────────────────────────────
 1     |  10.0%   |  50.1 MB  |  55.3 MB  |    23.8    |    10/0      |     0.42   | Success
 2     |  20.0%   |  55.3 MB  |  60.1 MB  |    24.1    |    10/0      |     0.41   | Success
 ...

This display provides chunk-specific metrics:

Batch: The batch number being reported.
Progress: Overall percentage of total URLs processed after this batch.
Start Mem / End Mem: Memory usage before and after processing this batch (if tracked).
URLs/sec: Processing speed for this specific batch.
Success/Fail: Number of successful and failed URLs in this batch.
Time (s): Wall-clock time taken to process this batch.
Status: Color-coded status for the batch outcome.

Summary Output

After test completion, a final summary is displayed:

================================================================================
Test Completed
================================================================================
Test ID: 20250418_103015
Configuration: 100 URLs, 16 max sessions, Chunk: 10, Stream: False, Monitor: DETAILED
Results: 100 successful, 0 failed (100 processed, 100.0% success)
Performance: 5.85 seconds total, 17.09 URLs/second avg
Memory Usage: Start: 50.1 MB, End: 75.3 MB, Max: 78.1 MB, Growth: 25.2 MB
Results summary saved to reports/test_summary_20250418_103015.json

HTML Report Structure (Generated by `benchmark_report.py`)

(This section remains the same, assuming benchmark_report.py generates these) The benchmark report contains several sections:

Summary: Overview of the latest test results and trends
Performance Comparison: Charts showing throughput across tests
Memory Usage: Detailed memory usage graphs for each test
Detailed Results: Tabular data of all test metrics
Conclusion: Automated analysis of performance and memory patterns

Memory Metrics

(This section remains conceptually the same) Memory growth is the key metric for detecting leaks...

Performance Metrics

(This section remains conceptually the same, though "URLs per Worker" is less relevant - focus on overall URLs/sec) Key performance indicators include:

URLs per Second: Higher is better (throughput)
Success Rate: Should be 100% in normal conditions
Total Processing Time: Lower is better
Dispatcher Efficiency: Observe queue lengths and wait times in the monitor (Detailed mode)

Raw Data Files

Raw data is saved in the --report-path directory (default ./reports/):

JSON files (test_summary_*.json): Contains the final summary for each test run.
CSV files (memory_samples_*.csv): Contains time-series memory samples taken during the test run.

Example of reading raw data:

import json
import pandas as pd

# Load test summary
test_id = "20250418_103015" # Example ID
with open(f'reports/test_summary_{test_id}.json', 'r') as f:
    results = json.load(f)

# Load memory samples
memory_df = pd.read_csv(f'reports/memory_samples_{test_id}.csv')

# Analyze memory_df (e.g., calculate growth, plot)
if not memory_df['memory_info_mb'].isnull().all():
    growth = memory_df['memory_info_mb'].iloc[-1] - memory_df['memory_info_mb'].iloc[0]
    print(f"Total Memory Growth: {growth:.1f} MB")
else:
    print("No valid memory samples found.")

print(f"Avg URLs/sec: {results['urls_processed'] / results['total_time_seconds']:.2f}")

Visualization Dependencies

(This section remains the same) For full visualization capabilities in the HTML reports generated by benchmark_report.py, install additional dependencies...

Directory Structure

benchmarking/          # Or your top-level directory name
├── benchmark_reports/ # Generated HTML reports (by benchmark_report.py)
├── reports/           # Raw test result data (from test_stress_sdk.py)
├── test_site/         # Generated test content (temporary)
├── benchmark_report.py# Report generator
├── run_benchmark.py   # Test runner with predefined configs
├── test_stress_sdk.py # Main stress test implementation using arun_many
└── run_all.sh         # Simple wrapper script (may need updates)
#└── requirements.txt   # Optional: Visualization dependencies for benchmark_report.py

Cleanup

To clean up after testing:

# Remove the test site content (if not using --keep-site)
rm -rf test_site

# Remove all raw reports and generated benchmark reports
rm -rf reports benchmark_reports

# Or use the --clean flag with run_benchmark.py
python run_benchmark.py medium --clean

Use in CI/CD

(This section remains conceptually the same, just update script names) These tests can be integrated into CI/CD pipelines:

# Example CI script
python run_benchmark.py medium --no-report # Run test without interactive report gen
# Check exit code
if [ $? -ne 0 ]; then echo "Stress test failed!"; exit 1; fi
# Optionally, run report generator and check its output/metrics
# python benchmark_report.py
# check_report_metrics.py reports/test_summary_*.json || exit 1
exit 0

Troubleshooting

HTTP Server Port Conflict: Use --port with run_benchmark.py or test_stress_sdk.py.
Memory Tracking Issues: The SimpleMemoryTracker uses platform commands (ps, /proc, tasklist). Ensure these are available and the script has permission. If it consistently fails, memory reporting will be limited.
Visualization Missing: Related to benchmark_report.py and its dependencies.
Site Generation Issues: Check permissions for creating ./test_site/. Use --skip-generation if you want to manage the site manually.
Testing Against External Site: Ensure the external site is running and use --use-existing-site --port <correct_port>.

16 KiB Raw Blame History