# Crawl4AI Stress Testing and Benchmarking This directory contains tools for stress testing Crawl4AI's `arun_many` method and dispatcher system with high volumes of URLs to evaluate performance, concurrency handling, and potentially detect memory issues. It also includes a benchmarking system to track performance over time. ## Quick Start ```bash # Run a default stress test (small config) and generate a report # (Assumes run_all.sh is updated to call run_benchmark.py) ./run_all.sh ``` *Note: `run_all.sh` might need to be updated if it directly called the old script.* ## Overview The stress testing system works by: 1. Generating a local test site with heavy HTML pages (regenerated by default for each test). 2. Starting a local HTTP server to serve these pages. 3. Running Crawl4AI's `arun_many` method against this local site using the `MemoryAdaptiveDispatcher` with configurable concurrency (`max_sessions`). 4. Monitoring performance metrics via the `CrawlerMonitor` and optionally logging memory usage. 5. Optionally generating detailed benchmark reports with visualizations using `benchmark_report.py`. ## Available Tools - `test_stress_sdk.py` - Main stress testing script utilizing `arun_many` and dispatchers. - `benchmark_report.py` - Report generator for comparing test results (assumes compatibility with `test_stress_sdk.py` outputs). - `run_benchmark.py` - Python script with predefined test configurations that orchestrates tests using `test_stress_sdk.py`. - `run_all.sh` - Simple wrapper script (may need updating). ## Usage Guide ### Using Predefined Configurations (Recommended) The `run_benchmark.py` script offers the easiest way to run standardized tests: ```bash # Quick test (50 URLs, 4 max sessions) python run_benchmark.py quick # Medium test (500 URLs, 16 max sessions) python run_benchmark.py medium # Large test (1000 URLs, 32 max sessions) python run_benchmark.py large # Extreme test (2000 URLs, 64 max sessions) python run_benchmark.py extreme # Custom configuration python run_benchmark.py custom --urls 300 --max-sessions 24 --chunk-size 50 # Run 'small' test in streaming mode python run_benchmark.py small --stream # Override max_sessions for the 'medium' config python run_benchmark.py medium --max-sessions 20 # Skip benchmark report generation after the test python run_benchmark.py small --no-report # Clean up reports and site files before running python run_benchmark.py medium --clean ``` #### `run_benchmark.py` Parameters | Parameter | Default | Description | | -------------------- | --------------- | --------------------------------------------------------------------------- | | `config` | *required* | Test configuration: `quick`, `small`, `medium`, `large`, `extreme`, `custom`| | `--urls` | config-specific | Number of URLs (required for `custom`) | | `--max-sessions` | config-specific | Max concurrent sessions managed by dispatcher (required for `custom`) | | `--chunk-size` | config-specific | URLs per batch for non-stream logging (required for `custom`) | | `--stream` | False | Enable streaming results (disables batch logging) | | `--monitor-mode` | DETAILED | `DETAILED` or `AGGREGATED` display for the live monitor | | `--use-rate-limiter` | False | Enable basic rate limiter in the dispatcher | | `--port` | 8000 | HTTP server port | | `--no-report` | False | Skip generating comparison report via `benchmark_report.py` | | `--clean` | False | Clean up reports and site files before running | | `--keep-server-alive`| False | Keep local HTTP server running after test | | `--use-existing-site`| False | Use existing site on specified port (no local server start/site gen) | | `--skip-generation` | False | Use existing site files but start local server | | `--keep-site` | False | Keep generated site files after test | #### Predefined Configurations | Configuration | URLs | Max Sessions | Chunk Size | Description | | ------------- | ------ | ------------ | ---------- | -------------------------------- | | `quick` | 50 | 4 | 10 | Quick test for basic validation | | `small` | 100 | 8 | 20 | Small test for routine checks | | `medium` | 500 | 16 | 50 | Medium test for thorough checks | | `large` | 1000 | 32 | 100 | Large test for stress testing | | `extreme` | 2000 | 64 | 200 | Extreme test for limit testing | ### Direct Usage of `test_stress_sdk.py` For fine-grained control or debugging, you can run the stress test script directly: ```bash # Test with 200 URLs and 32 max concurrent sessions python test_stress_sdk.py --urls 200 --max-sessions 32 --chunk-size 40 # Clean up previous test data first python test_stress_sdk.py --clean-reports --clean-site --urls 100 --max-sessions 16 --chunk-size 20 # Change the HTTP server port and use aggregated monitor python test_stress_sdk.py --port 8088 --urls 100 --max-sessions 16 --monitor-mode AGGREGATED # Enable streaming mode and use rate limiting python test_stress_sdk.py --urls 50 --max-sessions 8 --stream --use-rate-limiter # Change report output location python test_stress_sdk.py --report-path custom_reports --urls 100 --max-sessions 16 ``` #### `test_stress_sdk.py` Parameters | Parameter | Default | Description | | -------------------- | ---------- | -------------------------------------------------------------------- | | `--urls` | 100 | Number of URLs to test | | `--max-sessions` | 16 | Maximum concurrent crawling sessions managed by the dispatcher | | `--chunk-size` | 10 | Number of URLs per batch (relevant for non-stream logging) | | `--stream` | False | Enable streaming results (disables batch logging) | | `--monitor-mode` | DETAILED | `DETAILED` or `AGGREGATED` display for the live `CrawlerMonitor` | | `--use-rate-limiter` | False | Enable a basic `RateLimiter` within the dispatcher | | `--site-path` | "test_site"| Path to store/use the generated test site | | `--port` | 8000 | Port for the local HTTP server | | `--report-path` | "reports" | Path to save test result summary (JSON) and memory samples (CSV) | | `--skip-generation` | False | Use existing test site files but still start local server | | `--use-existing-site`| False | Use existing site on specified port (no local server/site gen) | | `--keep-server-alive`| False | Keep local HTTP server running after test completion | | `--keep-site` | False | Keep the generated test site files after test completion | | `--clean-reports` | False | Clean up report directory before running | | `--clean-site` | False | Clean up site directory before/after running (see script logic) | ### Generating Reports Only If you only want to generate a benchmark report from existing test results (assuming `benchmark_report.py` is compatible): ```bash # Generate a report from existing test results in ./reports/ python benchmark_report.py # Limit to the most recent 5 test results python benchmark_report.py --limit 5 # Specify a custom source directory for test results python benchmark_report.py --reports-dir alternate_results ``` #### `benchmark_report.py` Parameters (Assumed) | Parameter | Default | Description | | --------------- | -------------------- | ----------------------------------------------------------- | | `--reports-dir` | "reports" | Directory containing `test_stress_sdk.py` result files | | `--output-dir` | "benchmark_reports" | Directory to save generated HTML reports and charts | | `--limit` | None (all results) | Limit comparison to N most recent test results | | `--output-file` | Auto-generated | Custom output filename for the HTML report | ## Understanding the Test Output ### Real-time Progress Display (`CrawlerMonitor`) When running `test_stress_sdk.py`, the `CrawlerMonitor` provides a live view of the crawling process managed by the dispatcher. - **DETAILED Mode (Default):** Shows individual task status (Queued, Active, Completed, Failed), timings, memory usage per task (if `psutil` is available), overall queue statistics, and memory pressure status (if `psutil` available). - **AGGREGATED Mode:** Shows summary counts (Queued, Active, Completed, Failed), overall progress percentage, estimated time remaining, average URLs/sec, and memory pressure status. ### Batch Log Output (Non-Streaming Mode Only) If running `test_stress_sdk.py` **without** the `--stream` flag, you will *also* see per-batch summary lines printed to the console *after* the monitor display, once each chunk of URLs finishes processing: ``` Batch | Progress | Start Mem | End Mem | URLs/sec | Success/Fail | Time (s) | Status ─────────────────────────────────────────────────────────────────────────────────────────── 1 | 10.0% | 50.1 MB | 55.3 MB | 23.8 | 10/0 | 0.42 | Success 2 | 20.0% | 55.3 MB | 60.1 MB | 24.1 | 10/0 | 0.41 | Success ... ``` This display provides chunk-specific metrics: - **Batch**: The batch number being reported. - **Progress**: Overall percentage of total URLs processed *after* this batch. - **Start Mem / End Mem**: Memory usage before and after processing this batch (if tracked). - **URLs/sec**: Processing speed *for this specific batch*. - **Success/Fail**: Number of successful and failed URLs *in this batch*. - **Time (s)**: Wall-clock time taken to process *this batch*. - **Status**: Color-coded status for the batch outcome. ### Summary Output After test completion, a final summary is displayed: ``` ================================================================================ Test Completed ================================================================================ Test ID: 20250418_103015 Configuration: 100 URLs, 16 max sessions, Chunk: 10, Stream: False, Monitor: DETAILED Results: 100 successful, 0 failed (100 processed, 100.0% success) Performance: 5.85 seconds total, 17.09 URLs/second avg Memory Usage: Start: 50.1 MB, End: 75.3 MB, Max: 78.1 MB, Growth: 25.2 MB Results summary saved to reports/test_summary_20250418_103015.json ``` ### HTML Report Structure (Generated by `benchmark_report.py`) (This section remains the same, assuming `benchmark_report.py` generates these) The benchmark report contains several sections: 1. **Summary**: Overview of the latest test results and trends 2. **Performance Comparison**: Charts showing throughput across tests 3. **Memory Usage**: Detailed memory usage graphs for each test 4. **Detailed Results**: Tabular data of all test metrics 5. **Conclusion**: Automated analysis of performance and memory patterns ### Memory Metrics (This section remains conceptually the same) Memory growth is the key metric for detecting leaks... ### Performance Metrics (This section remains conceptually the same, though "URLs per Worker" is less relevant - focus on overall URLs/sec) Key performance indicators include: - **URLs per Second**: Higher is better (throughput) - **Success Rate**: Should be 100% in normal conditions - **Total Processing Time**: Lower is better - **Dispatcher Efficiency**: Observe queue lengths and wait times in the monitor (Detailed mode) ### Raw Data Files Raw data is saved in the `--report-path` directory (default `./reports/`): - **JSON files** (`test_summary_*.json`): Contains the final summary for each test run. - **CSV files** (`memory_samples_*.csv`): Contains time-series memory samples taken during the test run. Example of reading raw data: ```python import json import pandas as pd # Load test summary test_id = "20250418_103015" # Example ID with open(f'reports/test_summary_{test_id}.json', 'r') as f: results = json.load(f) # Load memory samples memory_df = pd.read_csv(f'reports/memory_samples_{test_id}.csv') # Analyze memory_df (e.g., calculate growth, plot) if not memory_df['memory_info_mb'].isnull().all(): growth = memory_df['memory_info_mb'].iloc[-1] - memory_df['memory_info_mb'].iloc[0] print(f"Total Memory Growth: {growth:.1f} MB") else: print("No valid memory samples found.") print(f"Avg URLs/sec: {results['urls_processed'] / results['total_time_seconds']:.2f}") ``` ## Visualization Dependencies (This section remains the same) For full visualization capabilities in the HTML reports generated by `benchmark_report.py`, install additional dependencies... ## Directory Structure ``` benchmarking/ # Or your top-level directory name ├── benchmark_reports/ # Generated HTML reports (by benchmark_report.py) ├── reports/ # Raw test result data (from test_stress_sdk.py) ├── test_site/ # Generated test content (temporary) ├── benchmark_report.py# Report generator ├── run_benchmark.py # Test runner with predefined configs ├── test_stress_sdk.py # Main stress test implementation using arun_many └── run_all.sh # Simple wrapper script (may need updates) #└── requirements.txt # Optional: Visualization dependencies for benchmark_report.py ``` ## Cleanup To clean up after testing: ```bash # Remove the test site content (if not using --keep-site) rm -rf test_site # Remove all raw reports and generated benchmark reports rm -rf reports benchmark_reports # Or use the --clean flag with run_benchmark.py python run_benchmark.py medium --clean ``` ## Use in CI/CD (This section remains conceptually the same, just update script names) These tests can be integrated into CI/CD pipelines: ```bash # Example CI script python run_benchmark.py medium --no-report # Run test without interactive report gen # Check exit code if [ $? -ne 0 ]; then echo "Stress test failed!"; exit 1; fi # Optionally, run report generator and check its output/metrics # python benchmark_report.py # check_report_metrics.py reports/test_summary_*.json || exit 1 exit 0 ``` ## Troubleshooting - **HTTP Server Port Conflict**: Use `--port` with `run_benchmark.py` or `test_stress_sdk.py`. - **Memory Tracking Issues**: The `SimpleMemoryTracker` uses platform commands (`ps`, `/proc`, `tasklist`). Ensure these are available and the script has permission. If it consistently fails, memory reporting will be limited. - **Visualization Missing**: Related to `benchmark_report.py` and its dependencies. - **Site Generation Issues**: Check permissions for creating `./test_site/`. Use `--skip-generation` if you want to manage the site manually. - **Testing Against External Site**: Ensure the external site is running and use `--use-existing-site --port `.