Files
crawl4ai/tests/memory
AHMET YILMAZ f7a3366f72 #1375 : refactor(proxy) Deprecate 'proxy' parameter in BrowserConfig and enhance proxy string parsing
- Updated ProxyConfig.from_string to support multiple proxy formats, including URLs with credentials.
- Deprecated the 'proxy' parameter in BrowserConfig, replacing it with 'proxy_config' for better flexibility.
- Added warnings for deprecated usage and clarified behavior when both parameters are provided.
- Updated documentation and tests to reflect changes in proxy configuration handling.
2025-08-28 17:21:49 +08:00
..

Crawl4AI Stress Testing and Benchmarking

This directory contains tools for stress testing Crawl4AI's arun_many method and dispatcher system with high volumes of URLs to evaluate performance, concurrency handling, and potentially detect memory issues. It also includes a benchmarking system to track performance over time.

Quick Start

# Run a default stress test (small config) and generate a report
# (Assumes run_all.sh is updated to call run_benchmark.py)
./run_all.sh

Note: run_all.sh might need to be updated if it directly called the old script.

Overview

The stress testing system works by:

  1. Generating a local test site with heavy HTML pages (regenerated by default for each test).
  2. Starting a local HTTP server to serve these pages.
  3. Running Crawl4AI's arun_many method against this local site using the MemoryAdaptiveDispatcher with configurable concurrency (max_sessions).
  4. Monitoring performance metrics via the CrawlerMonitor and optionally logging memory usage.
  5. Optionally generating detailed benchmark reports with visualizations using benchmark_report.py.

Available Tools

  • test_stress_sdk.py - Main stress testing script utilizing arun_many and dispatchers.
  • benchmark_report.py - Report generator for comparing test results (assumes compatibility with test_stress_sdk.py outputs).
  • run_benchmark.py - Python script with predefined test configurations that orchestrates tests using test_stress_sdk.py.
  • run_all.sh - Simple wrapper script (may need updating).

Usage Guide

The run_benchmark.py script offers the easiest way to run standardized tests:

# Quick test (50 URLs, 4 max sessions)
python run_benchmark.py quick

# Medium test (500 URLs, 16 max sessions)
python run_benchmark.py medium

# Large test (1000 URLs, 32 max sessions)
python run_benchmark.py large

# Extreme test (2000 URLs, 64 max sessions)
python run_benchmark.py extreme

# Custom configuration
python run_benchmark.py custom --urls 300 --max-sessions 24 --chunk-size 50

# Run 'small' test in streaming mode
python run_benchmark.py small --stream

# Override max_sessions for the 'medium' config
python run_benchmark.py medium --max-sessions 20

# Skip benchmark report generation after the test
python run_benchmark.py small --no-report

# Clean up reports and site files before running
python run_benchmark.py medium --clean

run_benchmark.py Parameters

Parameter Default Description
config required Test configuration: quick, small, medium, large, extreme, custom
--urls config-specific Number of URLs (required for custom)
--max-sessions config-specific Max concurrent sessions managed by dispatcher (required for custom)
--chunk-size config-specific URLs per batch for non-stream logging (required for custom)
--stream False Enable streaming results (disables batch logging)
--monitor-mode DETAILED DETAILED or AGGREGATED display for the live monitor
--use-rate-limiter False Enable basic rate limiter in the dispatcher
--port 8000 HTTP server port
--no-report False Skip generating comparison report via benchmark_report.py
--clean False Clean up reports and site files before running
--keep-server-alive False Keep local HTTP server running after test
--use-existing-site False Use existing site on specified port (no local server start/site gen)
--skip-generation False Use existing site files but start local server
--keep-site False Keep generated site files after test

Predefined Configurations

Configuration URLs Max Sessions Chunk Size Description
quick 50 4 10 Quick test for basic validation
small 100 8 20 Small test for routine checks
medium 500 16 50 Medium test for thorough checks
large 1000 32 100 Large test for stress testing
extreme 2000 64 200 Extreme test for limit testing

Direct Usage of test_stress_sdk.py

For fine-grained control or debugging, you can run the stress test script directly:

# Test with 200 URLs and 32 max concurrent sessions
python test_stress_sdk.py --urls 200 --max-sessions 32 --chunk-size 40

# Clean up previous test data first
python test_stress_sdk.py --clean-reports --clean-site --urls 100 --max-sessions 16 --chunk-size 20

# Change the HTTP server port and use aggregated monitor
python test_stress_sdk.py --port 8088 --urls 100 --max-sessions 16 --monitor-mode AGGREGATED

# Enable streaming mode and use rate limiting
python test_stress_sdk.py --urls 50 --max-sessions 8 --stream --use-rate-limiter

# Change report output location
python test_stress_sdk.py --report-path custom_reports --urls 100 --max-sessions 16

test_stress_sdk.py Parameters

Parameter Default Description
--urls 100 Number of URLs to test
--max-sessions 16 Maximum concurrent crawling sessions managed by the dispatcher
--chunk-size 10 Number of URLs per batch (relevant for non-stream logging)
--stream False Enable streaming results (disables batch logging)
--monitor-mode DETAILED DETAILED or AGGREGATED display for the live CrawlerMonitor
--use-rate-limiter False Enable a basic RateLimiter within the dispatcher
--site-path "test_site" Path to store/use the generated test site
--port 8000 Port for the local HTTP server
--report-path "reports" Path to save test result summary (JSON) and memory samples (CSV)
--skip-generation False Use existing test site files but still start local server
--use-existing-site False Use existing site on specified port (no local server/site gen)
--keep-server-alive False Keep local HTTP server running after test completion
--keep-site False Keep the generated test site files after test completion
--clean-reports False Clean up report directory before running
--clean-site False Clean up site directory before/after running (see script logic)

Generating Reports Only

If you only want to generate a benchmark report from existing test results (assuming benchmark_report.py is compatible):

# Generate a report from existing test results in ./reports/
python benchmark_report.py

# Limit to the most recent 5 test results
python benchmark_report.py --limit 5

# Specify a custom source directory for test results
python benchmark_report.py --reports-dir alternate_results

benchmark_report.py Parameters (Assumed)

Parameter Default Description
--reports-dir "reports" Directory containing test_stress_sdk.py result files
--output-dir "benchmark_reports" Directory to save generated HTML reports and charts
--limit None (all results) Limit comparison to N most recent test results
--output-file Auto-generated Custom output filename for the HTML report

Understanding the Test Output

Real-time Progress Display (CrawlerMonitor)

When running test_stress_sdk.py, the CrawlerMonitor provides a live view of the crawling process managed by the dispatcher.

  • DETAILED Mode (Default): Shows individual task status (Queued, Active, Completed, Failed), timings, memory usage per task (if psutil is available), overall queue statistics, and memory pressure status (if psutil available).
  • AGGREGATED Mode: Shows summary counts (Queued, Active, Completed, Failed), overall progress percentage, estimated time remaining, average URLs/sec, and memory pressure status.

Batch Log Output (Non-Streaming Mode Only)

If running test_stress_sdk.py without the --stream flag, you will also see per-batch summary lines printed to the console after the monitor display, once each chunk of URLs finishes processing:

 Batch | Progress | Start Mem | End Mem   | URLs/sec | Success/Fail | Time (s) | Status
───────────────────────────────────────────────────────────────────────────────────────────
 1     |  10.0%   |  50.1 MB  |  55.3 MB  |    23.8    |    10/0      |     0.42   | Success
 2     |  20.0%   |  55.3 MB  |  60.1 MB  |    24.1    |    10/0      |     0.41   | Success
 ...

This display provides chunk-specific metrics:

  • Batch: The batch number being reported.
  • Progress: Overall percentage of total URLs processed after this batch.
  • Start Mem / End Mem: Memory usage before and after processing this batch (if tracked).
  • URLs/sec: Processing speed for this specific batch.
  • Success/Fail: Number of successful and failed URLs in this batch.
  • Time (s): Wall-clock time taken to process this batch.
  • Status: Color-coded status for the batch outcome.

Summary Output

After test completion, a final summary is displayed:

================================================================================
Test Completed
================================================================================
Test ID: 20250418_103015
Configuration: 100 URLs, 16 max sessions, Chunk: 10, Stream: False, Monitor: DETAILED
Results: 100 successful, 0 failed (100 processed, 100.0% success)
Performance: 5.85 seconds total, 17.09 URLs/second avg
Memory Usage: Start: 50.1 MB, End: 75.3 MB, Max: 78.1 MB, Growth: 25.2 MB
Results summary saved to reports/test_summary_20250418_103015.json

HTML Report Structure (Generated by benchmark_report.py)

(This section remains the same, assuming benchmark_report.py generates these) The benchmark report contains several sections:

  1. Summary: Overview of the latest test results and trends
  2. Performance Comparison: Charts showing throughput across tests
  3. Memory Usage: Detailed memory usage graphs for each test
  4. Detailed Results: Tabular data of all test metrics
  5. Conclusion: Automated analysis of performance and memory patterns

Memory Metrics

(This section remains conceptually the same) Memory growth is the key metric for detecting leaks...

Performance Metrics

(This section remains conceptually the same, though "URLs per Worker" is less relevant - focus on overall URLs/sec) Key performance indicators include:

  • URLs per Second: Higher is better (throughput)
  • Success Rate: Should be 100% in normal conditions
  • Total Processing Time: Lower is better
  • Dispatcher Efficiency: Observe queue lengths and wait times in the monitor (Detailed mode)

Raw Data Files

Raw data is saved in the --report-path directory (default ./reports/):

  • JSON files (test_summary_*.json): Contains the final summary for each test run.
  • CSV files (memory_samples_*.csv): Contains time-series memory samples taken during the test run.

Example of reading raw data:

import json
import pandas as pd

# Load test summary
test_id = "20250418_103015" # Example ID
with open(f'reports/test_summary_{test_id}.json', 'r') as f:
    results = json.load(f)

# Load memory samples
memory_df = pd.read_csv(f'reports/memory_samples_{test_id}.csv')

# Analyze memory_df (e.g., calculate growth, plot)
if not memory_df['memory_info_mb'].isnull().all():
    growth = memory_df['memory_info_mb'].iloc[-1] - memory_df['memory_info_mb'].iloc[0]
    print(f"Total Memory Growth: {growth:.1f} MB")
else:
    print("No valid memory samples found.")

print(f"Avg URLs/sec: {results['urls_processed'] / results['total_time_seconds']:.2f}")

Visualization Dependencies

(This section remains the same) For full visualization capabilities in the HTML reports generated by benchmark_report.py, install additional dependencies...

Directory Structure

benchmarking/          # Or your top-level directory name
├── benchmark_reports/ # Generated HTML reports (by benchmark_report.py)
├── reports/           # Raw test result data (from test_stress_sdk.py)
├── test_site/         # Generated test content (temporary)
├── benchmark_report.py# Report generator
├── run_benchmark.py   # Test runner with predefined configs
├── test_stress_sdk.py # Main stress test implementation using arun_many
└── run_all.sh         # Simple wrapper script (may need updates)
#└── requirements.txt   # Optional: Visualization dependencies for benchmark_report.py

Cleanup

To clean up after testing:

# Remove the test site content (if not using --keep-site)
rm -rf test_site

# Remove all raw reports and generated benchmark reports
rm -rf reports benchmark_reports

# Or use the --clean flag with run_benchmark.py
python run_benchmark.py medium --clean

Use in CI/CD

(This section remains conceptually the same, just update script names) These tests can be integrated into CI/CD pipelines:

# Example CI script
python run_benchmark.py medium --no-report # Run test without interactive report gen
# Check exit code
if [ $? -ne 0 ]; then echo "Stress test failed!"; exit 1; fi
# Optionally, run report generator and check its output/metrics
# python benchmark_report.py
# check_report_metrics.py reports/test_summary_*.json || exit 1
exit 0

Troubleshooting

  • HTTP Server Port Conflict: Use --port with run_benchmark.py or test_stress_sdk.py.
  • Memory Tracking Issues: The SimpleMemoryTracker uses platform commands (ps, /proc, tasklist). Ensure these are available and the script has permission. If it consistently fails, memory reporting will be limited.
  • Visualization Missing: Related to benchmark_report.py and its dependencies.
  • Site Generation Issues: Check permissions for creating ./test_site/. Use --skip-generation if you want to manage the site manually.
  • Testing Against External Site: Ensure the external site is running and use --use-existing-site --port <correct_port>.