crawl4ai/tests/memory/README.md

# Crawl4AI Stress Testing and Benchmarking

This directory contains tools for stress testing Crawl4AI's `arun_many` method and dispatcher system with high volumes of URLs to evaluate performance, concurrency handling, and potentially detect memory issues. It also includes a benchmarking system to track performance over time.

## Quick Start

```bash
# Run a default stress test (small config) and generate a report
# (Assumes run_all.sh is updated to call run_benchmark.py)
./run_all.sh
```
*Note: `run_all.sh` might need to be updated if it directly called the old script.*

## Overview

The stress testing system works by:

1.  Generating a local test site with heavy HTML pages (regenerated by default for each test).
2.  Starting a local HTTP server to serve these pages.
3.  Running Crawl4AI's `arun_many` method against this local site using the `MemoryAdaptiveDispatcher` with configurable concurrency (`max_sessions`).
4.  Monitoring performance metrics via the `CrawlerMonitor` and optionally logging memory usage.
5.  Optionally generating detailed benchmark reports with visualizations using `benchmark_report.py`.

## Available Tools

-   `test_stress_sdk.py` - Main stress testing script utilizing `arun_many` and dispatchers.
-   `benchmark_report.py` - Report generator for comparing test results (assumes compatibility with `test_stress_sdk.py` outputs).
-   `run_benchmark.py` - Python script with predefined test configurations that orchestrates tests using `test_stress_sdk.py`.
-   `run_all.sh` - Simple wrapper script (may need updating).

## Usage Guide

### Using Predefined Configurations (Recommended)

The `run_benchmark.py` script offers the easiest way to run standardized tests:

```bash
# Quick test (50 URLs, 4 max sessions)
python run_benchmark.py quick

# Medium test (500 URLs, 16 max sessions)
python run_benchmark.py medium

# Large test (1000 URLs, 32 max sessions)
python run_benchmark.py large

# Extreme test (2000 URLs, 64 max sessions)
python run_benchmark.py extreme

# Custom configuration
python run_benchmark.py custom --urls 300 --max-sessions 24 --chunk-size 50

# Run 'small' test in streaming mode
python run_benchmark.py small --stream

# Override max_sessions for the 'medium' config
python run_benchmark.py medium --max-sessions 20

# Skip benchmark report generation after the test
python run_benchmark.py small --no-report

# Clean up reports and site files before running
python run_benchmark.py medium --clean
```

#### `run_benchmark.py` Parameters

| Parameter            | Default         | Description                                                                 |
| -------------------- | --------------- | --------------------------------------------------------------------------- |
| `config`             | *required*      | Test configuration: `quick`, `small`, `medium`, `large`, `extreme`, `custom`|
| `--urls`             | config-specific | Number of URLs (required for `custom`)                                      |
| `--max-sessions`     | config-specific | Max concurrent sessions managed by dispatcher (required for `custom`)         |
| `--chunk-size`       | config-specific | URLs per batch for non-stream logging (required for `custom`)               |
| `--stream`           | False           | Enable streaming results (disables batch logging)                           |
| `--monitor-mode`     | DETAILED        | `DETAILED` or `AGGREGATED` display for the live monitor                     |
| `--use-rate-limiter` | False           | Enable basic rate limiter in the dispatcher                                 |
| `--port`             | 8000            | HTTP server port                                                            |
| `--no-report`        | False           | Skip generating comparison report via `benchmark_report.py`                 |
| `--clean`            | False           | Clean up reports and site files before running                              |
| `--keep-server-alive`| False           | Keep local HTTP server running after test                                   |
| `--use-existing-site`| False           | Use existing site on specified port (no local server start/site gen)        |
| `--skip-generation`  | False           | Use existing site files but start local server                              |
| `--keep-site`        | False           | Keep generated site files after test                                        |

#### Predefined Configurations

| Configuration | URLs   | Max Sessions | Chunk Size | Description                      |
| ------------- | ------ | ------------ | ---------- | -------------------------------- |
| `quick`       | 50     | 4            | 10         | Quick test for basic validation  |
| `small`       | 100    | 8            | 20         | Small test for routine checks    |
| `medium`      | 500    | 16           | 50         | Medium test for thorough checks  |
| `large`       | 1000   | 32           | 100        | Large test for stress testing    |
| `extreme`     | 2000   | 64           | 200        | Extreme test for limit testing   |

### Direct Usage of `test_stress_sdk.py`

For fine-grained control or debugging, you can run the stress test script directly:

```bash
# Test with 200 URLs and 32 max concurrent sessions
python test_stress_sdk.py --urls 200 --max-sessions 32 --chunk-size 40

# Clean up previous test data first
python test_stress_sdk.py --clean-reports --clean-site --urls 100 --max-sessions 16 --chunk-size 20

# Change the HTTP server port and use aggregated monitor
python test_stress_sdk.py --port 8088 --urls 100 --max-sessions 16 --monitor-mode AGGREGATED

# Enable streaming mode and use rate limiting
python test_stress_sdk.py --urls 50 --max-sessions 8 --stream --use-rate-limiter

# Change report output location
python test_stress_sdk.py --report-path custom_reports --urls 100 --max-sessions 16
```

#### `test_stress_sdk.py` Parameters

| Parameter            | Default    | Description                                                          |
| -------------------- | ---------- | -------------------------------------------------------------------- |
| `--urls`             | 100        | Number of URLs to test                                               |
| `--max-sessions`     | 16         | Maximum concurrent crawling sessions managed by the dispatcher       |
| `--chunk-size`       | 10         | Number of URLs per batch (relevant for non-stream logging)           |
| `--stream`           | False      | Enable streaming results (disables batch logging)                    |
| `--monitor-mode`     | DETAILED   | `DETAILED` or `AGGREGATED` display for the live `CrawlerMonitor`     |
| `--use-rate-limiter` | False      | Enable a basic `RateLimiter` within the dispatcher                   |
| `--site-path`        | "test_site"| Path to store/use the generated test site                            |
| `--port`             | 8000       | Port for the local HTTP server                                       |
| `--report-path`      | "reports"  | Path to save test result summary (JSON) and memory samples (CSV)   |
| `--skip-generation`  | False      | Use existing test site files but still start local server            |
| `--use-existing-site`| False      | Use existing site on specified port (no local server/site gen)     |
| `--keep-server-alive`| False      | Keep local HTTP server running after test completion                 |
| `--keep-site`        | False      | Keep the generated test site files after test completion             |
| `--clean-reports`    | False      | Clean up report directory before running                             |
| `--clean-site`       | False      | Clean up site directory before/after running (see script logic)    |

### Generating Reports Only

If you only want to generate a benchmark report from existing test results (assuming `benchmark_report.py` is compatible):

```bash
# Generate a report from existing test results in ./reports/
python benchmark_report.py

# Limit to the most recent 5 test results
python benchmark_report.py --limit 5

# Specify a custom source directory for test results
python benchmark_report.py --reports-dir alternate_results
```

#### `benchmark_report.py` Parameters (Assumed)

| Parameter       | Default              | Description                                                 |
| --------------- | -------------------- | ----------------------------------------------------------- |
| `--reports-dir` | "reports"            | Directory containing `test_stress_sdk.py` result files      |
| `--output-dir`  | "benchmark_reports"  | Directory to save generated HTML reports and charts         |
| `--limit`       | None (all results)   | Limit comparison to N most recent test results              |
| `--output-file` | Auto-generated       | Custom output filename for the HTML report                  |

## Understanding the Test Output

### Real-time Progress Display (`CrawlerMonitor`)

When running `test_stress_sdk.py`, the `CrawlerMonitor` provides a live view of the crawling process managed by the dispatcher.

-   **DETAILED Mode (Default):** Shows individual task status (Queued, Active, Completed, Failed), timings, memory usage per task (if `psutil` is available), overall queue statistics, and memory pressure status (if `psutil` available).
-   **AGGREGATED Mode:** Shows summary counts (Queued, Active, Completed, Failed), overall progress percentage, estimated time remaining, average URLs/sec, and memory pressure status.

### Batch Log Output (Non-Streaming Mode Only)

If running `test_stress_sdk.py` **without** the `--stream` flag, you will *also* see per-batch summary lines printed to the console *after* the monitor display, once each chunk of URLs finishes processing:

```
 Batch | Progress | Start Mem | End Mem   | URLs/sec | Success/Fail | Time (s) | Status
───────────────────────────────────────────────────────────────────────────────────────────
 1     |  10.0%   |  50.1 MB  |  55.3 MB  |    23.8    |    10/0      |     0.42   | Success
 2     |  20.0%   |  55.3 MB  |  60.1 MB  |    24.1    |    10/0      |     0.41   | Success
 ...
```

This display provides chunk-specific metrics:
-   **Batch**: The batch number being reported.
-   **Progress**: Overall percentage of total URLs processed *after* this batch.
-   **Start Mem / End Mem**: Memory usage before and after processing this batch (if tracked).
-   **URLs/sec**: Processing speed *for this specific batch*.
-   **Success/Fail**: Number of successful and failed URLs *in this batch*.
-   **Time (s)**: Wall-clock time taken to process *this batch*.
-   **Status**: Color-coded status for the batch outcome.

### Summary Output

After test completion, a final summary is displayed:

```
================================================================================
Test Completed
================================================================================
Test ID: 20250418_103015
Configuration: 100 URLs, 16 max sessions, Chunk: 10, Stream: False, Monitor: DETAILED
Results: 100 successful, 0 failed (100 processed, 100.0% success)
Performance: 5.85 seconds total, 17.09 URLs/second avg
Memory Usage: Start: 50.1 MB, End: 75.3 MB, Max: 78.1 MB, Growth: 25.2 MB
Results summary saved to reports/test_summary_20250418_103015.json
```

### HTML Report Structure (Generated by `benchmark_report.py`)

(This section remains the same, assuming `benchmark_report.py` generates these)
The benchmark report contains several sections:
1.  **Summary**: Overview of the latest test results and trends
2.  **Performance Comparison**: Charts showing throughput across tests
3.  **Memory Usage**: Detailed memory usage graphs for each test
4.  **Detailed Results**: Tabular data of all test metrics
5.  **Conclusion**: Automated analysis of performance and memory patterns

### Memory Metrics

(This section remains conceptually the same)
Memory growth is the key metric for detecting leaks...

### Performance Metrics

(This section remains conceptually the same, though "URLs per Worker" is less relevant - focus on overall URLs/sec)
Key performance indicators include:
-   **URLs per Second**: Higher is better (throughput)
-   **Success Rate**: Should be 100% in normal conditions
-   **Total Processing Time**: Lower is better
-   **Dispatcher Efficiency**: Observe queue lengths and wait times in the monitor (Detailed mode)

### Raw Data Files

Raw data is saved in the `--report-path` directory (default `./reports/`):

-   **JSON files** (`test_summary_*.json`): Contains the final summary for each test run.
-   **CSV files** (`memory_samples_*.csv`): Contains time-series memory samples taken during the test run.

Example of reading raw data:
```python
import json
import pandas as pd

# Load test summary
test_id = "20250418_103015" # Example ID
with open(f'reports/test_summary_{test_id}.json', 'r') as f:
    results = json.load(f)

# Load memory samples
memory_df = pd.read_csv(f'reports/memory_samples_{test_id}.csv')

# Analyze memory_df (e.g., calculate growth, plot)
if not memory_df['memory_info_mb'].isnull().all():
    growth = memory_df['memory_info_mb'].iloc[-1] - memory_df['memory_info_mb'].iloc[0]
    print(f"Total Memory Growth: {growth:.1f} MB")
else:
    print("No valid memory samples found.")

print(f"Avg URLs/sec: {results['urls_processed'] / results['total_time_seconds']:.2f}")
```

## Visualization Dependencies

(This section remains the same)
For full visualization capabilities in the HTML reports generated by `benchmark_report.py`, install additional dependencies...

## Directory Structure

```
benchmarking/          # Or your top-level directory name
├── benchmark_reports/ # Generated HTML reports (by benchmark_report.py)
├── reports/           # Raw test result data (from test_stress_sdk.py)
├── test_site/         # Generated test content (temporary)
├── benchmark_report.py# Report generator
├── run_benchmark.py   # Test runner with predefined configs
├── test_stress_sdk.py # Main stress test implementation using arun_many
└── run_all.sh         # Simple wrapper script (may need updates)
#└── requirements.txt   # Optional: Visualization dependencies for benchmark_report.py
```

## Cleanup

To clean up after testing:

```bash
# Remove the test site content (if not using --keep-site)
rm -rf test_site

# Remove all raw reports and generated benchmark reports
rm -rf reports benchmark_reports

# Or use the --clean flag with run_benchmark.py
python run_benchmark.py medium --clean
```

## Use in CI/CD

(This section remains conceptually the same, just update script names)
These tests can be integrated into CI/CD pipelines:
```bash
# Example CI script
python run_benchmark.py medium --no-report # Run test without interactive report gen
# Check exit code
if [ $? -ne 0 ]; then echo "Stress test failed!"; exit 1; fi
# Optionally, run report generator and check its output/metrics
# python benchmark_report.py
# check_report_metrics.py reports/test_summary_*.json || exit 1
exit 0
```

## Troubleshooting

-   **HTTP Server Port Conflict**: Use `--port` with `run_benchmark.py` or `test_stress_sdk.py`.
-   **Memory Tracking Issues**: The `SimpleMemoryTracker` uses platform commands (`ps`, `/proc`, `tasklist`). Ensure these are available and the script has permission. If it consistently fails, memory reporting will be limited.
-   **Visualization Missing**: Related to `benchmark_report.py` and its dependencies.
-   **Site Generation Issues**: Check permissions for creating `./test_site/`. Use `--skip-generation` if you want to manage the site manually.
-   **Testing Against External Site**: Ensure the external site is running and use `--use-existing-site --port <correct_port>`.