diff --git a/.gitignore b/.gitignore index a290ab7d..1658a987 100644 --- a/.gitignore +++ b/.gitignore @@ -257,4 +257,8 @@ continue_config.json .private/ CLAUDE_MONITOR.md -CLAUDE.md \ No newline at end of file +CLAUDE.md + +tests/**/test_site +tests/**/reports +tests/**/benchmark_reports \ No newline at end of file diff --git a/JOURNAL.md b/JOURNAL.md index 0451b425..c2d21e3e 100644 --- a/JOURNAL.md +++ b/JOURNAL.md @@ -42,6 +42,196 @@ This feature provides greater flexibility in how users generate markdown, enabli - Capture more detailed content from the original HTML when needed - Use schema-optimized HTML when working with structured data - Choose the approach that best suits their specific use case +## [2025-04-17] Implemented High Volume Stress Testing Solution for SDK + +**Feature:** Comprehensive stress testing framework using `arun_many` and the dispatcher system to evaluate performance, concurrency handling, and identify potential issues under high-volume crawling scenarios. + +**Changes Made:** +1. Created a dedicated stress testing framework in the `benchmarking/` (or similar) directory. +2. Implemented local test site generation (`SiteGenerator`) with configurable heavy HTML pages. +3. Added basic memory usage tracking (`SimpleMemoryTracker`) using platform-specific commands (avoiding `psutil` dependency for this specific test). +4. Utilized `CrawlerMonitor` from `crawl4ai` for rich terminal UI and real-time monitoring of test progress and dispatcher activity. +5. Implemented detailed result summary saving (JSON) and memory sample logging (CSV). +6. Developed `run_benchmark.py` to orchestrate tests with predefined configurations. +7. Created `run_all.sh` as a simple wrapper for `run_benchmark.py`. + +**Implementation Details:** +- Generates a local test site with configurable pages containing heavy text and image content. +- Uses Python's built-in `http.server` for local serving, minimizing network variance. +- Leverages `crawl4ai`'s `arun_many` method for processing URLs. +- Utilizes `MemoryAdaptiveDispatcher` to manage concurrency via the `max_sessions` parameter (note: memory adaptation features require `psutil`, not used by `SimpleMemoryTracker`). +- Tracks memory usage via `SimpleMemoryTracker`, recording samples throughout test execution to a CSV file. +- Uses `CrawlerMonitor` (which uses the `rich` library) for clear terminal visualization and progress reporting directly from the dispatcher. +- Stores detailed final metrics in a JSON summary file. + +**Files Created/Updated:** +- `stress_test_sdk.py`: Main stress testing implementation using `arun_many`. +- `benchmark_report.py`: (Assumed) Report generator for comparing test results. +- `run_benchmark.py`: Test runner script with predefined configurations. +- `run_all.sh`: Simple bash script wrapper for `run_benchmark.py`. +- `USAGE.md`: Comprehensive documentation on usage and interpretation (updated). + +**Testing Approach:** +- Creates a controlled, reproducible test environment with a local HTTP server. +- Processes URLs using `arun_many`, allowing the dispatcher to manage concurrency up to `max_sessions`. +- Optionally logs per-batch summaries (when not in streaming mode) after processing chunks. +- Supports different test sizes via `run_benchmark.py` configurations. +- Records memory samples via platform commands for basic trend analysis. +- Includes cleanup functionality for the test environment. + +**Challenges:** +- Ensuring proper cleanup of HTTP server processes. +- Getting reliable memory tracking across platforms without adding heavy dependencies (`psutil`) to this specific test script. +- Designing `run_benchmark.py` to correctly pass arguments to `stress_test_sdk.py`. + +**Why This Feature:** +The high volume stress testing solution addresses critical needs for ensuring Crawl4AI's `arun_many` reliability: +1. Provides a reproducible way to evaluate performance under concurrent load. +2. Allows testing the dispatcher's concurrency control (`max_session_permit`) and queue management. +3. Enables performance tuning by observing throughput (`URLs/sec`) under different `max_sessions` settings. +4. Creates a controlled environment for testing `arun_many` behavior. +5. Supports continuous integration by providing deterministic test conditions for `arun_many`. + +**Design Decisions:** +- Chose local site generation for reproducibility and isolation from network issues. +- Utilized the built-in `CrawlerMonitor` for real-time feedback, leveraging its `rich` integration. +- Implemented optional per-batch logging in `stress_test_sdk.py` (when not streaming) to provide chunk-level summaries alongside the continuous monitor. +- Adopted `arun_many` with a `MemoryAdaptiveDispatcher` as the core mechanism for parallel execution, reflecting the intended SDK usage. +- Created `run_benchmark.py` to simplify running standard test configurations. +- Used `SimpleMemoryTracker` to provide basic memory insights without requiring `psutil` for this particular test runner. + +**Future Enhancements to Consider:** +- Create a separate test variant that *does* use `psutil` to specifically stress the memory-adaptive features of the dispatcher. +- Add support for generated JavaScript content. +- Add support for Docker-based testing with explicit memory limits. +- Enhance `benchmark_report.py` to provide more sophisticated analysis of performance and memory trends from the generated JSON/CSV files. + +--- + +## [2025-04-17] Refined Stress Testing System Parameters and Execution + +**Changes Made:** +1. Corrected `run_benchmark.py` and `stress_test_sdk.py` to use `--max-sessions` instead of the incorrect `--workers` parameter, accurately reflecting dispatcher configuration. +2. Updated `run_benchmark.py` argument handling to correctly pass all relevant custom parameters (including `--stream`, `--monitor-mode`, etc.) to `stress_test_sdk.py`. +3. (Assuming changes in `benchmark_report.py`) Applied dark theme to benchmark reports for better readability. +4. (Assuming changes in `benchmark_report.py`) Improved visualization code to eliminate matplotlib warnings. +5. Updated `run_benchmark.py` to provide clickable `file://` links to generated reports in the terminal output. +6. Updated `USAGE.md` with comprehensive parameter descriptions reflecting the final script arguments. +7. Updated `run_all.sh` wrapper to correctly invoke `run_benchmark.py` with flexible arguments. + +**Details of Changes:** + +1. **Parameter Correction (`--max-sessions`)**: + * Identified the fundamental misunderstanding where `--workers` was used incorrectly. + * Refactored `stress_test_sdk.py` to accept `--max-sessions` and configure the `MemoryAdaptiveDispatcher`'s `max_session_permit` accordingly. + * Updated `run_benchmark.py` argument parsing and command construction to use `--max-sessions`. + * Updated `TEST_CONFIGS` in `run_benchmark.py` to use `max_sessions`. + +2. **Argument Handling (`run_benchmark.py`)**: + * Improved logic to collect all command-line arguments provided to `run_benchmark.py`. + * Ensured all relevant arguments (like `--stream`, `--monitor-mode`, `--port`, `--use-rate-limiter`, etc.) are correctly forwarded when calling `stress_test_sdk.py` as a subprocess. + +3. **Dark Theme & Visualization Fixes (Assumed in `benchmark_report.py`)**: + * (Describes changes assumed to be made in the separate reporting script). + +4. **Clickable Links (`run_benchmark.py`)**: + * Added logic to find the latest HTML report and PNG chart in the `benchmark_reports` directory after `benchmark_report.py` runs. + * Used `pathlib` to generate correct `file://` URLs for terminal output. + +5. **Documentation Improvements (`USAGE.md`)**: + * Rewrote sections to explain `arun_many`, dispatchers, and `--max-sessions`. + * Updated parameter tables for all scripts (`stress_test_sdk.py`, `run_benchmark.py`). + * Clarified the difference between batch and streaming modes and their effect on logging. + * Updated examples to use correct arguments. + +**Files Modified:** +- `stress_test_sdk.py`: Changed `--workers` to `--max-sessions`, added new arguments, used `arun_many`. +- `run_benchmark.py`: Changed argument handling, updated configs, calls `stress_test_sdk.py`. +- `run_all.sh`: Updated to call `run_benchmark.py` correctly. +- `USAGE.md`: Updated documentation extensively. +- `benchmark_report.py`: (Assumed modifications for dark theme and viz fixes). + +**Testing:** +- Verified that `--max-sessions` correctly limits concurrency via the `CrawlerMonitor` output. +- Confirmed that custom arguments passed to `run_benchmark.py` are forwarded to `stress_test_sdk.py`. +- Validated clickable links work in supporting terminals. +- Ensured documentation matches the final script parameters and behavior. + +**Why These Changes:** +These refinements correct the fundamental approach of the stress test to align with `crawl4ai`'s actual architecture and intended usage: +1. Ensures the test evaluates the correct components (`arun_many`, `MemoryAdaptiveDispatcher`). +2. Makes test configurations more accurate and flexible. +3. Improves the usability of the testing framework through better argument handling and documentation. + + +**Future Enhancements to Consider:** +- Add support for generated JavaScript content to test JS rendering performance +- Implement more sophisticated memory analysis like generational garbage collection tracking +- Add support for Docker-based testing with memory limits to force OOM conditions +- Create visualization tools for analyzing memory usage patterns across test runs +- Add benchmark comparisons between different crawler versions or configurations + +## [2025-04-17] Fixed Issues in Stress Testing System + +**Changes Made:** +1. Fixed custom parameter handling in run_benchmark.py +2. Applied dark theme to benchmark reports for better readability +3. Improved visualization code to eliminate matplotlib warnings +4. Added clickable links to generated reports in terminal output +5. Enhanced documentation with comprehensive parameter descriptions + +**Details of Changes:** + +1. **Custom Parameter Handling Fix** + - Identified bug where custom URL count was being ignored in run_benchmark.py + - Rewrote argument handling to use a custom args dictionary + - Properly passed parameters to the test_simple_stress.py command + - Added better UI indication of custom parameters in use + +2. **Dark Theme Implementation** + - Added complete dark theme to HTML benchmark reports + - Applied dark styling to all visualization components + - Used Nord-inspired color palette for charts and graphs + - Improved contrast and readability for data visualization + - Updated text colors and backgrounds for better eye comfort + +3. **Matplotlib Warning Fixes** + - Resolved warnings related to improper use of set_xticklabels() + - Implemented correct x-axis positioning for bar charts + - Ensured proper alignment of bar labels and data points + - Updated plotting code to use modern matplotlib practices + +4. **Documentation Improvements** + - Created comprehensive USAGE.md with detailed instructions + - Added parameter documentation for all scripts + - Included examples for all common use cases + - Provided detailed explanations for interpreting results + - Added troubleshooting guide for common issues + +**Files Modified:** +- `tests/memory/run_benchmark.py`: Fixed custom parameter handling +- `tests/memory/benchmark_report.py`: Added dark theme and fixed visualization warnings +- `tests/memory/run_all.sh`: Added clickable links to reports +- `tests/memory/USAGE.md`: Created comprehensive documentation + +**Testing:** +- Verified that custom URL counts are now correctly used +- Confirmed dark theme is properly applied to all report elements +- Checked that matplotlib warnings are no longer appearing +- Validated clickable links to reports work in terminals that support them + +**Why These Changes:** +These improvements address several usability issues with the stress testing system: +1. Better parameter handling ensures test configurations work as expected +2. Dark theme reduces eye strain during extended test review sessions +3. Fixing visualization warnings improves code quality and output clarity +4. Enhanced documentation makes the system more accessible for future use + +**Future Enhancements:** +- Add additional visualization options for different types of analysis +- Implement theme toggle to support both light and dark preferences +- Add export options for embedding reports in other documentation +- Create dedicated CI/CD integration templates for automated testing ## [2025-04-09] Added MHTML Capture Feature diff --git a/tests/memory/README.md b/tests/memory/README.md new file mode 100644 index 00000000..164ef095 --- /dev/null +++ b/tests/memory/README.md @@ -0,0 +1,315 @@ +# Crawl4AI Stress Testing and Benchmarking + +This directory contains tools for stress testing Crawl4AI's `arun_many` method and dispatcher system with high volumes of URLs to evaluate performance, concurrency handling, and potentially detect memory issues. It also includes a benchmarking system to track performance over time. + +## Quick Start + +```bash +# Run a default stress test (small config) and generate a report +# (Assumes run_all.sh is updated to call run_benchmark.py) +./run_all.sh +``` +*Note: `run_all.sh` might need to be updated if it directly called the old script.* + +## Overview + +The stress testing system works by: + +1. Generating a local test site with heavy HTML pages (regenerated by default for each test). +2. Starting a local HTTP server to serve these pages. +3. Running Crawl4AI's `arun_many` method against this local site using the `MemoryAdaptiveDispatcher` with configurable concurrency (`max_sessions`). +4. Monitoring performance metrics via the `CrawlerMonitor` and optionally logging memory usage. +5. Optionally generating detailed benchmark reports with visualizations using `benchmark_report.py`. + +## Available Tools + +- `test_stress_sdk.py` - Main stress testing script utilizing `arun_many` and dispatchers. +- `benchmark_report.py` - Report generator for comparing test results (assumes compatibility with `test_stress_sdk.py` outputs). +- `run_benchmark.py` - Python script with predefined test configurations that orchestrates tests using `test_stress_sdk.py`. +- `run_all.sh` - Simple wrapper script (may need updating). + +## Usage Guide + +### Using Predefined Configurations (Recommended) + +The `run_benchmark.py` script offers the easiest way to run standardized tests: + +```bash +# Quick test (50 URLs, 4 max sessions) +python run_benchmark.py quick + +# Medium test (500 URLs, 16 max sessions) +python run_benchmark.py medium + +# Large test (1000 URLs, 32 max sessions) +python run_benchmark.py large + +# Extreme test (2000 URLs, 64 max sessions) +python run_benchmark.py extreme + +# Custom configuration +python run_benchmark.py custom --urls 300 --max-sessions 24 --chunk-size 50 + +# Run 'small' test in streaming mode +python run_benchmark.py small --stream + +# Override max_sessions for the 'medium' config +python run_benchmark.py medium --max-sessions 20 + +# Skip benchmark report generation after the test +python run_benchmark.py small --no-report + +# Clean up reports and site files before running +python run_benchmark.py medium --clean +``` + +#### `run_benchmark.py` Parameters + +| Parameter | Default | Description | +| -------------------- | --------------- | --------------------------------------------------------------------------- | +| `config` | *required* | Test configuration: `quick`, `small`, `medium`, `large`, `extreme`, `custom`| +| `--urls` | config-specific | Number of URLs (required for `custom`) | +| `--max-sessions` | config-specific | Max concurrent sessions managed by dispatcher (required for `custom`) | +| `--chunk-size` | config-specific | URLs per batch for non-stream logging (required for `custom`) | +| `--stream` | False | Enable streaming results (disables batch logging) | +| `--monitor-mode` | DETAILED | `DETAILED` or `AGGREGATED` display for the live monitor | +| `--use-rate-limiter` | False | Enable basic rate limiter in the dispatcher | +| `--port` | 8000 | HTTP server port | +| `--no-report` | False | Skip generating comparison report via `benchmark_report.py` | +| `--clean` | False | Clean up reports and site files before running | +| `--keep-server-alive`| False | Keep local HTTP server running after test | +| `--use-existing-site`| False | Use existing site on specified port (no local server start/site gen) | +| `--skip-generation` | False | Use existing site files but start local server | +| `--keep-site` | False | Keep generated site files after test | + +#### Predefined Configurations + +| Configuration | URLs | Max Sessions | Chunk Size | Description | +| ------------- | ------ | ------------ | ---------- | -------------------------------- | +| `quick` | 50 | 4 | 10 | Quick test for basic validation | +| `small` | 100 | 8 | 20 | Small test for routine checks | +| `medium` | 500 | 16 | 50 | Medium test for thorough checks | +| `large` | 1000 | 32 | 100 | Large test for stress testing | +| `extreme` | 2000 | 64 | 200 | Extreme test for limit testing | + +### Direct Usage of `test_stress_sdk.py` + +For fine-grained control or debugging, you can run the stress test script directly: + +```bash +# Test with 200 URLs and 32 max concurrent sessions +python test_stress_sdk.py --urls 200 --max-sessions 32 --chunk-size 40 + +# Clean up previous test data first +python test_stress_sdk.py --clean-reports --clean-site --urls 100 --max-sessions 16 --chunk-size 20 + +# Change the HTTP server port and use aggregated monitor +python test_stress_sdk.py --port 8088 --urls 100 --max-sessions 16 --monitor-mode AGGREGATED + +# Enable streaming mode and use rate limiting +python test_stress_sdk.py --urls 50 --max-sessions 8 --stream --use-rate-limiter + +# Change report output location +python test_stress_sdk.py --report-path custom_reports --urls 100 --max-sessions 16 +``` + +#### `test_stress_sdk.py` Parameters + +| Parameter | Default | Description | +| -------------------- | ---------- | -------------------------------------------------------------------- | +| `--urls` | 100 | Number of URLs to test | +| `--max-sessions` | 16 | Maximum concurrent crawling sessions managed by the dispatcher | +| `--chunk-size` | 10 | Number of URLs per batch (relevant for non-stream logging) | +| `--stream` | False | Enable streaming results (disables batch logging) | +| `--monitor-mode` | DETAILED | `DETAILED` or `AGGREGATED` display for the live `CrawlerMonitor` | +| `--use-rate-limiter` | False | Enable a basic `RateLimiter` within the dispatcher | +| `--site-path` | "test_site"| Path to store/use the generated test site | +| `--port` | 8000 | Port for the local HTTP server | +| `--report-path` | "reports" | Path to save test result summary (JSON) and memory samples (CSV) | +| `--skip-generation` | False | Use existing test site files but still start local server | +| `--use-existing-site`| False | Use existing site on specified port (no local server/site gen) | +| `--keep-server-alive`| False | Keep local HTTP server running after test completion | +| `--keep-site` | False | Keep the generated test site files after test completion | +| `--clean-reports` | False | Clean up report directory before running | +| `--clean-site` | False | Clean up site directory before/after running (see script logic) | + +### Generating Reports Only + +If you only want to generate a benchmark report from existing test results (assuming `benchmark_report.py` is compatible): + +```bash +# Generate a report from existing test results in ./reports/ +python benchmark_report.py + +# Limit to the most recent 5 test results +python benchmark_report.py --limit 5 + +# Specify a custom source directory for test results +python benchmark_report.py --reports-dir alternate_results +``` + +#### `benchmark_report.py` Parameters (Assumed) + +| Parameter | Default | Description | +| --------------- | -------------------- | ----------------------------------------------------------- | +| `--reports-dir` | "reports" | Directory containing `test_stress_sdk.py` result files | +| `--output-dir` | "benchmark_reports" | Directory to save generated HTML reports and charts | +| `--limit` | None (all results) | Limit comparison to N most recent test results | +| `--output-file` | Auto-generated | Custom output filename for the HTML report | + +## Understanding the Test Output + +### Real-time Progress Display (`CrawlerMonitor`) + +When running `test_stress_sdk.py`, the `CrawlerMonitor` provides a live view of the crawling process managed by the dispatcher. + +- **DETAILED Mode (Default):** Shows individual task status (Queued, Active, Completed, Failed), timings, memory usage per task (if `psutil` is available), overall queue statistics, and memory pressure status (if `psutil` available). +- **AGGREGATED Mode:** Shows summary counts (Queued, Active, Completed, Failed), overall progress percentage, estimated time remaining, average URLs/sec, and memory pressure status. + +### Batch Log Output (Non-Streaming Mode Only) + +If running `test_stress_sdk.py` **without** the `--stream` flag, you will *also* see per-batch summary lines printed to the console *after* the monitor display, once each chunk of URLs finishes processing: + +``` + Batch | Progress | Start Mem | End Mem | URLs/sec | Success/Fail | Time (s) | Status +─────────────────────────────────────────────────────────────────────────────────────────── + 1 | 10.0% | 50.1 MB | 55.3 MB | 23.8 | 10/0 | 0.42 | Success + 2 | 20.0% | 55.3 MB | 60.1 MB | 24.1 | 10/0 | 0.41 | Success + ... +``` + +This display provides chunk-specific metrics: +- **Batch**: The batch number being reported. +- **Progress**: Overall percentage of total URLs processed *after* this batch. +- **Start Mem / End Mem**: Memory usage before and after processing this batch (if tracked). +- **URLs/sec**: Processing speed *for this specific batch*. +- **Success/Fail**: Number of successful and failed URLs *in this batch*. +- **Time (s)**: Wall-clock time taken to process *this batch*. +- **Status**: Color-coded status for the batch outcome. + +### Summary Output + +After test completion, a final summary is displayed: + +``` +================================================================================ +Test Completed +================================================================================ +Test ID: 20250418_103015 +Configuration: 100 URLs, 16 max sessions, Chunk: 10, Stream: False, Monitor: DETAILED +Results: 100 successful, 0 failed (100 processed, 100.0% success) +Performance: 5.85 seconds total, 17.09 URLs/second avg +Memory Usage: Start: 50.1 MB, End: 75.3 MB, Max: 78.1 MB, Growth: 25.2 MB +Results summary saved to reports/test_summary_20250418_103015.json +``` + +### HTML Report Structure (Generated by `benchmark_report.py`) + +(This section remains the same, assuming `benchmark_report.py` generates these) +The benchmark report contains several sections: +1. **Summary**: Overview of the latest test results and trends +2. **Performance Comparison**: Charts showing throughput across tests +3. **Memory Usage**: Detailed memory usage graphs for each test +4. **Detailed Results**: Tabular data of all test metrics +5. **Conclusion**: Automated analysis of performance and memory patterns + +### Memory Metrics + +(This section remains conceptually the same) +Memory growth is the key metric for detecting leaks... + +### Performance Metrics + +(This section remains conceptually the same, though "URLs per Worker" is less relevant - focus on overall URLs/sec) +Key performance indicators include: +- **URLs per Second**: Higher is better (throughput) +- **Success Rate**: Should be 100% in normal conditions +- **Total Processing Time**: Lower is better +- **Dispatcher Efficiency**: Observe queue lengths and wait times in the monitor (Detailed mode) + +### Raw Data Files + +Raw data is saved in the `--report-path` directory (default `./reports/`): + +- **JSON files** (`test_summary_*.json`): Contains the final summary for each test run. +- **CSV files** (`memory_samples_*.csv`): Contains time-series memory samples taken during the test run. + +Example of reading raw data: +```python +import json +import pandas as pd + +# Load test summary +test_id = "20250418_103015" # Example ID +with open(f'reports/test_summary_{test_id}.json', 'r') as f: + results = json.load(f) + +# Load memory samples +memory_df = pd.read_csv(f'reports/memory_samples_{test_id}.csv') + +# Analyze memory_df (e.g., calculate growth, plot) +if not memory_df['memory_info_mb'].isnull().all(): + growth = memory_df['memory_info_mb'].iloc[-1] - memory_df['memory_info_mb'].iloc[0] + print(f"Total Memory Growth: {growth:.1f} MB") +else: + print("No valid memory samples found.") + +print(f"Avg URLs/sec: {results['urls_processed'] / results['total_time_seconds']:.2f}") +``` + +## Visualization Dependencies + +(This section remains the same) +For full visualization capabilities in the HTML reports generated by `benchmark_report.py`, install additional dependencies... + +## Directory Structure + +``` +benchmarking/ # Or your top-level directory name +├── benchmark_reports/ # Generated HTML reports (by benchmark_report.py) +├── reports/ # Raw test result data (from test_stress_sdk.py) +├── test_site/ # Generated test content (temporary) +├── benchmark_report.py# Report generator +├── run_benchmark.py # Test runner with predefined configs +├── test_stress_sdk.py # Main stress test implementation using arun_many +└── run_all.sh # Simple wrapper script (may need updates) +#└── requirements.txt # Optional: Visualization dependencies for benchmark_report.py +``` + +## Cleanup + +To clean up after testing: + +```bash +# Remove the test site content (if not using --keep-site) +rm -rf test_site + +# Remove all raw reports and generated benchmark reports +rm -rf reports benchmark_reports + +# Or use the --clean flag with run_benchmark.py +python run_benchmark.py medium --clean +``` + +## Use in CI/CD + +(This section remains conceptually the same, just update script names) +These tests can be integrated into CI/CD pipelines: +```bash +# Example CI script +python run_benchmark.py medium --no-report # Run test without interactive report gen +# Check exit code +if [ $? -ne 0 ]; then echo "Stress test failed!"; exit 1; fi +# Optionally, run report generator and check its output/metrics +# python benchmark_report.py +# check_report_metrics.py reports/test_summary_*.json || exit 1 +exit 0 +``` + +## Troubleshooting + +- **HTTP Server Port Conflict**: Use `--port` with `run_benchmark.py` or `test_stress_sdk.py`. +- **Memory Tracking Issues**: The `SimpleMemoryTracker` uses platform commands (`ps`, `/proc`, `tasklist`). Ensure these are available and the script has permission. If it consistently fails, memory reporting will be limited. +- **Visualization Missing**: Related to `benchmark_report.py` and its dependencies. +- **Site Generation Issues**: Check permissions for creating `./test_site/`. Use `--skip-generation` if you want to manage the site manually. +- **Testing Against External Site**: Ensure the external site is running and use `--use-existing-site --port `. diff --git a/tests/memory/benchmark_report.py b/tests/memory/benchmark_report.py new file mode 100755 index 00000000..a634f997 --- /dev/null +++ b/tests/memory/benchmark_report.py @@ -0,0 +1,887 @@ +#!/usr/bin/env python3 +""" +Benchmark reporting tool for Crawl4AI stress tests. +Generates visual reports and comparisons between test runs. +""" + +import os +import json +import glob +import argparse +import sys +from datetime import datetime +from pathlib import Path +from rich.console import Console +from rich.table import Table +from rich.panel import Panel + +# Initialize rich console +console = Console() + +# Try to import optional visualization dependencies +VISUALIZATION_AVAILABLE = True +try: + import pandas as pd + import matplotlib.pyplot as plt + import matplotlib as mpl + import numpy as np + import seaborn as sns +except ImportError: + VISUALIZATION_AVAILABLE = False + console.print("[yellow]Warning: Visualization dependencies not found. Install with:[/yellow]") + console.print("[yellow]pip install pandas matplotlib seaborn[/yellow]") + console.print("[yellow]Only text-based reports will be generated.[/yellow]") + +# Configure plotting if available +if VISUALIZATION_AVAILABLE: + # Set plot style for dark theme + plt.style.use('dark_background') + sns.set_theme(style="darkgrid") + + # Custom color palette based on Nord theme + nord_palette = ["#88c0d0", "#81a1c1", "#a3be8c", "#ebcb8b", "#bf616a", "#b48ead", "#5e81ac"] + sns.set_palette(nord_palette) + +class BenchmarkReporter: + """Generates visual reports and comparisons for Crawl4AI stress tests.""" + + def __init__(self, reports_dir="reports", output_dir="benchmark_reports"): + """Initialize the benchmark reporter. + + Args: + reports_dir: Directory containing test result files + output_dir: Directory to save generated reports + """ + self.reports_dir = Path(reports_dir) + self.output_dir = Path(output_dir) + self.output_dir.mkdir(parents=True, exist_ok=True) + + # Configure matplotlib if available + if VISUALIZATION_AVAILABLE: + # Ensure the matplotlib backend works in headless environments + mpl.use('Agg') + + # Set up styling for plots with dark theme + mpl.rcParams['figure.figsize'] = (12, 8) + mpl.rcParams['font.size'] = 12 + mpl.rcParams['axes.labelsize'] = 14 + mpl.rcParams['axes.titlesize'] = 16 + mpl.rcParams['xtick.labelsize'] = 12 + mpl.rcParams['ytick.labelsize'] = 12 + mpl.rcParams['legend.fontsize'] = 12 + mpl.rcParams['figure.facecolor'] = '#1e1e1e' + mpl.rcParams['axes.facecolor'] = '#2e3440' + mpl.rcParams['savefig.facecolor'] = '#1e1e1e' + mpl.rcParams['text.color'] = '#e0e0e0' + mpl.rcParams['axes.labelcolor'] = '#e0e0e0' + mpl.rcParams['xtick.color'] = '#e0e0e0' + mpl.rcParams['ytick.color'] = '#e0e0e0' + mpl.rcParams['grid.color'] = '#444444' + mpl.rcParams['figure.edgecolor'] = '#444444' + + def load_test_results(self, limit=None): + """Load all test results from the reports directory. + + Args: + limit: Optional limit on number of most recent tests to load + + Returns: + Dictionary mapping test IDs to result data + """ + result_files = glob.glob(str(self.reports_dir / "test_results_*.json")) + + # Sort files by modification time (newest first) + result_files.sort(key=os.path.getmtime, reverse=True) + + if limit: + result_files = result_files[:limit] + + results = {} + for file_path in result_files: + try: + with open(file_path, 'r') as f: + data = json.load(f) + test_id = data.get('test_id') + if test_id: + results[test_id] = data + + # Try to load the corresponding memory samples + csv_path = self.reports_dir / f"memory_samples_{test_id}.csv" + if csv_path.exists(): + try: + memory_df = pd.read_csv(csv_path) + results[test_id]['memory_samples'] = memory_df + except Exception as e: + console.print(f"[yellow]Warning: Could not load memory samples for {test_id}: {e}[/yellow]") + except Exception as e: + console.print(f"[red]Error loading {file_path}: {e}[/red]") + + console.print(f"Loaded {len(results)} test results") + return results + + def generate_summary_table(self, results): + """Generate a summary table of test results. + + Args: + results: Dictionary mapping test IDs to result data + + Returns: + Rich Table object + """ + table = Table(title="Crawl4AI Stress Test Summary", show_header=True) + + # Define columns + table.add_column("Test ID", style="cyan") + table.add_column("Date", style="bright_green") + table.add_column("URLs", justify="right") + table.add_column("Workers", justify="right") + table.add_column("Success %", justify="right") + table.add_column("Time (s)", justify="right") + table.add_column("Mem Growth", justify="right") + table.add_column("URLs/sec", justify="right") + + # Add rows + for test_id, data in sorted(results.items(), key=lambda x: x[0], reverse=True): + # Parse timestamp from test_id + try: + date_str = datetime.strptime(test_id, "%Y%m%d_%H%M%S").strftime("%Y-%m-%d %H:%M") + except: + date_str = "Unknown" + + # Calculate success percentage + total_urls = data.get('url_count', 0) + successful = data.get('successful_urls', 0) + success_pct = (successful / total_urls * 100) if total_urls > 0 else 0 + + # Calculate memory growth if available + mem_growth = "N/A" + if 'memory_samples' in data: + samples = data['memory_samples'] + if len(samples) >= 2: + # Try to extract numeric values from memory_info strings + try: + first_mem = float(samples.iloc[0]['memory_info'].split()[0]) + last_mem = float(samples.iloc[-1]['memory_info'].split()[0]) + mem_growth = f"{last_mem - first_mem:.1f} MB" + except: + pass + + # Calculate URLs per second + time_taken = data.get('total_time_seconds', 0) + urls_per_sec = total_urls / time_taken if time_taken > 0 else 0 + + table.add_row( + test_id, + date_str, + str(total_urls), + str(data.get('workers', 'N/A')), + f"{success_pct:.1f}%", + f"{data.get('total_time_seconds', 0):.2f}", + mem_growth, + f"{urls_per_sec:.1f}" + ) + + return table + + def generate_performance_chart(self, results, output_file=None): + """Generate a performance comparison chart. + + Args: + results: Dictionary mapping test IDs to result data + output_file: File path to save the chart + + Returns: + Path to the saved chart file or None if visualization is not available + """ + if not VISUALIZATION_AVAILABLE: + console.print("[yellow]Skipping performance chart - visualization dependencies not available[/yellow]") + return None + + # Extract relevant data + data = [] + for test_id, result in results.items(): + urls = result.get('url_count', 0) + workers = result.get('workers', 0) + time_taken = result.get('total_time_seconds', 0) + urls_per_sec = urls / time_taken if time_taken > 0 else 0 + + # Parse timestamp from test_id for sorting + try: + timestamp = datetime.strptime(test_id, "%Y%m%d_%H%M%S") + data.append({ + 'test_id': test_id, + 'timestamp': timestamp, + 'urls': urls, + 'workers': workers, + 'time_seconds': time_taken, + 'urls_per_sec': urls_per_sec + }) + except: + console.print(f"[yellow]Warning: Could not parse timestamp from {test_id}[/yellow]") + + if not data: + console.print("[yellow]No valid data for performance chart[/yellow]") + return None + + # Convert to DataFrame and sort by timestamp + df = pd.DataFrame(data) + df = df.sort_values('timestamp') + + # Create the plot + fig, ax1 = plt.subplots(figsize=(12, 6)) + + # Plot URLs per second as bars with properly set x-axis + x_pos = range(len(df['test_id'])) + bars = ax1.bar(x_pos, df['urls_per_sec'], color='#88c0d0', alpha=0.8) + ax1.set_ylabel('URLs per Second', color='#88c0d0') + ax1.tick_params(axis='y', labelcolor='#88c0d0') + + # Properly set x-axis labels + ax1.set_xticks(x_pos) + ax1.set_xticklabels(df['test_id'].tolist(), rotation=45, ha='right') + + # Add worker count as text on each bar + for i, bar in enumerate(bars): + height = bar.get_height() + workers = df.iloc[i]['workers'] + ax1.text(i, height + 0.1, + f'W: {workers}', ha='center', va='bottom', fontsize=9, color='#e0e0e0') + + # Add a second y-axis for total URLs + ax2 = ax1.twinx() + ax2.plot(x_pos, df['urls'], '-', color='#bf616a', alpha=0.8, markersize=6, marker='o') + ax2.set_ylabel('Total URLs', color='#bf616a') + ax2.tick_params(axis='y', labelcolor='#bf616a') + + # Set title and layout + plt.title('Crawl4AI Performance Benchmarks') + plt.tight_layout() + + # Save the figure + if output_file is None: + output_file = self.output_dir / "performance_comparison.png" + plt.savefig(output_file, dpi=100, bbox_inches='tight') + plt.close() + + return output_file + + def generate_memory_charts(self, results, output_prefix=None): + """Generate memory usage charts for each test. + + Args: + results: Dictionary mapping test IDs to result data + output_prefix: Prefix for output file names + + Returns: + List of paths to the saved chart files + """ + if not VISUALIZATION_AVAILABLE: + console.print("[yellow]Skipping memory charts - visualization dependencies not available[/yellow]") + return [] + + output_files = [] + + for test_id, result in results.items(): + if 'memory_samples' not in result: + continue + + memory_df = result['memory_samples'] + + # Check if we have enough data points + if len(memory_df) < 2: + continue + + # Try to extract numeric values from memory_info strings + try: + memory_values = [] + for mem_str in memory_df['memory_info']: + # Extract the number from strings like "142.8 MB" + value = float(mem_str.split()[0]) + memory_values.append(value) + + memory_df['memory_mb'] = memory_values + except Exception as e: + console.print(f"[yellow]Could not parse memory values for {test_id}: {e}[/yellow]") + continue + + # Create the plot + plt.figure(figsize=(10, 6)) + + # Plot memory usage over time + plt.plot(memory_df['elapsed_seconds'], memory_df['memory_mb'], + color='#88c0d0', marker='o', linewidth=2, markersize=4) + + # Add annotations for chunk processing + chunk_size = result.get('chunk_size', 0) + url_count = result.get('url_count', 0) + if chunk_size > 0 and url_count > 0: + # Estimate chunk processing times + num_chunks = (url_count + chunk_size - 1) // chunk_size # Ceiling division + total_time = result.get('total_time_seconds', memory_df['elapsed_seconds'].max()) + chunk_times = np.linspace(0, total_time, num_chunks + 1)[1:] + + for i, time_point in enumerate(chunk_times): + if time_point <= memory_df['elapsed_seconds'].max(): + plt.axvline(x=time_point, color='#4c566a', linestyle='--', alpha=0.6) + plt.text(time_point, memory_df['memory_mb'].min(), f'Chunk {i+1}', + rotation=90, verticalalignment='bottom', fontsize=8, color='#e0e0e0') + + # Set labels and title + plt.xlabel('Elapsed Time (seconds)', color='#e0e0e0') + plt.ylabel('Memory Usage (MB)', color='#e0e0e0') + plt.title(f'Memory Usage During Test {test_id}\n({url_count} URLs, {result.get("workers", "?")} Workers)', + color='#e0e0e0') + + # Add grid and set y-axis to start from zero + plt.grid(True, alpha=0.3, color='#4c566a') + + # Add test metadata as text + info_text = ( + f"URLs: {url_count}\n" + f"Workers: {result.get('workers', 'N/A')}\n" + f"Chunk Size: {result.get('chunk_size', 'N/A')}\n" + f"Total Time: {result.get('total_time_seconds', 0):.2f}s\n" + ) + + # Calculate memory growth + if len(memory_df) >= 2: + first_mem = memory_df.iloc[0]['memory_mb'] + last_mem = memory_df.iloc[-1]['memory_mb'] + growth = last_mem - first_mem + growth_rate = growth / result.get('total_time_seconds', 1) + + info_text += f"Memory Growth: {growth:.1f} MB\n" + info_text += f"Growth Rate: {growth_rate:.2f} MB/s" + + plt.figtext(0.02, 0.02, info_text, fontsize=9, color='#e0e0e0', + bbox=dict(facecolor='#3b4252', alpha=0.8, edgecolor='#4c566a')) + + # Save the figure + if output_prefix is None: + output_file = self.output_dir / f"memory_chart_{test_id}.png" + else: + output_file = Path(f"{output_prefix}_memory_{test_id}.png") + + plt.tight_layout() + plt.savefig(output_file, dpi=100, bbox_inches='tight') + plt.close() + + output_files.append(output_file) + + return output_files + + def generate_comparison_report(self, results, title=None, output_file=None): + """Generate a comprehensive comparison report of multiple test runs. + + Args: + results: Dictionary mapping test IDs to result data + title: Optional title for the report + output_file: File path to save the report + + Returns: + Path to the saved report file + """ + if not results: + console.print("[yellow]No results to generate comparison report[/yellow]") + return None + + if output_file is None: + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + output_file = self.output_dir / f"comparison_report_{timestamp}.html" + + # Create data for the report + rows = [] + for test_id, data in results.items(): + # Calculate metrics + urls = data.get('url_count', 0) + workers = data.get('workers', 0) + successful = data.get('successful_urls', 0) + failed = data.get('failed_urls', 0) + time_seconds = data.get('total_time_seconds', 0) + + # Calculate additional metrics + success_rate = (successful / urls) * 100 if urls > 0 else 0 + urls_per_second = urls / time_seconds if time_seconds > 0 else 0 + urls_per_worker = urls / workers if workers > 0 else 0 + + # Calculate memory growth if available + mem_start = None + mem_end = None + mem_growth = None + if 'memory_samples' in data: + samples = data['memory_samples'] + if len(samples) >= 2: + try: + first_mem = float(samples.iloc[0]['memory_info'].split()[0]) + last_mem = float(samples.iloc[-1]['memory_info'].split()[0]) + mem_start = first_mem + mem_end = last_mem + mem_growth = last_mem - first_mem + except: + pass + + # Parse timestamp from test_id + try: + timestamp = datetime.strptime(test_id, "%Y%m%d_%H%M%S") + except: + timestamp = None + + rows.append({ + 'test_id': test_id, + 'timestamp': timestamp, + 'date': timestamp.strftime("%Y-%m-%d %H:%M:%S") if timestamp else "Unknown", + 'urls': urls, + 'workers': workers, + 'chunk_size': data.get('chunk_size', 0), + 'successful': successful, + 'failed': failed, + 'success_rate': success_rate, + 'time_seconds': time_seconds, + 'urls_per_second': urls_per_second, + 'urls_per_worker': urls_per_worker, + 'memory_start': mem_start, + 'memory_end': mem_end, + 'memory_growth': mem_growth + }) + + # Sort data by timestamp if possible + if VISUALIZATION_AVAILABLE: + # Convert to DataFrame and sort by timestamp + df = pd.DataFrame(rows) + if 'timestamp' in df.columns and not df['timestamp'].isna().all(): + df = df.sort_values('timestamp', ascending=False) + else: + # Simple sorting without pandas + rows.sort(key=lambda x: x.get('timestamp', datetime.now()), reverse=True) + df = None + + # Generate HTML report + html = [] + html.append('') + html.append('') + html.append('') + html.append('') + html.append('') + html.append(f'{title or "Crawl4AI Benchmark Comparison"}') + html.append('') + html.append('') + html.append('') + + # Header + html.append(f'

{title or "Crawl4AI Benchmark Comparison"}

') + html.append(f'

Report generated on {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}

') + + # Summary section + html.append('
') + html.append('

Summary

') + html.append('

This report compares the performance of Crawl4AI across multiple test runs.

') + + # Summary metrics + data_available = (VISUALIZATION_AVAILABLE and df is not None and not df.empty) or (not VISUALIZATION_AVAILABLE and len(rows) > 0) + if data_available: + # Get the latest test data + if VISUALIZATION_AVAILABLE and df is not None and not df.empty: + latest_test = df.iloc[0] + latest_id = latest_test['test_id'] + else: + latest_test = rows[0] # First row (already sorted by timestamp) + latest_id = latest_test['test_id'] + + html.append('

Latest Test Results

') + html.append('') + + # If we have more than one test, show trend + if (VISUALIZATION_AVAILABLE and df is not None and len(df) > 1) or (not VISUALIZATION_AVAILABLE and len(rows) > 1): + if VISUALIZATION_AVAILABLE and df is not None: + prev_test = df.iloc[1] + else: + prev_test = rows[1] + + # Calculate performance change + perf_change = ((latest_test["urls_per_second"] / prev_test["urls_per_second"]) - 1) * 100 if prev_test["urls_per_second"] > 0 else 0 + + status_class = "" + if perf_change > 5: + status_class = "status-good" + elif perf_change < -5: + status_class = "status-bad" + + html.append('

Performance Trend

') + html.append('') + + html.append('
') + + # Generate performance chart if visualization is available + if VISUALIZATION_AVAILABLE: + perf_chart = self.generate_performance_chart(results) + if perf_chart: + html.append('
') + html.append('

Performance Comparison

') + html.append(f'Performance Comparison Chart') + html.append('
') + else: + html.append('
') + html.append('

Performance Comparison

') + html.append('

Charts not available - install visualization dependencies (pandas, matplotlib, seaborn) to enable.

') + html.append('
') + + # Generate memory charts if visualization is available + if VISUALIZATION_AVAILABLE: + memory_charts = self.generate_memory_charts(results) + if memory_charts: + html.append('
') + html.append('

Memory Usage

') + + for chart in memory_charts: + test_id = chart.stem.split('_')[-1] + html.append(f'

Test {test_id}

') + html.append(f'Memory Chart for {test_id}') + + html.append('
') + else: + html.append('
') + html.append('

Memory Usage

') + html.append('

Charts not available - install visualization dependencies (pandas, matplotlib, seaborn) to enable.

') + html.append('
') + + # Detailed results table + html.append('

Detailed Results

') + + # Add the results as an HTML table + html.append('') + + # Table headers + html.append('') + for col in ['Test ID', 'Date', 'URLs', 'Workers', 'Success %', 'Time (s)', 'URLs/sec', 'Mem Growth (MB)']: + html.append(f'') + html.append('') + + # Table rows - handle both pandas DataFrame and list of dicts + if VISUALIZATION_AVAILABLE and df is not None: + # Using pandas DataFrame + for _, row in df.iterrows(): + html.append('') + html.append(f'') + html.append(f'') + html.append(f'') + html.append(f'') + html.append(f'') + html.append(f'') + html.append(f'') + + # Memory growth cell + if pd.notna(row["memory_growth"]): + html.append(f'') + else: + html.append('') + + html.append('') + else: + # Using list of dicts (when pandas is not available) + for row in rows: + html.append('') + html.append(f'') + html.append(f'') + html.append(f'') + html.append(f'') + html.append(f'') + html.append(f'') + html.append(f'') + + # Memory growth cell + if row["memory_growth"] is not None: + html.append(f'') + else: + html.append('') + + html.append('') + + html.append('
{col}
{row["test_id"]}{row["date"]}{row["urls"]}{row["workers"]}{row["success_rate"]:.1f}%{row["time_seconds"]:.2f}{row["urls_per_second"]:.1f}{row["memory_growth"]:.1f}N/A
{row["test_id"]}{row["date"]}{row["urls"]}{row["workers"]}{row["success_rate"]:.1f}%{row["time_seconds"]:.2f}{row["urls_per_second"]:.1f}{row["memory_growth"]:.1f}N/A
') + + # Conclusion section + html.append('
') + html.append('

Conclusion

') + + if VISUALIZATION_AVAILABLE and df is not None and not df.empty: + # Using pandas for statistics (when available) + # Calculate some overall statistics + avg_urls_per_sec = df['urls_per_second'].mean() + max_urls_per_sec = df['urls_per_second'].max() + + # Determine if we have a trend + if len(df) > 1: + trend_data = df.sort_values('timestamp') + first_perf = trend_data.iloc[0]['urls_per_second'] + last_perf = trend_data.iloc[-1]['urls_per_second'] + + perf_change = ((last_perf / first_perf) - 1) * 100 if first_perf > 0 else 0 + + if perf_change > 10: + trend_desc = "significantly improved" + trend_class = "status-good" + elif perf_change > 5: + trend_desc = "improved" + trend_class = "status-good" + elif perf_change < -10: + trend_desc = "significantly decreased" + trend_class = "status-bad" + elif perf_change < -5: + trend_desc = "decreased" + trend_class = "status-bad" + else: + trend_desc = "remained stable" + trend_class = "" + + html.append(f'

Overall performance has {trend_desc} over the test period.

') + + html.append(f'

Average throughput: {avg_urls_per_sec:.1f} URLs/second

') + html.append(f'

Maximum throughput: {max_urls_per_sec:.1f} URLs/second

') + + # Memory leak assessment + if 'memory_growth' in df.columns and not df['memory_growth'].isna().all(): + avg_growth = df['memory_growth'].mean() + max_growth = df['memory_growth'].max() + + if avg_growth < 5: + leak_assessment = "No significant memory leaks detected" + leak_class = "status-good" + elif avg_growth < 10: + leak_assessment = "Minor memory growth observed" + leak_class = "status-warning" + else: + leak_assessment = "Potential memory leak detected" + leak_class = "status-bad" + + html.append(f'

{leak_assessment}. Average memory growth: {avg_growth:.1f} MB per test.

') + else: + # Manual calculations without pandas + if rows: + # Calculate average and max throughput + total_urls_per_sec = sum(row['urls_per_second'] for row in rows) + avg_urls_per_sec = total_urls_per_sec / len(rows) + max_urls_per_sec = max(row['urls_per_second'] for row in rows) + + html.append(f'

Average throughput: {avg_urls_per_sec:.1f} URLs/second

') + html.append(f'

Maximum throughput: {max_urls_per_sec:.1f} URLs/second

') + + # Memory assessment (simplified without pandas) + growth_values = [row['memory_growth'] for row in rows if row['memory_growth'] is not None] + if growth_values: + avg_growth = sum(growth_values) / len(growth_values) + + if avg_growth < 5: + leak_assessment = "No significant memory leaks detected" + leak_class = "status-good" + elif avg_growth < 10: + leak_assessment = "Minor memory growth observed" + leak_class = "status-warning" + else: + leak_assessment = "Potential memory leak detected" + leak_class = "status-bad" + + html.append(f'

{leak_assessment}. Average memory growth: {avg_growth:.1f} MB per test.

') + else: + html.append('

No test data available for analysis.

') + + html.append('
') + + # Footer + html.append('
') + html.append('

Generated by Crawl4AI Benchmark Reporter

') + html.append('
') + + html.append('') + html.append('') + + # Write the HTML file + with open(output_file, 'w') as f: + f.write('\n'.join(html)) + + # Print a clickable link for terminals that support it (iTerm, VS Code, etc.) + file_url = f"file://{os.path.abspath(output_file)}" + console.print(f"[green]Comparison report saved to: {output_file}[/green]") + console.print(f"[blue underline]Click to open report: {file_url}[/blue underline]") + return output_file + + def run(self, limit=None, output_file=None): + """Generate a full benchmark report. + + Args: + limit: Optional limit on number of most recent tests to include + output_file: Optional output file path + + Returns: + Path to the generated report file + """ + # Load test results + results = self.load_test_results(limit=limit) + + if not results: + console.print("[yellow]No test results found. Run some tests first.[/yellow]") + return None + + # Generate and display summary table + summary_table = self.generate_summary_table(results) + console.print(summary_table) + + # Generate comparison report + title = f"Crawl4AI Benchmark Report ({len(results)} test runs)" + report_file = self.generate_comparison_report(results, title=title, output_file=output_file) + + if report_file: + console.print(f"[bold green]Report generated successfully: {report_file}[/bold green]") + return report_file + else: + console.print("[bold red]Failed to generate report[/bold red]") + return None + + +def main(): + """Main entry point for the benchmark reporter.""" + parser = argparse.ArgumentParser(description="Generate benchmark reports for Crawl4AI stress tests") + + parser.add_argument("--reports-dir", type=str, default="reports", + help="Directory containing test result files") + parser.add_argument("--output-dir", type=str, default="benchmark_reports", + help="Directory to save generated reports") + parser.add_argument("--limit", type=int, default=None, + help="Limit to most recent N test results") + parser.add_argument("--output-file", type=str, default=None, + help="Custom output file path for the report") + + args = parser.parse_args() + + # Create the benchmark reporter + reporter = BenchmarkReporter(reports_dir=args.reports_dir, output_dir=args.output_dir) + + # Generate the report + report_file = reporter.run(limit=args.limit, output_file=args.output_file) + + if report_file: + print(f"Report generated at: {report_file}") + return 0 + else: + print("Failed to generate report") + return 1 + + +if __name__ == "__main__": + import sys + sys.exit(main()) \ No newline at end of file diff --git a/tests/memory/requirements.txt b/tests/memory/requirements.txt new file mode 100644 index 00000000..230e0e1f --- /dev/null +++ b/tests/memory/requirements.txt @@ -0,0 +1,4 @@ +pandas>=1.5.0 +matplotlib>=3.5.0 +seaborn>=0.12.0 +rich>=12.0.0 \ No newline at end of file diff --git a/tests/memory/run_benchmark.py b/tests/memory/run_benchmark.py new file mode 100755 index 00000000..1e110ddf --- /dev/null +++ b/tests/memory/run_benchmark.py @@ -0,0 +1,259 @@ +#!/usr/bin/env python3 +""" +Run a complete Crawl4AI benchmark test using test_stress_sdk.py and generate a report. +""" + +import sys +import os +import glob +import argparse +import subprocess +import time +from datetime import datetime + +from rich.console import Console +from rich.text import Text + +console = Console() + +# Updated TEST_CONFIGS to use max_sessions +TEST_CONFIGS = { + "quick": {"urls": 50, "max_sessions": 4, "chunk_size": 10, "description": "Quick test (50 URLs, 4 sessions)"}, + "small": {"urls": 100, "max_sessions": 8, "chunk_size": 20, "description": "Small test (100 URLs, 8 sessions)"}, + "medium": {"urls": 500, "max_sessions": 16, "chunk_size": 50, "description": "Medium test (500 URLs, 16 sessions)"}, + "large": {"urls": 1000, "max_sessions": 32, "chunk_size": 100,"description": "Large test (1000 URLs, 32 sessions)"}, + "extreme": {"urls": 2000, "max_sessions": 64, "chunk_size": 200,"description": "Extreme test (2000 URLs, 64 sessions)"}, +} + +# Arguments to forward directly if present in custom_args +FORWARD_ARGS = { + "urls": "--urls", + "max_sessions": "--max-sessions", + "chunk_size": "--chunk-size", + "port": "--port", + "monitor_mode": "--monitor-mode", +} +# Boolean flags to forward if True +FORWARD_FLAGS = { + "stream": "--stream", + "use_rate_limiter": "--use-rate-limiter", + "keep_server_alive": "--keep-server-alive", + "use_existing_site": "--use-existing-site", + "skip_generation": "--skip-generation", + "keep_site": "--keep-site", + "clean_reports": "--clean-reports", # Note: clean behavior is handled here, but pass flag if needed + "clean_site": "--clean-site", # Note: clean behavior is handled here, but pass flag if needed +} + +def run_benchmark(config_name, custom_args=None, compare=True, clean=False): + """Runs the stress test and optionally the report generator.""" + if config_name not in TEST_CONFIGS and config_name != "custom": + console.print(f"[bold red]Unknown configuration: {config_name}[/bold red]") + return False + + # Print header + title = "Crawl4AI SDK Benchmark Test" + if config_name != "custom": + title += f" - {TEST_CONFIGS[config_name]['description']}" + else: + # Safely get custom args for title + urls = custom_args.get('urls', '?') if custom_args else '?' + sessions = custom_args.get('max_sessions', '?') if custom_args else '?' + title += f" - Custom ({urls} URLs, {sessions} sessions)" + + console.print(f"\n[bold blue]{title}[/bold blue]") + console.print("=" * (len(title) + 4)) # Adjust underline length + + console.print("\n[bold white]Preparing test...[/bold white]") + + # --- Command Construction --- + # Use the new script name + cmd = ["python", "test_stress_sdk.py"] + + # Apply config or custom args + args_to_use = {} + if config_name != "custom": + args_to_use = TEST_CONFIGS[config_name].copy() + # If custom args are provided (e.g., boolean flags), overlay them + if custom_args: + args_to_use.update(custom_args) + elif custom_args: # Custom config + args_to_use = custom_args.copy() + + # Add arguments with values + for key, arg_name in FORWARD_ARGS.items(): + if key in args_to_use: + cmd.extend([arg_name, str(args_to_use[key])]) + + # Add boolean flags + for key, flag_name in FORWARD_FLAGS.items(): + if args_to_use.get(key, False): # Check if key exists and is True + # Special handling for clean flags - apply locally, don't forward? + # Decide if test_stress_sdk.py also needs --clean flags or if run_benchmark handles it. + # For now, let's assume run_benchmark handles cleaning based on its own --clean flag. + # We'll forward other flags. + if key not in ["clean_reports", "clean_site"]: + cmd.append(flag_name) + + # Handle the top-level --clean flag for run_benchmark + if clean: + # Pass clean flags to the stress test script as well, if needed + # This assumes test_stress_sdk.py also uses --clean-reports and --clean-site + cmd.append("--clean-reports") + cmd.append("--clean-site") + console.print("[yellow]Applying --clean: Cleaning reports and site before test.[/yellow]") + # Actual cleaning logic might reside here or be delegated entirely + + console.print(f"\n[bold white]Running stress test:[/bold white] {' '.join(cmd)}") + start = time.time() + + # Execute the stress test script + # Use Popen to stream output + try: + proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, encoding='utf-8', errors='replace') + while True: + line = proc.stdout.readline() + if not line: + break + console.print(line.rstrip()) # Print line by line + proc.wait() # Wait for the process to complete + except FileNotFoundError: + console.print(f"[bold red]Error: Script 'test_stress_sdk.py' not found. Make sure it's in the correct directory.[/bold red]") + return False + except Exception as e: + console.print(f"[bold red]Error running stress test subprocess: {e}[/bold red]") + return False + + + if proc.returncode != 0: + console.print(f"[bold red]Stress test failed with exit code {proc.returncode}[/bold red]") + return False + + duration = time.time() - start + console.print(f"[bold green]Stress test completed in {duration:.1f} seconds[/bold green]") + + # --- Report Generation (Optional) --- + if compare: + # Assuming benchmark_report.py exists and works with the generated reports + report_script = "benchmark_report.py" # Keep configurable if needed + report_cmd = ["python", report_script] + console.print(f"\n[bold white]Generating benchmark report: {' '.join(report_cmd)}[/bold white]") + + # Run the report command and capture output + try: + report_proc = subprocess.run(report_cmd, capture_output=True, text=True, check=False, encoding='utf-8', errors='replace') # Use check=False to handle potential errors + + # Print the captured output from benchmark_report.py + if report_proc.stdout: + console.print("\n" + report_proc.stdout) + if report_proc.stderr: + console.print("[yellow]Report generator stderr:[/yellow]\n" + report_proc.stderr) + + if report_proc.returncode != 0: + console.print(f"[bold yellow]Benchmark report generation script '{report_script}' failed with exit code {report_proc.returncode}[/bold yellow]") + # Don't return False here, test itself succeeded + else: + console.print(f"[bold green]Benchmark report script '{report_script}' completed.[/bold green]") + + # Find and print clickable links to the reports + # Assuming reports are saved in 'benchmark_reports' by benchmark_report.py + report_dir = "benchmark_reports" + if os.path.isdir(report_dir): + report_files = glob.glob(os.path.join(report_dir, "comparison_report_*.html")) + if report_files: + try: + latest_report = max(report_files, key=os.path.getctime) + report_path = os.path.abspath(latest_report) + report_url = pathlib.Path(report_path).as_uri() # Better way to create file URI + console.print(f"[bold cyan]Click to open report: [link={report_url}]{report_url}[/link][/bold cyan]") + except Exception as e: + console.print(f"[yellow]Could not determine latest report: {e}[/yellow]") + + chart_files = glob.glob(os.path.join(report_dir, "memory_chart_*.png")) + if chart_files: + try: + latest_chart = max(chart_files, key=os.path.getctime) + chart_path = os.path.abspath(latest_chart) + chart_url = pathlib.Path(chart_path).as_uri() + console.print(f"[cyan]Memory chart: [link={chart_url}]{chart_url}[/link][/cyan]") + except Exception as e: + console.print(f"[yellow]Could not determine latest chart: {e}[/yellow]") + else: + console.print(f"[yellow]Benchmark report directory '{report_dir}' not found. Cannot link reports.[/yellow]") + + except FileNotFoundError: + console.print(f"[bold red]Error: Report script '{report_script}' not found.[/bold red]") + except Exception as e: + console.print(f"[bold red]Error running report generation subprocess: {e}[/bold red]") + + + # Prompt to exit + console.print("\n[bold green]Benchmark run finished. Press Enter to exit.[/bold green]") + try: + input() # Wait for user input + except EOFError: + pass # Handle case where input is piped or unavailable + + return True + +def main(): + parser = argparse.ArgumentParser(description="Run a Crawl4AI SDK benchmark test and generate a report") + + # --- Arguments --- + parser.add_argument("config", choices=list(TEST_CONFIGS) + ["custom"], + help="Test configuration: quick, small, medium, large, extreme, or custom") + + # Arguments for 'custom' config or to override presets + parser.add_argument("--urls", type=int, help="Number of URLs") + parser.add_argument("--max-sessions", type=int, help="Max concurrent sessions (replaces --workers)") + parser.add_argument("--chunk-size", type=int, help="URLs per batch (for non-stream logging)") + parser.add_argument("--port", type=int, help="HTTP server port") + parser.add_argument("--monitor-mode", type=str, choices=["DETAILED", "AGGREGATED"], help="Monitor display mode") + + # Boolean flags / options + parser.add_argument("--stream", action="store_true", help="Enable streaming results (disables batch logging)") + parser.add_argument("--use-rate-limiter", action="store_true", help="Enable basic rate limiter") + parser.add_argument("--no-report", action="store_true", help="Skip generating comparison report") + parser.add_argument("--clean", action="store_true", help="Clean up reports and site before running") + parser.add_argument("--keep-server-alive", action="store_true", help="Keep HTTP server running after test") + parser.add_argument("--use-existing-site", action="store_true", help="Use existing site on specified port") + parser.add_argument("--skip-generation", action="store_true", help="Use existing site files without regenerating") + parser.add_argument("--keep-site", action="store_true", help="Keep generated site files after test") + # Removed url_level_logging as it's implicitly handled by stream/batch mode now + + args = parser.parse_args() + + custom_args = {} + + # Populate custom_args from explicit command-line args + if args.urls is not None: custom_args["urls"] = args.urls + if args.max_sessions is not None: custom_args["max_sessions"] = args.max_sessions + if args.chunk_size is not None: custom_args["chunk_size"] = args.chunk_size + if args.port is not None: custom_args["port"] = args.port + if args.monitor_mode is not None: custom_args["monitor_mode"] = args.monitor_mode + if args.stream: custom_args["stream"] = True + if args.use_rate_limiter: custom_args["use_rate_limiter"] = True + if args.keep_server_alive: custom_args["keep_server_alive"] = True + if args.use_existing_site: custom_args["use_existing_site"] = True + if args.skip_generation: custom_args["skip_generation"] = True + if args.keep_site: custom_args["keep_site"] = True + # Clean flags are handled by the 'clean' argument passed to run_benchmark + + # Validate custom config requirements + if args.config == "custom": + required_custom = ["urls", "max_sessions", "chunk_size"] + missing = [f"--{arg}" for arg in required_custom if arg not in custom_args] + if missing: + console.print(f"[bold red]Error: 'custom' config requires: {', '.join(missing)}[/bold red]") + return 1 + + success = run_benchmark( + config_name=args.config, + custom_args=custom_args, # Pass all collected custom args + compare=not args.no_report, + clean=args.clean + ) + return 0 if success else 1 + +if __name__ == "__main__": + sys.exit(main()) \ No newline at end of file diff --git a/tests/memory/test_stress_sdk.py b/tests/memory/test_stress_sdk.py new file mode 100644 index 00000000..8000690c --- /dev/null +++ b/tests/memory/test_stress_sdk.py @@ -0,0 +1,500 @@ +#!/usr/bin/env python3 +""" +Stress test for Crawl4AI's arun_many and dispatcher system. +This version uses a local HTTP server and focuses on testing +the SDK's ability to handle multiple URLs concurrently, with per-batch logging. +""" + +import asyncio +import os +import time +import pathlib +import random +import secrets +import argparse +import json +import sys +import subprocess +import signal +from typing import List, Dict, Optional, Union, AsyncGenerator +import shutil +from rich.console import Console + +# Crawl4AI components +from crawl4ai import ( + AsyncWebCrawler, + CrawlerRunConfig, + BrowserConfig, + MemoryAdaptiveDispatcher, + CrawlerMonitor, + DisplayMode, + CrawlResult, + RateLimiter, + CacheMode, +) + +# Constants +DEFAULT_SITE_PATH = "test_site" +DEFAULT_PORT = 8000 +DEFAULT_MAX_SESSIONS = 16 +DEFAULT_URL_COUNT = 100 +DEFAULT_CHUNK_SIZE = 10 # Define chunk size for batch logging +DEFAULT_REPORT_PATH = "reports" +DEFAULT_STREAM_MODE = False +DEFAULT_MONITOR_MODE = "DETAILED" + +# Initialize Rich console +console = Console() + +# --- SiteGenerator Class (Unchanged) --- +class SiteGenerator: + """Generates a local test site with heavy pages for stress testing.""" + + def __init__(self, site_path: str = DEFAULT_SITE_PATH, page_count: int = DEFAULT_URL_COUNT): + self.site_path = pathlib.Path(site_path) + self.page_count = page_count + self.images_dir = self.site_path / "images" + self.lorem_words = " ".join("lorem ipsum dolor sit amet " * 100).split() + + self.html_template = """ + + + Test Page {page_num} + + + +

Test Page {page_num}

+ {paragraphs} + {images} + + +""" + + def generate_site(self) -> None: + self.site_path.mkdir(parents=True, exist_ok=True) + self.images_dir.mkdir(exist_ok=True) + console.print(f"Generating {self.page_count} test pages...") + for i in range(self.page_count): + paragraphs = "\n".join(f"

{' '.join(random.choices(self.lorem_words, k=200))}

" for _ in range(5)) + images = "\n".join(f'Random image {j}' for j in range(3)) + page_path = self.site_path / f"page_{i}.html" + page_path.write_text(self.html_template.format(page_num=i, paragraphs=paragraphs, images=images), encoding="utf-8") + if (i + 1) % (self.page_count // 10 or 1) == 0 or i == self.page_count - 1: + console.print(f"Generated {i+1}/{self.page_count} pages") + self._create_index_page() + console.print(f"[bold green]Successfully generated {self.page_count} test pages in [cyan]{self.site_path}[/cyan][/bold green]") + + def _create_index_page(self) -> None: + index_content = """Test Site Index

Test Site Index

This is an automatically generated site for testing Crawl4AI.

""" + (self.site_path / "index.html").write_text(index_content, encoding="utf-8") + +# --- LocalHttpServer Class (Unchanged) --- +class LocalHttpServer: + """Manages a local HTTP server for serving test pages.""" + def __init__(self, site_path: str = DEFAULT_SITE_PATH, port: int = DEFAULT_PORT): + self.site_path = pathlib.Path(site_path) + self.port = port + self.process = None + + def start(self) -> None: + if not self.site_path.exists(): raise FileNotFoundError(f"Site directory {self.site_path} does not exist") + console.print(f"Attempting to start HTTP server in [cyan]{self.site_path}[/cyan] on port {self.port}...") + try: + cmd = ["python", "-m", "http.server", str(self.port)] + creationflags = 0; preexec_fn = None + if sys.platform == 'win32': creationflags = subprocess.CREATE_NEW_PROCESS_GROUP + self.process = subprocess.Popen(cmd, cwd=str(self.site_path), stdout=subprocess.PIPE, stderr=subprocess.PIPE, creationflags=creationflags) + time.sleep(1.5) + if self.is_running(): console.print(f"[bold green]HTTP server started successfully (PID: {self.process.pid})[/bold green]") + else: + console.print("[bold red]Failed to start HTTP server. Checking logs...[/bold red]") + stdout, stderr = self.process.communicate(); print(stdout.decode(errors='ignore')); print(stderr.decode(errors='ignore')) + self.stop(); raise RuntimeError("HTTP server failed to start.") + except Exception as e: console.print(f"[bold red]Error starting HTTP server: {str(e)}[/bold red]"); self.stop(); raise + + def stop(self) -> None: + if self.process and self.is_running(): + console.print(f"Stopping HTTP server (PID: {self.process.pid})...") + try: + if sys.platform == 'win32': self.process.send_signal(signal.CTRL_BREAK_EVENT); time.sleep(0.5) + self.process.terminate() + try: stdout, stderr = self.process.communicate(timeout=5); console.print("[bold yellow]HTTP server stopped[/bold yellow]") + except subprocess.TimeoutExpired: console.print("[bold red]Server did not terminate gracefully, killing...[/bold red]"); self.process.kill(); stdout, stderr = self.process.communicate(); console.print("[bold yellow]HTTP server killed[/bold yellow]") + except Exception as e: console.print(f"[bold red]Error stopping HTTP server: {str(e)}[/bold red]"); self.process.kill() + finally: self.process = None + elif self.process: console.print("[dim]HTTP server process already stopped.[/dim]"); self.process = None + + def is_running(self) -> bool: + if not self.process: return False + return self.process.poll() is None + +# --- SimpleMemoryTracker Class (Unchanged) --- +class SimpleMemoryTracker: + """Basic memory tracker that doesn't rely on psutil.""" + def __init__(self, report_path: str = DEFAULT_REPORT_PATH, test_id: Optional[str] = None): + self.report_path = pathlib.Path(report_path); self.report_path.mkdir(parents=True, exist_ok=True) + self.test_id = test_id or time.strftime("%Y%m%d_%H%M%S") + self.start_time = time.time(); self.memory_samples = []; self.pid = os.getpid() + self.csv_path = self.report_path / f"memory_samples_{self.test_id}.csv" + with open(self.csv_path, 'w', encoding='utf-8') as f: f.write("timestamp,elapsed_seconds,memory_info_mb\n") + + def sample(self) -> Dict: + try: + memory_mb = self._get_memory_info_mb() + memory_str = f"{memory_mb:.1f} MB" if memory_mb is not None else "Unknown" + timestamp = time.time(); elapsed = timestamp - self.start_time + sample = {"timestamp": timestamp, "elapsed_seconds": elapsed, "memory_mb": memory_mb, "memory_str": memory_str} + self.memory_samples.append(sample) + with open(self.csv_path, 'a', encoding='utf-8') as f: f.write(f"{timestamp},{elapsed:.2f},{memory_mb if memory_mb is not None else ''}\n") + return sample + except Exception as e: return {"memory_mb": None, "memory_str": "Error"} + + def _get_memory_info_mb(self) -> Optional[float]: + pid_str = str(self.pid) + try: + if sys.platform == 'darwin': result = subprocess.run(["ps", "-o", "rss=", "-p", pid_str], capture_output=True, text=True, check=True, encoding='utf-8'); return int(result.stdout.strip()) / 1024.0 + elif sys.platform == 'linux': + with open(f"/proc/{pid_str}/status", encoding='utf-8') as f: + for line in f: + if line.startswith("VmRSS:"): return int(line.split()[1]) / 1024.0 + return None + elif sys.platform == 'win32': result = subprocess.run(["tasklist", "/fi", f"PID eq {pid_str}", "/fo", "csv", "/nh"], capture_output=True, text=True, check=True, encoding='cp850', errors='ignore'); parts = result.stdout.strip().split('","'); return int(parts[4].strip().replace('"', '').replace(' K', '').replace(',', '')) / 1024.0 if len(parts) >= 5 else None + else: return None + except: return None # Catch all exceptions for robustness + + def get_report(self) -> Dict: + if not self.memory_samples: return {"error": "No memory samples collected"} + total_time = time.time() - self.start_time; valid_samples = [s['memory_mb'] for s in self.memory_samples if s['memory_mb'] is not None] + start_mem = valid_samples[0] if valid_samples else None; end_mem = valid_samples[-1] if valid_samples else None + max_mem = max(valid_samples) if valid_samples else None; avg_mem = sum(valid_samples) / len(valid_samples) if valid_samples else None + growth = (end_mem - start_mem) if start_mem is not None and end_mem is not None else None + return {"test_id": self.test_id, "total_time_seconds": total_time, "sample_count": len(self.memory_samples), "valid_sample_count": len(valid_samples), "csv_path": str(self.csv_path), "platform": sys.platform, "start_memory_mb": start_mem, "end_memory_mb": end_mem, "max_memory_mb": max_mem, "average_memory_mb": avg_mem, "memory_growth_mb": growth} + + +# --- CrawlerStressTest Class (Refactored for Per-Batch Logging) --- +class CrawlerStressTest: + """Orchestrates the stress test using arun_many per chunk and a dispatcher.""" + + def __init__( + self, + url_count: int = DEFAULT_URL_COUNT, + port: int = DEFAULT_PORT, + max_sessions: int = DEFAULT_MAX_SESSIONS, + chunk_size: int = DEFAULT_CHUNK_SIZE, # Added chunk_size + report_path: str = DEFAULT_REPORT_PATH, + stream_mode: bool = DEFAULT_STREAM_MODE, + monitor_mode: str = DEFAULT_MONITOR_MODE, + use_rate_limiter: bool = False + ): + self.url_count = url_count + self.server_port = port + self.max_sessions = max_sessions + self.chunk_size = chunk_size # Store chunk size + self.report_path = pathlib.Path(report_path) + self.report_path.mkdir(parents=True, exist_ok=True) + self.stream_mode = stream_mode + self.monitor_mode = DisplayMode[monitor_mode.upper()] + self.use_rate_limiter = use_rate_limiter + + self.test_id = time.strftime("%Y%m%d_%H%M%S") + self.results_summary = { + "test_id": self.test_id, "url_count": url_count, "max_sessions": max_sessions, + "chunk_size": chunk_size, "stream_mode": stream_mode, "monitor_mode": monitor_mode, + "rate_limiter_used": use_rate_limiter, "start_time": "", "end_time": "", + "total_time_seconds": 0, "successful_urls": 0, "failed_urls": 0, + "urls_processed": 0, "chunks_processed": 0 + } + + async def run(self) -> Dict: + """Run the stress test and return results.""" + memory_tracker = SimpleMemoryTracker(report_path=self.report_path, test_id=self.test_id) + urls = [f"http://localhost:{self.server_port}/page_{i}.html" for i in range(self.url_count)] + # Split URLs into chunks based on self.chunk_size + url_chunks = [urls[i:i+self.chunk_size] for i in range(0, len(urls), self.chunk_size)] + + self.results_summary["start_time"] = time.strftime("%Y-%m-%d %H:%M:%S") + start_time = time.time() + + config = CrawlerRunConfig( + wait_for_images=False, verbose=False, + stream=self.stream_mode, # Still pass stream mode, affects arun_many return type + cache_mode=CacheMode.BYPASS + ) + + total_successful_urls = 0 + total_failed_urls = 0 + total_urls_processed = 0 + start_memory_sample = memory_tracker.sample() + start_memory_str = start_memory_sample.get("memory_str", "Unknown") + + # monitor = CrawlerMonitor(display_mode=self.monitor_mode, total_urls=self.url_count) + monitor = None + rate_limiter = RateLimiter(base_delay=(0.1, 0.3)) if self.use_rate_limiter else None + dispatcher = MemoryAdaptiveDispatcher(max_session_permit=self.max_sessions, monitor=monitor, rate_limiter=rate_limiter) + + console.print(f"\n[bold cyan]Crawl4AI Stress Test - {self.url_count} URLs, {self.max_sessions} max sessions[/bold cyan]") + console.print(f"[bold cyan]Mode:[/bold cyan] {'Streaming' if self.stream_mode else 'Batch'}, [bold cyan]Monitor:[/bold cyan] {self.monitor_mode.name}, [bold cyan]Chunk Size:[/bold cyan] {self.chunk_size}") + console.print(f"[bold cyan]Initial Memory:[/bold cyan] {start_memory_str}") + + # Print batch log header only if not streaming + if not self.stream_mode: + console.print("\n[bold]Batch Progress:[/bold] (Monitor below shows overall progress)") + console.print("[bold] Batch | Progress | Start Mem | End Mem | URLs/sec | Success/Fail | Time (s) | Status [/bold]") + console.print("─" * 90) + + monitor_task = asyncio.create_task(self._periodic_memory_sample(memory_tracker, 2.0)) + + try: + async with AsyncWebCrawler( + config=BrowserConfig( verbose = False) + ) as crawler: + # Process URLs chunk by chunk + for chunk_idx, url_chunk in enumerate(url_chunks): + batch_start_time = time.time() + chunk_success = 0 + chunk_failed = 0 + + # Sample memory before the chunk + start_mem_sample = memory_tracker.sample() + start_mem_str = start_mem_sample.get("memory_str", "Unknown") + + # --- Call arun_many for the current chunk --- + try: + # Note: dispatcher/monitor persist across calls + results_gen_or_list: Union[AsyncGenerator[CrawlResult, None], List[CrawlResult]] = \ + await crawler.arun_many( + urls=url_chunk, + config=config, + dispatcher=dispatcher # Reuse the same dispatcher + ) + + if self.stream_mode: + # Process stream results if needed, but batch logging is less relevant + async for result in results_gen_or_list: + total_urls_processed += 1 + if result.success: chunk_success += 1 + else: chunk_failed += 1 + # In stream mode, batch summary isn't as meaningful here + # We could potentially track completion per chunk async, but it's complex + + else: # Batch mode + # Process the list of results for this chunk + for result in results_gen_or_list: + total_urls_processed += 1 + if result.success: chunk_success += 1 + else: chunk_failed += 1 + + except Exception as e: + console.print(f"[bold red]Error processing chunk {chunk_idx+1}: {e}[/bold red]") + chunk_failed = len(url_chunk) # Assume all failed in the chunk on error + total_urls_processed += len(url_chunk) # Count them as processed (failed) + + # --- Log batch results (only if not streaming) --- + if not self.stream_mode: + batch_time = time.time() - batch_start_time + urls_per_sec = len(url_chunk) / batch_time if batch_time > 0 else 0 + end_mem_sample = memory_tracker.sample() + end_mem_str = end_mem_sample.get("memory_str", "Unknown") + + progress_pct = (total_urls_processed / self.url_count) * 100 + + if chunk_failed == 0: status_color, status = "green", "Success" + elif chunk_success == 0: status_color, status = "red", "Failed" + else: status_color, status = "yellow", "Partial" + + console.print( + f" {chunk_idx+1:<5} | {progress_pct:6.1f}% | {start_mem_str:>9} | {end_mem_str:>9} | {urls_per_sec:8.1f} | " + f"{chunk_success:^7}/{chunk_failed:<6} | {batch_time:8.2f} | [{status_color}]{status:<7}[/{status_color}]" + ) + + # Accumulate totals + total_successful_urls += chunk_success + total_failed_urls += chunk_failed + self.results_summary["chunks_processed"] += 1 + + # Optional small delay between starting chunks if needed + # await asyncio.sleep(0.1) + + except Exception as e: + console.print(f"[bold red]An error occurred during the main crawl loop: {e}[/bold red]") + finally: + if 'monitor_task' in locals() and not monitor_task.done(): + monitor_task.cancel() + try: await monitor_task + except asyncio.CancelledError: pass + + end_time = time.time() + self.results_summary.update({ + "end_time": time.strftime("%Y-%m-%d %H:%M:%S"), + "total_time_seconds": end_time - start_time, + "successful_urls": total_successful_urls, + "failed_urls": total_failed_urls, + "urls_processed": total_urls_processed, + "memory": memory_tracker.get_report() + }) + self._save_results() + return self.results_summary + + async def _periodic_memory_sample(self, tracker: SimpleMemoryTracker, interval: float): + """Background task to sample memory periodically.""" + while True: + tracker.sample() + try: + await asyncio.sleep(interval) + except asyncio.CancelledError: + break # Exit loop on cancellation + + def _save_results(self) -> None: + results_path = self.report_path / f"test_summary_{self.test_id}.json" + try: + with open(results_path, 'w', encoding='utf-8') as f: json.dump(self.results_summary, f, indent=2, default=str) + # console.print(f"\n[bold green]Results summary saved to {results_path}[/bold green]") # Moved summary print to run_full_test + except Exception as e: console.print(f"[bold red]Failed to save results summary: {e}[/bold red]") + + +# --- run_full_test Function (Adjusted) --- +async def run_full_test(args): + """Run the complete test process from site generation to crawling.""" + server = None + site_generated = False + + # --- Site Generation --- (Same as before) + if not args.use_existing_site and not args.skip_generation: + if os.path.exists(args.site_path): console.print(f"[yellow]Removing existing site directory: {args.site_path}[/yellow]"); shutil.rmtree(args.site_path) + site_generator = SiteGenerator(site_path=args.site_path, page_count=args.urls); site_generator.generate_site(); site_generated = True + elif args.use_existing_site: console.print(f"[cyan]Using existing site assumed to be running on port {args.port}[/cyan]") + elif args.skip_generation: + console.print(f"[cyan]Skipping site generation, using existing directory: {args.site_path}[/cyan]") + if not os.path.exists(args.site_path) or not os.path.isdir(args.site_path): console.print(f"[bold red]Error: Site path '{args.site_path}' does not exist or is not a directory.[/bold red]"); return + + # --- Start Local Server --- (Same as before) + server_started = False + if not args.use_existing_site: + server = LocalHttpServer(site_path=args.site_path, port=args.port) + try: server.start(); server_started = True + except Exception as e: + console.print(f"[bold red]Failed to start local server. Aborting test.[/bold red]") + if site_generated and not args.keep_site: console.print(f"[yellow]Cleaning up generated site: {args.site_path}[/yellow]"); shutil.rmtree(args.site_path) + return + + try: + # --- Run the Stress Test --- + test = CrawlerStressTest( + url_count=args.urls, + port=args.port, + max_sessions=args.max_sessions, + chunk_size=args.chunk_size, # Pass chunk_size + report_path=args.report_path, + stream_mode=args.stream, + monitor_mode=args.monitor_mode, + use_rate_limiter=args.use_rate_limiter + ) + results = await test.run() # Run the test which now handles chunks internally + + # --- Print Summary --- + console.print("\n" + "=" * 80) + console.print("[bold green]Test Completed[/bold green]") + console.print("=" * 80) + + # (Summary printing logic remains largely the same) + success_rate = results["successful_urls"] / results["url_count"] * 100 if results["url_count"] > 0 else 0 + urls_per_second = results["urls_processed"] / results["total_time_seconds"] if results["total_time_seconds"] > 0 else 0 + + console.print(f"[bold cyan]Test ID:[/bold cyan] {results['test_id']}") + console.print(f"[bold cyan]Configuration:[/bold cyan] {results['url_count']} URLs, {results['max_sessions']} sessions, Chunk: {results['chunk_size']}, Stream: {results['stream_mode']}, Monitor: {results['monitor_mode']}") + console.print(f"[bold cyan]Results:[/bold cyan] {results['successful_urls']} successful, {results['failed_urls']} failed ({results['urls_processed']} processed, {success_rate:.1f}% success)") + console.print(f"[bold cyan]Performance:[/bold cyan] {results['total_time_seconds']:.2f} seconds total, {urls_per_second:.2f} URLs/second avg") + + mem_report = results.get("memory", {}) + mem_info_str = "Memory tracking data unavailable." + if mem_report and not mem_report.get("error"): + start_mb = mem_report.get('start_memory_mb'); end_mb = mem_report.get('end_memory_mb'); max_mb = mem_report.get('max_memory_mb'); growth_mb = mem_report.get('memory_growth_mb') + mem_parts = [] + if start_mb is not None: mem_parts.append(f"Start: {start_mb:.1f} MB") + if end_mb is not None: mem_parts.append(f"End: {end_mb:.1f} MB") + if max_mb is not None: mem_parts.append(f"Max: {max_mb:.1f} MB") + if growth_mb is not None: mem_parts.append(f"Growth: {growth_mb:.1f} MB") + if mem_parts: mem_info_str = ", ".join(mem_parts) + csv_path = mem_report.get('csv_path') + if csv_path: console.print(f"[dim]Memory samples saved to: {csv_path}[/dim]") + + console.print(f"[bold cyan]Memory Usage:[/bold cyan] {mem_info_str}") + console.print(f"[bold green]Results summary saved to {results['memory']['csv_path'].replace('memory_samples', 'test_summary').replace('.csv', '.json')}[/bold green]") # Infer summary path + + + if results["failed_urls"] > 0: console.print(f"\n[bold yellow]Warning: {results['failed_urls']} URLs failed to process ({100-success_rate:.1f}% failure rate)[/bold yellow]") + if results["urls_processed"] < results["url_count"]: console.print(f"\n[bold red]Error: Only {results['urls_processed']} out of {results['url_count']} URLs were processed![/bold red]") + + + finally: + # --- Stop Server / Cleanup --- (Same as before) + if server_started and server and not args.keep_server_alive: server.stop() + elif server_started and server and args.keep_server_alive: + console.print(f"[bold cyan]Server is kept running on port {args.port}. Press Ctrl+C to stop it.[/bold cyan]") + try: await asyncio.Future() # Keep running indefinitely + except KeyboardInterrupt: console.print("\n[bold yellow]Stopping server due to user interrupt...[/bold yellow]"); server.stop() + + if site_generated and not args.keep_site: console.print(f"[yellow]Cleaning up generated site: {args.site_path}[/yellow]"); shutil.rmtree(args.site_path) + elif args.clean_site and os.path.exists(args.site_path): console.print(f"[yellow]Cleaning up site directory as requested: {args.site_path}[/yellow]"); shutil.rmtree(args.site_path) + + +# --- main Function (Added chunk_size argument) --- +def main(): + """Main entry point for the script.""" + parser = argparse.ArgumentParser(description="Crawl4AI SDK High Volume Stress Test using arun_many") + + # Test parameters + parser.add_argument("--urls", type=int, default=DEFAULT_URL_COUNT, help=f"Number of URLs to test (default: {DEFAULT_URL_COUNT})") + parser.add_argument("--max-sessions", type=int, default=DEFAULT_MAX_SESSIONS, help=f"Maximum concurrent crawling sessions (default: {DEFAULT_MAX_SESSIONS})") + parser.add_argument("--chunk-size", type=int, default=DEFAULT_CHUNK_SIZE, help=f"Number of URLs per batch for logging (default: {DEFAULT_CHUNK_SIZE})") # Added + parser.add_argument("--stream", action="store_true", default=DEFAULT_STREAM_MODE, help=f"Enable streaming mode (disables batch logging) (default: {DEFAULT_STREAM_MODE})") + parser.add_argument("--monitor-mode", type=str, default=DEFAULT_MONITOR_MODE, choices=["DETAILED", "AGGREGATED"], help=f"Display mode for the live monitor (default: {DEFAULT_MONITOR_MODE})") + parser.add_argument("--use-rate-limiter", action="store_true", default=False, help="Enable a basic rate limiter (default: False)") + + # Environment parameters + parser.add_argument("--site-path", type=str, default=DEFAULT_SITE_PATH, help=f"Path to generate/use the test site (default: {DEFAULT_SITE_PATH})") + parser.add_argument("--port", type=int, default=DEFAULT_PORT, help=f"Port for the local HTTP server (default: {DEFAULT_PORT})") + parser.add_argument("--report-path", type=str, default=DEFAULT_REPORT_PATH, help=f"Path to save reports and logs (default: {DEFAULT_REPORT_PATH})") + + # Site/Server management + parser.add_argument("--skip-generation", action="store_true", help="Use existing test site folder without regenerating") + parser.add_argument("--use-existing-site", action="store_true", help="Do not generate site or start local server; assume site exists on --port") + parser.add_argument("--keep-server-alive", action="store_true", help="Keep the local HTTP server running after test") + parser.add_argument("--keep-site", action="store_true", help="Keep the generated test site files after test") + parser.add_argument("--clean-reports", action="store_true", help="Clean up report directory before running") + parser.add_argument("--clean-site", action="store_true", help="Clean up site directory before running (if generating) or after") + + args = parser.parse_args() + + # Display config + console.print("[bold underline]Crawl4AI SDK Stress Test Configuration[/bold underline]") + console.print(f"URLs: {args.urls}, Max Sessions: {args.max_sessions}, Chunk Size: {args.chunk_size}") # Added chunk size + console.print(f"Mode: {'Streaming' if args.stream else 'Batch'}, Monitor: {args.monitor_mode}, Rate Limit: {args.use_rate_limiter}") + console.print(f"Site Path: {args.site_path}, Port: {args.port}, Report Path: {args.report_path}") + console.print("-" * 40) + # (Rest of config display and cleanup logic is the same) + if args.use_existing_site: console.print("[cyan]Mode: Using existing external site/server[/cyan]") + elif args.skip_generation: console.print("[cyan]Mode: Using existing site files, starting local server[/cyan]") + else: console.print("[cyan]Mode: Generating site files, starting local server[/cyan]") + if args.keep_server_alive: console.print("[cyan]Option: Keep server alive after test[/cyan]") + if args.keep_site: console.print("[cyan]Option: Keep site files after test[/cyan]") + if args.clean_reports: console.print("[cyan]Option: Clean reports before test[/cyan]") + if args.clean_site: console.print("[cyan]Option: Clean site directory[/cyan]") + console.print("-" * 40) + + if args.clean_reports: + if os.path.exists(args.report_path): console.print(f"[yellow]Cleaning up reports directory: {args.report_path}[/yellow]"); shutil.rmtree(args.report_path) + os.makedirs(args.report_path, exist_ok=True) + if args.clean_site and not args.use_existing_site: + if os.path.exists(args.site_path): console.print(f"[yellow]Cleaning up site directory as requested: {args.site_path}[/yellow]"); shutil.rmtree(args.site_path) + + # Run + try: asyncio.run(run_full_test(args)) + except KeyboardInterrupt: console.print("\n[bold yellow]Test interrupted by user.[/bold yellow]") + except Exception as e: console.print(f"\n[bold red]An unexpected error occurred:[/bold red] {e}"); import traceback; traceback.print_exc() + +if __name__ == "__main__": + main() \ No newline at end of file