Merge branch 'next-stress' into next

2025-04-17 22:34:43 +08:00
parent 3bf78ff47a 921e0c46b6
commit 907cba194f
7 changed files with 2160 additions and 1 deletions
--- a/JOURNAL.md
+++ b/JOURNAL.md
@@ -42,6 +42,196 @@ This feature provides greater flexibility in how users generate markdown, enabli
 - Capture more detailed content from the original HTML when needed
 - Use schema-optimized HTML when working with structured data
 - Choose the approach that best suits their specific use case
+## [2025-04-17] Implemented High Volume Stress Testing Solution for SDK
+
+**Feature:** Comprehensive stress testing framework using `arun_many` and the dispatcher system to evaluate performance, concurrency handling, and identify potential issues under high-volume crawling scenarios.
+
+**Changes Made:**
+1.  Created a dedicated stress testing framework in the `benchmarking/` (or similar) directory.
+2.  Implemented local test site generation (`SiteGenerator`) with configurable heavy HTML pages.
+3.  Added basic memory usage tracking (`SimpleMemoryTracker`) using platform-specific commands (avoiding `psutil` dependency for this specific test).
+4.  Utilized `CrawlerMonitor` from `crawl4ai` for rich terminal UI and real-time monitoring of test progress and dispatcher activity.
+5.  Implemented detailed result summary saving (JSON) and memory sample logging (CSV).
+6.  Developed `run_benchmark.py` to orchestrate tests with predefined configurations.
+7.  Created `run_all.sh` as a simple wrapper for `run_benchmark.py`.
+
+**Implementation Details:**
+-   Generates a local test site with configurable pages containing heavy text and image content.
+-   Uses Python's built-in `http.server` for local serving, minimizing network variance.
+-   Leverages `crawl4ai`'s `arun_many` method for processing URLs.
+-   Utilizes `MemoryAdaptiveDispatcher` to manage concurrency via the `max_sessions` parameter (note: memory adaptation features require `psutil`, not used by `SimpleMemoryTracker`).
+-   Tracks memory usage via `SimpleMemoryTracker`, recording samples throughout test execution to a CSV file.
+-   Uses `CrawlerMonitor` (which uses the `rich` library) for clear terminal visualization and progress reporting directly from the dispatcher.
+-   Stores detailed final metrics in a JSON summary file.
+
+**Files Created/Updated:**
+-   `stress_test_sdk.py`: Main stress testing implementation using `arun_many`.
+-   `benchmark_report.py`: (Assumed) Report generator for comparing test results.
+-   `run_benchmark.py`: Test runner script with predefined configurations.
+-   `run_all.sh`: Simple bash script wrapper for `run_benchmark.py`.
+-   `USAGE.md`: Comprehensive documentation on usage and interpretation (updated).
+
+**Testing Approach:**
+-   Creates a controlled, reproducible test environment with a local HTTP server.
+-   Processes URLs using `arun_many`, allowing the dispatcher to manage concurrency up to `max_sessions`.
+-   Optionally logs per-batch summaries (when not in streaming mode) after processing chunks.
+-   Supports different test sizes via `run_benchmark.py` configurations.
+-   Records memory samples via platform commands for basic trend analysis.
+-   Includes cleanup functionality for the test environment.
+
+**Challenges:**
+-   Ensuring proper cleanup of HTTP server processes.
+-   Getting reliable memory tracking across platforms without adding heavy dependencies (`psutil`) to this specific test script.
+-   Designing `run_benchmark.py` to correctly pass arguments to `stress_test_sdk.py`.
+
+**Why This Feature:**
+The high volume stress testing solution addresses critical needs for ensuring Crawl4AI's `arun_many` reliability:
+1.  Provides a reproducible way to evaluate performance under concurrent load.
+2.  Allows testing the dispatcher's concurrency control (`max_session_permit`) and queue management.
+3.  Enables performance tuning by observing throughput (`URLs/sec`) under different `max_sessions` settings.
+4.  Creates a controlled environment for testing `arun_many` behavior.
+5.  Supports continuous integration by providing deterministic test conditions for `arun_many`.
+
+**Design Decisions:**
+-   Chose local site generation for reproducibility and isolation from network issues.
+-   Utilized the built-in `CrawlerMonitor` for real-time feedback, leveraging its `rich` integration.
+-   Implemented optional per-batch logging in `stress_test_sdk.py` (when not streaming) to provide chunk-level summaries alongside the continuous monitor.
+-   Adopted `arun_many` with a `MemoryAdaptiveDispatcher` as the core mechanism for parallel execution, reflecting the intended SDK usage.
+-   Created `run_benchmark.py` to simplify running standard test configurations.
+-   Used `SimpleMemoryTracker` to provide basic memory insights without requiring `psutil` for this particular test runner.
+
+**Future Enhancements to Consider:**
+-   Create a separate test variant that *does* use `psutil` to specifically stress the memory-adaptive features of the dispatcher.
+-   Add support for generated JavaScript content.
+-   Add support for Docker-based testing with explicit memory limits.
+-   Enhance `benchmark_report.py` to provide more sophisticated analysis of performance and memory trends from the generated JSON/CSV files.
+
+---
+
+## [2025-04-17] Refined Stress Testing System Parameters and Execution
+
+**Changes Made:**
+1.  Corrected `run_benchmark.py` and `stress_test_sdk.py` to use `--max-sessions` instead of the incorrect `--workers` parameter, accurately reflecting dispatcher configuration.
+2.  Updated `run_benchmark.py` argument handling to correctly pass all relevant custom parameters (including `--stream`, `--monitor-mode`, etc.) to `stress_test_sdk.py`.
+3.  (Assuming changes in `benchmark_report.py`) Applied dark theme to benchmark reports for better readability.
+4.  (Assuming changes in `benchmark_report.py`) Improved visualization code to eliminate matplotlib warnings.
+5.  Updated `run_benchmark.py` to provide clickable `file://` links to generated reports in the terminal output.
+6.  Updated `USAGE.md` with comprehensive parameter descriptions reflecting the final script arguments.
+7.  Updated `run_all.sh` wrapper to correctly invoke `run_benchmark.py` with flexible arguments.
+
+**Details of Changes:**
+
+1.  **Parameter Correction (`--max-sessions`)**:
+    *   Identified the fundamental misunderstanding where `--workers` was used incorrectly.
+    *   Refactored `stress_test_sdk.py` to accept `--max-sessions` and configure the `MemoryAdaptiveDispatcher`'s `max_session_permit` accordingly.
+    *   Updated `run_benchmark.py` argument parsing and command construction to use `--max-sessions`.
+    *   Updated `TEST_CONFIGS` in `run_benchmark.py` to use `max_sessions`.
+
+2.  **Argument Handling (`run_benchmark.py`)**:
+    *   Improved logic to collect all command-line arguments provided to `run_benchmark.py`.
+    *   Ensured all relevant arguments (like `--stream`, `--monitor-mode`, `--port`, `--use-rate-limiter`, etc.) are correctly forwarded when calling `stress_test_sdk.py` as a subprocess.
+
+3.  **Dark Theme & Visualization Fixes (Assumed in `benchmark_report.py`)**:
+    *   (Describes changes assumed to be made in the separate reporting script).
+
+4.  **Clickable Links (`run_benchmark.py`)**:
+    *   Added logic to find the latest HTML report and PNG chart in the `benchmark_reports` directory after `benchmark_report.py` runs.
+    *   Used `pathlib` to generate correct `file://` URLs for terminal output.
+
+5.  **Documentation Improvements (`USAGE.md`)**:
+    *   Rewrote sections to explain `arun_many`, dispatchers, and `--max-sessions`.
+    *   Updated parameter tables for all scripts (`stress_test_sdk.py`, `run_benchmark.py`).
+    *   Clarified the difference between batch and streaming modes and their effect on logging.
+    *   Updated examples to use correct arguments.
+
+**Files Modified:**
+-   `stress_test_sdk.py`: Changed `--workers` to `--max-sessions`, added new arguments, used `arun_many`.
+-   `run_benchmark.py`: Changed argument handling, updated configs, calls `stress_test_sdk.py`.
+-   `run_all.sh`: Updated to call `run_benchmark.py` correctly.
+-   `USAGE.md`: Updated documentation extensively.
+-   `benchmark_report.py`: (Assumed modifications for dark theme and viz fixes).
+
+**Testing:**
+-   Verified that `--max-sessions` correctly limits concurrency via the `CrawlerMonitor` output.
+-   Confirmed that custom arguments passed to `run_benchmark.py` are forwarded to `stress_test_sdk.py`.
+-   Validated clickable links work in supporting terminals.
+-   Ensured documentation matches the final script parameters and behavior.
+
+**Why These Changes:**
+These refinements correct the fundamental approach of the stress test to align with `crawl4ai`'s actual architecture and intended usage:
+1.  Ensures the test evaluates the correct components (`arun_many`, `MemoryAdaptiveDispatcher`).
+2.  Makes test configurations more accurate and flexible.
+3.  Improves the usability of the testing framework through better argument handling and documentation.
+
+
+**Future Enhancements to Consider:**
+- Add support for generated JavaScript content to test JS rendering performance
+- Implement more sophisticated memory analysis like generational garbage collection tracking
+- Add support for Docker-based testing with memory limits to force OOM conditions
+- Create visualization tools for analyzing memory usage patterns across test runs
+- Add benchmark comparisons between different crawler versions or configurations
+
+## [2025-04-17] Fixed Issues in Stress Testing System
+
+**Changes Made:**
+1. Fixed custom parameter handling in run_benchmark.py
+2. Applied dark theme to benchmark reports for better readability
+3. Improved visualization code to eliminate matplotlib warnings
+4. Added clickable links to generated reports in terminal output
+5. Enhanced documentation with comprehensive parameter descriptions
+
+**Details of Changes:**
+
+1. **Custom Parameter Handling Fix**
+   - Identified bug where custom URL count was being ignored in run_benchmark.py
+   - Rewrote argument handling to use a custom args dictionary
+   - Properly passed parameters to the test_simple_stress.py command
+   - Added better UI indication of custom parameters in use
+
+2. **Dark Theme Implementation**
+   - Added complete dark theme to HTML benchmark reports
+   - Applied dark styling to all visualization components
+   - Used Nord-inspired color palette for charts and graphs
+   - Improved contrast and readability for data visualization
+   - Updated text colors and backgrounds for better eye comfort
+
+3. **Matplotlib Warning Fixes**
+   - Resolved warnings related to improper use of set_xticklabels()
+   - Implemented correct x-axis positioning for bar charts
+   - Ensured proper alignment of bar labels and data points
+   - Updated plotting code to use modern matplotlib practices
+
+4. **Documentation Improvements**
+   - Created comprehensive USAGE.md with detailed instructions
+   - Added parameter documentation for all scripts
+   - Included examples for all common use cases
+   - Provided detailed explanations for interpreting results
+   - Added troubleshooting guide for common issues
+
+**Files Modified:**
+- `tests/memory/run_benchmark.py`: Fixed custom parameter handling
+- `tests/memory/benchmark_report.py`: Added dark theme and fixed visualization warnings
+- `tests/memory/run_all.sh`: Added clickable links to reports
+- `tests/memory/USAGE.md`: Created comprehensive documentation
+
+**Testing:**
+- Verified that custom URL counts are now correctly used
+- Confirmed dark theme is properly applied to all report elements
+- Checked that matplotlib warnings are no longer appearing
+- Validated clickable links to reports work in terminals that support them
+
+**Why These Changes:**
+These improvements address several usability issues with the stress testing system:
+1. Better parameter handling ensures test configurations work as expected
+2. Dark theme reduces eye strain during extended test review sessions
+3. Fixing visualization warnings improves code quality and output clarity
+4. Enhanced documentation makes the system more accessible for future use
+
+**Future Enhancements:**
+- Add additional visualization options for different types of analysis
+- Implement theme toggle to support both light and dark preferences
+- Add export options for embedding reports in other documentation
+- Create dedicated CI/CD integration templates for automated testing

 ## [2025-04-09] Added MHTML Capture Feature