Merge branch 'next-stress' into next
This commit is contained in:
190
JOURNAL.md
190
JOURNAL.md
@@ -42,6 +42,196 @@ This feature provides greater flexibility in how users generate markdown, enabli
|
||||
- Capture more detailed content from the original HTML when needed
|
||||
- Use schema-optimized HTML when working with structured data
|
||||
- Choose the approach that best suits their specific use case
|
||||
## [2025-04-17] Implemented High Volume Stress Testing Solution for SDK
|
||||
|
||||
**Feature:** Comprehensive stress testing framework using `arun_many` and the dispatcher system to evaluate performance, concurrency handling, and identify potential issues under high-volume crawling scenarios.
|
||||
|
||||
**Changes Made:**
|
||||
1. Created a dedicated stress testing framework in the `benchmarking/` (or similar) directory.
|
||||
2. Implemented local test site generation (`SiteGenerator`) with configurable heavy HTML pages.
|
||||
3. Added basic memory usage tracking (`SimpleMemoryTracker`) using platform-specific commands (avoiding `psutil` dependency for this specific test).
|
||||
4. Utilized `CrawlerMonitor` from `crawl4ai` for rich terminal UI and real-time monitoring of test progress and dispatcher activity.
|
||||
5. Implemented detailed result summary saving (JSON) and memory sample logging (CSV).
|
||||
6. Developed `run_benchmark.py` to orchestrate tests with predefined configurations.
|
||||
7. Created `run_all.sh` as a simple wrapper for `run_benchmark.py`.
|
||||
|
||||
**Implementation Details:**
|
||||
- Generates a local test site with configurable pages containing heavy text and image content.
|
||||
- Uses Python's built-in `http.server` for local serving, minimizing network variance.
|
||||
- Leverages `crawl4ai`'s `arun_many` method for processing URLs.
|
||||
- Utilizes `MemoryAdaptiveDispatcher` to manage concurrency via the `max_sessions` parameter (note: memory adaptation features require `psutil`, not used by `SimpleMemoryTracker`).
|
||||
- Tracks memory usage via `SimpleMemoryTracker`, recording samples throughout test execution to a CSV file.
|
||||
- Uses `CrawlerMonitor` (which uses the `rich` library) for clear terminal visualization and progress reporting directly from the dispatcher.
|
||||
- Stores detailed final metrics in a JSON summary file.
|
||||
|
||||
**Files Created/Updated:**
|
||||
- `stress_test_sdk.py`: Main stress testing implementation using `arun_many`.
|
||||
- `benchmark_report.py`: (Assumed) Report generator for comparing test results.
|
||||
- `run_benchmark.py`: Test runner script with predefined configurations.
|
||||
- `run_all.sh`: Simple bash script wrapper for `run_benchmark.py`.
|
||||
- `USAGE.md`: Comprehensive documentation on usage and interpretation (updated).
|
||||
|
||||
**Testing Approach:**
|
||||
- Creates a controlled, reproducible test environment with a local HTTP server.
|
||||
- Processes URLs using `arun_many`, allowing the dispatcher to manage concurrency up to `max_sessions`.
|
||||
- Optionally logs per-batch summaries (when not in streaming mode) after processing chunks.
|
||||
- Supports different test sizes via `run_benchmark.py` configurations.
|
||||
- Records memory samples via platform commands for basic trend analysis.
|
||||
- Includes cleanup functionality for the test environment.
|
||||
|
||||
**Challenges:**
|
||||
- Ensuring proper cleanup of HTTP server processes.
|
||||
- Getting reliable memory tracking across platforms without adding heavy dependencies (`psutil`) to this specific test script.
|
||||
- Designing `run_benchmark.py` to correctly pass arguments to `stress_test_sdk.py`.
|
||||
|
||||
**Why This Feature:**
|
||||
The high volume stress testing solution addresses critical needs for ensuring Crawl4AI's `arun_many` reliability:
|
||||
1. Provides a reproducible way to evaluate performance under concurrent load.
|
||||
2. Allows testing the dispatcher's concurrency control (`max_session_permit`) and queue management.
|
||||
3. Enables performance tuning by observing throughput (`URLs/sec`) under different `max_sessions` settings.
|
||||
4. Creates a controlled environment for testing `arun_many` behavior.
|
||||
5. Supports continuous integration by providing deterministic test conditions for `arun_many`.
|
||||
|
||||
**Design Decisions:**
|
||||
- Chose local site generation for reproducibility and isolation from network issues.
|
||||
- Utilized the built-in `CrawlerMonitor` for real-time feedback, leveraging its `rich` integration.
|
||||
- Implemented optional per-batch logging in `stress_test_sdk.py` (when not streaming) to provide chunk-level summaries alongside the continuous monitor.
|
||||
- Adopted `arun_many` with a `MemoryAdaptiveDispatcher` as the core mechanism for parallel execution, reflecting the intended SDK usage.
|
||||
- Created `run_benchmark.py` to simplify running standard test configurations.
|
||||
- Used `SimpleMemoryTracker` to provide basic memory insights without requiring `psutil` for this particular test runner.
|
||||
|
||||
**Future Enhancements to Consider:**
|
||||
- Create a separate test variant that *does* use `psutil` to specifically stress the memory-adaptive features of the dispatcher.
|
||||
- Add support for generated JavaScript content.
|
||||
- Add support for Docker-based testing with explicit memory limits.
|
||||
- Enhance `benchmark_report.py` to provide more sophisticated analysis of performance and memory trends from the generated JSON/CSV files.
|
||||
|
||||
---
|
||||
|
||||
## [2025-04-17] Refined Stress Testing System Parameters and Execution
|
||||
|
||||
**Changes Made:**
|
||||
1. Corrected `run_benchmark.py` and `stress_test_sdk.py` to use `--max-sessions` instead of the incorrect `--workers` parameter, accurately reflecting dispatcher configuration.
|
||||
2. Updated `run_benchmark.py` argument handling to correctly pass all relevant custom parameters (including `--stream`, `--monitor-mode`, etc.) to `stress_test_sdk.py`.
|
||||
3. (Assuming changes in `benchmark_report.py`) Applied dark theme to benchmark reports for better readability.
|
||||
4. (Assuming changes in `benchmark_report.py`) Improved visualization code to eliminate matplotlib warnings.
|
||||
5. Updated `run_benchmark.py` to provide clickable `file://` links to generated reports in the terminal output.
|
||||
6. Updated `USAGE.md` with comprehensive parameter descriptions reflecting the final script arguments.
|
||||
7. Updated `run_all.sh` wrapper to correctly invoke `run_benchmark.py` with flexible arguments.
|
||||
|
||||
**Details of Changes:**
|
||||
|
||||
1. **Parameter Correction (`--max-sessions`)**:
|
||||
* Identified the fundamental misunderstanding where `--workers` was used incorrectly.
|
||||
* Refactored `stress_test_sdk.py` to accept `--max-sessions` and configure the `MemoryAdaptiveDispatcher`'s `max_session_permit` accordingly.
|
||||
* Updated `run_benchmark.py` argument parsing and command construction to use `--max-sessions`.
|
||||
* Updated `TEST_CONFIGS` in `run_benchmark.py` to use `max_sessions`.
|
||||
|
||||
2. **Argument Handling (`run_benchmark.py`)**:
|
||||
* Improved logic to collect all command-line arguments provided to `run_benchmark.py`.
|
||||
* Ensured all relevant arguments (like `--stream`, `--monitor-mode`, `--port`, `--use-rate-limiter`, etc.) are correctly forwarded when calling `stress_test_sdk.py` as a subprocess.
|
||||
|
||||
3. **Dark Theme & Visualization Fixes (Assumed in `benchmark_report.py`)**:
|
||||
* (Describes changes assumed to be made in the separate reporting script).
|
||||
|
||||
4. **Clickable Links (`run_benchmark.py`)**:
|
||||
* Added logic to find the latest HTML report and PNG chart in the `benchmark_reports` directory after `benchmark_report.py` runs.
|
||||
* Used `pathlib` to generate correct `file://` URLs for terminal output.
|
||||
|
||||
5. **Documentation Improvements (`USAGE.md`)**:
|
||||
* Rewrote sections to explain `arun_many`, dispatchers, and `--max-sessions`.
|
||||
* Updated parameter tables for all scripts (`stress_test_sdk.py`, `run_benchmark.py`).
|
||||
* Clarified the difference between batch and streaming modes and their effect on logging.
|
||||
* Updated examples to use correct arguments.
|
||||
|
||||
**Files Modified:**
|
||||
- `stress_test_sdk.py`: Changed `--workers` to `--max-sessions`, added new arguments, used `arun_many`.
|
||||
- `run_benchmark.py`: Changed argument handling, updated configs, calls `stress_test_sdk.py`.
|
||||
- `run_all.sh`: Updated to call `run_benchmark.py` correctly.
|
||||
- `USAGE.md`: Updated documentation extensively.
|
||||
- `benchmark_report.py`: (Assumed modifications for dark theme and viz fixes).
|
||||
|
||||
**Testing:**
|
||||
- Verified that `--max-sessions` correctly limits concurrency via the `CrawlerMonitor` output.
|
||||
- Confirmed that custom arguments passed to `run_benchmark.py` are forwarded to `stress_test_sdk.py`.
|
||||
- Validated clickable links work in supporting terminals.
|
||||
- Ensured documentation matches the final script parameters and behavior.
|
||||
|
||||
**Why These Changes:**
|
||||
These refinements correct the fundamental approach of the stress test to align with `crawl4ai`'s actual architecture and intended usage:
|
||||
1. Ensures the test evaluates the correct components (`arun_many`, `MemoryAdaptiveDispatcher`).
|
||||
2. Makes test configurations more accurate and flexible.
|
||||
3. Improves the usability of the testing framework through better argument handling and documentation.
|
||||
|
||||
|
||||
**Future Enhancements to Consider:**
|
||||
- Add support for generated JavaScript content to test JS rendering performance
|
||||
- Implement more sophisticated memory analysis like generational garbage collection tracking
|
||||
- Add support for Docker-based testing with memory limits to force OOM conditions
|
||||
- Create visualization tools for analyzing memory usage patterns across test runs
|
||||
- Add benchmark comparisons between different crawler versions or configurations
|
||||
|
||||
## [2025-04-17] Fixed Issues in Stress Testing System
|
||||
|
||||
**Changes Made:**
|
||||
1. Fixed custom parameter handling in run_benchmark.py
|
||||
2. Applied dark theme to benchmark reports for better readability
|
||||
3. Improved visualization code to eliminate matplotlib warnings
|
||||
4. Added clickable links to generated reports in terminal output
|
||||
5. Enhanced documentation with comprehensive parameter descriptions
|
||||
|
||||
**Details of Changes:**
|
||||
|
||||
1. **Custom Parameter Handling Fix**
|
||||
- Identified bug where custom URL count was being ignored in run_benchmark.py
|
||||
- Rewrote argument handling to use a custom args dictionary
|
||||
- Properly passed parameters to the test_simple_stress.py command
|
||||
- Added better UI indication of custom parameters in use
|
||||
|
||||
2. **Dark Theme Implementation**
|
||||
- Added complete dark theme to HTML benchmark reports
|
||||
- Applied dark styling to all visualization components
|
||||
- Used Nord-inspired color palette for charts and graphs
|
||||
- Improved contrast and readability for data visualization
|
||||
- Updated text colors and backgrounds for better eye comfort
|
||||
|
||||
3. **Matplotlib Warning Fixes**
|
||||
- Resolved warnings related to improper use of set_xticklabels()
|
||||
- Implemented correct x-axis positioning for bar charts
|
||||
- Ensured proper alignment of bar labels and data points
|
||||
- Updated plotting code to use modern matplotlib practices
|
||||
|
||||
4. **Documentation Improvements**
|
||||
- Created comprehensive USAGE.md with detailed instructions
|
||||
- Added parameter documentation for all scripts
|
||||
- Included examples for all common use cases
|
||||
- Provided detailed explanations for interpreting results
|
||||
- Added troubleshooting guide for common issues
|
||||
|
||||
**Files Modified:**
|
||||
- `tests/memory/run_benchmark.py`: Fixed custom parameter handling
|
||||
- `tests/memory/benchmark_report.py`: Added dark theme and fixed visualization warnings
|
||||
- `tests/memory/run_all.sh`: Added clickable links to reports
|
||||
- `tests/memory/USAGE.md`: Created comprehensive documentation
|
||||
|
||||
**Testing:**
|
||||
- Verified that custom URL counts are now correctly used
|
||||
- Confirmed dark theme is properly applied to all report elements
|
||||
- Checked that matplotlib warnings are no longer appearing
|
||||
- Validated clickable links to reports work in terminals that support them
|
||||
|
||||
**Why These Changes:**
|
||||
These improvements address several usability issues with the stress testing system:
|
||||
1. Better parameter handling ensures test configurations work as expected
|
||||
2. Dark theme reduces eye strain during extended test review sessions
|
||||
3. Fixing visualization warnings improves code quality and output clarity
|
||||
4. Enhanced documentation makes the system more accessible for future use
|
||||
|
||||
**Future Enhancements:**
|
||||
- Add additional visualization options for different types of analysis
|
||||
- Implement theme toggle to support both light and dark preferences
|
||||
- Add export options for embedding reports in other documentation
|
||||
- Create dedicated CI/CD integration templates for automated testing
|
||||
|
||||
## [2025-04-09] Added MHTML Capture Feature
|
||||
|
||||
|
||||
Reference in New Issue
Block a user