Implement a comprehensive health monitoring system for crawler configurations
that automatically detects failures and applies resolution strategies.
Features
- **Continuous Health Monitoring**: Periodic health checks for multiple crawler
configurations with configurable check intervals
- **Automatic Failure Detection**: Detects failures based on HTTP status codes,
empty HTML responses, and logger errors
- **Resolution Strategies**: Built-in and custom resolution strategies that
automatically attempt to fix failing configurations
- **Resolution Chains**: Support for sequential resolution strategies that
validate each step before proceeding
- **Metrics Collection**: Comprehensive metrics tracking including success rates,
response times, resolution attempts, and uptime statistics
- **Graceful Shutdown**: Robust cleanup mechanism that waits for active health
checks to complete before shutting down
- **Error Tracking**: Integrated logger error tracking to detect non-critical
errors that don't fail HTTP requests but indicate issues
Implementation Details
- New module `crawl4ai/config_health_monitor.py` containing:
- `ConfigHealthMonitor`: Main monitoring class
- `ConfigHealthState`: Health state tracking dataclass
- `ResolutionResult`: Resolution strategy result dataclass
- `ResolutionStrategy`: Type alias for resolution callables
- `_ErrorTrackingLogger`: Proxy logger for error event tracking
- Key capabilities:
- Register/unregister configurations for monitoring
- Manual and automatic health checks
- Config-specific or global resolution strategies
- Thread-safe state management with asyncio locks
- Per-config and global metrics reporting
- Context manager support for automatic cleanup
Testing
- Comprehensive test suite in `tests/general/test_config_health_monitor.py`:
- Basic functionality tests (initialization, registration)
- Lifecycle management tests (start/stop, context manager)
- Health checking tests (success/failure scenarios)
- Resolution strategy tests
- Metrics and status query tests
- Property validation tests
Examples
- Example usage in `docs/examples/config_health_monitor_example.py`:
- Demonstrates monitor initialization and configuration
- Shows custom resolution strategies (incremental backoff, magic mode toggle)
- Implements resolution chains with validation
- Displays metrics reporting and status monitoring
- Includes context manager usage pattern
Technical Notes
- Uses `copy.deepcopy()` for safe configuration mutation
- Implements `_ErrorTrackingLogger` to capture logger errors during health checks
- Tracks active health check tasks for graceful shutdown
- Uses `CacheMode.BYPASS` for health check configs to ensure fresh data
- Minimum check interval enforced at 10 seconds
This feature enables production-grade monitoring of crawler configurations,
automatically detecting and resolving issues before they impact crawling
operations.