Add ability to capture web pages as MHTML format, which includes all page resources
in a single file. This enables complete page archival and offline viewing.
- Add capture_mhtml parameter to CrawlerRunConfig
- Implement MHTML capture using CDP in AsyncPlaywrightCrawlerStrategy
- Add mhtml field to CrawlResult and AsyncCrawlResponse models
- Add comprehensive tests for MHTML capture functionality
- Update documentation with MHTML capture details
- Add exclude_all_images option for better memory management
Breaking changes: None
Add comprehensive table detection and extraction functionality to the web scraping system:
- Implement intelligent table detection algorithm with scoring system
- Add table extraction with support for headers, rows, captions
- Update models to include tables in Media class
- Add table_score_threshold configuration option
- Add documentation and examples for table extraction
- Include crypto analysis example demonstrating table usage
This change enables users to extract structured data from HTML tables while intelligently filtering out layout tables.
Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.
BREAKING CHANGE: Documentation structure has been significantly reorganized