feat: Add virtual scroll support for modern web scraping
Add comprehensive virtual scroll handling to capture all content from pages that use DOM recycling techniques (Twitter, Instagram, etc). Key features: - New VirtualScrollConfig class for configuring virtual scroll behavior - Automatic detection of three scrolling scenarios: no change, content appended, content replaced - Intelligent HTML chunk capture and merging with deduplication - 100% content capture from virtual scroll pages - Seamless integration with existing extraction strategies - JavaScript-based detection and capture for performance - Tree-based DOM merging with text-based deduplication Documentation: - Comprehensive guide at docs/md_v2/advanced/virtual-scroll.md - API reference updates in parameters.md and page-interaction.md - Blog article explaining the solution and techniques - Complete examples with local test server Testing: - Full test suite achieving 100% capture of 1000 items - Examples for Twitter timeline, Instagram grid scenarios - Local test server with different scrolling behaviors This enables scraping of modern websites that were previously impossible to fully capture with traditional scrolling techniques.
This commit is contained in:
@@ -37,6 +37,7 @@ nav:
|
||||
- "Link & Media": "core/link-media.md"
|
||||
- Advanced:
|
||||
- "Overview": "advanced/advanced-features.md"
|
||||
- "Virtual Scroll": "advanced/virtual-scroll.md"
|
||||
- "File Downloading": "advanced/file-downloading.md"
|
||||
- "Lazy Loading": "advanced/lazy-loading.md"
|
||||
- "Hooks & Auth": "advanced/hooks-auth.md"
|
||||
|
||||
Reference in New Issue
Block a user