Adds new target_elements parameter to CrawlerRunConfig that allows more flexible content selection than css_selector. This enables focusing markdown generation and data extraction on specific elements while still processing the entire page for links and media. Key changes: - Added target_elements list parameter to CrawlerRunConfig - Modified WebScrapingStrategy and LXMLWebScrapingStrategy to handle target_elements - Updated documentation with examples and comparison between css_selector and target_elements - Fixed table extraction in content_scraping_strategy.py BREAKING CHANGE: Table extraction logic has been modified to better handle thead/tbody structures
4.6 KiB
4.6 KiB