feat(scraping): add LXML-based scraping mode for improved performance
Adds a new ScrapingMode enum to allow switching between BeautifulSoup and LXML parsing. LXML mode offers 10-20x better performance for large HTML documents. Key changes: - Added ScrapingMode enum with BEAUTIFULSOUP and LXML options - Implemented LXMLWebScrapingStrategy class - Added LXML-based metadata extraction - Updated documentation with scraping mode usage and performance considerations - Added cssselect dependency BREAKING CHANGE: None
This commit is contained in:
@@ -19,4 +19,5 @@ pydantic>=2.10
|
||||
pyOpenSSL>=24.3.0
|
||||
psutil>=6.1.1
|
||||
nltk>=3.9.1
|
||||
rich>=13.9.4
|
||||
rich>=13.9.4
|
||||
cssselect>=1.2.0
|
||||
Reference in New Issue
Block a user