feat(scraping): add LXML-based scraping mode for improved performance
Adds a new ScrapingMode enum to allow switching between BeautifulSoup and LXML parsing. LXML mode offers 10-20x better performance for large HTML documents. Key changes: - Added ScrapingMode enum with BEAUTIFULSOUP and LXML options - Implemented LXMLWebScrapingStrategy class - Added LXML-based metadata extraction - Updated documentation with scraping mode usage and performance considerations - Added cssselect dependency BREAKING CHANGE: None
This commit is contained in:
16
scraper_equivalence_results.json
Normal file
16
scraper_equivalence_results.json
Normal file
@@ -0,0 +1,16 @@
|
||||
{
|
||||
"tests": [
|
||||
{
|
||||
"case": "complicated_exclude_all_links",
|
||||
"lxml_mode": {
|
||||
"differences": {},
|
||||
"execution_time": 0.0019578933715820312
|
||||
},
|
||||
"original_time": 0.0059909820556640625
|
||||
}
|
||||
],
|
||||
"summary": {
|
||||
"passed": 1,
|
||||
"failed": 0
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user