crawl4ai/scraper_equivalence_results.json at f3ae5a657c6271063c32c826a7cd724f881c61db - crawl4ai - Gitea: Git with a cup of tea

ayrisdev/crawl4ai

Files

UncleCode f3ae5a657c feat(scraping): add LXML-based scraping mode for improved performance

Adds a new ScrapingMode enum to allow switching between BeautifulSoup and LXML parsing.
LXML mode offers 10-20x better performance for large HTML documents.

Key changes:
- Added ScrapingMode enum with BEAUTIFULSOUP and LXML options
- Implemented LXMLWebScrapingStrategy class
- Added LXML-based metadata extraction
- Updated documentation with scraping mode usage and performance considerations
- Added cssselect dependency

BREAKING CHANGE: None

2025-01-12 20:46:23 +08:00

16 lines

282 B

JSON

Raw Blame History

 {
   "tests": [
     {
       "case": "complicated_exclude_all_links",
       "lxml_mode": {
         "differences": {},
         "execution_time": 0.0019578933715820312
       },
       "original_time": 0.0059909820556640625
     }
   ],
   "summary": {
     "passed": 1,
     "failed": 0
   }
 }