feat(scraper): add optimized URL scoring system
Implements a new high-performance URL scoring system with multiple scoring strategies: - FastKeywordRelevanceScorer for keyword matching - FastPathDepthScorer for URL depth analysis - FastContentTypeScorer for file type scoring - FastFreshnessScorer for date-based scoring - FastDomainAuthorityScorer for domain reputation - FastCompositeScorer for combining multiple scorers Key improvements: - Memory optimization using __slots__ - LRU caching for expensive operations - Optimized string operations - Pre-computed scoring tables - Fast path optimizations for common cases - Reduced object allocation Includes comprehensive benchmarking and testing utilities.
This commit is contained in:
@@ -761,7 +761,6 @@ def run_performance_test():
|
||||
print(f"Original Domain Filter: {sys.getsizeof(domain_filter):,} bytes")
|
||||
print(f"Optimized Domain Filter: {sys.getsizeof(fast_domain_filter):,} bytes")
|
||||
|
||||
|
||||
def test_pattern_filter():
|
||||
import time
|
||||
from itertools import chain
|
||||
|
||||
1208
crawl4ai/scraper/scorers_review.py
Normal file
1208
crawl4ai/scraper/scorers_review.py
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user