feat(scraper): add optimized URL scoring system

Implements a new high-performance URL scoring system with multiple scoring strategies:
- FastKeywordRelevanceScorer for keyword matching
- FastPathDepthScorer for URL depth analysis
- FastContentTypeScorer for file type scoring
- FastFreshnessScorer for date-based scoring
- FastDomainAuthorityScorer for domain reputation
- FastCompositeScorer for combining multiple scorers

Key improvements:
- Memory optimization using __slots__
- LRU caching for expensive operations
- Optimized string operations
- Pre-computed scoring tables
- Fast path optimizations for common cases
- Reduced object allocation

Includes comprehensive benchmarking and testing utilities.
This commit is contained in:
UncleCode
2025-01-23 20:46:33 +08:00
parent e6ef8d91ba
commit cf3e1e748d
2 changed files with 1208 additions and 1 deletions

View File

@@ -761,7 +761,6 @@ def run_performance_test():
print(f"Original Domain Filter: {sys.getsizeof(domain_filter):,} bytes")
print(f"Optimized Domain Filter: {sys.getsizeof(fast_domain_filter):,} bytes")
def test_pattern_filter():
import time
from itertools import chain

File diff suppressed because it is too large Load Diff