Commit Graph

1 Commits

Author SHA1 Message Date
UncleCode
bae4665949 feat(scraper): Enhance URL filtering and scoring systems
Implement comprehensive URL filtering and scoring capabilities:

Filters:
- Add URLPatternFilter with glob/regex support
- Implement ContentTypeFilter with MIME type checking
- Add DomainFilter for domain control
- Create FilterChain with stats tracking

Scorers:
- Complete KeywordRelevanceScorer implementation
- Add PathDepthScorer for URL structure scoring
- Implement ContentTypeScorer for file type priorities
- Add FreshnessScorer for date-based scoring
- Add DomainAuthorityScorer for domain weighting
- Create CompositeScorer for combined strategies

Features:
- Add statistics tracking for both filters and scorers
- Implement logging support throughout
- Add resource cleanup methods
- Create comprehensive documentation
- Include performance optimizations

Tests and docs included.
Note: Review URL normalization overlap with recent crawler changes.

- Quick Start is created and added
2024-11-08 18:45:12 +08:00