feat(deep-crawling): improve URL normalization and domain filtering

Enhance URL handling in deep crawling with:
- New URL normalization functions for consistent URL formats
- Improved domain filtering with subdomain support
- Added URLPatternFilter to public API
- Better URL deduplication in BFS strategy

These changes improve crawling accuracy and reduce duplicate visits.
This commit is contained in:
UncleCode
2025-03-06 22:45:57 +08:00
parent 1b72880007
commit f78c46446b
6 changed files with 186 additions and 14 deletions

View File

@@ -1,2 +1,2 @@
# crawl4ai/_version.py
__version__ = "0.5.0.post3"
__version__ = "0.5.0.post4"