Fix can_process_url() to receive normalized URL in deep crawl strategies
Pass the normalized absolute URL instead of the raw href to can_process_url() in BFS, BFF, and DFS deep crawl strategies. This ensures URL validation and filter chain evaluation operate on consistent, fully-qualified URLs. Fixes #1743
This commit is contained in:
@@ -300,7 +300,7 @@ class DFSDeepCrawlStrategy(BFSDeepCrawlStrategy):
|
||||
if not normalized_url or normalized_url in seen:
|
||||
continue
|
||||
|
||||
if not await self.can_process_url(raw_url, next_depth):
|
||||
if not await self.can_process_url(normalized_url, next_depth):
|
||||
self.stats.urls_skipped += 1
|
||||
continue
|
||||
|
||||
|
||||
Reference in New Issue
Block a user