UncleCode
d11c004fbb
Enhanced BFS Strategy: Improved monitoring, resource management & configuration
- Added CrawlStats for comprehensive crawl monitoring
- Implemented proper resource cleanup with shutdown mechanism
- Enhanced URL processing with better validation and politeness controls
- Added configuration options (max_concurrent, timeout, external_links)
- Improved error handling with retry logic
- Added domain-specific queues for better performance
- Created comprehensive documentation
Note: URL normalization needs review - potential duplicate processing
with core crawler for internal links. Currently commented out pending
further investigation of edge cases.
2024-11-08 15:57:23 +08:00
..
2024-10-14 21:03:28 +08:00
2024-09-25 16:52:11 +08:00
2024-09-24 20:52:08 +08:00
2024-11-08 15:57:23 +08:00
2024-09-24 20:52:08 +08:00
2024-05-17 17:00:43 +08:00
2024-05-16 17:31:44 +08:00