This website requires JavaScript.
Explore
Help
Register
Sign In
ayrisdev
/
crawl4ai
Watch
1
Star
0
Fork
0
You've already forked crawl4ai
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
Files
be472c624c625b5f240705112036fd5ef6f1eb8f
crawl4ai
/
crawl4ai
/
scraper
History
UncleCode
be472c624c
Refactored AsyncWebScraper to include comprehensive error handling and progress tracking capabilities. Introduced a ScrapingProgress data class to monitor processed and failed URLs. Enhanced scraping methods to log errors and track stats throughout the scraping process.
2024-11-06 21:09:47 +08:00
..
filters
Created scaffolding for Scraper as per the plan. Implemented the ascrape method in bfs_scraper_strategy
2024-09-09 13:13:34 +05:30
scorers
Created scaffolding for Scraper as per the plan. Implemented the ascrape method in bfs_scraper_strategy
2024-09-09 13:13:34 +05:30
__init__.py
Parallel processing with retry on failure with exponential backoff - Simplified URL validation and normalisation - respecting Robots.txt
2024-09-19 12:34:12 +05:30
async_web_scraper.py
Refactored AsyncWebScraper to include comprehensive error handling and progress tracking capabilities. Introduced a ScrapingProgress data class to monitor processed and failed URLs. Enhanced scraping methods to log errors and track stats throughout the scraping process.
2024-11-06 21:09:47 +08:00
bfs_scraper_strategy.py
Removed stubs for remove_from_future_crawls since the visited set is updated soon as the URL was queued, Removed add_to_retry_queue(url) since retry with exponential backoff with help of tenacity is going to take care of it.
2024-10-17 15:42:43 +05:30
models.py
Parallel processing with retry on failure with exponential backoff - Simplified URL validation and normalisation - respecting Robots.txt
2024-09-19 12:34:12 +05:30
scraper_strategy.py
1. Moved to asyncio.wait instead of gather so that results can be yeilded just as they are ready, rather than in batches
2024-10-17 12:25:17 +05:30