Logo
Explore Help
Register Sign In
ayrisdev/crawl4ai
1
0
Fork 0
You've already forked crawl4ai
Code Issues Pull Requests Actions Packages Projects Releases Wiki Activity
Files
a677c2b61d1e451e4b6a80e7c7cca993ed1863c6
crawl4ai/crawl4ai/scraper
History
Aravind Karnam 7a5f83b76f fix: Added browser config and crawler run config from 0.4.22
2024-12-18 10:33:09 +05:30
..
__init__.py
Fixed a few bugs, import errors and changed to asyncio wait_for instead of timeout to support python versions < 3.11
2024-11-23 12:39:25 +05:30
async_web_scraper.py
Refactored AsyncWebScraper to include comprehensive error handling and progress tracking capabilities. Introduced a ScrapingProgress data class to monitor processed and failed URLs. Enhanced scraping methods to log errors and track stats throughout the scraping process.
2024-11-06 21:09:47 +08:00
bfs_scraper_strategy.py
fix: Added browser config and crawler run config from 0.4.22
2024-12-18 10:33:09 +05:30
filters.py
feat(scraper): Enhance URL filtering and scoring systems
2024-11-08 19:02:28 +08:00
models.py
Parallel processing with retry on failure with exponential backoff - Simplified URL validation and normalisation - respecting Robots.txt
2024-09-19 12:34:12 +05:30
scorers.py
feat(scraper): Enhance URL filtering and scoring systems
2024-11-08 19:02:28 +08:00
scraper_strategy.py
updated definition of can_process_url to include dept as an argument, as it's needed to skip filters for start_url
2024-11-26 18:26:57 +05:30
Powered by Gitea Version: 1.25.4 Page: 60ms Template: 5ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API