UncleCode
e6ef8d91ba
refactor(scraper): optimize URL validation and filter performance
...
- Replace validators library with built-in urlparse for URL validation
- Optimize filter statistics update logic for better performance
- Add performance benchmarking suite for filters
- Add execution time tracking to scraper examples
- Update gitignore with windsurfrules
BREAKING CHANGE: Removed dependency on validators library for URL validation
2025-01-22 19:45:56 +08:00
Aravind Karnam
6e78c56dda
Refactor: Removed all scheduling logic from scraper. From now scraper expects arun_many to handle all scheduling. Scraper will only do traversal, validations, compliance checks, URL filtering and scoring etc. Reformatted some of the scraper files with Black code formatter
2025-01-21 18:44:43 +05:30
Aravind Karnam
67fa06c09b
Refactor: Removed all scheduling logic from scraper. From now scraper expects arun_many to handle all scheduling. Scraper will only do traversal, validations, compliance checks, URL filtering and scoring etc. Reformatted some of the scraper files with Black code formatter
2025-01-21 17:49:51 +05:30
Aravind Karnam
7a5f83b76f
fix: Added browser config and crawler run config from 0.4.22
2024-12-18 10:33:09 +05:30
Aravind Karnam
ff731e4ea1
fixed the final scraper_quickstart.py example
2024-11-26 17:08:32 +05:30
Aravind Karnam
9530ded83a
fixed the final scraper_quickstart.py example
2024-11-26 17:05:54 +05:30
Aravind Karnam
f8e85b1499
Fixed a bug in _process_links, handled condition for when url_scorer is passed as None, renamed the scrapper folder to scraper.
2024-11-23 13:52:34 +05:30