crawl4ai

Author	SHA1	Message	Date
Aravind Karnam	9ef43bc5f0	Refactor: Move adeep_crawl as method of crawler itself. Create attributes in CrawlResult to reconstruct the tree once deep crawling is completed	2025-01-29 15:58:21 +05:30
Aravind Karnam	84ffdaab9a	Refactor: Move adeep_crawl as method of crawler itself. Create attributes in CrawlResult to reconstruct the tree once deep crawling is completed	2025-01-29 13:06:09 +05:30
Aravind Karnam	78223bc847	feat: create ScraperPageResult model to attach score and depth attributes to yielded/returned crawl results	2025-01-28 16:47:30 +05:30
Aravind Karnam	85847ff13f	feat: 1. Make active_crawls into a dict instead of set and remove jobs array. Effective lookup and storage of active crawls and crawl control. 2. Put a lock on active_crawls, so similtanious push and pop by coroutines doesn't cause a race condition 3. Move the depth check logic outside the child link for loop, as source_url doesn't change in the loop.	2025-01-28 12:39:45 +05:30
Aravind Karnam	f34b4878cf	fix: code formatting	2025-01-28 10:00:01 +05:30
Aravind Karnam	0ff95c83bc	feat: change input params to scraper, Add asynchronous context manager to AsyncWebScraper, Optimise filter application	2025-01-27 18:13:33 +05:30
UncleCode	e6ef8d91ba	refactor(scraper): optimize URL validation and filter performance - Replace validators library with built-in urlparse for URL validation - Optimize filter statistics update logic for better performance - Add performance benchmarking suite for filters - Add execution time tracking to scraper examples - Update gitignore with windsurfrules BREAKING CHANGE: Removed dependency on validators library for URL validation	2025-01-22 19:45:56 +08:00
Aravind Karnam	6e78c56dda	Refactor: Removed all scheduling logic from scraper. From now scraper expects arun_many to handle all scheduling. Scraper will only do traversal, validations, compliance checks, URL filtering and scoring etc. Reformatted some of the scraper files with Black code formatter	2025-01-21 18:44:43 +05:30
Aravind Karnam	67fa06c09b	Refactor: Removed all scheduling logic from scraper. From now scraper expects arun_many to handle all scheduling. Scraper will only do traversal, validations, compliance checks, URL filtering and scoring etc. Reformatted some of the scraper files with Black code formatter	2025-01-21 17:49:51 +05:30
Aravind Karnam	7a5f83b76f	fix: Added browser config and crawler run config from 0.4.22	2024-12-18 10:33:09 +05:30
Aravind Karnam	ff731e4ea1	fixed the final scraper_quickstart.py example	2024-11-26 17:08:32 +05:30
Aravind Karnam	9530ded83a	fixed the final scraper_quickstart.py example	2024-11-26 17:05:54 +05:30
Aravind Karnam	f8e85b1499	Fixed a bug in _process_links, handled condition for when url_scorer is passed as None, renamed the scrapper folder to scraper.	2024-11-23 13:52:34 +05:30

13 Commits