crawl4ai

Author	SHA1	Message	Date
UncleCode	bae4665949	feat(scraper): Enhance URL filtering and scoring systems Implement comprehensive URL filtering and scoring capabilities: Filters: - Add URLPatternFilter with glob/regex support - Implement ContentTypeFilter with MIME type checking - Add DomainFilter for domain control - Create FilterChain with stats tracking Scorers: - Complete KeywordRelevanceScorer implementation - Add PathDepthScorer for URL structure scoring - Implement ContentTypeScorer for file type priorities - Add FreshnessScorer for date-based scoring - Add DomainAuthorityScorer for domain weighting - Create CompositeScorer for combined strategies Features: - Add statistics tracking for both filters and scorers - Implement logging support throughout - Add resource cleanup methods - Create comprehensive documentation - Include performance optimizations Tests and docs included. Note: Review URL normalization overlap with recent crawler changes. - Quick Start is created and added	2024-11-08 18:45:12 +08:00
unclecode	4750810a67	Enhance AsyncWebCrawler with smart waiting and screenshot capabilities - Implement smart_wait function in AsyncPlaywrightCrawlerStrategy - Add screenshot support to AsyncCrawlResponse and AsyncWebCrawler - Improve error handling and timeout management in crawling process - Fix typo in CrawlResult model (responser_headers -> response_headers) - Update .gitignore to exclude additional files - Adjust import path in test_basic_crawling.py	2024-10-02 17:34:56 +08:00
unclecode	8b6e88c85c	Update .gitignore to ignore temporary and test directories	2024-09-26 15:09:49 +08:00
unclecode	c37614cbc8	Add Async Version, JsonCss Extrator	2024-09-03 01:27:00 +08:00
unclecode	f6e59157bf	- Test all methods - Update index.hml - Update Readme - Resolve some bugs	2024-05-14 21:27:41 +08:00

5 Commits