crawl4ai/.gitignore at d09c611d152717bc0801b65cc45efeebff2e4399

Files

UncleCode d09c611d15 feat(robots): add robots.txt compliance support

Add support for checking and respecting robots.txt rules before crawling websites:
- Implement RobotsParser class with SQLite caching
- Add check_robots_txt parameter to CrawlerRunConfig
- Integrate robots.txt checking in AsyncWebCrawler
- Update documentation with robots.txt compliance examples
- Add tests for robot parser functionality

The cache uses WAL mode for better concurrency and has a default TTL of 7 days.

2025-01-21 17:54:13 +08:00

3.8 KiB

Raw Blame History

View Raw

3.8 KiB Raw Blame History

3.8 KiB

Raw Blame History