refactor(deep-crawl): add max_pages limit and improve crawl control

Add max_pages parameter to all deep crawling strategies to limit total pages crawled.
Add score_threshold parameter to BFS/DFS strategies for quality control.
Remove legacy parameter handling in AsyncWebCrawler.
Improve error handling and logging in crawl strategies.

BREAKING CHANGE: Removed support for legacy parameters in AsyncWebCrawler.run_many()
This commit is contained in:
UncleCode
2025-03-03 21:51:11 +08:00
parent c612f9a852
commit d024749633
7 changed files with 372 additions and 91 deletions

View File

@@ -5,6 +5,27 @@ All notable changes to Crawl4AI will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## Version 0.5.0 (2025-03-02)
### Added
- *(profiles)* Add BrowserProfiler class for dedicated browser profile management
- *(cli)* Add interactive profile management to CLI with rich UI
- *(profiles)* Add ability to crawl directly from profile management interface
- *(browser)* Support identity-based browsing with persistent profiles
### Changed
- *(browser)* Refactor profile management from ManagedBrowser to BrowserProfiler class
- *(cli)* Enhance CLI with profile selection and status display for crawling
- *(examples)* Update identity-based browsing example to use BrowserProfiler class
- *(docs)* Update identity-based crawling documentation
### Fixed
- *(browser)* Fix profile detection and management on different platforms
- *(cli)* Fix CLI command structure for better user experience
## Version 0.5.0 (2025-02-21)