Resolves merge conflict in README.md by removing outdated version 0.4.24x information and keeping current version 0.4.3bx details. Updates release notes description to reflect current features including Memory Dispatcher System, Streaming Support, and other improvements.
No breaking changes.
Update version numbers to v0.4.3bx throughout README.md
Fix contributing guidelines link to point to CONTRIBUTORS.md
Update Aravind's role in CONTRIBUTORS.md to Head of Community and Product
Add pre-release installation instructions
Fix minor formatting in personal story section
No breaking changes
Renames the final_url field to redirected_url across all components to maintain
consistent terminology throughout the codebase. This change affects:
- AsyncCrawlResponse model
- AsyncPlaywrightCrawlerStrategy
- Documentation and examples
No functional changes, purely naming consistency improvement.
Implements dynamic proxy rotation functionality with authentication support and IP verification. Updates include:
- Added proxy rotation demo in features example
- Updated proxy configuration handling in BrowserManager
- Added proxy rotation documentation
- Updated README with new proxy rotation feature
- Bumped version to 0.4.3b2
This change enables users to dynamically switch between proxies and verify IP addresses for each request.
Update README.md to announce version 0.4.3b1 release with new features including:
- Memory Dispatcher System
- Streaming Support
- LLM-Powered Markdown Generation
- Schema Generation
- Robots.txt Compliance
Add detailed version numbering explanation section to help users understand pre-release versions.
Thank you for pointing this out. I am creating a contributing guide, which is why I changed the name to the contributors, but I forgot to update some other places. Thanks again.
Update all documentation URLs from crawl4ai.com/mkdocs to docs.crawl4ai.com across README, examples, and documentation files. This change reflects the new documentation hosting domain.
Also add todo/ directory to .gitignore.
Update all documentation URLs from crawl4ai.com/mkdocs to docs.crawl4ai.com
Improve badges styling and layout in documentation
Increase code font size in documentation CSS
BREAKING CHANGE: Documentation URLs have changed from crawl4ai.com/mkdocs to docs.crawl4ai.com
Revise the README's personal story section to better reflect the project's
origins, motivation, and vision for open-source data accessibility. Add more
detail about the creator's background and the project's mission to
democratize AI through open data access.
Also includes a minor TODO comment addition in async crawler strategy.
Reorganize documentation into core/advanced/extraction sections for better navigation.
Update terminal theme styles and add rich library for better CLI output.
Remove redundant tutorial files and consolidate content into core sections.
Add personal story to index page for project context.
BREAKING CHANGE: Documentation structure has been significantly reorganized
Enhance Async Crawler with storage state handling
- Updated Async Crawler to support storage state management.
- Added error handling for URL validation in Async Web Crawler.
- Modified README logo and improved .gitignore entries.
- Fixed issues in multiple files for better code robustness.
### New Features:
- **Text-Only Mode**: Added support for text-only crawling by disabling images, JavaScript, GPU, and other non-essential features.
- **Light Mode**: Optimized browser settings to reduce resource usage and improve efficiency during crawling.
- **Dynamic Viewport Adjustment**: Automatically adjusts viewport dimensions based on content size, ensuring accurate rendering and scaling.
- **Full Page Scanning**: Introduced a feature to scroll and capture dynamic content for pages with infinite scroll or lazy-loading elements.
- **Session Management**: Added `create_session` method for creating and managing browser sessions with unique IDs.
### Improvements:
- Unified viewport handling across contexts by dynamically setting dimensions using `self.viewport_width` and `self.viewport_height`.
- Enhanced logging and error handling for viewport adjustments, page scanning, and content evaluation.
- Reduced resource usage with additional browser flags for both `light_mode` and `text_only` configurations.
- Improved handling of cookies, headers, and proxies in session creation.
### Refactoring:
- Removed hardcoded viewport dimensions and replaced them with dynamic configurations.
- Cleaned up unused and commented-out code for better readability and maintainability.
- Introduced defaults for frequently used parameters like `delay_before_return_html`.
### Fixes:
- Resolved potential inconsistencies in viewport handling.
- Improved robustness of content loading and dynamic adjustments to avoid failures and timeouts.
### Docs Update:
- Updated schema usage in `quickstart_async.py` example:
- Changed `OpenAIModelFee.schema()` to `OpenAIModelFee.model_json_schema()` for compatibility.
- Enhanced LLM extraction instruction documentation.
This commit introduces significant enhancements to improve efficiency, flexibility, and reliability of the crawler strategy.
- Introduced the PruningContentFilter for better content relevance.
- Implemented comprehensive unit tests for verification of functionality.
- Enhanced existing BM25ContentFilter tests for edge case coverage.
- Updated documentation to include usage examples for new filter.
- Added new Docker commands for platform-specific builds.
- Updated README with comprehensive installation and setup instructions.
- Introduced `post_install` method in setup script for automation.
- Refined migration processes with enhanced error logging.
- Bump version to 0.3.746 and updated dependencies.
- Added a post-installation setup script for initialization.
- Updated README with installation notes for Playwright setup.
- Enhanced migration logging for better error visibility.
- Added 'pydantic' to requirements.
- Bumped version to 0.3.746.