docs(readme): update version and feature announcements for v0.4.3b1
Update README.md to announce version 0.4.3b1 release with new features including: - Memory Dispatcher System - Streaming Support - LLM-Powered Markdown Generation - Schema Generation - Robots.txt Compliance Add detailed version numbering explanation section to help users understand pre-release versions.
This commit is contained in:
70
README.md
70
README.md
@@ -20,9 +20,9 @@
|
||||
|
||||
Crawl4AI is the #1 trending GitHub repository, actively maintained by a vibrant community. It delivers blazing-fast, AI-ready web crawling tailored for LLMs, AI agents, and data pipelines. Open source, flexible, and built for real-time performance, Crawl4AI empowers developers with unmatched speed, precision, and deployment ease.
|
||||
|
||||
[✨ Check out latest update v0.4.24x](#-recent-updates)
|
||||
[✨ Check out latest update v0.4.3b1x](#-recent-updates)
|
||||
|
||||
🎉 **Version 0.4.24x is out!** Major improvements in extraction strategies with enhanced JSON handling, SSL security, and Amazon product extraction. Plus, a completely revamped content filtering system! [Read the release notes →](https://docs.crawl4ai.com/blog)
|
||||
🎉 **Version 0.4.3b1 is out!** This release brings exciting new features like a Memory Dispatcher System, Streaming Support, LLM-Powered Markdown Generation, Schema Generation, and Robots.txt Compliance! [Read the release notes →](https://docs.crawl4ai.com/blog)
|
||||
|
||||
<details>
|
||||
<summary>🤓 <strong>My Personal Story</strong></summary>
|
||||
@@ -481,18 +481,66 @@ async def test_news_crawl():
|
||||
|
||||
</details>
|
||||
|
||||
## ✨ Recent Updates
|
||||
|
||||
## ✨ Recent Updates
|
||||
- **🚀 New Dispatcher System**: Scale to thousands of URLs with intelligent **memory monitoring**, **concurrency control**, and optional **rate limiting**. (See `MemoryAdaptiveDispatcher`, `SemaphoreDispatcher`, `RateLimiter`, `CrawlerMonitor`)
|
||||
- **⚡ Streaming Mode**: Process results **as they arrive** instead of waiting for an entire batch to complete. (Set `stream=True` in `CrawlerRunConfig`)
|
||||
- **🤖 Enhanced LLM Integration**:
|
||||
- **Automatic schema generation**: Create extraction rules from HTML using OpenAI or Ollama, no manual CSS/XPath needed.
|
||||
- **LLM-powered Markdown filtering**: Refine your markdown output with a new `LLMContentFilter` that understands content relevance.
|
||||
- **Ollama Support**: Use open-source or self-hosted models for private or cost-effective extraction.
|
||||
- **🏎️ Faster Scraping Option**: New `LXMLWebScrapingStrategy` offers **10-20x speedup** for large, complex pages (experimental).
|
||||
- **🤖 robots.txt Compliance**: Respect website rules with `check_robots_txt=True` and efficient local caching.
|
||||
- **➡️ URL Redirection Tracking**: The `final_url` field now captures the final destination after any redirects.
|
||||
- **🪞 Improved Mirroring**: The `LXMLWebScrapingStrategy` now has much greater fidelity, allowing for almost pixel-perfect mirroring of websites.
|
||||
- **📈 Enhanced Monitoring**: Track memory, CPU, and individual crawler status with `CrawlerMonitor`.
|
||||
- **📝 Improved Documentation**: More examples, clearer explanations, and updated tutorials.
|
||||
|
||||
- 🔒 **Enhanced SSL & Security**: New SSL certificate handling with custom paths and validation options for secure crawling
|
||||
- 🔍 **Smart Content Filtering**: Advanced filtering system with regex support and efficient chunking strategies
|
||||
- 📦 **Improved JSON Extraction**: Support for complex JSONPath, JSON-CSS, and Microdata extraction
|
||||
- 🏗️ **New Field Types**: Added `computed`, `conditional`, `aggregate`, and `template` field types
|
||||
- ⚡ **Performance Boost**: Optimized caching, parallel processing, and memory management
|
||||
- 🐛 **Better Error Handling**: Enhanced debugging capabilities with detailed error tracking
|
||||
- 🔐 **Security Features**: Improved input validation and safe expression evaluation
|
||||
Read the full details in our [0.4.248 Release Notes](https://github.com/unclecode/crawl4ai/blob/main/CHANGELOG.md).
|
||||
|
||||
Read the full details of this release in our [0.4.24 Release Notes](https://github.com/unclecode/crawl4ai/blob/main/CHANGELOG.md).
|
||||
Here's a clear markdown explanation for your users about version numbering:
|
||||
|
||||
|
||||
## Version Numbering in Crawl4AI
|
||||
|
||||
Crawl4AI follows standard Python version numbering conventions (PEP 440) to help users understand the stability and features of each release.
|
||||
|
||||
### Version Numbers Explained
|
||||
|
||||
Our version numbers follow this pattern: `MAJOR.MINOR.PATCH` (e.g., 0.4.3)
|
||||
|
||||
#### Pre-release Versions
|
||||
We use different suffixes to indicate development stages:
|
||||
|
||||
- `dev` (0.4.3dev1): Development versions, unstable
|
||||
- `a` (0.4.3a1): Alpha releases, experimental features
|
||||
- `b` (0.4.3b1): Beta releases, feature complete but needs testing
|
||||
- `rc` (0.4.3rc1): Release candidates, potential final version
|
||||
|
||||
#### Installation
|
||||
- Regular installation (stable version):
|
||||
```bash
|
||||
pip install -U crawl4ai
|
||||
```
|
||||
|
||||
- Install pre-release versions:
|
||||
```bash
|
||||
pip install crawl4ai --pre
|
||||
```
|
||||
|
||||
- Install specific version:
|
||||
```bash
|
||||
pip install crawl4ai==0.4.3b1
|
||||
```
|
||||
|
||||
#### Why Pre-releases?
|
||||
We use pre-releases to:
|
||||
- Test new features in real-world scenarios
|
||||
- Gather feedback before final releases
|
||||
- Ensure stability for production users
|
||||
- Allow early adopters to try new features
|
||||
|
||||
For production environments, we recommend using the stable version. For testing new features, you can opt-in to pre-releases using the `--pre` flag.
|
||||
|
||||
## 📖 Documentation & Roadmap
|
||||
|
||||
|
||||
Reference in New Issue
Block a user