docs(readme): update version and feature announcements for v0.4.3b1

Update README.md to announce version 0.4.3b1 release with new features including: - Memory Dispatcher System - Streaming Support - LLM-Powered Markdown Generation - Schema Generation - Robots.txt Compliance Add detailed version numbering explanation section to help users understand pre-release versions.
2025-01-21 21:20:04 +08:00
parent 16b8d4945b
commit 88697c4630
1 changed files with 59 additions and 11 deletions
--- a/README.md
+++ b/README.md
@@ -20,9 +20,9 @@

 Crawl4AI is the #1 trending GitHub repository, actively maintained by a vibrant community. It delivers blazing-fast, AI-ready web crawling tailored for LLMs, AI agents, and data pipelines. Open source, flexible, and built for real-time performance, Crawl4AI empowers developers with unmatched speed, precision, and deployment ease.  

-[✨ Check out latest update v0.4.24x](#-recent-updates)
+[✨ Check out latest update v0.4.3b1x](#-recent-updates)

-🎉 **Version 0.4.24x is out!** Major improvements in extraction strategies with enhanced JSON handling, SSL security, and Amazon product extraction. Plus, a completely revamped content filtering system! [Read the release notes →](https://docs.crawl4ai.com/blog)
+🎉 **Version 0.4.3b1 is out!** This release brings exciting new features like a Memory Dispatcher System, Streaming Support, LLM-Powered Markdown Generation, Schema Generation, and Robots.txt Compliance! [Read the release notes →](https://docs.crawl4ai.com/blog)

 <details>
 <summary>🤓 <strong>My Personal Story</strong></summary>
@@ -481,18 +481,66 @@ async def test_news_crawl():

 </details>

-
 ## ✨ Recent Updates

- 🔒 **Enhanced SSL & Security**: New SSL certificate handling with custom paths and validation options for secure crawling
- 🔍 **Smart Content Filtering**: Advanced filtering system with regex support and efficient chunking strategies
- 📦 **Improved JSON Extraction**: Support for complex JSONPath, JSON-CSS, and Microdata extraction
- 🏗️ **New Field Types**: Added `computed`, `conditional`, `aggregate`, and `template` field types
- ⚡ **Performance Boost**: Optimized caching, parallel processing, and memory management
- 🐛 **Better Error Handling**: Enhanced debugging capabilities with detailed error tracking
- 🔐 **Security Features**: Improved input validation and safe expression evaluation
+-   **🚀 New Dispatcher System**: Scale to thousands of URLs with intelligent **memory monitoring**, **concurrency control**, and optional **rate limiting**. (See `MemoryAdaptiveDispatcher`, `SemaphoreDispatcher`, `RateLimiter`, `CrawlerMonitor`)
+-   **⚡ Streaming Mode**: Process results **as they arrive** instead of waiting for an entire batch to complete. (Set `stream=True` in `CrawlerRunConfig`)
+-   **🤖 Enhanced LLM Integration**:
+    -   **Automatic schema generation**: Create extraction rules from HTML using OpenAI or Ollama, no manual CSS/XPath needed.
+    -   **LLM-powered Markdown filtering**: Refine your markdown output with a new `LLMContentFilter` that understands content relevance.
+    -   **Ollama Support**: Use open-source or self-hosted models for private or cost-effective extraction.
+-   **🏎️ Faster Scraping Option**: New `LXMLWebScrapingStrategy` offers **10-20x speedup** for large, complex pages (experimental).
+-   **🤖 robots.txt Compliance**: Respect website rules with `check_robots_txt=True` and efficient local caching.
+-   **➡️ URL Redirection Tracking**: The `final_url` field now captures the final destination after any redirects.
+-   **🪞 Improved Mirroring**: The `LXMLWebScrapingStrategy` now has much greater fidelity, allowing for almost pixel-perfect mirroring of websites.
+-   **📈 Enhanced Monitoring**: Track memory, CPU, and individual crawler status with `CrawlerMonitor`.
+-   **📝 Improved Documentation**: More examples, clearer explanations, and updated tutorials.

-Read the full details of this release in our [0.4.24 Release Notes](https://github.com/unclecode/crawl4ai/blob/main/CHANGELOG.md).
+Read the full details in our [0.4.248 Release Notes](https://github.com/unclecode/crawl4ai/blob/main/CHANGELOG.md).
+
+Here's a clear markdown explanation for your users about version numbering:
+
+
+## Version Numbering in Crawl4AI
+
+Crawl4AI follows standard Python version numbering conventions (PEP 440) to help users understand the stability and features of each release.
+
+### Version Numbers Explained
+
+Our version numbers follow this pattern: `MAJOR.MINOR.PATCH` (e.g., 0.4.3)
+
+#### Pre-release Versions
+We use different suffixes to indicate development stages:
+
+- `dev` (0.4.3dev1): Development versions, unstable
+- `a` (0.4.3a1): Alpha releases, experimental features
+- `b` (0.4.3b1): Beta releases, feature complete but needs testing
+- `rc` (0.4.3rc1): Release candidates, potential final version
+
+#### Installation
+- Regular installation (stable version):
+  ```bash
+  pip install -U crawl4ai
+  ```
+
+- Install pre-release versions:
+  ```bash
+  pip install crawl4ai --pre
+  ```
+
+- Install specific version:
+  ```bash
+  pip install crawl4ai==0.4.3b1
+  ```
+
+#### Why Pre-releases?
+We use pre-releases to:
+- Test new features in real-world scenarios
+- Gather feedback before final releases
+- Ensure stability for production users
+- Allow early adopters to try new features
+
+For production environments, we recommend using the stable version. For testing new features, you can opt-in to pre-releases using the `--pre` flag.

 ## 📖 Documentation & Roadmap