# Crawl4AI 0.4.3: Major Performance Boost & LLM Integration We're excited to announce Crawl4AI 0.4.3, focusing on three key areas: Speed & Efficiency, LLM Integration, and Core Platform Improvements. This release significantly improves crawling performance while adding powerful new LLM-powered features. ## ⚡ Speed & Efficiency Improvements ### 1. Memory-Adaptive Dispatcher System The new dispatcher system provides intelligent resource management and real-time monitoring: ```python from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, DisplayMode from crawl4ai.async_dispatcher import MemoryAdaptiveDispatcher, CrawlerMonitor async def main(): urls = ["https://example1.com", "https://example2.com"] * 50 # Configure memory-aware dispatch dispatcher = MemoryAdaptiveDispatcher( memory_threshold_percent=80.0, # Auto-throttle at 80% memory check_interval=0.5, # Check every 0.5 seconds max_session_permit=20, # Max concurrent sessions monitor=CrawlerMonitor( # Real-time monitoring display_mode=DisplayMode.DETAILED ) ) async with AsyncWebCrawler() as crawler: results = await dispatcher.run_urls( urls=urls, crawler=crawler, config=CrawlerRunConfig() ) ``` ### 2. Streaming Support Process crawled URLs in real-time instead of waiting for all results: ```python config = CrawlerRunConfig(stream=True) async with AsyncWebCrawler() as crawler: async for result in await crawler.arun_many(urls, config=config): print(f"Got result for {result.url}") # Process each result immediately ``` ### 3. LXML-Based Scraping New LXML scraping strategy offering up to 20x faster parsing: ```python config = CrawlerRunConfig( scraping_strategy=LXMLWebScrapingStrategy(), cache_mode=CacheMode.ENABLED ) ``` ## 🤖 LLM Integration ### 1. LLM-Powered Markdown Generation Smart content filtering and organization using LLMs: ```python config = CrawlerRunConfig( markdown_generator=DefaultMarkdownGenerator( content_filter=LLMContentFilter( provider="openai/gpt-4o", instruction="Extract technical documentation and code examples" ) ) ) ``` ### 2. Automatic Schema Generation Generate extraction schemas instantly using LLMs instead of manual CSS/XPath writing: ```python schema = JsonCssExtractionStrategy.generate_schema( html_content, schema_type="CSS", query="Extract product name, price, and description" ) ``` ## 🔧 Core Improvements ### 1. Proxy Support & Rotation Integrated proxy support with automatic rotation and verification: ```python config = CrawlerRunConfig( proxy_config={ "server": "http://proxy:8080", "username": "user", "password": "pass" } ) ``` ### 2. Robots.txt Compliance Built-in robots.txt support with SQLite caching: ```python config = CrawlerRunConfig(check_robots_txt=True) result = await crawler.arun(url, config=config) if result.status_code == 403: print("Access blocked by robots.txt") ``` ### 3. URL Redirection Tracking Track final URLs after redirects: ```python result = await crawler.arun(url) print(f"Initial URL: {url}") print(f"Final URL: {result.redirected_url}") ``` ## Performance Impact - Memory usage reduced by up to 40% with adaptive dispatcher - Parsing speed increased up to 20x with LXML strategy - Streaming reduces memory footprint for large crawls by ~60% ## Getting Started ```bash pip install -U crawl4ai ``` For complete examples, check our [demo repository](https://github.com/unclecode/crawl4ai/examples). ## Stay Connected - Star us on [GitHub](https://github.com/unclecode/crawl4ai) - Follow [@unclecode](https://twitter.com/unclecode) - Join our [Discord](https://discord.gg/crawl4ai) Happy crawling! 🕷️