Release/v0.7.8 (#1662)

* Fix: Use correct URL variable for raw HTML extraction (#1116)

- Prevents full HTML content from being passed as URL to extraction strategies
- Added unit tests to verify raw HTML and regular URL processing

Fix: Wrong URL variable used for extraction of raw html

* Fix #1181: Preserve whitespace in code blocks during HTML scraping

  The remove_empty_elements_fast() method was removing whitespace-only
  span elements inside <pre> and <code> tags, causing import statements
  like "import torch" to become "importtorch". Now skips elements inside
  code blocks where whitespace is significant.

* Refactor Pydantic model configuration to use ConfigDict for arbitrary types

* Fix EmbeddingStrategy: Uncomment response handling for the variations and clean up mock data. ref #1621

* Fix: permission issues with .cache/url_seeder and other runtime cache dirs. ref #1638

* fix: ensure BrowserConfig.to_dict serializes proxy_config

* feat: make LLM backoff configurable end-to-end

- extend LLMConfig with backoff delay/attempt/factor fields and thread them
  through LLMExtractionStrategy, LLMContentFilter, table extraction, and
  Docker API handlers
- expose the backoff parameter knobs on perform_completion_with_backoff/aperform_completion_with_backoff
  and document them in the md_v2 guides

* reproduced AttributeError from #1642

* pass timeout parameter to docker client request

* added missing deep crawling objects to init

* generalized query in ContentRelevanceFilter to be a str or list

* import modules from enhanceable deserialization

* parameterized tests

* Fix: capture current page URL to reflect JavaScript navigation and add test for delayed redirects. ref #1268

* refactor: replace PyPDF2 with pypdf across the codebase. ref #1412

* announcement: add application form for cloud API closed beta

* Release v0.7.8: Stability & Bug Fix Release

- Updated version to 0.7.8
- Introduced focused stability release addressing 11 community-reported bugs.
- Key fixes include Docker API improvements, LLM extraction enhancements, URL handling corrections, and dependency updates.
- Added detailed release notes for v0.7.8 in the blog and created a dedicated verification script to ensure all fixes are functioning as intended.
- Updated documentation to reflect recent changes and improvements.

* docs: add section for Crawl4AI Cloud API closed beta with application link

* fix: add disk cleanup step to Docker workflow

---------

Co-authored-by: rbushria <rbushri@gmail.com>
Co-authored-by: AHMET YILMAZ <tawfik@kidocode.com>
Co-authored-by: Soham Kukreti <kukretisoham@gmail.com>
Co-authored-by: Chris Murphy <chris.murphy@klaviyo.com>
Co-authored-by: Aravind Karnam <aravind.karanam@gmail.com>
This commit is contained in:
Nasrin
2025-12-11 18:04:52 +08:00
committed by GitHub
parent 835e3c56fe
commit a87e8c1c9e
32 changed files with 2123 additions and 135 deletions

View File

@@ -20,25 +20,35 @@ Ever wondered why your AI coding assistant struggles with your library despite c
## Latest Release
### [Crawl4AI v0.7.8 Stability & Bug Fix Release](../blog/release-v0.7.8.md)
*December 2025*
Crawl4AI v0.7.8 is a focused stability release addressing 11 bugs reported by the community. While there are no new features, these fixes resolve important issues affecting Docker deployments, LLM extraction, URL handling, and dependency compatibility.
Key highlights:
- **🐳 Docker API Fixes**: ContentRelevanceFilter deserialization, ProxyConfig serialization, cache folder permissions
- **🤖 LLM Improvements**: Configurable rate limiter backoff, HTML input format support, raw HTML URL handling
- **🔗 URL Handling**: Correct relative URL resolution after JavaScript redirects
- **📦 Dependencies**: Replaced deprecated PyPDF2 with pypdf, Pydantic v2 ConfigDict compatibility
- **🧠 AdaptiveCrawler**: Fixed query expansion to actually use LLM instead of mock data
[Read full release notes →](../blog/release-v0.7.8.md)
## Recent Releases
### [Crawl4AI v0.7.7 The Self-Hosting & Monitoring Update](../blog/release-v0.7.7.md)
*November 14, 2025*
Crawl4AI v0.7.7 transforms Docker into a complete self-hosting platform with enterprise-grade real-time monitoring, comprehensive observability, and full operational control. Experience complete visibility into your crawling infrastructure!
Crawl4AI v0.7.7 transforms Docker into a complete self-hosting platform with enterprise-grade real-time monitoring, comprehensive observability, and full operational control.
Key highlights:
- **📊 Real-time Monitoring Dashboard**: Interactive web UI with live system metrics and browser pool visibility
- **🔌 Comprehensive Monitor API**: Complete REST API for programmatic access to all monitoring data
- **⚡ WebSocket Streaming**: Real-time updates every 2 seconds for custom dashboards
- **🔥 Smart Browser Pool**: 3-tier architecture (permanent/hot/cold) with automatic promotion and cleanup
- **🧹 Janitor System**: Automatic resource management with event logging
- **🎮 Control Actions**: Manual browser management (kill, restart, cleanup) via API
- **📈 Production Ready**: Prometheus integration, alerting patterns, and 6 critical metrics for ops excellence
- **🐛 Critical Fixes**: Async LLM extraction (#1055), DFS crawling (#1607), viewport config, and security updates
- **📊 Real-time Monitoring Dashboard**: Interactive web UI with live system metrics
- **🔌 Comprehensive Monitor API**: Complete REST API for programmatic access
- **⚡ WebSocket Streaming**: Real-time updates every 2 seconds
- **🔥 Smart Browser Pool**: 3-tier architecture with automatic promotion and cleanup
[Read full release notes →](../blog/release-v0.7.7.md)
## Recent Releases
### [Crawl4AI v0.7.6 The Webhook Infrastructure Update](../blog/release-v0.7.6.md)
*October 22, 2025*
@@ -66,15 +76,17 @@ Key highlights:
[Read full release notes →](../blog/release-v0.7.5.md)
### [Crawl4AI v0.7.4 The Intelligent Table Extraction & Performance Update](../blog/release-v0.7.4.md)
*August 17, 2025*
Revolutionary LLM-powered table extraction with intelligent chunking, performance improvements for concurrent crawling, enhanced browser management, and critical stability fixes.
[Read full release notes →](../blog/release-v0.7.4.md)
---
## Older Releases
| Version | Date | Highlights |
|---------|------|------------|
| [v0.7.4](../blog/release-v0.7.4.md) | August 2025 | LLM-powered table extraction, performance improvements |
| [v0.7.3](../blog/release-v0.7.3.md) | July 2025 | Undetected browser, multi-URL config, memory monitoring |
| [v0.7.1](../blog/release-v0.7.1.md) | June 2025 | Bug fixes and stability improvements |
| [v0.7.0](../blog/release-v0.7.0.md) | May 2025 | Adaptive crawling, virtual scroll, link analysis |
## Project History
Curious about how Crawl4AI has evolved? Check out our [complete changelog](https://github.com/unclecode/crawl4ai/blob/main/CHANGELOG.md) for a detailed history of all versions and updates.