Add release notes for v0.8.0, detailing breaking changes, security fixes, new features, bug fixes, and documentation updates

Documentation for v0.8.0 release:

- SECURITY.md: Security policy and vulnerability reporting guidelines
- RELEASE_NOTES_v0.8.0.md: Comprehensive release notes
- migration/v0.8.0-upgrade-guide.md: Step-by-step migration guide
- security/GHSA-DRAFT-RCE-LFI.md: GitHub security advisory drafts
- CHANGELOG.md: Updated with v0.8.0 changes

Breaking changes documented:
- Docker API hooks disabled by default (CRAWL4AI_HOOKS_ENABLED)
- file:// URLs blocked on Docker API endpoints

Security fixes credited to Neo by ProjectDiscovery
This commit is contained in:
unclecode
2026-01-12 13:45:42 +00:00
parent 122b4fe3f0
commit 530cde351f
6 changed files with 877 additions and 335 deletions

View File

@@ -5,6 +5,46 @@ All notable changes to Crawl4AI will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.8.0] - 2026-01-12
### Security
- **🔒 CRITICAL: Remote Code Execution Fix**: Removed `__import__` from hook allowed builtins
- Prevents arbitrary module imports in user-provided hook code
- Hooks now disabled by default via `CRAWL4AI_HOOKS_ENABLED` environment variable
- Credit: Neo by ProjectDiscovery
- **🔒 HIGH: Local File Inclusion Fix**: Added URL scheme validation to Docker API endpoints
- Blocks `file://`, `javascript:`, `data:` URLs on `/execute_js`, `/screenshot`, `/pdf`, `/html`
- Only allows `http://`, `https://`, and `raw:` URLs
- Credit: Neo by ProjectDiscovery
### Breaking Changes
- **Docker API: Hooks disabled by default**: Set `CRAWL4AI_HOOKS_ENABLED=true` to enable
- **Docker API: file:// URLs blocked**: Use Python library directly for local file processing
### Added
- **🚀 init_scripts for BrowserConfig**: Pre-page-load JavaScript injection for stealth evasions
- **🔄 CDP Connection Improvements**: WebSocket URL support, proper cleanup, browser reuse
- **💾 Crash Recovery for Deep Crawl**: `resume_state` and `on_state_change` for BFS/DFS/Best-First strategies
- **📄 PDF/MHTML for raw:/file:// URLs**: Generate PDFs and MHTML from cached HTML content
- **📸 Screenshots for raw:/file:// URLs**: Render cached HTML and capture screenshots
- **🔗 base_url Parameter**: Proper URL resolution for raw: HTML processing
- **⚡ Prefetch Mode**: Two-phase deep crawling with fast link extraction
- **🔀 Enhanced Proxy Support**: Improved proxy rotation and sticky sessions
- **🌐 HTTP Strategy Proxy Support**: Non-browser crawler now supports proxies
- **🖥️ Browser Pipeline for raw:/file://**: New `process_in_browser` parameter
- **📋 Smart TTL Cache for Sitemap Seeder**: `cache_ttl_hours` and `validate_sitemap_lastmod` parameters
- **📚 Security Documentation**: Added SECURITY.md with vulnerability reporting guidelines
### Fixed
- **raw: URL Parsing**: Fixed truncation at `#` character (CSS color codes like `#eee`)
- **Caching System**: Various improvements to cache validation and persistence
### Documentation
- Multi-sample schema generation section
- URL seeder smart TTL cache parameters
- v0.8.0 migration guide
- Security policy and disclosure process
## [Unreleased]
### Added