feat(mcp): Implement MCP protocol and enhance server capabilities
This commit introduces several significant enhancements to the Crawl4AI Docker deployment:
1. Add MCP Protocol Support:
- Implement WebSocket and SSE transport layers for MCP server communication
- Create mcp_bridge.py to expose existing API endpoints via MCP protocol
- Add comprehensive tests for both socket and SSE transport methods
2. Enhance Docker Server Capabilities:
- Add PDF generation endpoint with file saving functionality
- Add screenshot capture endpoint with configurable wait time
- Implement JavaScript execution endpoint for dynamic page interaction
- Add intelligent file path handling for saving generated assets
3. Improve Search and Context Functionality:
- Implement syntax-aware code function chunking using AST parsing
- Add BM25-based intelligent document search with relevance scoring
- Create separate code and documentation context endpoints
- Enhance response format with structured results and scores
4. Rename and Fix File Organization:
- Fix typo in test_docker_config_gen.py filename
- Update import statements and dependencies
- Add FileResponse for context endpoints
This enhancement significantly improves the machine-to-machine communication
capabilities of Crawl4AI, making it more suitable for integration with LLM agents
and other automated systems.
The CHANGELOG update has been applied successfully, highlighting the key features and improvements made in this release. The commit message provides a detailed explanation of all the
changes, which will be helpful for tracking the project's evolution.
This commit is contained in:
24
CHANGELOG.md
24
CHANGELOG.md
@@ -5,6 +5,30 @@ All notable changes to Crawl4AI will be documented in this file.
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
### [Feature] 2025-04-21
|
||||
- Implemented MCP protocol for machine-to-machine communication
|
||||
- Added WebSocket and SSE transport for MCP server
|
||||
- Exposed server endpoints via MCP protocol
|
||||
- Created tests for MCP socket and SSE communication
|
||||
- Enhanced Docker server with file handling and intelligent search
|
||||
- Added PDF and screenshot endpoints with file saving capability
|
||||
- Added JavaScript execution endpoint for page interaction
|
||||
- Implemented advanced context search with BM25 and code chunking
|
||||
- Added file path output support for generated assets
|
||||
- Improved server endpoints and API surface
|
||||
- Added intelligent context search with query filtering
|
||||
- Added syntax-aware code function chunking
|
||||
- Implemented efficient HTML processing pipeline
|
||||
|
||||
### [Refactor] 2025-04-20
|
||||
- Replaced crawler_manager.py with simpler crawler_pool.py implementation
|
||||
- Added global page semaphore for hard concurrency cap
|
||||
- Implemented browser pool with idle cleanup
|
||||
- Added playground UI for testing and stress testing
|
||||
- Updated API handlers to use pooled crawlers
|
||||
- Enhanced logging levels and symbols
|
||||
- Added memory tests and stress test utilities
|
||||
|
||||
### [Added] 2025-04-17
|
||||
- Added content source selection feature for markdown generation
|
||||
- New `content_source` parameter allows choosing between `cleaned_html`, `raw_html`, and `fit_html`
|
||||
|
||||
Reference in New Issue
Block a user