crawl4ai

Author	SHA1	Message	Date
UncleCode	7aaaaae461	feat(browser-farm): Add Docker browser support for remote crawling Implement initial MVP for Docker-based browser management in Crawl4ai, enabling remote browser execution in containerized environments. Key Changes: - Add browser_farm module with Docker support components: * BrowserFarmService: Manages browser endpoints * DockerBrowser: Handles Docker browser communication * Basic health check implementation * Dockerfile with optimized Chrome/Playwright setup: - Based on python:3.10-slim for minimal size - Includes all required system dependencies - Auto-installs crawl4ai and sets up Playwright - Configures Chrome with remote debugging - Uses socat for port forwarding (9223) - Update core components: * Rename use_managed_browser to use_remote_browser for clarity * Modify BrowserManager to support Docker mode * Add Docker configuration in BrowserConfig * Update context handling for remote browsers - Add example: * hello_world_docker.py demonstrating Docker browser usage Technical Details: - Docker container exposes port 9223 (mapped to host:9333) - Uses CDP (Chrome DevTools Protocol) for remote connection - Maintains compatibility with existing managed browser features - Simplified endpoint management for MVP phase - Optimized Docker setup: * Minimal dependencies installation * Proper Chrome flags for containerized environment * Headless mode with GPU disabled * Security considerations (no-sandbox mode) Testing: - Extensive Docker configuration testing and optimization - Verified with hello_world_docker.py example - Confirmed remote browser connection and crawling functionality - Tested basic health checks This is the first step towards a scalable browser farm solution, setting up the foundation for future enhancements like resource monitoring, multiple browser instances, and container lifecycle management.	2025-01-02 18:41:36 +08:00
UncleCode	fb33a24891	Commit Message: - Added examples for Amazon product data extraction methods - Updated configuration options and enhance documentation - Minor refactoring for improved performance and readability - Cleaned up version control settings.	2024-12-29 20:05:18 +08:00
UncleCode	852729ff38	feat(docker): add Docker Compose configurations for local and hub deployment; enhance GPU support checks in Dockerfile feat(requirements): update requirements.txt to include snowballstemmer fix(version_manager): correct version parsing to use __version__.__version__ feat(main): introduce chunking strategy and content filter in CrawlRequest model feat(content_filter): enhance BM25 algorithm with priority tag scoring for improved content relevance feat(logger): implement new async logger engine replacing print statements throughout library fix(database): resolve version-related deadlock and circular lock issues in database operations docs(docker): expand Docker deployment documentation with usage instructions for Docker Compose	2024-11-18 21:00:06 +08:00
UncleCode	4b45b28f25	feat(docs): enhance deployment documentation with one-click setup, API security details, and Docker Compose examples	2024-11-16 18:44:47 +08:00
UncleCode	9139ef3125	feat(docker): update Dockerfile for improved installation process and enhance deployment documentation with Docker Compose setup and API token security	2024-11-16 18:19:44 +08:00
UncleCode	67a23c3182	feat(core): Release v0.3.73 with Browser Takeover and Docker Support Major changes: - Add browser takeover feature using CDP for authentic browsing - Implement Docker support with full API server documentation - Enhance Mockdown with tag preservation system - Improve parallel crawling performance This release focuses on authenticity and scalability, introducing the ability to use users' own browsers while providing containerized deployment options. Breaking changes include modified browser handling and API response structure. See CHANGELOG.md for detailed migration guide.	2024-11-05 20:04:18 +08:00

6 Commits