crawl4ai

Author	SHA1	Message	Date
UncleCode	b11a91e1dd	Update gitignore	2025-01-04 16:07:18 +08:00
UncleCode	7aaaaae461	feat(browser-farm): Add Docker browser support for remote crawling Implement initial MVP for Docker-based browser management in Crawl4ai, enabling remote browser execution in containerized environments. Key Changes: - Add browser_farm module with Docker support components: * BrowserFarmService: Manages browser endpoints * DockerBrowser: Handles Docker browser communication * Basic health check implementation * Dockerfile with optimized Chrome/Playwright setup: - Based on python:3.10-slim for minimal size - Includes all required system dependencies - Auto-installs crawl4ai and sets up Playwright - Configures Chrome with remote debugging - Uses socat for port forwarding (9223) - Update core components: * Rename use_managed_browser to use_remote_browser for clarity * Modify BrowserManager to support Docker mode * Add Docker configuration in BrowserConfig * Update context handling for remote browsers - Add example: * hello_world_docker.py demonstrating Docker browser usage Technical Details: - Docker container exposes port 9223 (mapped to host:9333) - Uses CDP (Chrome DevTools Protocol) for remote connection - Maintains compatibility with existing managed browser features - Simplified endpoint management for MVP phase - Optimized Docker setup: * Minimal dependencies installation * Proper Chrome flags for containerized environment * Headless mode with GPU disabled * Security considerations (no-sandbox mode) Testing: - Extensive Docker configuration testing and optimization - Verified with hello_world_docker.py example - Confirmed remote browser connection and crawling functionality - Tested basic health checks This is the first step towards a scalable browser farm solution, setting up the foundation for future enhancements like resource monitoring, multiple browser instances, and container lifecycle management.	2025-01-02 18:41:36 +08:00
UncleCode	24b3da717a	refactor(): - Update hello world example	2025-01-02 17:53:30 +08:00
UncleCode	98acc4254d	refactor: - Update hello_world.py example	2025-01-01 19:47:22 +08:00
UncleCode	eac78c7993	Merge branch 'vr0.4.246'	2025-01-01 19:43:01 +08:00
UncleCode	da1bc0f7bf	Update version file	2025-01-01 19:42:35 +08:00
UncleCode	aa4f92f458	refactor(crawler): - Update hello_world example with proper content filtering	2025-01-01 19:39:42 +08:00
UncleCode	a96e05d4ae	refactor(crawler): optimize response handling and default settings - Set wait_for_images default to false for better performance - Simplify response attribute copying in AsyncWebCrawler - Update hello_world example with proper content filtering	2025-01-01 19:39:02 +08:00
UncleCode	5c95fd92b4	fix(browser): resolve merge conflicts in browser channel configuration	2025-01-01 19:05:47 +08:00
UncleCode	4cb2a62551	Update README	2025-01-01 18:59:55 +08:00
UncleCode	5b4fad9e25	- Bump version to 0.4.244	2025-01-01 18:58:43 +08:00
UncleCode	ea0ac25f38	refactor(browser): Update browser channel default to 'chromium' in BrowserConfig.from_args method	2025-01-01 18:58:15 +08:00
UncleCode	7688aca7d6	Update Version	2025-01-01 18:44:27 +08:00
UncleCode	a7215ad972	fix(browser): update default browser channel to chromium and simplify channel selection logic	2025-01-01 18:38:33 +08:00
Arno.Edwards	8e2403a7da	fix(browser)!: default to Chromium channel for new headless mode (#387 ) BREAKING CHANGE: Updated `chrome_channel` to "chromium" to fix compatibility with the new Chromium headless implementation. This resolves the error `playwright._impl._errors.Error: BrowserType.launch: Chromium distribution 'chrome' is not found`, caused by the removal of the old headless mode in Chromium. With this change, channels like "chrome" and "msedge" now default to the new headless mode, aligning with upstream updates in Playwright v1.49. The new headless mode uses the real Chrome browser, offering more authenticity, reliability, and feature parity with the full browser. Additionally, simplified fallback logic by directly assigning `chrome_channel` based on `browser_type` or defaulting to "chromium". Refer to: - https://playwright.dev/python/docs/browsers#chromium - https://github.com/microsoft/playwright/issues/33566	2025-01-01 18:37:50 +08:00
UncleCode	318554e6bf	Merge branch 'v0.4.243' v0.4.243	2025-01-01 18:11:15 +08:00
UncleCode	c64979b8dd	docs: update README	2025-01-01 18:10:38 +08:00
UncleCode	bfe21b29d4	build: streamline package discovery and bump to v0.4.243 - Replace explicit package listing with setuptools.find - Include all crawl4ai.* packages automatically - Use `packages = {find = {where = ["."], include = ["crawl4ai*"]}}` syntax - Bump version to 0.4.243 This change simplifies package maintenance by automatically discovering all subpackages under crawl4ai namespace instead of listing them manually.	2025-01-01 17:55:59 +08:00
UncleCode	e9d9a6ffe8	fix: ensure js_snippet files are included in package - Add js_snippet to packages list in pyproject.toml - Verified JS files are properly included in installed package - Bump version to 0.4.242	2025-01-01 17:38:59 +08:00
UncleCode	5313c71a0d	docs: update REAME browser installation command - Remove Chrome from manual installation command - Keep Chromium as the only default browser in docs	2025-01-01 17:24:44 +08:00
UncleCode	d36ef3d424	refactor(install): use chromium as default browser - Remove Chrome installation to reduce setup time - Keep Chromium as default browser for better cross-platform compatibility	2025-01-01 17:19:54 +08:00
UncleCode	4a4f613238	docs: simplify installation instructions - Add crawl4ai-doctor command to verify installation - Update browser installation instructions in README and docs - Move optional features to documentation - Add manual browser installation steps as fallback - Update getting-started guide with verification step	2025-01-01 16:54:03 +08:00
UncleCode	dc6a24618e	feat(install): add doctor command and force browser install - Add --force flag to Playwright browser installation - Add doctor command to test crawling functionality - Install Chrome and Chromium browsers explicitly - Add crawl4ai-doctor entry point in pyproject.toml - Implement simple health check focused on crawling test	2025-01-01 16:33:43 +08:00
UncleCode	74a7c6dbb6	feat(install): specify chrome and chromium for playwright - Install Chrome and Chromium browsers explicitly - Split browser installation into separate commands	2025-01-01 16:10:08 +08:00
UncleCode	67f65f958b	refactor(build): simplify setup.py configuration - Remove dependency management from setup.py - Remove entry points configuration (moved to pyproject.toml) - Keep minimal setup.py for backwards compatibility - Clean up package metadata structure	2025-01-01 15:52:01 +08:00
UncleCode	78b6ba5cef	build: modernize package configuration with pyproject.toml - Add pyproject.toml for PEP 517 build system support - Configure dependencies, scripts, and metadata in pyproject.toml - Set Python requirement to >=3.9 and add support up to 3.13 - Keep setup.py for backwards compatibility - Move package dependencies and entry points to pyproject.toml	2025-01-01 15:45:27 +08:00
UncleCode	3f019d34cc	docs: update project description emojis - Change project description emojis from 🔥🕷️ to 🚀🤖 - Update emojis consistently in both setup.py and pyproject.toml	2025-01-01 15:39:33 +08:00
UncleCode	304260e484	refactor(install): simplify Playwright installation error handling - Remove setup_docs() call from post_install() - Simplify error messages for Playwright installation failures - Use sys.executable for more accurate Python path in error messages - Add --with-deps flag to Playwright install command	2025-01-01 15:33:36 +08:00
UncleCode	704bd66b63	Uphrade plawyright installation command to install dependencies	2025-01-01 15:23:16 +08:00
UncleCode	1acc162c18	Bumb version v0.4.241	2025-01-01 15:16:06 +08:00
UncleCode	553c97a0c1	Fix bug reported in issue https://github.com/unclecode/crawl4ai/issues/396	2025-01-01 15:15:14 +08:00
UncleCode	bd66befcf0	Fix issue in 0.4.24 walkthrough	2024-12-31 21:07:58 +08:00
UncleCode	3e769a9c6c	Fix issue in 0.4.24 walkthrough	2024-12-31 21:07:33 +08:00
UncleCode	19b0a5ae82	Update 0.4.24 walkthrough	2024-12-31 21:01:46 +08:00
UncleCode	bd71f7f4ea	Add 0.4.24 walkthrough	2024-12-31 20:22:33 +08:00
UncleCode	171ce25ba6	Fixe typo in CHANGELOG	2024-12-31 19:49:00 +08:00
UncleCode	6c5a44f774	chore: bump version to 0.4.25	2024-12-31 19:45:48 +08:00
UncleCode	5c3c05bf93	docs: update README badges and Docker section, reorganize documentation structure	2024-12-31 19:45:02 +08:00
UncleCode	67d0999bc3	chore: resolve merge conflicts for v0.4.24 v0.4.24	2024-12-31 19:24:03 +08:00
UncleCode	553a4622bf	chore: prepare for version 0.4.24	2024-12-31 19:18:36 +08:00
UncleCode	6f81ef006d	Remove .local folder from remote repository	2024-12-31 17:37:50 +08:00
UncleCode	a04870a662	Remove .do folder	2024-12-31 17:37:14 +08:00
UncleCode	f7d26390c5	Remove .do folder	2024-12-31 17:36:22 +08:00
UncleCode	141783fb2d	Remove .do folder from remote repository	2024-12-31 17:35:57 +08:00
UncleCode	2fedd4876e	Update gitignore	2024-12-31 17:35:34 +08:00
UncleCode	e187b0aaf0	update gitignore	2024-12-31 17:34:31 +08:00
UncleCode	e95374d7c6	Delete .do/deploy.template.yaml (#394 )	2024-12-31 17:33:59 +08:00
UncleCode	8f2d0cda2f	Remove .do folder from remote	2024-12-31 17:32:55 +08:00
UncleCode	9d261d2b9c	Recreate .do folder with temporary file	2024-12-31 17:32:44 +08:00
UncleCode	7792fe0e4c	Recreate .do folder for removal	2024-12-31 17:31:51 +08:00

1 2 3 4 5 ...

529 Commits