crawl4ai

Author	SHA1	Message	Date
UncleCode	88697c4630	docs(readme): update version and feature announcements for v0.4.3b1 Update README.md to announce version 0.4.3b1 release with new features including: - Memory Dispatcher System - Streaming Support - LLM-Powered Markdown Generation - Schema Generation - Robots.txt Compliance Add detailed version numbering explanation section to help users understand pre-release versions.	2025-01-21 21:20:04 +08:00
UncleCode	e8b4ac6046	docs(urls): update documentation URLs to new domain Update all documentation URLs from crawl4ai.com/mkdocs to docs.crawl4ai.com Improve badges styling and layout in documentation Increase code font size in documentation CSS BREAKING CHANGE: Documentation URLs have changed from crawl4ai.com/mkdocs to docs.crawl4ai.com	2025-01-09 16:22:41 +08:00
UncleCode	051a6cf974	docs(readme): update personal story and project vision Revise the README's personal story section to better reflect the project's origins, motivation, and vision for open-source data accessibility. Add more detail about the creator's background and the project's mission to democratize AI through open data access. Also includes a minor TODO comment addition in async crawler strategy.	2025-01-08 21:13:31 +08:00
UncleCode	56fa4e1e42	refactor(doc) Update README	2025-01-07 20:53:10 +08:00
UncleCode	ca3e33122e	refactor(docs): reorganize documentation structure and update styles Reorganize documentation into core/advanced/extraction sections for better navigation. Update terminal theme styles and add rich library for better CLI output. Remove redundant tutorial files and consolidate content into core sections. Add personal story to index page for project context. BREAKING CHANGE: Documentation structure has been significantly reorganized	2025-01-07 20:49:50 +08:00
UncleCode	4cb2a62551	Update README	2025-01-01 18:59:55 +08:00
UncleCode	c64979b8dd	docs: update README	2025-01-01 18:10:38 +08:00
UncleCode	5313c71a0d	docs: update REAME browser installation command - Remove Chrome from manual installation command - Keep Chromium as the only default browser in docs	2025-01-01 17:24:44 +08:00
UncleCode	4a4f613238	docs: simplify installation instructions - Add crawl4ai-doctor command to verify installation - Update browser installation instructions in README and docs - Move optional features to documentation - Add manual browser installation steps as fallback - Update getting-started guide with verification step	2025-01-01 16:54:03 +08:00
UncleCode	171ce25ba6	Fixe typo in CHANGELOG	2024-12-31 19:49:00 +08:00
UncleCode	5c3c05bf93	docs: update README badges and Docker section, reorganize documentation structure	2024-12-31 19:45:02 +08:00
UncleCode	67d0999bc3	chore: resolve merge conflicts for v0.4.24	2024-12-31 19:24:03 +08:00
UncleCode	553a4622bf	chore: prepare for version 0.4.24	2024-12-31 19:18:36 +08:00
UncleCode	7391d6be73	Update README.md (#390 )	2024-12-30 21:24:43 +08:00
UncleCode	e4e23065f1	Update README.md (#389 )	2024-12-30 21:24:06 +08:00
UncleCode	7af1d32ef6	Update README for version 0.4.2: Reflect new features and enhancements	2024-12-12 20:18:44 +08:00
UncleCode	2d31915f0a	Commit Message: Enhance Async Crawler with storage state handling - Updated Async Crawler to support storage state management. - Added error handling for URL validation in Async Web Crawler. - Modified README logo and improved .gitignore entries. - Fixed issues in multiple files for better code robustness.	2024-12-09 20:04:59 +08:00
UncleCode	c51e901f68	feat: Enhance AsyncPlaywrightCrawlerStrategy with text-only and light modes, dynamic viewport adjustment, and session management ### New Features: - Text-Only Mode: Added support for text-only crawling by disabling images, JavaScript, GPU, and other non-essential features. - Light Mode: Optimized browser settings to reduce resource usage and improve efficiency during crawling. - Dynamic Viewport Adjustment: Automatically adjusts viewport dimensions based on content size, ensuring accurate rendering and scaling. - Full Page Scanning: Introduced a feature to scroll and capture dynamic content for pages with infinite scroll or lazy-loading elements. - Session Management: Added `create_session` method for creating and managing browser sessions with unique IDs. ### Improvements: - Unified viewport handling across contexts by dynamically setting dimensions using `self.viewport_width` and `self.viewport_height`. - Enhanced logging and error handling for viewport adjustments, page scanning, and content evaluation. - Reduced resource usage with additional browser flags for both `light_mode` and `text_only` configurations. - Improved handling of cookies, headers, and proxies in session creation. ### Refactoring: - Removed hardcoded viewport dimensions and replaced them with dynamic configurations. - Cleaned up unused and commented-out code for better readability and maintainability. - Introduced defaults for frequently used parameters like `delay_before_return_html`. ### Fixes: - Resolved potential inconsistencies in viewport handling. - Improved robustness of content loading and dynamic adjustments to avoid failures and timeouts. ### Docs Update: - Updated schema usage in `quickstart_async.py` example: - Changed `OpenAIModelFee.schema()` to `OpenAIModelFee.model_json_schema()` for compatibility. - Enhanced LLM extraction instruction documentation. This commit introduces significant enhancements to improve efficiency, flexibility, and reliability of the crawler strategy.	2024-12-08 20:04:44 +08:00
UncleCode	b02544bc0b	docs: update README and blog for version 0.4.0 release, highlighting new features and improvements	2024-12-03 21:28:52 +08:00
unclecode	293f299c08	Add PruningContentFilter with unit tests and update documentation - Introduced the PruningContentFilter for better content relevance. - Implemented comprehensive unit tests for verification of functionality. - Enhanced existing BM25ContentFilter tests for edge case coverage. - Updated documentation to include usage examples for new filter.	2024-12-01 19:17:33 +08:00
UncleCode	1def53b7fe	docs: update Raspberry Pi section to indicate upcoming support	2024-11-29 20:53:43 +08:00
UncleCode	f9c98a377d	Enhance Docker support and improve installation process - Added new Docker commands for platform-specific builds. - Updated README with comprehensive installation and setup instructions. - Introduced `post_install` method in setup script for automation. - Refined migration processes with enhanced error logging. - Bump version to 0.3.746 and updated dependencies.	2024-11-29 20:52:51 +08:00
UncleCode	d202f3539b	Enhance installation and migration processes - Added a post-installation setup script for initialization. - Updated README with installation notes for Playwright setup. - Enhanced migration logging for better error visibility. - Added 'pydantic' to requirements. - Bumped version to 0.3.746.	2024-11-29 18:48:44 +08:00
UncleCode	c8485776fe	docs: update README to reflect latest version v0.3.745	2024-11-28 20:04:16 +08:00
UncleCode	48d43c14b1	docs: fix link formatting for recent updates section in README	2024-11-28 19:33:02 +08:00
UncleCode	776efa74a4	docs: fix link formatting for recent updates section in README	2024-11-28 19:32:32 +08:00
UncleCode	b14e83f499	docs: fix link formatting for recent updates section in README	2024-11-28 19:31:09 +08:00
UncleCode	0cbd594512	Merge branch 'next' - Update README, and quickstart examples	2024-11-28 16:43:16 +08:00
UncleCode	efe93a5f57	docs: enhance README with development TODOs and refine mission statement for clarity	2024-11-28 16:41:11 +08:00
UncleCode	3fda66b85b	docs: refine README content for clarity and conciseness, improving descriptions and formatting	2024-11-28 16:36:24 +08:00
UncleCode	ddfb6707b4	docs: update README to reflect new branding and improve section headings for clarity	2024-11-28 16:34:08 +08:00
UncleCode	a69f7a9531	fix: correct typo in function documentation for clarity and accuracy	2024-11-28 16:31:41 +08:00
UncleCode	d583aa43ca	refactor: update cache handling in quickstart_async example to use CacheMode enum	2024-11-28 15:53:25 +08:00
UncleCode	3abb573142	docs: update README for version 0.3.743 with improved formatting and contributor acknowledgments	2024-11-28 13:07:59 +08:00
UncleCode	d556dada9f	docs: update README to keep details open for extraction capabilities, browser integration, input/output flexibility, utility & debugging, security & accessibility, community & documentation, and cutting-edge features	2024-11-28 13:07:33 +08:00
UncleCode	ce7d49484f	docs: update README for version 0.3.743 with new features, enhancements, and contributor acknowledgments	2024-11-28 13:06:46 +08:00
UncleCode	e4acd18429	docs: update README for version 0.3.743 with new features, enhancements, and contributor acknowledgments	2024-11-28 13:06:30 +08:00
zhounan	73661f7d1f	docs: enhance development installation instructions (#286 ) Thanks for your contribution. I'm merging your changes and I'll add your name to our contributor list. Thank you so much.	2024-11-27 15:04:20 +08:00
unclecode	d7c5b900b8	feat: add support for arm64 platform in Docker commands and update INSTALL_TYPE variable in docker-compose	2024-11-24 19:35:53 +08:00
UncleCode	8dea3f470f	chore: update README to include new features and improvements for version 0.3.74	2024-11-22 18:50:12 +08:00
UncleCode	e02935dc5b	chore: update README to reflect new features and improvements in version 0.3.74	2024-11-22 18:49:22 +08:00
UncleCode	571dda6549	Update Redme	2024-11-22 18:27:43 +08:00
UncleCode	006bee4a5a	feat: enhance image processing capabilities - Enhanced image processing with srcset support and validation checks for better image selection.	2024-11-22 16:00:17 +08:00
UncleCode	b6af94cbbb	Merge remote-tracking branch 'origin/main' into 0.3.74	2024-11-18 21:15:04 +08:00
UncleCode	152ac35bc2	feat(docs): update README for version 0.3.74 with new features and improvements fix(version): update version number to 0.3.74 refactor(async_webcrawler): enhance logging and add domain-based request delay	2024-11-17 21:09:26 +08:00
UncleCode	df63a40606	feat(docs): update examples and documentation to replace bypass_cache with cache_mode for improved clarity	2024-11-17 19:44:45 +08:00
UncleCode	3a524a3bdd	fix(docs): remove unnecessary blank line in README for improved readability	2024-11-17 16:00:39 +08:00
UncleCode	4b45b28f25	feat(docs): enhance deployment documentation with one-click setup, API security details, and Docker Compose examples	2024-11-16 18:44:47 +08:00
UncleCode	bf91adf3f8	fix: Resolve unexpected BrowserContext closure during crawl in Docker - Removed __del__ method in AsyncPlaywrightCrawlerStrategy to ensure reliable browser lifecycle management by using explicit context managers. - Added process monitoring in ManagedBrowser to detect and log unexpected terminations of the browser subprocess. - Updated Docker configuration to expose port 9222 for remote debugging and allocate extra shared memory to prevent browser crashes. - Improved error handling and resource cleanup for browser instances, particularly in Docker environments. Resolves Issue #256	2024-11-13 15:37:16 +08:00
UncleCode	8c22396d8b	Merge pull request #234 from devatnull/patch-1 Fix typo: scrapper → scraper	2024-11-12 08:37:14 +01:00

1 2 3 4

169 Commits