crawl4ai

Author	SHA1	Message	Date
UncleCode	d36ef3d424	refactor(install): use chromium as default browser - Remove Chrome installation to reduce setup time - Keep Chromium as default browser for better cross-platform compatibility	2025-01-01 17:19:54 +08:00
UncleCode	4a4f613238	docs: simplify installation instructions - Add crawl4ai-doctor command to verify installation - Update browser installation instructions in README and docs - Move optional features to documentation - Add manual browser installation steps as fallback - Update getting-started guide with verification step	2025-01-01 16:54:03 +08:00
UncleCode	dc6a24618e	feat(install): add doctor command and force browser install - Add --force flag to Playwright browser installation - Add doctor command to test crawling functionality - Install Chrome and Chromium browsers explicitly - Add crawl4ai-doctor entry point in pyproject.toml - Implement simple health check focused on crawling test	2025-01-01 16:33:43 +08:00
UncleCode	74a7c6dbb6	feat(install): specify chrome and chromium for playwright - Install Chrome and Chromium browsers explicitly - Split browser installation into separate commands	2025-01-01 16:10:08 +08:00
UncleCode	67f65f958b	refactor(build): simplify setup.py configuration - Remove dependency management from setup.py - Remove entry points configuration (moved to pyproject.toml) - Keep minimal setup.py for backwards compatibility - Clean up package metadata structure	2025-01-01 15:52:01 +08:00
UncleCode	78b6ba5cef	build: modernize package configuration with pyproject.toml - Add pyproject.toml for PEP 517 build system support - Configure dependencies, scripts, and metadata in pyproject.toml - Set Python requirement to >=3.9 and add support up to 3.13 - Keep setup.py for backwards compatibility - Move package dependencies and entry points to pyproject.toml	2025-01-01 15:45:27 +08:00
UncleCode	3f019d34cc	docs: update project description emojis - Change project description emojis from 🔥🕷️ to 🚀🤖 - Update emojis consistently in both setup.py and pyproject.toml	2025-01-01 15:39:33 +08:00
UncleCode	304260e484	refactor(install): simplify Playwright installation error handling - Remove setup_docs() call from post_install() - Simplify error messages for Playwright installation failures - Use sys.executable for more accurate Python path in error messages - Add --with-deps flag to Playwright install command	2025-01-01 15:33:36 +08:00
UncleCode	704bd66b63	Uphrade plawyright installation command to install dependencies	2025-01-01 15:23:16 +08:00
UncleCode	1acc162c18	Bumb version v0.4.241	2025-01-01 15:16:06 +08:00
UncleCode	553c97a0c1	Fix bug reported in issue https://github.com/unclecode/crawl4ai/issues/396	2025-01-01 15:15:14 +08:00
UncleCode	bd66befcf0	Fix issue in 0.4.24 walkthrough	2024-12-31 21:07:58 +08:00
UncleCode	19b0a5ae82	Update 0.4.24 walkthrough	2024-12-31 21:01:46 +08:00
UncleCode	bd71f7f4ea	Add 0.4.24 walkthrough	2024-12-31 20:22:33 +08:00
UncleCode	171ce25ba6	Fixe typo in CHANGELOG	2024-12-31 19:49:00 +08:00
UncleCode	6c5a44f774	chore: bump version to 0.4.25	2024-12-31 19:45:48 +08:00
UncleCode	5c3c05bf93	docs: update README badges and Docker section, reorganize documentation structure	2024-12-31 19:45:02 +08:00
UncleCode	67d0999bc3	chore: resolve merge conflicts for v0.4.24 v0.4.24	2024-12-31 19:24:03 +08:00
UncleCode	553a4622bf	chore: prepare for version 0.4.24	2024-12-31 19:18:36 +08:00
UncleCode	6f81ef006d	Remove .local folder from remote repository	2024-12-31 17:37:50 +08:00
UncleCode	a04870a662	Remove .do folder	2024-12-31 17:37:14 +08:00
UncleCode	f7d26390c5	Remove .do folder	2024-12-31 17:36:22 +08:00
UncleCode	141783fb2d	Remove .do folder from remote repository	2024-12-31 17:35:57 +08:00
UncleCode	2fedd4876e	Update gitignore	2024-12-31 17:35:34 +08:00
UncleCode	e187b0aaf0	update gitignore	2024-12-31 17:34:31 +08:00
UncleCode	e95374d7c6	Delete .do/deploy.template.yaml (#394 )	2024-12-31 17:33:59 +08:00
UncleCode	8f2d0cda2f	Remove .do folder from remote	2024-12-31 17:32:55 +08:00
UncleCode	9d261d2b9c	Recreate .do folder with temporary file	2024-12-31 17:32:44 +08:00
UncleCode	7792fe0e4c	Recreate .do folder for removal	2024-12-31 17:31:51 +08:00
UncleCode	86259244e4	Add ".do" to gitignore	2024-12-31 17:30:09 +08:00
UncleCode	0ec593fa90	Update the Tutorial section for new document version	2024-12-31 17:27:31 +08:00
UncleCode	7391d6be73	Update README.md (#390 )	2024-12-30 21:24:43 +08:00
UncleCode	e4e23065f1	Update README.md (#389 )	2024-12-30 21:24:06 +08:00
UncleCode	fb33a24891	Commit Message: - Added examples for Amazon product data extraction methods - Updated configuration options and enhance documentation - Minor refactoring for improved performance and readability - Cleaned up version control settings.	2024-12-29 20:05:18 +08:00
Robin Singh	78768fd714	Update simple-crawling.md (#379 ) In the comprehensive example, AttributeError: type object 'CacheMode' has no attribute 'ENABLE'. Did you mean: 'ENABLED'?	2024-12-27 17:42:59 +08:00
UncleCode	f2d9912697	Renames browser_config param to config in AsyncWebCrawler Standardizes parameter naming convention across the codebase by renaming browser_config to the more concise config in AsyncWebCrawler constructor. Updates all documentation examples and internal usages to reflect the new parameter name for consistency. Also improves hook execution by adding url/response parameters to goto hooks and fixes parameter ordering in before_return_html hook.	2024-12-26 16:34:36 +08:00
UncleCode	9a4ed6bbd7	Commit Message: Enhance crawler capabilities and documentation - Added SSL certificate extraction in AsyncWebCrawler. - Introduced new content filters and chunking strategies for more robust data extraction. - Updated documentation management to streamline user experience.	2024-12-26 15:17:07 +08:00
UncleCode	d5ed451299	Enhance crawler capabilities and documentation - Add llm.txt generator - Added SSL certificate extraction in AsyncWebCrawler. - Introduced new content filters and chunking strategies for more robust data extraction. - Updated documentation.	2024-12-25 21:34:31 +08:00
Haopeng138	bacbeb3ed4	Fix #340 example llm_extraction (#358 ) @Haopeng138 Thank you so much. They are still part of the library. I forgot to update them since I moved the asynchronous versions years ago. I really appreciate it. I have to say that I feel weak in the documentation. That's why I spent a lot of time on it last week. Now, when you mention some of the things in the example folder, I realize I forgot about the example folder. I'll try to update it more. If you find anything else, please help and support. Thank you. I will add your name to contributor name as well.	2024-12-24 19:56:07 +08:00
UncleCode	84b311760f	Commit Message: Enhance Crawl4AI with CLI and documentation updates - Implemented Command-Line Interface (CLI) in `crawl4ai/cli.py` - Added chunking strategies and their documentation in `llm.txt`	2024-12-21 14:26:56 +08:00
UncleCode	8fbc2e0463	Refactor deployment configuration and enhance browser debugging options	2024-12-20 20:35:28 +08:00
UncleCode	849765712f	Enhance Crawl4AI with new features and documentation - Fix crawler text mode for improved performance; cover missing `srcset` and `data_srcset` attributes in image tags. - Introduced Managed Browsers for enhanced crawling experience. - Updated documentation for clearer navigation on configuration. - Changed 'text_only' to 'text_mode' in configuration and methods. - Improved performance and relevance in content filtering strategies.	2024-12-19 21:02:29 +08:00
UncleCode	393bb911c0	Enhance crawler strategies with new features - ReImplemented JsonXPathExtractionStrategy for enhanced JSON data extraction. - Updated existing extraction strategies for better performance. - Improved handling of response status codes during crawls.	2024-12-17 22:40:10 +08:00
UncleCode	4a5f1aebee	Bump version to 0.4.23	2024-12-16 18:53:11 +08:00
UncleCode	a11d9646e3	Enhance crawler features and improve documentation - Added detailed CrawlerRunConfig parameters documentation. - Introduced plans for real-time event-driven crawling. - Updated async logger default level to DEBUG for better insights. - Improved structure and readability in configuration file. - Enhanced documentation on future capabilities in new blog entries.	2024-12-16 18:52:51 +08:00
UncleCode	ed7bc1909c	Bump version to 0.4.22	2024-12-15 19:49:38 +08:00
UncleCode	e9e5b5642d	Fix js_snipprt issue 0.4.21 bump to 0.4.22	2024-12-15 19:49:30 +08:00
UncleCode	7524aa7b5e	Feature: Add Markdown generation to CrawlerRunConfig - Added markdown generator parameter to CrawlerRunConfig in `async_configs.py`. - Implemented logic for Markdown generation in content scraping in `async_webcrawler.py`. - Updated version number to 0.4.21 in `__version__.py`.	2024-12-13 21:51:38 +08:00
UncleCode	7af1d32ef6	Update README for version 0.4.2: Reflect new features and enhancements	2024-12-12 20:18:44 +08:00
UncleCode	399af801a1	Merge branch 'next'	2024-12-12 20:17:27 +08:00

1 2 3 4 5 ...

508 Commits