Commit Graph

  • 141783fb2d Remove .do folder from remote repository UncleCode 2024-12-31 17:35:57 +08:00
  • 2fedd4876e Update gitignore UncleCode 2024-12-31 17:35:34 +08:00
  • e187b0aaf0 update gitignore UncleCode 2024-12-31 17:34:31 +08:00
  • e95374d7c6 Delete .do/deploy.template.yaml (#394) UncleCode 2024-12-31 10:33:59 +01:00
  • 406702a77f Delete .do/deploy.template.yaml unclecode-patch-5 UncleCode 2024-12-31 17:33:39 +08:00
  • 8f2d0cda2f Remove .do folder from remote UncleCode 2024-12-31 17:32:55 +08:00
  • 9d261d2b9c Recreate .do folder with temporary file UncleCode 2024-12-31 17:32:44 +08:00
  • 7792fe0e4c Recreate .do folder for removal UncleCode 2024-12-31 17:31:51 +08:00
  • 86259244e4 Add ".do" to gitignore UncleCode 2024-12-31 17:30:09 +08:00
  • 0ec593fa90 Update the Tutorial section for new document version UncleCode 2024-12-31 17:27:31 +08:00
  • 7391d6be73 Update README.md (#390) UncleCode 2024-12-30 14:24:43 +01:00
  • 494ee32619 Update README.md unclecode-patch-4 UncleCode 2024-12-30 21:24:30 +08:00
  • e4e23065f1 Update README.md (#389) UncleCode 2024-12-30 14:24:06 +01:00
  • 8a4952c128 Update README.md unclecode-patch-3 UncleCode 2024-12-30 21:23:19 +08:00
  • fb33a24891 Commit Message: - Added examples for Amazon product data extraction methods - Updated configuration options and enhance documentation - Minor refactoring for improved performance and readability - Cleaned up version control settings. UncleCode 2024-12-29 20:05:18 +08:00
  • 78768fd714 Update simple-crawling.md (#379) Robin Singh 2024-12-27 09:42:59 +00:00
  • f2d9912697 Renames browser_config param to config in AsyncWebCrawler UncleCode 2024-12-26 16:34:36 +08:00
  • 9a4ed6bbd7 Commit Message: Enhance crawler capabilities and documentation UncleCode 2024-12-26 15:17:07 +08:00
  • d5ed451299 Enhance crawler capabilities and documentation - Add llm.txt generator - Added SSL certificate extraction in AsyncWebCrawler. - Introduced new content filters and chunking strategies for more robust data extraction. - Updated documentation. UncleCode 2024-12-25 21:34:31 +08:00
  • d97a075082 Delete a.md unclecode-patch-2 UncleCode 2024-12-25 19:43:39 +08:00
  • bacbeb3ed4 Fix #340 example llm_extraction (#358) Haopeng138 2024-12-24 12:56:07 +01:00
  • 84b311760f Commit Message: Enhance Crawl4AI with CLI and documentation updates - Implemented Command-Line Interface (CLI) in crawl4ai/cli.py - Added chunking strategies and their documentation in llm.txt UncleCode 2024-12-21 14:26:56 +08:00
  • 8fbc2e0463 Refactor deployment configuration and enhance browser debugging options UncleCode 2024-12-20 20:35:28 +08:00
  • 849765712f Enhance Crawl4AI with new features and documentation UncleCode 2024-12-19 21:02:29 +08:00
  • 7a5f83b76f fix: Added browser config and crawler run config from 0.4.22 Aravind Karnam 2024-12-18 10:33:09 +05:30
  • 393bb911c0 Enhance crawler strategies with new features - ReImplemented JsonXPathExtractionStrategy for enhanced JSON data extraction. - Updated existing extraction strategies for better performance. - Improved handling of response status codes during crawls. UncleCode 2024-12-17 22:40:10 +08:00
  • 7c0fa269a6 Merge pull request #9 from aravindkarnam/main aravind 2024-12-17 18:43:36 +05:30
  • 4a5f1aebee Bump version to 0.4.23 UncleCode 2024-12-16 18:53:11 +08:00
  • a11d9646e3 Enhance crawler features and improve documentation UncleCode 2024-12-16 18:52:51 +08:00
  • ed7bc1909c Bump version to 0.4.22 UncleCode 2024-12-15 19:49:38 +08:00
  • e9e5b5642d Fix js_snipprt issue 0.4.21 bump to 0.4.22 UncleCode 2024-12-15 19:49:30 +08:00
  • 7524aa7b5e Feature: Add Markdown generation to CrawlerRunConfig UncleCode 2024-12-13 21:51:38 +08:00
  • b1ac4fe023 Merge branch 'main' into ssh-server ssh-server Unclecode 2024-12-12 12:25:26 +00:00
  • a3c92141a1 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-12-12 12:25:01 +00:00
  • 3fd777dd6f remove crawl endpoints Unclecode 2024-12-12 12:24:13 +00:00
  • 7af1d32ef6 Update README for version 0.4.2: Reflect new features and enhancements 0.4.2 UncleCode 2024-12-12 20:18:44 +08:00
  • 399af801a1 Merge branch 'next' UncleCode 2024-12-12 20:17:27 +08:00
  • 4a72c5ea6e Add release notes and documentation for version 0.4.2: Configurable Crawlers, Session Management, and Enhanced Screenshot/PDF features UncleCode 2024-12-12 20:15:50 +08:00
  • 20d6f5fdf4 Merge branch 'main' of https://github.com/unclecode/crawl4ai UncleCode 2024-12-12 19:58:01 +08:00
  • 3d69715dba chore: Update .gitignore to include new files and directories UncleCode 2024-12-12 19:57:59 +08:00
  • de1766d565 Bump version to 0.4.2 UncleCode 2024-12-12 19:35:30 +08:00
  • 0982c639ae Enhance AsyncWebCrawler and related configurations UncleCode 2024-12-12 19:35:09 +08:00
  • 5188b7a6a0 Add full-page screenshot and PDF export features - Introduced a new approach for capturing full-page screenshots by exporting them as PDFs first, enhancing reliability and performance. - Added documentation for the feature in docs/examples/full_page_screenshot_and_pdf_export.md. - Refactored perform_completion_with_backoff in crawl4ai/utils.py to include necessary extra parameters. - Updated quickstart_async.py to utilize LLM extraction with refined arguments. UncleCode 2024-12-10 20:59:31 +08:00
  • 759164831d Update async_webcrawler.py (#337) lvzhengri 2024-12-10 20:56:52 +08:00
  • 5431fa2d0c Add PDF & screenshot functionality, new tutorial UncleCode 2024-12-10 20:10:39 +08:00
  • e130fd8db9 Implement new async crawler features and stability updates UncleCode 2024-12-10 17:55:29 +08:00
  • ded554d334 Fixed typo (#324) Mohammed 2024-12-09 07:17:43 -05:00
  • aadbcb3481 fix: Improve image loading handling by adding timeout for wait_for_function in AsyncPlaywrightCrawlerStrategy 0.4.1 UncleCode 2024-12-09 20:06:29 +08:00
  • 2d31915f0a Commit Message: Enhance Async Crawler with storage state handling - Updated Async Crawler to support storage state management. - Added error handling for URL validation in Async Web Crawler. - Modified README logo and improved .gitignore entries. - Fixed issues in multiple files for better code robustness. UncleCode 2024-12-09 20:04:59 +08:00
  • ba3e808802 fix: The extract method logs output only when self.verbose is set to True. (#314) lu4nx 2024-12-09 17:19:26 +08:00
  • e3488da194 fixing Readmen tap (#313) Olavo Henrique Marques Peixoto 2024-12-09 03:34:52 -03:00
  • d7200138a0 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-12-08 12:06:53 +00:00
  • 740214e021 Merge branch 'next' UncleCode 2024-12-08 20:06:36 +08:00
  • c51e901f68 feat: Enhance AsyncPlaywrightCrawlerStrategy with text-only and light modes, dynamic viewport adjustment, and session management UncleCode 2024-12-08 20:04:44 +08:00
  • 8c611dcb4b Refactored web scraping components UncleCode 2024-12-05 22:33:47 +08:00
  • be37abe05a Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-12-04 12:31:45 +00:00
  • 90ba51b52f fix(mkdocs): correct typo in Docker Deployment navigation entry Unclecode 2024-12-04 12:31:41 +00:00
  • a45b8b1eb1 Merge issues with 0.4.0 is over UncleCode 2024-12-04 20:29:25 +08:00
  • 56f82f3e7f Merge branch 'next' UncleCode 2024-12-04 20:27:35 +08:00
  • 486db3a771 Updated to version 0.4.0 with new features - Enhanced error handling in async crawler. - Added flexible options in Markdown generation. - Updated user agent settings for improved reliability. - Reflected changes in documentation and examples. 0.4.0 UncleCode 2024-12-04 20:26:39 +08:00
  • b02544bc0b docs: update README and blog for version 0.4.0 release, highlighting new features and improvements UncleCode 2024-12-03 21:28:52 +08:00
  • e9639ad189 refactor: improve error handling in DataProcessor and optimize data parsing logic UncleCode 2024-12-03 19:44:38 +08:00
  • 95a4f74d2a fix: pass logger to WebScrapingStrategy and update score computation in PruningContentFilter UncleCode 2024-12-02 20:37:28 +08:00
  • 293f299c08 Add PruningContentFilter with unit tests and update documentation unclecode 2024-12-01 19:17:33 +08:00
  • 80d58ad24c bump version to 0.3.747 UncleCode 2024-11-30 22:00:15 +08:00
  • 3e83893b3f Enhance User-Agent Handling UncleCode 2024-11-30 18:13:12 +08:00
  • 8c76a8c7dc docs: add contributor entry for dvschuyl regarding AsyncPlaywrightCrawlerStrategy issue v0.3.746 UncleCode 2024-11-29 21:14:49 +08:00
  • 0780db55e1 fix: handle errors during image dimension updates in AsyncPlaywrightCrawlerStrategy UncleCode 2024-11-29 21:12:19 +08:00
  • 1ed7c15118 🩹 Page-evaluate navigation destroyed error (#304) dvschuyl 2024-11-29 14:06:04 +01:00
  • 569bdb6073 Merge branch 'next' UncleCode 2024-11-29 20:54:28 +08:00
  • 1def53b7fe docs: update Raspberry Pi section to indicate upcoming support UncleCode 2024-11-29 20:53:43 +08:00
  • f9c98a377d Enhance Docker support and improve installation process - Added new Docker commands for platform-specific builds. - Updated README with comprehensive installation and setup instructions. - Introduced post_install method in setup script for automation. - Refined migration processes with enhanced error logging. - Bump version to 0.3.746 and updated dependencies. UncleCode 2024-11-29 20:52:51 +08:00
  • 93bf3e8a1f Refactor Dockerfile and clean up main.py - Enhanced Dockerfile for platform-specific installations - Added ARG for TARGETPLATFORM and BUILDPLATFORM - Improved GPU support conditional on TARGETPLATFORM - Removed static pages mounting in main.py - Streamlined code structure to improve maintainability UncleCode 2024-11-29 20:08:09 +08:00
  • d202f3539b Enhance installation and migration processes - Added a post-installation setup script for initialization. - Updated README with installation notes for Playwright setup. - Enhanced migration logging for better error visibility. - Added 'pydantic' to requirements. - Bumped version to 0.3.746. UncleCode 2024-11-29 18:48:44 +08:00
  • 12e73d4898 refactor: remove legacy build hooks and setup files, migrate to setup.cfg and pyproject.toml UncleCode 2024-11-29 16:01:19 +08:00
  • 449dd7cc0b Migrating from the classic setup.py to a using PyProject approach. unclecode 2024-11-29 14:45:04 +08:00
  • b0419edda6 Update README.md (#300) UncleCode 2024-11-29 02:31:17 +08:00
  • c0e87abaee fix: update package versions in requirements.txt for compatibility 0.3.745 UncleCode 2024-11-28 21:43:08 +08:00
  • c8485776fe docs: update README to reflect latest version v0.3.745 v0.3.745 UncleCode 2024-11-28 20:04:16 +08:00
  • aa3e2d0fe6 Merge branch 'main' of https://github.com/unclecode/crawl4ai UncleCode 2024-11-28 20:03:43 +08:00
  • 98c64f9d5f Merge branch 'next' UncleCode 2024-11-28 20:03:11 +08:00
  • 7d81c17cca fix: improve handling of CRAWL4_AI_BASE_DIRECTORY environment variable in setup.py UncleCode 2024-11-28 20:02:39 +08:00
  • 652d396a81 chore: update version to 0.3.745 UncleCode 2024-11-28 20:00:29 +08:00
  • 1d83c493af Enhance setup process and update contributors list - Acknowledge contributor paulokuong for fixing RAWL4_AI_BASE_DIRECTORY issue - Refine base directory handling in setup.py - Clarify Playwright installation instructions and improve error handling UncleCode 2024-11-28 19:58:40 +08:00
  • cf35cbe59e CRAWL4_AI_BASE_DIRECTORY should be Path object instead of string (#298) Paulo Kuong 2024-11-28 06:46:36 -05:00
  • 9221c08418 docs: fix link formatting for recent updates section in README UncleCode 2024-11-28 19:33:36 +08:00
  • 48d43c14b1 docs: fix link formatting for recent updates section in README UncleCode 2024-11-28 19:33:02 +08:00
  • 776efa74a4 docs: fix link formatting for recent updates section in README UncleCode 2024-11-28 19:32:32 +08:00
  • b14e83f499 docs: fix link formatting for recent updates section in README UncleCode 2024-11-28 19:31:09 +08:00
  • a9b6b65238 chore: update version to 0.3.744 and add publish.sh to .gitignore 0.3.744 UncleCode 2024-11-28 19:26:50 +08:00
  • a036b7f122 feat: implement create_box_message utility for formatted error messages and enhance error logging in AsyncWebCrawler UncleCode 2024-11-28 19:24:07 +08:00
  • 0bccf23db3 docs: update quickstart_async.py to enable example function calls for better demonstration UncleCode 2024-11-28 18:19:42 +08:00
  • 0cbd594512 Merge branch 'next' - Update README, and quickstart examples UncleCode 2024-11-28 16:43:16 +08:00
  • efe93a5f57 docs: enhance README with development TODOs and refine mission statement for clarity UncleCode 2024-11-28 16:41:11 +08:00
  • 3fda66b85b docs: refine README content for clarity and conciseness, improving descriptions and formatting UncleCode 2024-11-28 16:36:24 +08:00
  • ddfb6707b4 docs: update README to reflect new branding and improve section headings for clarity UncleCode 2024-11-28 16:34:08 +08:00
  • a69f7a9531 fix: correct typo in function documentation for clarity and accuracy UncleCode 2024-11-28 16:31:41 +08:00
  • d583aa43ca refactor: update cache handling in quickstart_async example to use CacheMode enum UncleCode 2024-11-28 15:53:25 +08:00
  • 3abb573142 docs: update README for version 0.3.743 with improved formatting and contributor acknowledgments UncleCode 2024-11-28 13:07:59 +08:00
  • d556dada9f docs: update README to keep details open for extraction capabilities, browser integration, input/output flexibility, utility & debugging, security & accessibility, community & documentation, and cutting-edge features UncleCode 2024-11-28 13:07:33 +08:00