Commit Graph

  • a0dff192ae Update README for speed example unclecode 2024-06-24 23:06:12 +08:00
  • 1fffeeedd2 Update Readme: Showcase the speed unclecode 2024-06-24 23:02:08 +08:00
  • f51b078042 Update reame example. unclecode 2024-06-24 22:54:29 +08:00
  • b6023a51fb Add star chart unclecode 2024-06-24 22:47:46 +08:00
  • 7e95c38acb Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-24 14:40:48 +00:00
  • 78cfad8b2f chore: Update version to 0.2.7 and improve extraction function speed v0.2.7 unclecode 2024-06-24 22:39:56 +08:00
  • c697bf23e4 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-22 16:37:27 +00:00
  • b951d34ed0 chore: Update fetch URL to use HTTPS Unclecode 2024-06-22 16:37:21 +00:00
  • 68b3dff74a Update CSS unclecode 2024-06-23 00:36:03 +08:00
  • bfc4abd6e8 Update documents unclecode 2024-06-22 20:57:03 +08:00
  • c8a10dc455 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-22 12:54:41 +00:00
  • 8c77a760fc Fixed: - Redirect "/" to mkdocs unclecode 2024-06-22 20:54:32 +08:00
  • 9e0ded8da0 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-22 12:41:52 +00:00
  • b9bf8ac9d7 Fix mounting the "/" to mkdocs site folder unclecode 2024-06-22 20:41:39 +08:00
  • 48c27899b7 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-22 12:38:14 +00:00
  • d6182bedd7 chore: - Add demo page to the new mkdocs - Set website home page to mkdocs unclecode 2024-06-22 20:36:01 +08:00
  • 2217904876 Update .gitignore unclecode 2024-06-22 18:12:12 +08:00
  • 2c2362b4d3 issue 19 is resolved - Update Dockerfile to install mkdocs and build documentation v0.2.6 unclecode 2024-06-22 17:18:00 +08:00
  • 612ed3fef2 chore: Update print statement to use markdown format unclecode 2024-06-21 19:10:13 +08:00
  • fb2a6d0d04 chore: Update documentation link in README.md unclecode 2024-06-21 18:05:18 +08:00
  • 19d3d39115 Update Marge the DOCS branch unclecode 2024-06-21 18:04:13 +08:00
  • 3c32b0abed Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-21 09:58:17 +00:00
  • c1413e6916 chore: Update documentation link in README.md docs unclecode 2024-06-21 17:57:47 +08:00
  • e7705e661a ADD MKDocs unclecode 2024-06-21 17:56:54 +08:00
  • 21b110bfd7 Update LLMExtractionStrategy to disable chunking if specified, Add example of summarization for a web page. unclecode 2024-06-19 19:03:35 +08:00
  • 1fcb573909 chore: Update table of contents in README.md unclecode 2024-06-19 18:53:22 +08:00
  • a215ec08d6 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-19 10:51:31 +00:00
  • 0f6c5f5453 chore: Update configuration values, create new example, and update Dockerfile and README unclecode 2024-06-19 18:50:58 +08:00
  • 350ca1511b chore: Update configuration values, create new example, and update Dockerfile and README unclecode 2024-06-19 18:48:20 +08:00
  • 539263a8ba chore: Update configuration values for chunk token threshold, overlap rate, and minimum word threshold. Create a new example for LLMExtraction Strategy, update Dockerfile, and README unclecode 2024-06-19 18:32:20 +08:00
  • 3f0e265baf Merge branch 'format-inline-tags' unclecode 2024-06-19 00:48:38 +08:00
  • 21e2538e57 Update quickstart.py unclecode 2024-06-19 00:37:53 +08:00
  • 5d3fef45f7 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-18 12:02:29 +00:00
  • 480902bd66 Update README unclecode 2024-06-18 20:02:21 +08:00
  • 853b9d59d8 feat: Add hooks for enhanced control over Selenium drivers unclecode 2024-06-18 20:00:51 +08:00
  • 6d04284c44 Merge branch 'hooks' unclecode 2024-06-18 19:53:50 +08:00
  • 4d43880cde Playing with different Docker settings to find the best one docker-test unclecode 2024-06-18 19:08:46 +08:00
  • 4a50781453 chore: Remove local and .files folders from .gitignore unclecode 2024-06-17 15:57:34 +08:00
  • 18561c55ce Remove .files folder from repository unclecode 2024-06-17 15:56:56 +08:00
  • 77da48050d chore: Add custom headers to LocalSeleniumCrawlerStrategy hooks unclecode 2024-06-17 15:50:03 +08:00
  • 9a97aacd85 chore: Add hooks for customizing the LocalSeleniumCrawlerStrategy unclecode 2024-06-17 15:37:18 +08:00
  • 52daf3936a Fix typo in README unclecode 2024-06-17 15:15:37 +08:00
  • 2f246d19f4 Enhancement: Replaced inline HTML tags with textual format for better LLM context handling #45 format-inline-tags unclecode 2024-06-17 15:14:56 +08:00
  • 413595542a Enhancement: Replaced inline HTML tags with textual format for better LLM context handling #24 unclecode 2024-06-17 15:14:34 +08:00
  • 42a5da854d Update version and change log. v0.2.4 unclecode 2024-06-17 14:47:58 +08:00
  • d1d83a6ef7 Fix issue #22: Use MD5 hash for caching HTML files to handle long URLs unclecode 2024-06-17 14:44:01 +08:00
  • 194050705d chore: Add pillow library to requirements.txt unclecode 2024-06-10 23:03:32 +08:00
  • 989f8c91c8 Update README unclecode 2024-06-08 18:50:35 +08:00
  • edba5fb5e9 Update README unclecode 2024-06-08 18:48:21 +08:00
  • faa1defa5c Update README unclecode 2024-06-08 18:47:23 +08:00
  • 77df6db453 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-08 10:38:10 +00:00
  • f7e0cee1b0 vital: Right now, only raw html is retrived from datbase, therefore, css selector and other filter will be executed every time. unclecode 2024-06-08 18:37:40 +08:00
  • 2124652327 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-08 10:07:30 +00:00
  • b3a0edaa6d - User agent - Extract Links - Extract Metadata - Update Readme - Update REST API document unclecode 2024-06-08 17:59:42 +08:00
  • 255bde70c9 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-08 08:53:54 +00:00
  • 9c34b30723 Extract internal and external links. unclecode 2024-06-08 16:53:06 +08:00
  • 36a5847df5 Add css selector example unclecode 2024-06-07 20:47:20 +08:00
  • 04808b5dc9 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-07 12:44:41 +00:00
  • a19379aa58 Add recipe images, update README, and REST api example unclecode 2024-06-07 20:43:50 +08:00
  • 768d048e1c Update rest call how to use unclecode 2024-06-07 18:10:45 +08:00
  • 94c11a0262 Add image unclecode 2024-06-07 18:09:21 +08:00
  • b3a150f3d1 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-07 08:26:43 +00:00
  • 649b0bfd02 feat: Remove default checked state for bypass-cache-checkbox unclecode 2024-06-07 16:26:36 +08:00
  • de80a2da09 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-07 08:25:49 +00:00
  • 57a00ec677 Update Readme unclecode 2024-06-07 16:25:30 +08:00
  • df4cda8322 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-07 08:24:46 +00:00
  • aeb2114170 Add example of REST API call unclecode 2024-06-07 16:24:40 +08:00
  • 7717a3b948 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-07 08:19:37 +00:00
  • b8d405fddd Update version number in landing page header unclecode 2024-06-07 16:19:30 +08:00
  • a4a6b2075f Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-07 08:18:19 +00:00
  • b32013cb97 Fix README file hyperlink unclecode 2024-06-07 15:37:05 +08:00
  • 226a62a3c0 feat: Add screenshot functionality to crawl_urls extract-media unclecode 2024-06-07 15:33:15 +08:00
  • 8e73a482a2 feat: Add screenshot functionality to crawl_urls unclecode 2024-06-07 15:23:32 +08:00
  • 0533aeb814 v0.2.3: - Extract all media tags - Take screenshot of the page unclecode 2024-06-07 15:23:13 +08:00
  • aead6de888 Merge branch 'main' of https://github.com/unclecode/crawl4ai into extract-media unclecode 2024-06-07 13:41:48 +08:00
  • 8d82fd4cfe Merge pull request #14 from gkhngyk/main UncleCode 2024-06-07 13:30:10 +08:00
  • 8f44db6499 Update README.md Gökhan Geyik 2024-06-05 17:16:02 +03:00
  • c7553b1280 Update research assistant example with package installation instructions unclecode 2024-06-04 23:18:19 +08:00
  • 8b8683f22e Add research assistant example using Chainlit unclecode 2024-06-04 22:43:09 +08:00
  • 774ace6e3b Update html page for tutorial. unclecode 2024-06-02 18:00:53 +08:00
  • 4010558885 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-02 08:12:32 +00:00
  • 4a8f91a0fc Set bypass_cached to True unclecode 2024-06-02 16:12:25 +08:00
  • b0cf5076da Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-02 08:09:25 +00:00
  • 18c9784b61 Update index.html (hide extract block check box) unclecode 2024-06-02 16:09:20 +08:00
  • 0d6e9e37ca Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-02 08:06:56 +00:00
  • e5d401c67c Update generated code sample unclecode 2024-06-02 16:06:43 +08:00
  • 9b0f71ba88 Merge branch 'main' of https://github.com/unclecode/crawl4ai Unclecode 2024-06-02 07:56:00 +00:00
  • ae77589a98 Update Readme unclecode 2024-06-02 15:42:13 +08:00
  • ad373c0e19 Update Readme unclecode 2024-06-02 15:41:24 +08:00
  • 51f26d12fe Update for v0.2.2 - Support multiple JS scripts - Fixed some of bugs - Resolved a few issue relevant to Colab installation unclecode 2024-06-02 15:40:18 +08:00
  • f1b60b2016 chore: Update ONNX model loading process unclecode 2024-05-31 18:07:05 +08:00
  • 8c2dc2b1e4 Create Dockerfile UncleCode 2024-05-29 17:56:57 +08:00
  • dc9a44c12a Update and rename Dockerfile to Dockerfile-version-0 UncleCode 2024-05-29 17:56:34 +08:00
  • d9753b6349 Update requirements.txt UncleCode 2024-05-24 14:49:48 +08:00
  • a554c0b143 Update requirements.txt UncleCode 2024-05-23 12:52:31 +08:00
  • 7381fa95e6 Merge pull request #3 from QIN2DIM/main UncleCode 2024-05-23 09:29:28 +08:00
  • 6ddccc144c chore: Bump version to 0.2.2 in setup.py v0.2.2 Unclecode 2024-05-19 16:19:40 +00:00
  • 53d1176d53 chore: Update extraction strategy to support GPU, MPS, and CPU, add batch processing for CPU devices Unclecode 2024-05-19 16:18:58 +00:00
  • 52c4be0696 Update setup.py version to 0.2.1 v0.2.1 unclecode 2024-05-19 22:30:59 +08:00
  • 13a3b21d19 - Add ONNX embedding model for CPU devices, Update the similarithy threshold, improve the embedding speed. unclecode 2024-05-19 22:30:10 +08:00