This website requires JavaScript.
a0dff192ae
Update README for speed example
unclecode
2024-06-24 23:06:12 +08:00
1fffeeedd2
Update Readme: Showcase the speed
unclecode
2024-06-24 23:02:08 +08:00
f51b078042
Update reame example.
unclecode
2024-06-24 22:54:29 +08:00
b6023a51fb
Add star chart
unclecode
2024-06-24 22:47:46 +08:00
7e95c38acb
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-24 14:40:48 +00:00
78cfad8b2f
chore: Update version to 0.2.7 and improve extraction function speed
v0.2.7
unclecode
2024-06-24 22:39:56 +08:00
c697bf23e4
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-22 16:37:27 +00:00
b951d34ed0
chore: Update fetch URL to use HTTPS
Unclecode
2024-06-22 16:37:21 +00:00
68b3dff74a
Update CSS
unclecode
2024-06-23 00:36:03 +08:00
bfc4abd6e8
Update documents
unclecode
2024-06-22 20:57:03 +08:00
c8a10dc455
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-22 12:54:41 +00:00
8c77a760fc
Fixed: - Redirect "/" to mkdocs
unclecode
2024-06-22 20:54:32 +08:00
9e0ded8da0
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-22 12:41:52 +00:00
b9bf8ac9d7
Fix mounting the "/" to mkdocs site folder
unclecode
2024-06-22 20:41:39 +08:00
48c27899b7
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-22 12:38:14 +00:00
d6182bedd7
chore: - Add demo page to the new mkdocs - Set website home page to mkdocs
unclecode
2024-06-22 20:36:01 +08:00
2217904876
Update .gitignore
unclecode
2024-06-22 18:12:12 +08:00
2c2362b4d3
issue 19 is resolved - Update Dockerfile to install mkdocs and build documentation
v0.2.6
unclecode
2024-06-22 17:18:00 +08:00
612ed3fef2
chore: Update print statement to use markdown format
unclecode
2024-06-21 19:10:13 +08:00
fb2a6d0d04
chore: Update documentation link in README.md
unclecode
2024-06-21 18:05:18 +08:00
19d3d39115
Update Marge the DOCS branch
unclecode
2024-06-21 18:04:13 +08:00
3c32b0abed
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-21 09:58:17 +00:00
c1413e6916
chore: Update documentation link in README.md
docs
unclecode
2024-06-21 17:57:47 +08:00
e7705e661a
ADD MKDocs
unclecode
2024-06-21 17:56:54 +08:00
21b110bfd7
Update LLMExtractionStrategy to disable chunking if specified, Add example of summarization for a web page.
unclecode
2024-06-19 19:03:35 +08:00
1fcb573909
chore: Update table of contents in README.md
unclecode
2024-06-19 18:53:22 +08:00
a215ec08d6
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-19 10:51:31 +00:00
0f6c5f5453
chore: Update configuration values, create new example, and update Dockerfile and README
unclecode
2024-06-19 18:50:58 +08:00
350ca1511b
chore: Update configuration values, create new example, and update Dockerfile and README
unclecode
2024-06-19 18:48:20 +08:00
539263a8ba
chore: Update configuration values for chunk token threshold, overlap rate, and minimum word threshold. Create a new example for LLMExtraction Strategy, update Dockerfile, and README
unclecode
2024-06-19 18:32:20 +08:00
3f0e265baf
Merge branch 'format-inline-tags'
unclecode
2024-06-19 00:48:38 +08:00
21e2538e57
Update quickstart.py
unclecode
2024-06-19 00:37:53 +08:00
5d3fef45f7
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-18 12:02:29 +00:00
480902bd66
Update README
unclecode
2024-06-18 20:02:21 +08:00
853b9d59d8
feat: Add hooks for enhanced control over Selenium drivers
unclecode
2024-06-18 20:00:51 +08:00
6d04284c44
Merge branch 'hooks'
unclecode
2024-06-18 19:53:50 +08:00
4d43880cde
Playing with different Docker settings to find the best one
docker-test
unclecode
2024-06-18 19:08:46 +08:00
4a50781453
chore: Remove local and .files folders from .gitignore
unclecode
2024-06-17 15:57:34 +08:00
18561c55ce
Remove .files folder from repository
unclecode
2024-06-17 15:56:56 +08:00
77da48050d
chore: Add custom headers to LocalSeleniumCrawlerStrategy
hooks
unclecode
2024-06-17 15:50:03 +08:00
9a97aacd85
chore: Add hooks for customizing the LocalSeleniumCrawlerStrategy
unclecode
2024-06-17 15:37:18 +08:00
52daf3936a
Fix typo in README
unclecode
2024-06-17 15:15:37 +08:00
2f246d19f4
Enhancement: Replaced inline HTML tags with textual format for better LLM context handling #45
format-inline-tags
unclecode
2024-06-17 15:14:56 +08:00
413595542a
Enhancement: Replaced inline HTML tags with textual format for better LLM context handling #24
unclecode
2024-06-17 15:14:34 +08:00
42a5da854d
Update version and change log.
v0.2.4
unclecode
2024-06-17 14:47:58 +08:00
d1d83a6ef7
Fix issue #22 : Use MD5 hash for caching HTML files to handle long URLs
unclecode
2024-06-17 14:44:01 +08:00
194050705d
chore: Add pillow library to requirements.txt
unclecode
2024-06-10 23:03:32 +08:00
989f8c91c8
Update README
unclecode
2024-06-08 18:50:35 +08:00
edba5fb5e9
Update README
unclecode
2024-06-08 18:48:21 +08:00
faa1defa5c
Update README
unclecode
2024-06-08 18:47:23 +08:00
77df6db453
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-08 10:38:10 +00:00
f7e0cee1b0
vital: Right now, only raw html is retrived from datbase, therefore, css selector and other filter will be executed every time.
unclecode
2024-06-08 18:37:40 +08:00
2124652327
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-08 10:07:30 +00:00
b3a0edaa6d
- User agent - Extract Links - Extract Metadata - Update Readme - Update REST API document
unclecode
2024-06-08 17:59:42 +08:00
255bde70c9
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-08 08:53:54 +00:00
9c34b30723
Extract internal and external links.
unclecode
2024-06-08 16:53:06 +08:00
36a5847df5
Add css selector example
unclecode
2024-06-07 20:47:20 +08:00
04808b5dc9
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-07 12:44:41 +00:00
a19379aa58
Add recipe images, update README, and REST api example
unclecode
2024-06-07 20:43:50 +08:00
768d048e1c
Update rest call how to use
unclecode
2024-06-07 18:10:45 +08:00
94c11a0262
Add image
unclecode
2024-06-07 18:09:21 +08:00
b3a150f3d1
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-07 08:26:43 +00:00
649b0bfd02
feat: Remove default checked state for bypass-cache-checkbox
unclecode
2024-06-07 16:26:36 +08:00
de80a2da09
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-07 08:25:49 +00:00
57a00ec677
Update Readme
unclecode
2024-06-07 16:25:30 +08:00
df4cda8322
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-07 08:24:46 +00:00
aeb2114170
Add example of REST API call
unclecode
2024-06-07 16:24:40 +08:00
7717a3b948
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-07 08:19:37 +00:00
b8d405fddd
Update version number in landing page header
unclecode
2024-06-07 16:19:30 +08:00
a4a6b2075f
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-07 08:18:19 +00:00
b32013cb97
Fix README file hyperlink
unclecode
2024-06-07 15:37:05 +08:00
226a62a3c0
feat: Add screenshot functionality to crawl_urls
extract-media
unclecode
2024-06-07 15:33:15 +08:00
8e73a482a2
feat: Add screenshot functionality to crawl_urls
unclecode
2024-06-07 15:23:32 +08:00
0533aeb814
v0.2.3: - Extract all media tags - Take screenshot of the page
unclecode
2024-06-07 15:23:13 +08:00
aead6de888
Merge branch 'main' of https://github.com/unclecode/crawl4ai into extract-media
unclecode
2024-06-07 13:41:48 +08:00
8d82fd4cfe
Merge pull request #14 from gkhngyk/main
UncleCode
2024-06-07 13:30:10 +08:00
8f44db6499
Update README.md
Gökhan Geyik
2024-06-05 17:16:02 +03:00
c7553b1280
Update research assistant example with package installation instructions
unclecode
2024-06-04 23:18:19 +08:00
8b8683f22e
Add research assistant example using Chainlit
unclecode
2024-06-04 22:43:09 +08:00
774ace6e3b
Update html page for tutorial.
unclecode
2024-06-02 18:00:53 +08:00
4010558885
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-02 08:12:32 +00:00
4a8f91a0fc
Set bypass_cached to True
unclecode
2024-06-02 16:12:25 +08:00
b0cf5076da
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-02 08:09:25 +00:00
18c9784b61
Update index.html (hide extract block check box)
unclecode
2024-06-02 16:09:20 +08:00
0d6e9e37ca
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-02 08:06:56 +00:00
e5d401c67c
Update generated code sample
unclecode
2024-06-02 16:06:43 +08:00
9b0f71ba88
Merge branch 'main' of https://github.com/unclecode/crawl4ai
Unclecode
2024-06-02 07:56:00 +00:00
ae77589a98
Update Readme
unclecode
2024-06-02 15:42:13 +08:00
ad373c0e19
Update Readme
unclecode
2024-06-02 15:41:24 +08:00
51f26d12fe
Update for v0.2.2 - Support multiple JS scripts - Fixed some of bugs - Resolved a few issue relevant to Colab installation
unclecode
2024-06-02 15:40:18 +08:00
f1b60b2016
chore: Update ONNX model loading process
unclecode
2024-05-31 18:07:05 +08:00
8c2dc2b1e4
Create Dockerfile
UncleCode
2024-05-29 17:56:57 +08:00
dc9a44c12a
Update and rename Dockerfile to Dockerfile-version-0
UncleCode
2024-05-29 17:56:34 +08:00
d9753b6349
Update requirements.txt
UncleCode
2024-05-24 14:49:48 +08:00
a554c0b143
Update requirements.txt
UncleCode
2024-05-23 12:52:31 +08:00
7381fa95e6
Merge pull request #3 from QIN2DIM/main
UncleCode
2024-05-23 09:29:28 +08:00
6ddccc144c
chore: Bump version to 0.2.2 in setup.py
v0.2.2
Unclecode
2024-05-19 16:19:40 +00:00
53d1176d53
chore: Update extraction strategy to support GPU, MPS, and CPU, add batch processing for CPU devices
Unclecode
2024-05-19 16:18:58 +00:00
52c4be0696
Update setup.py version to 0.2.1
v0.2.1
unclecode
2024-05-19 22:30:59 +08:00
13a3b21d19
- Add ONNX embedding model for CPU devices, Update the similarithy threshold, improve the embedding speed.
unclecode
2024-05-19 22:30:10 +08:00