unclecode
57a00ec677
Update Readme
2024-06-07 16:25:30 +08:00
unclecode
aeb2114170
Add example of REST API call
2024-06-07 16:24:40 +08:00
unclecode
b8d405fddd
Update version number in landing page header
2024-06-07 16:19:30 +08:00
unclecode
b32013cb97
Fix README file hyperlink
2024-06-07 15:37:05 +08:00
unclecode
226a62a3c0
feat: Add screenshot functionality to crawl_urls
2024-06-07 15:33:15 +08:00
unclecode
8e73a482a2
feat: Add screenshot functionality to crawl_urls
...
The code changes in this commit add the `screenshot` parameter to the `crawl_urls` function in `main.py`. This allows users to specify whether they want to take a screenshot of the page during the crawling process. The default value is `False`.
This commit message follows the established convention of starting with a type (feat for feature) and providing a concise and descriptive summary of the changes made.
2024-06-07 15:23:32 +08:00
unclecode
0533aeb814
v0.2.3:
...
- Extract all media tags
- Take screenshot of the page
2024-06-07 15:23:13 +08:00
unclecode
aead6de888
Merge branch 'main' of https://github.com/unclecode/crawl4ai into extract-media
2024-06-07 13:41:48 +08:00
UncleCode
8d82fd4cfe
Merge pull request #14 from gkhngyk/main
...
Update README.md
2024-06-07 13:30:10 +08:00
Gökhan Geyik
8f44db6499
Update README.md
2024-06-05 17:16:02 +03:00
unclecode
c7553b1280
Update research assistant example with package installation instructions
2024-06-04 23:18:19 +08:00
unclecode
8b8683f22e
Add research assistant example using Chainlit
2024-06-04 22:43:09 +08:00
unclecode
774ace6e3b
Update html page for tutorial.
2024-06-02 18:00:53 +08:00
unclecode
4a8f91a0fc
Set bypass_cached to True
2024-06-02 16:12:25 +08:00
unclecode
18c9784b61
Update index.html (hide extract block check box)
2024-06-02 16:09:20 +08:00
unclecode
e5d401c67c
Update generated code sample
2024-06-02 16:06:43 +08:00
unclecode
ae77589a98
Update Readme
2024-06-02 15:42:13 +08:00
unclecode
ad373c0e19
Update Readme
2024-06-02 15:41:24 +08:00
unclecode
51f26d12fe
Update for v0.2.2
...
- Support multiple JS scripts
- Fixed some of bugs
- Resolved a few issue relevant to Colab installation
2024-06-02 15:40:18 +08:00
unclecode
f1b60b2016
chore: Update ONNX model loading process
2024-05-31 18:07:05 +08:00
UncleCode
8c2dc2b1e4
Create Dockerfile
2024-05-29 17:56:57 +08:00
UncleCode
dc9a44c12a
Update and rename Dockerfile to Dockerfile-version-0
2024-05-29 17:56:34 +08:00
UncleCode
d9753b6349
Update requirements.txt
...
Remove tokenizer version from requirements.txt
2024-05-24 14:49:48 +08:00
UncleCode
a554c0b143
Update requirements.txt
2024-05-23 12:52:31 +08:00
UncleCode
7381fa95e6
Merge pull request #3 from QIN2DIM/main
...
fix(main): UnicodeDecodeError
2024-05-23 09:29:28 +08:00
Unclecode
53d1176d53
chore: Update extraction strategy to support GPU, MPS, and CPU, add batch processing for CPU devices
2024-05-19 16:18:58 +00:00
unclecode
52c4be0696
Update setup.py version to 0.2.1
v0.2.1
2024-05-19 22:30:59 +08:00
unclecode
13a3b21d19
- Add ONNX embedding model for CPU devices, Update the similarithy threshold, improve the embedding speed.
2024-05-19 22:30:10 +08:00
QIN2DIM
5cee084340
fix(main): UnicodeDecodeError
...
File "T:\_GitHubProjects\Forks\crawl4ai\main.py", line 70, in read_index
partials[filename[:-5]] = file.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 149: illegal multibyte sequence
2024-05-18 23:31:11 +08:00
Unclecode
bf00c26a83
chore: Update Dockerfile to install chromium-chromedriver and spacy library
2024-05-18 09:16:52 +00:00
unclecode
3846648c12
chore: Update extraction strategy to support GPU, MPS, and CPU, add batch procesing for CPU devices
2024-05-18 15:42:19 +08:00
unclecode
eb6423875f
chore: Update Selenium options in crawler_strategy.py and add verbose logging in CosineStrategy
2024-05-18 14:13:06 +08:00
unclecode
e3524a10a7
chore: Update REST API base URL in README.md
2024-05-17 23:28:29 +08:00
unclecode
468dad6169
chore: Update Dockerfile to install chromium-chromedriver and spacy library
2024-05-17 23:15:39 +08:00
UncleCode
bc27982992
Update setup.py Handle Spacy installation
2024-05-17 22:11:00 +08:00
UncleCode
57e5decb55
Update requirements.txt
2024-05-17 22:02:08 +08:00
unclecode
b6319c6f6e
chore: Add support for GPU, MPS, and CPU
2024-05-17 21:56:13 +08:00
UncleCode
0a902f562f
Update requirements.txt Add Spacy
2024-05-17 21:41:35 +08:00
UncleCode
454135856e
Update extraction_strategy.py Support GPU, MPS, and CPU
2024-05-17 21:40:48 +08:00
UncleCode
33fddc27ad
Update model loader to support GPU, MPS, and CPU
2024-05-17 21:39:22 +08:00
unclecode
ce052a4eb5
Update README
2024-05-17 18:29:59 +08:00
unclecode
b43d77a56b
Update README
2024-05-17 18:28:39 +08:00
unclecode
1635a92218
chore: Update Crawl4AI quickstart script in README.md
2024-05-17 18:25:32 +08:00
unclecode
2a8a1b27e1
chore: Update Readme
2024-05-17 18:24:47 +08:00
unclecode
f5f3cce2c8
Merge new-release-0.0.2-no-spacy into main for v0.2.0 release
v0.2.0
2024-05-17 18:23:27 +08:00
unclecode
a085e6315b
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-05-17 18:21:02 +08:00
unclecode
a8d600a3b4
chore: Add test_pad.py, requirements0.txt, and a.txt to .gitignore
v0.1.0
2024-05-17 18:13:43 +08:00
unclecode
6f96dcd649
chore: Update README
2024-05-17 18:12:50 +08:00
unclecode
957a2458b1
chore: Update web crawler URLs to use NBC News business section
2024-05-17 18:11:13 +08:00
unclecode
36e46be23d
chore: Add verbose option to ExtractionStrategy classes
...
This commit adds a new `verbose` option to the `ExtractionStrategy` classes. The `verbose` option allows for logging of extraction details, such as the number of extracted blocks and the URL being processed. This improves the debugging and monitoring capabilities of the code.
2024-05-17 18:06:10 +08:00