unclecode
b8d405fddd
Update version number in landing page header
2024-06-07 16:19:30 +08:00
unclecode
b32013cb97
Fix README file hyperlink
2024-06-07 15:37:05 +08:00
unclecode
226a62a3c0
feat: Add screenshot functionality to crawl_urls
2024-06-07 15:33:15 +08:00
unclecode
8e73a482a2
feat: Add screenshot functionality to crawl_urls
...
The code changes in this commit add the `screenshot` parameter to the `crawl_urls` function in `main.py`. This allows users to specify whether they want to take a screenshot of the page during the crawling process. The default value is `False`.
This commit message follows the established convention of starting with a type (feat for feature) and providing a concise and descriptive summary of the changes made.
2024-06-07 15:23:32 +08:00
unclecode
0533aeb814
v0.2.3:
...
- Extract all media tags
- Take screenshot of the page
2024-06-07 15:23:13 +08:00
unclecode
aead6de888
Merge branch 'main' of https://github.com/unclecode/crawl4ai into extract-media
2024-06-07 13:41:48 +08:00
UncleCode
8d82fd4cfe
Merge pull request #14 from gkhngyk/main
...
Update README.md
2024-06-07 13:30:10 +08:00
Gökhan Geyik
8f44db6499
Update README.md
2024-06-05 17:16:02 +03:00
unclecode
c7553b1280
Update research assistant example with package installation instructions
2024-06-04 23:18:19 +08:00
unclecode
8b8683f22e
Add research assistant example using Chainlit
2024-06-04 22:43:09 +08:00
unclecode
774ace6e3b
Update html page for tutorial.
2024-06-02 18:00:53 +08:00
unclecode
4a8f91a0fc
Set bypass_cached to True
2024-06-02 16:12:25 +08:00
unclecode
18c9784b61
Update index.html (hide extract block check box)
2024-06-02 16:09:20 +08:00
unclecode
e5d401c67c
Update generated code sample
2024-06-02 16:06:43 +08:00
unclecode
ae77589a98
Update Readme
2024-06-02 15:42:13 +08:00
unclecode
ad373c0e19
Update Readme
2024-06-02 15:41:24 +08:00
unclecode
51f26d12fe
Update for v0.2.2
...
- Support multiple JS scripts
- Fixed some of bugs
- Resolved a few issue relevant to Colab installation
2024-06-02 15:40:18 +08:00
unclecode
f1b60b2016
chore: Update ONNX model loading process
2024-05-31 18:07:05 +08:00
UncleCode
8c2dc2b1e4
Create Dockerfile
2024-05-29 17:56:57 +08:00
UncleCode
dc9a44c12a
Update and rename Dockerfile to Dockerfile-version-0
2024-05-29 17:56:34 +08:00
UncleCode
d9753b6349
Update requirements.txt
...
Remove tokenizer version from requirements.txt
2024-05-24 14:49:48 +08:00
UncleCode
a554c0b143
Update requirements.txt
2024-05-23 12:52:31 +08:00
UncleCode
7381fa95e6
Merge pull request #3 from QIN2DIM/main
...
fix(main): UnicodeDecodeError
2024-05-23 09:29:28 +08:00
Unclecode
53d1176d53
chore: Update extraction strategy to support GPU, MPS, and CPU, add batch processing for CPU devices
2024-05-19 16:18:58 +00:00
unclecode
52c4be0696
Update setup.py version to 0.2.1
v0.2.1
2024-05-19 22:30:59 +08:00
unclecode
13a3b21d19
- Add ONNX embedding model for CPU devices, Update the similarithy threshold, improve the embedding speed.
2024-05-19 22:30:10 +08:00
QIN2DIM
5cee084340
fix(main): UnicodeDecodeError
...
File "T:\_GitHubProjects\Forks\crawl4ai\main.py", line 70, in read_index
partials[filename[:-5]] = file.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 149: illegal multibyte sequence
2024-05-18 23:31:11 +08:00
Unclecode
bf00c26a83
chore: Update Dockerfile to install chromium-chromedriver and spacy library
2024-05-18 09:16:52 +00:00
unclecode
3846648c12
chore: Update extraction strategy to support GPU, MPS, and CPU, add batch procesing for CPU devices
2024-05-18 15:42:19 +08:00
unclecode
eb6423875f
chore: Update Selenium options in crawler_strategy.py and add verbose logging in CosineStrategy
2024-05-18 14:13:06 +08:00
unclecode
e3524a10a7
chore: Update REST API base URL in README.md
2024-05-17 23:28:29 +08:00
unclecode
468dad6169
chore: Update Dockerfile to install chromium-chromedriver and spacy library
2024-05-17 23:15:39 +08:00
UncleCode
bc27982992
Update setup.py Handle Spacy installation
2024-05-17 22:11:00 +08:00
UncleCode
57e5decb55
Update requirements.txt
2024-05-17 22:02:08 +08:00
unclecode
b6319c6f6e
chore: Add support for GPU, MPS, and CPU
2024-05-17 21:56:13 +08:00
UncleCode
0a902f562f
Update requirements.txt Add Spacy
2024-05-17 21:41:35 +08:00
UncleCode
454135856e
Update extraction_strategy.py Support GPU, MPS, and CPU
2024-05-17 21:40:48 +08:00
UncleCode
33fddc27ad
Update model loader to support GPU, MPS, and CPU
2024-05-17 21:39:22 +08:00
unclecode
ce052a4eb5
Update README
2024-05-17 18:29:59 +08:00
unclecode
b43d77a56b
Update README
2024-05-17 18:28:39 +08:00
unclecode
1635a92218
chore: Update Crawl4AI quickstart script in README.md
2024-05-17 18:25:32 +08:00
unclecode
2a8a1b27e1
chore: Update Readme
2024-05-17 18:24:47 +08:00
unclecode
f5f3cce2c8
Merge new-release-0.0.2-no-spacy into main for v0.2.0 release
v0.2.0
2024-05-17 18:23:27 +08:00
unclecode
a085e6315b
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-05-17 18:21:02 +08:00
unclecode
a8d600a3b4
chore: Add test_pad.py, requirements0.txt, and a.txt to .gitignore
v0.1.0
2024-05-17 18:13:43 +08:00
unclecode
6f96dcd649
chore: Update README
2024-05-17 18:12:50 +08:00
unclecode
957a2458b1
chore: Update web crawler URLs to use NBC News business section
2024-05-17 18:11:13 +08:00
unclecode
36e46be23d
chore: Add verbose option to ExtractionStrategy classes
...
This commit adds a new `verbose` option to the `ExtractionStrategy` classes. The `verbose` option allows for logging of extraction details, such as the number of extracted blocks and the URL being processed. This improves the debugging and monitoring capabilities of the code.
2024-05-17 18:06:10 +08:00
unclecode
32c87f0388
chore: Update NlpSentenceChunking constructor parameters to None
...
The NlpSentenceChunking constructor parameters have been updated to None in order to simplify the usage of the class. This change removes the need for specifying the SpaCy model for sentence detection, making the code more concise and easier to understand.
2024-05-17 17:00:43 +08:00
unclecode
647cfda225
chore: Update Crawl4AI quickstart script in README.md
...
This commit updates the Crawl4AI quickstart script in the README.md file. The script is now properly formatted and aligned, making it easier to read and understand. The unnecessary indentation has been removed, and the script is now more concise and efficient.
2024-05-17 16:55:34 +08:00