unclecode
774ace6e3b
Update html page for tutorial.
2024-06-02 18:00:53 +08:00
unclecode
4a8f91a0fc
Set bypass_cached to True
2024-06-02 16:12:25 +08:00
unclecode
18c9784b61
Update index.html (hide extract block check box)
2024-06-02 16:09:20 +08:00
unclecode
e5d401c67c
Update generated code sample
2024-06-02 16:06:43 +08:00
unclecode
ae77589a98
Update Readme
2024-06-02 15:42:13 +08:00
unclecode
ad373c0e19
Update Readme
2024-06-02 15:41:24 +08:00
unclecode
51f26d12fe
Update for v0.2.2
...
- Support multiple JS scripts
- Fixed some of bugs
- Resolved a few issue relevant to Colab installation
2024-06-02 15:40:18 +08:00
unclecode
f1b60b2016
chore: Update ONNX model loading process
2024-05-31 18:07:05 +08:00
UncleCode
8c2dc2b1e4
Create Dockerfile
2024-05-29 17:56:57 +08:00
UncleCode
dc9a44c12a
Update and rename Dockerfile to Dockerfile-version-0
2024-05-29 17:56:34 +08:00
UncleCode
d9753b6349
Update requirements.txt
...
Remove tokenizer version from requirements.txt
2024-05-24 14:49:48 +08:00
UncleCode
a554c0b143
Update requirements.txt
2024-05-23 12:52:31 +08:00
UncleCode
7381fa95e6
Merge pull request #3 from QIN2DIM/main
...
fix(main): UnicodeDecodeError
2024-05-23 09:29:28 +08:00
Unclecode
53d1176d53
chore: Update extraction strategy to support GPU, MPS, and CPU, add batch processing for CPU devices
2024-05-19 16:18:58 +00:00
unclecode
52c4be0696
Update setup.py version to 0.2.1
v0.2.1
2024-05-19 22:30:59 +08:00
unclecode
13a3b21d19
- Add ONNX embedding model for CPU devices, Update the similarithy threshold, improve the embedding speed.
2024-05-19 22:30:10 +08:00
QIN2DIM
5cee084340
fix(main): UnicodeDecodeError
...
File "T:\_GitHubProjects\Forks\crawl4ai\main.py", line 70, in read_index
partials[filename[:-5]] = file.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 149: illegal multibyte sequence
2024-05-18 23:31:11 +08:00
Unclecode
bf00c26a83
chore: Update Dockerfile to install chromium-chromedriver and spacy library
2024-05-18 09:16:52 +00:00
unclecode
3846648c12
chore: Update extraction strategy to support GPU, MPS, and CPU, add batch procesing for CPU devices
2024-05-18 15:42:19 +08:00
unclecode
eb6423875f
chore: Update Selenium options in crawler_strategy.py and add verbose logging in CosineStrategy
2024-05-18 14:13:06 +08:00
unclecode
e3524a10a7
chore: Update REST API base URL in README.md
2024-05-17 23:28:29 +08:00
unclecode
468dad6169
chore: Update Dockerfile to install chromium-chromedriver and spacy library
2024-05-17 23:15:39 +08:00
UncleCode
bc27982992
Update setup.py Handle Spacy installation
2024-05-17 22:11:00 +08:00
UncleCode
57e5decb55
Update requirements.txt
2024-05-17 22:02:08 +08:00
unclecode
b6319c6f6e
chore: Add support for GPU, MPS, and CPU
2024-05-17 21:56:13 +08:00
UncleCode
0a902f562f
Update requirements.txt Add Spacy
2024-05-17 21:41:35 +08:00
UncleCode
454135856e
Update extraction_strategy.py Support GPU, MPS, and CPU
2024-05-17 21:40:48 +08:00
UncleCode
33fddc27ad
Update model loader to support GPU, MPS, and CPU
2024-05-17 21:39:22 +08:00
unclecode
ce052a4eb5
Update README
2024-05-17 18:29:59 +08:00
unclecode
b43d77a56b
Update README
2024-05-17 18:28:39 +08:00
unclecode
1635a92218
chore: Update Crawl4AI quickstart script in README.md
2024-05-17 18:25:32 +08:00
unclecode
2a8a1b27e1
chore: Update Readme
2024-05-17 18:24:47 +08:00
unclecode
f5f3cce2c8
Merge new-release-0.0.2-no-spacy into main for v0.2.0 release
v0.2.0
2024-05-17 18:23:27 +08:00
unclecode
a085e6315b
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-05-17 18:21:02 +08:00
unclecode
a8d600a3b4
chore: Add test_pad.py, requirements0.txt, and a.txt to .gitignore
v0.1.0
2024-05-17 18:13:43 +08:00
unclecode
6f96dcd649
chore: Update README
2024-05-17 18:12:50 +08:00
unclecode
957a2458b1
chore: Update web crawler URLs to use NBC News business section
2024-05-17 18:11:13 +08:00
unclecode
36e46be23d
chore: Add verbose option to ExtractionStrategy classes
...
This commit adds a new `verbose` option to the `ExtractionStrategy` classes. The `verbose` option allows for logging of extraction details, such as the number of extracted blocks and the URL being processed. This improves the debugging and monitoring capabilities of the code.
2024-05-17 18:06:10 +08:00
unclecode
32c87f0388
chore: Update NlpSentenceChunking constructor parameters to None
...
The NlpSentenceChunking constructor parameters have been updated to None in order to simplify the usage of the class. This change removes the need for specifying the SpaCy model for sentence detection, making the code more concise and easier to understand.
2024-05-17 17:00:43 +08:00
unclecode
647cfda225
chore: Update Crawl4AI quickstart script in README.md
...
This commit updates the Crawl4AI quickstart script in the README.md file. The script is now properly formatted and aligned, making it easier to read and understand. The unnecessary indentation has been removed, and the script is now more concise and efficient.
2024-05-17 16:55:34 +08:00
unclecode
1cc67df301
chore: Update pip installation command and requirements, add new dependencies
2024-05-17 16:53:03 +08:00
unclecode
d7b37e849d
chore: Update CrawlRequest model to use NoExtractionStrategy as default
2024-05-17 16:50:38 +08:00
unclecode
f52f526002
chore: Update web_crawler.py to use NoExtractionStrategy as default
2024-05-17 16:03:35 +08:00
unclecode
3593f017d7
chore: Update setup.py to exclude torch, transformers, and nltk dependencies
...
This commit updates the setup.py file to exclude the torch, transformers, and nltk dependencies from the install_requires section. Instead, it creates separate extras_require sections for different environments, including all requirements, excluding torch for Colab, and excluding torch, transformers, and nltk for the crawl environment.
2024-05-17 16:01:04 +08:00
unclecode
e7bb76f19b
chore: Update torch dependency to version 2.3.0
2024-05-17 15:52:39 +08:00
unclecode
593b928967
Update requirements.txt to include latest versions of dependencies
2024-05-17 15:48:14 +08:00
unclecode
bb3d37face
chore: Update requirements.txt to include latest versions of dependencies
2024-05-17 15:32:37 +08:00
unclecode
3f8576f870
chore: Update model_loader.py to use pretrained models without resume_download
2024-05-17 15:26:15 +08:00
unclecode
bf3b040f10
chore: Update pip installation command and requirements, add new dependencies
2024-05-17 15:21:45 +08:00
unclecode
a317dc5e1d
Load CosineStrategy in the function
2024-05-17 15:13:06 +08:00