unclecode
52c4be0696
Update setup.py version to 0.2.1
v0.2.1
2024-05-19 22:30:59 +08:00
unclecode
13a3b21d19
- Add ONNX embedding model for CPU devices, Update the similarithy threshold, improve the embedding speed.
2024-05-19 22:30:10 +08:00
Unclecode
bf00c26a83
chore: Update Dockerfile to install chromium-chromedriver and spacy library
2024-05-18 09:16:52 +00:00
unclecode
3846648c12
chore: Update extraction strategy to support GPU, MPS, and CPU, add batch procesing for CPU devices
2024-05-18 15:42:19 +08:00
unclecode
eb6423875f
chore: Update Selenium options in crawler_strategy.py and add verbose logging in CosineStrategy
2024-05-18 14:13:06 +08:00
unclecode
e3524a10a7
chore: Update REST API base URL in README.md
2024-05-17 23:28:29 +08:00
unclecode
468dad6169
chore: Update Dockerfile to install chromium-chromedriver and spacy library
2024-05-17 23:15:39 +08:00
UncleCode
bc27982992
Update setup.py Handle Spacy installation
2024-05-17 22:11:00 +08:00
UncleCode
57e5decb55
Update requirements.txt
2024-05-17 22:02:08 +08:00
unclecode
b6319c6f6e
chore: Add support for GPU, MPS, and CPU
2024-05-17 21:56:13 +08:00
UncleCode
0a902f562f
Update requirements.txt Add Spacy
2024-05-17 21:41:35 +08:00
UncleCode
454135856e
Update extraction_strategy.py Support GPU, MPS, and CPU
2024-05-17 21:40:48 +08:00
UncleCode
33fddc27ad
Update model loader to support GPU, MPS, and CPU
2024-05-17 21:39:22 +08:00
unclecode
ce052a4eb5
Update README
2024-05-17 18:29:59 +08:00
unclecode
b43d77a56b
Update README
2024-05-17 18:28:39 +08:00
unclecode
1635a92218
chore: Update Crawl4AI quickstart script in README.md
2024-05-17 18:25:32 +08:00
unclecode
2a8a1b27e1
chore: Update Readme
2024-05-17 18:24:47 +08:00
unclecode
f5f3cce2c8
Merge new-release-0.0.2-no-spacy into main for v0.2.0 release
v0.2.0
2024-05-17 18:23:27 +08:00
unclecode
a085e6315b
Merge branch 'main' of https://github.com/unclecode/crawl4ai
2024-05-17 18:21:02 +08:00
unclecode
a8d600a3b4
chore: Add test_pad.py, requirements0.txt, and a.txt to .gitignore
v0.1.0
2024-05-17 18:13:43 +08:00
unclecode
6f96dcd649
chore: Update README
2024-05-17 18:12:50 +08:00
unclecode
957a2458b1
chore: Update web crawler URLs to use NBC News business section
2024-05-17 18:11:13 +08:00
unclecode
36e46be23d
chore: Add verbose option to ExtractionStrategy classes
...
This commit adds a new `verbose` option to the `ExtractionStrategy` classes. The `verbose` option allows for logging of extraction details, such as the number of extracted blocks and the URL being processed. This improves the debugging and monitoring capabilities of the code.
2024-05-17 18:06:10 +08:00
unclecode
32c87f0388
chore: Update NlpSentenceChunking constructor parameters to None
...
The NlpSentenceChunking constructor parameters have been updated to None in order to simplify the usage of the class. This change removes the need for specifying the SpaCy model for sentence detection, making the code more concise and easier to understand.
2024-05-17 17:00:43 +08:00
unclecode
647cfda225
chore: Update Crawl4AI quickstart script in README.md
...
This commit updates the Crawl4AI quickstart script in the README.md file. The script is now properly formatted and aligned, making it easier to read and understand. The unnecessary indentation has been removed, and the script is now more concise and efficient.
2024-05-17 16:55:34 +08:00
unclecode
1cc67df301
chore: Update pip installation command and requirements, add new dependencies
2024-05-17 16:53:03 +08:00
unclecode
d7b37e849d
chore: Update CrawlRequest model to use NoExtractionStrategy as default
2024-05-17 16:50:38 +08:00
unclecode
f52f526002
chore: Update web_crawler.py to use NoExtractionStrategy as default
2024-05-17 16:03:35 +08:00
unclecode
3593f017d7
chore: Update setup.py to exclude torch, transformers, and nltk dependencies
...
This commit updates the setup.py file to exclude the torch, transformers, and nltk dependencies from the install_requires section. Instead, it creates separate extras_require sections for different environments, including all requirements, excluding torch for Colab, and excluding torch, transformers, and nltk for the crawl environment.
2024-05-17 16:01:04 +08:00
unclecode
e7bb76f19b
chore: Update torch dependency to version 2.3.0
2024-05-17 15:52:39 +08:00
unclecode
593b928967
Update requirements.txt to include latest versions of dependencies
2024-05-17 15:48:14 +08:00
unclecode
bb3d37face
chore: Update requirements.txt to include latest versions of dependencies
2024-05-17 15:32:37 +08:00
unclecode
3f8576f870
chore: Update model_loader.py to use pretrained models without resume_download
2024-05-17 15:26:15 +08:00
unclecode
bf3b040f10
chore: Update pip installation command and requirements, add new dependencies
2024-05-17 15:21:45 +08:00
unclecode
a317dc5e1d
Load CosineStrategy in the function
2024-05-17 15:13:06 +08:00
unclecode
a5f9d07dbf
Remove dependency on Spacy model.
2024-05-17 15:08:03 +08:00
unclecode
f85df91ca6
chore: Update README.md with Colab badge
2024-05-17 00:21:16 +08:00
UncleCode
6fcaf26b4f
Update quickstart.py: Add counting items
2024-05-16 22:49:12 +08:00
UncleCode
5b4a586b2d
Update web_crawler.py
...
Set CosineExtraction as defaul strategy
2024-05-16 22:28:24 +08:00
UncleCode
a856319499
Update web_crawler.py
...
Set NoExtractionStrategy for FetchPages
2024-05-16 22:06:33 +08:00
UncleCode
5ce1dc1622
Update web_crawler.py
...
Set all extraction strategies default to NoExtractionStrategy
2024-05-16 21:58:11 +08:00
unclecode
ea16dec587
Improve library loading
2024-05-16 21:19:02 +08:00
unclecode
d19488a821
chore: Update model_loader.py to create necessary folders in the home directory
2024-05-16 21:05:24 +08:00
unclecode
199c66114c
chore: Update pip installation command and requirements, add new dependencies
2024-05-16 20:58:36 +08:00
unclecode
45569d058d
chore: Update pip installation command and requirements for Crawl4AI
2024-05-16 20:42:53 +08:00
unclecode
5bb0b0b378
chore: Update pip installation command and requirements for Crawl4AI
2024-05-16 20:36:29 +08:00
unclecode
4006f5f4e2
chore: Update pip installation command to use sys.executable
2024-05-16 20:24:48 +08:00
unclecode
7e0682e0de
chore: Update dependencies and installation process
2024-05-16 20:22:50 +08:00
unclecode
8e28eb9efb
Add model loader, update requirements.txt
2024-05-16 20:08:21 +08:00
unclecode
c8589f8da3
Update:
...
- Fix Spacy model issue
- Update Readme and requirements.txt
2024-05-16 19:50:20 +08:00