UncleCode
8ec12d7d68
Apply Ruff Corrections
2025-01-13 19:19:58 +08:00
Mahesh
00026b5f8b
feat(config): Adding a configurable way of setting the cache directory for constrained environments
2024-11-12 14:52:51 -07:00
UncleCode
4e2852d5ff
[v0.3.71] Enhance chunking strategies and improve overall performance
...
- Add OverlappingWindowChunking and improve SlidingWindowChunking
- Update CHUNK_TOKEN_THRESHOLD to 2048 tokens
- Optimize AsyncPlaywrightCrawlerStrategy close method
- Enhance flexibility in CosineStrategy with generic embedding model loading
- Improve JSON-based extraction strategies
- Add knowledge graph generation example
2024-10-19 18:36:59 +08:00
unclecode
4d48bd31ca
Push async version last changes for merge to main branch
2024-09-24 20:52:08 +08:00
unclecode
5c15837677
chore: Update README, generate new notbook for quickstart
2024-09-04 14:46:22 +08:00
unclecode
e5e6a34e80
## [v0.2.77] - 2024-08-04
...
Significant improvements in text processing and performance:
- 🚀 **Dependency reduction**: Removed dependency on spaCy model for text chunk labeling in cosine extraction strategy.
- 🤖 **Transformer upgrade**: Implemented text sequence classification using a transformer model for labeling text chunks.
- ⚡ **Performance enhancement**: Improved model loading speed due to removal of spaCy dependency.
- 🔧 **Future-proofing**: Laid groundwork for potential complete removal of spaCy dependency in future versions.
These changes address issue #68 and provide a foundation for faster, more efficient text processing in Crawl4AI.
2024-08-04 14:54:18 +08:00
Aravind Karnam
9d0cafcfa6
fixed import error in model_loader.py
2024-07-21 14:55:58 +05:30
unclecode
51f26d12fe
Update for v0.2.2
...
- Support multiple JS scripts
- Fixed some of bugs
- Resolved a few issue relevant to Colab installation
2024-06-02 15:40:18 +08:00
unclecode
f1b60b2016
chore: Update ONNX model loading process
2024-05-31 18:07:05 +08:00
unclecode
13a3b21d19
- Add ONNX embedding model for CPU devices, Update the similarithy threshold, improve the embedding speed.
2024-05-19 22:30:10 +08:00
unclecode
3846648c12
chore: Update extraction strategy to support GPU, MPS, and CPU, add batch procesing for CPU devices
2024-05-18 15:42:19 +08:00
unclecode
b6319c6f6e
chore: Add support for GPU, MPS, and CPU
2024-05-17 21:56:13 +08:00
UncleCode
33fddc27ad
Update model loader to support GPU, MPS, and CPU
2024-05-17 21:39:22 +08:00
unclecode
36e46be23d
chore: Add verbose option to ExtractionStrategy classes
...
This commit adds a new `verbose` option to the `ExtractionStrategy` classes. The `verbose` option allows for logging of extraction details, such as the number of extracted blocks and the URL being processed. This improves the debugging and monitoring capabilities of the code.
2024-05-17 18:06:10 +08:00
unclecode
3f8576f870
chore: Update model_loader.py to use pretrained models without resume_download
2024-05-17 15:26:15 +08:00
unclecode
a5f9d07dbf
Remove dependency on Spacy model.
2024-05-17 15:08:03 +08:00
unclecode
ea16dec587
Improve library loading
2024-05-16 21:19:02 +08:00
unclecode
d19488a821
chore: Update model_loader.py to create necessary folders in the home directory
2024-05-16 21:05:24 +08:00
unclecode
5bb0b0b378
chore: Update pip installation command and requirements for Crawl4AI
2024-05-16 20:36:29 +08:00
unclecode
8e28eb9efb
Add model loader, update requirements.txt
2024-05-16 20:08:21 +08:00
unclecode
c8589f8da3
Update:
...
- Fix Spacy model issue
- Update Readme and requirements.txt
2024-05-16 19:50:20 +08:00
unclecode
f6e59157bf
- Test all methods
...
- Update index.hml
- Update Readme
- Resolve some bugs
2024-05-14 21:27:41 +08:00