crawl4ai

Author	SHA1	Message	Date
UncleCode	8ec12d7d68	Apply Ruff Corrections	2025-01-13 19:19:58 +08:00
Mahesh	00026b5f8b	feat(config): Adding a configurable way of setting the cache directory for constrained environments	2024-11-12 14:52:51 -07:00
UncleCode	4e2852d5ff	[v0.3.71] Enhance chunking strategies and improve overall performance - Add OverlappingWindowChunking and improve SlidingWindowChunking - Update CHUNK_TOKEN_THRESHOLD to 2048 tokens - Optimize AsyncPlaywrightCrawlerStrategy close method - Enhance flexibility in CosineStrategy with generic embedding model loading - Improve JSON-based extraction strategies - Add knowledge graph generation example	2024-10-19 18:36:59 +08:00
unclecode	4d48bd31ca	Push async version last changes for merge to main branch	2024-09-24 20:52:08 +08:00
unclecode	5c15837677	chore: Update README, generate new notbook for quickstart	2024-09-04 14:46:22 +08:00
unclecode	e5e6a34e80	## [v0.2.77] - 2024-08-04 Significant improvements in text processing and performance: - 🚀 Dependency reduction: Removed dependency on spaCy model for text chunk labeling in cosine extraction strategy. - 🤖 Transformer upgrade: Implemented text sequence classification using a transformer model for labeling text chunks. - ⚡ Performance enhancement: Improved model loading speed due to removal of spaCy dependency. - 🔧 Future-proofing: Laid groundwork for potential complete removal of spaCy dependency in future versions. These changes address issue #68 and provide a foundation for faster, more efficient text processing in Crawl4AI.	2024-08-04 14:54:18 +08:00
Aravind Karnam	9d0cafcfa6	fixed import error in model_loader.py	2024-07-21 14:55:58 +05:30
unclecode	51f26d12fe	Update for v0.2.2 - Support multiple JS scripts - Fixed some of bugs - Resolved a few issue relevant to Colab installation	2024-06-02 15:40:18 +08:00
unclecode	f1b60b2016	chore: Update ONNX model loading process	2024-05-31 18:07:05 +08:00
unclecode	13a3b21d19	- Add ONNX embedding model for CPU devices, Update the similarithy threshold, improve the embedding speed.	2024-05-19 22:30:10 +08:00
unclecode	3846648c12	chore: Update extraction strategy to support GPU, MPS, and CPU, add batch procesing for CPU devices	2024-05-18 15:42:19 +08:00
unclecode	b6319c6f6e	chore: Add support for GPU, MPS, and CPU	2024-05-17 21:56:13 +08:00
UncleCode	33fddc27ad	Update model loader to support GPU, MPS, and CPU	2024-05-17 21:39:22 +08:00
unclecode	36e46be23d	chore: Add verbose option to ExtractionStrategy classes This commit adds a new `verbose` option to the `ExtractionStrategy` classes. The `verbose` option allows for logging of extraction details, such as the number of extracted blocks and the URL being processed. This improves the debugging and monitoring capabilities of the code.	2024-05-17 18:06:10 +08:00
unclecode	3f8576f870	chore: Update model_loader.py to use pretrained models without resume_download	2024-05-17 15:26:15 +08:00
unclecode	a5f9d07dbf	Remove dependency on Spacy model.	2024-05-17 15:08:03 +08:00
unclecode	ea16dec587	Improve library loading	2024-05-16 21:19:02 +08:00
unclecode	d19488a821	chore: Update model_loader.py to create necessary folders in the home directory	2024-05-16 21:05:24 +08:00
unclecode	5bb0b0b378	chore: Update pip installation command and requirements for Crawl4AI	2024-05-16 20:36:29 +08:00
unclecode	8e28eb9efb	Add model loader, update requirements.txt	2024-05-16 20:08:21 +08:00
unclecode	c8589f8da3	Update: - Fix Spacy model issue - Update Readme and requirements.txt	2024-05-16 19:50:20 +08:00
unclecode	f6e59157bf	- Test all methods - Update index.hml - Update Readme - Resolve some bugs	2024-05-14 21:27:41 +08:00

22 Commits