unclecode
bccadec887
Remove dependency on psutil, PyYaml, and extend requests version range
2024-09-29 17:07:06 +08:00
unclecode
0759503e50
Extend numpy version range to support Python 3.9
2024-09-29 00:08:02 +08:00
unclecode
8b6e88c85c
Update .gitignore to ignore temporary and test directories
2024-09-26 15:09:49 +08:00
unclecode
4d48bd31ca
Push async version last changes for merge to main branch
2024-09-24 20:52:08 +08:00
unclecode
5c15837677
chore: Update README, generate new notbook for quickstart
2024-09-04 14:46:22 +08:00
unclecode
b6713870ef
refactor: Update Dockerfile to install Crawl4AI with specified options
...
This commit updates the Dockerfile to install Crawl4AI with the specified options. The `INSTALL_OPTION` build argument is used to determine which additional packages to install. If the option is set to "all", all models will be downloaded. If the option is set to "torch", only torch models will be downloaded. If the option is set to "transformer", only transformer models will be downloaded. If no option is specified, the default installation will be used. This change improves the flexibility and customization of the Crawl4AI installation process.
2024-08-01 17:56:19 +08:00
unclecode
9e43f7beda
refactor: Temporarily disable fetching image file size in get_content_of_website_optimized
...
Set the `image_size` variable to 0 in the `get_content_of_website_optimized` function to temporarily disable fetching the image file size. This change addresses performance issues and will be improved in a future update.
Update Dockerfile for linuz users
2024-07-31 13:29:23 +08:00
unclecode
144cfa0eda
Switch to ChromeDriverManager due some issues with download the chrome driver
2024-06-26 13:00:17 +08:00
unclecode
539263a8ba
chore: Update configuration values for chunk token threshold, overlap rate, and minimum word threshold. Create a new example for LLMExtraction Strategy, update Dockerfile, and README
2024-06-19 18:32:20 +08:00
unclecode
194050705d
chore: Add pillow library to requirements.txt
2024-06-10 23:03:32 +08:00
unclecode
51f26d12fe
Update for v0.2.2
...
- Support multiple JS scripts
- Fixed some of bugs
- Resolved a few issue relevant to Colab installation
2024-06-02 15:40:18 +08:00
UncleCode
d9753b6349
Update requirements.txt
...
Remove tokenizer version from requirements.txt
2024-05-24 14:49:48 +08:00
UncleCode
a554c0b143
Update requirements.txt
2024-05-23 12:52:31 +08:00
unclecode
13a3b21d19
- Add ONNX embedding model for CPU devices, Update the similarithy threshold, improve the embedding speed.
2024-05-19 22:30:10 +08:00
unclecode
468dad6169
chore: Update Dockerfile to install chromium-chromedriver and spacy library
2024-05-17 23:15:39 +08:00
UncleCode
57e5decb55
Update requirements.txt
2024-05-17 22:02:08 +08:00
UncleCode
0a902f562f
Update requirements.txt Add Spacy
2024-05-17 21:41:35 +08:00
unclecode
e7bb76f19b
chore: Update torch dependency to version 2.3.0
2024-05-17 15:52:39 +08:00
unclecode
593b928967
Update requirements.txt to include latest versions of dependencies
2024-05-17 15:48:14 +08:00
unclecode
bb3d37face
chore: Update requirements.txt to include latest versions of dependencies
2024-05-17 15:32:37 +08:00
unclecode
a5f9d07dbf
Remove dependency on Spacy model.
2024-05-17 15:08:03 +08:00
unclecode
199c66114c
chore: Update pip installation command and requirements, add new dependencies
2024-05-16 20:58:36 +08:00
unclecode
7e0682e0de
chore: Update dependencies and installation process
2024-05-16 20:22:50 +08:00
unclecode
8e28eb9efb
Add model loader, update requirements.txt
2024-05-16 20:08:21 +08:00
unclecode
c8589f8da3
Update:
...
- Fix Spacy model issue
- Update Readme and requirements.txt
2024-05-16 19:50:20 +08:00
unclecode
5b80be956d
Update:
...
- Debug
- Refactor code for new version
2024-05-16 17:31:44 +08:00
unclecode
5fea6c064b
Improve libraries import
2024-05-13 02:46:35 +08:00
unclecode
b38bf64490
Exclude spaCy from requirements.txt
2024-05-12 22:59:26 +08:00
unclecode
82706129f5
Update:
...
- Text Categorization
- Crawler, Extraction, and Chunking strategies
- Clustering for semantic segmentation
2024-05-12 22:37:21 +08:00
unclecode
b8e743cd8d
Initial Commit
2024-05-09 19:10:25 +08:00