unclecode
d58286989c
UPDATE DOCUMENTS
2024-06-30 00:34:02 +08:00
unclecode
b58af3349c
chore: Update installation instructions with support for different modes
2024-06-30 00:22:17 +08:00
unclecode
685706e0aa
Update version, and change log
2024-06-30 00:17:43 +08:00
unclecode
61ae2de841
1/Update setup.py to support following modes:
...
- default (most frequent mode)
- torch
- transformers
- all
2/ Update Docker file
3/ Update documentation as well.
2024-06-30 00:15:29 +08:00
unclecode
f8a11779fe
Update change log
2024-06-26 16:48:36 +08:00
unclecode
d11a83c232
## [0.2.71] 2024-06-26
...
• Refactored `crawler_strategy.py` to handle exceptions and improve error messages
• Improved `get_content_of_website_optimized` function in `utils.py` for better performance
• Updated `utils.py` with latest changes
• Migrated to `ChromeDriverManager` for resolving Chrome driver download issues
2024-06-26 15:34:15 +08:00
unclecode
78cfad8b2f
chore: Update version to 0.2.7 and improve extraction function speed
2024-06-24 22:39:56 +08:00
unclecode
68b3dff74a
Update CSS
2024-06-23 00:36:03 +08:00
unclecode
bfc4abd6e8
Update documents
2024-06-22 20:57:03 +08:00
unclecode
d6182bedd7
chore:
...
- Add demo page to the new mkdocs
- Set website home page to mkdocs
2024-06-22 20:36:01 +08:00
unclecode
2c2362b4d3
issue 19 is resolved
...
- Update Dockerfile to install mkdocs and build documentation
2024-06-22 17:18:00 +08:00
unclecode
e7705e661a
ADD MKDocs
2024-06-21 17:56:54 +08:00
unclecode
21b110bfd7
Update LLMExtractionStrategy to disable chunking if specified, Add example of summarization for a web page.
2024-06-19 19:03:35 +08:00
unclecode
539263a8ba
chore: Update configuration values for chunk token threshold, overlap rate, and minimum word threshold. Create a new example for LLMExtraction Strategy, update Dockerfile, and README
2024-06-19 18:32:20 +08:00
unclecode
3f0e265baf
Merge branch 'format-inline-tags'
2024-06-19 00:48:38 +08:00
unclecode
21e2538e57
Update quickstart.py
2024-06-19 00:37:53 +08:00
unclecode
77da48050d
chore: Add custom headers to LocalSeleniumCrawlerStrategy
2024-06-17 15:50:03 +08:00
unclecode
9a97aacd85
chore: Add hooks for customizing the LocalSeleniumCrawlerStrategy
2024-06-17 15:37:18 +08:00
unclecode
b3a0edaa6d
- User agent
...
- Extract Links
- Extract Metadata
- Update Readme
- Update REST API document
2024-06-08 17:59:42 +08:00
unclecode
36a5847df5
Add css selector example
2024-06-07 20:47:20 +08:00
unclecode
a19379aa58
Add recipe images, update README, and REST api example
2024-06-07 20:43:50 +08:00
unclecode
768d048e1c
Update rest call how to use
2024-06-07 18:10:45 +08:00
unclecode
94c11a0262
Add image
2024-06-07 18:09:21 +08:00
unclecode
aeb2114170
Add example of REST API call
2024-06-07 16:24:40 +08:00
unclecode
226a62a3c0
feat: Add screenshot functionality to crawl_urls
2024-06-07 15:33:15 +08:00
unclecode
8e73a482a2
feat: Add screenshot functionality to crawl_urls
...
The code changes in this commit add the `screenshot` parameter to the `crawl_urls` function in `main.py`. This allows users to specify whether they want to take a screenshot of the page during the crawling process. The default value is `False`.
This commit message follows the established convention of starting with a type (feat for feature) and providing a concise and descriptive summary of the changes made.
2024-06-07 15:23:32 +08:00
unclecode
c7553b1280
Update research assistant example with package installation instructions
2024-06-04 23:18:19 +08:00
unclecode
8b8683f22e
Add research assistant example using Chainlit
2024-06-04 22:43:09 +08:00
unclecode
51f26d12fe
Update for v0.2.2
...
- Support multiple JS scripts
- Fixed some of bugs
- Resolved a few issue relevant to Colab installation
2024-06-02 15:40:18 +08:00
unclecode
13a3b21d19
- Add ONNX embedding model for CPU devices, Update the similarithy threshold, improve the embedding speed.
2024-05-19 22:30:10 +08:00
unclecode
eb6423875f
chore: Update Selenium options in crawler_strategy.py and add verbose logging in CosineStrategy
2024-05-18 14:13:06 +08:00
unclecode
b6319c6f6e
chore: Add support for GPU, MPS, and CPU
2024-05-17 21:56:13 +08:00
unclecode
957a2458b1
chore: Update web crawler URLs to use NBC News business section
2024-05-17 18:11:13 +08:00
unclecode
32c87f0388
chore: Update NlpSentenceChunking constructor parameters to None
...
The NlpSentenceChunking constructor parameters have been updated to None in order to simplify the usage of the class. This change removes the need for specifying the SpaCy model for sentence detection, making the code more concise and easier to understand.
2024-05-17 17:00:43 +08:00
unclecode
1cc67df301
chore: Update pip installation command and requirements, add new dependencies
2024-05-17 16:53:03 +08:00
unclecode
a5f9d07dbf
Remove dependency on Spacy model.
2024-05-17 15:08:03 +08:00
UncleCode
6fcaf26b4f
Update quickstart.py: Add counting items
2024-05-16 22:49:12 +08:00
unclecode
c8589f8da3
Update:
...
- Fix Spacy model issue
- Update Readme and requirements.txt
2024-05-16 19:50:20 +08:00
unclecode
5b80be956d
Update:
...
- Debug
- Refactor code for new version
2024-05-16 17:31:44 +08:00
unclecode
f6e59157bf
- Test all methods
...
- Update index.hml
- Update Readme
- Resolve some bugs
2024-05-14 21:27:41 +08:00