11 Commits

Author SHA1 Message Date
unclecode
4a50781453 chore: Remove local and .files folders from .gitignore 2024-06-17 15:57:34 +08:00
unclecode
8b8683f22e Add research assistant example using Chainlit 2024-06-04 22:43:09 +08:00
QIN2DIM
5cee084340 fix(main): UnicodeDecodeError
File "T:\_GitHubProjects\Forks\crawl4ai\main.py", line 70, in read_index
    partials[filename[:-5]] = file.read()

UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 149: illegal multibyte sequence
2024-05-18 23:31:11 +08:00
Unclecode
bf00c26a83 chore: Update Dockerfile to install chromium-chromedriver and spacy library 2024-05-18 09:16:52 +00:00
unclecode
199c66114c chore: Update pip installation command and requirements, add new dependencies 2024-05-16 20:58:36 +08:00
unclecode
f6e59157bf - Test all methods
- Update index.hml
- Update Readme
- Resolve some bugs
2024-05-14 21:27:41 +08:00
unclecode
82706129f5 Update:
- Text Categorization
- Crawler, Extraction, and Chunking strategies
- Clustering for semantic segmentation
2024-05-12 22:37:21 +08:00
unclecode
7039e3c1ee - Issue Resolved: Every <pre> tag's HTML content is replaced with its inner text to address situations like syntax highlighters, where each character might be in a <span>. This avoids issues where the minimum word threshold might ignore them. 2024-05-12 14:08:22 +08:00
unclecode
181250cb93 chore: Add function to clear the database 2024-05-09 19:42:43 +08:00
unclecode
c71adb29ce chore: Update .gitignore and README.md 2024-05-09 19:25:25 +08:00
unclecode
b8e743cd8d Initial Commit 2024-05-09 19:10:25 +08:00