Commit Graph

314 Commits

Author SHA1 Message Date
unclecode
5bb0b0b378 chore: Update pip installation command and requirements for Crawl4AI 2024-05-16 20:36:29 +08:00
unclecode
8e28eb9efb Add model loader, update requirements.txt 2024-05-16 20:08:21 +08:00
unclecode
c8589f8da3 Update:
- Fix Spacy model issue
- Update Readme and requirements.txt
2024-05-16 19:50:20 +08:00
unclecode
5b80be956d Update:
- Debug
- Refactor code for new version
2024-05-16 17:31:44 +08:00
unclecode
f6e59157bf - Test all methods
- Update index.hml
- Update Readme
- Resolve some bugs
2024-05-14 21:27:41 +08:00
unclecode
5fea6c064b Improve libraries import 2024-05-13 02:46:35 +08:00
unclecode
7679064521 Add model parameter for clustring. 2024-05-13 00:06:16 +08:00
unclecode
cf087cfa58 Replace embedding model with smaller one 2024-05-12 23:55:57 +08:00
unclecode
5693e324a4 Add time measurements. 2024-05-12 23:35:27 +08:00
unclecode
82706129f5 Update:
- Text Categorization
- Crawler, Extraction, and Chunking strategies
- Clustering for semantic segmentation
2024-05-12 22:37:21 +08:00
unclecode
7039e3c1ee - Issue Resolved: Every <pre> tag's HTML content is replaced with its inner text to address situations like syntax highlighters, where each character might be in a <span>. This avoids issues where the minimum word threshold might ignore them. 2024-05-12 14:08:22 +08:00
unclecode
372c921429 Update: Fix bug, when user set extract_blocks to False 2024-05-10 20:12:31 +08:00
unclecode
88643612e8 chore: Update environment variable usage in config files 2024-05-09 22:37:01 +08:00
unclecode
3ff1d15702 Change the project folder name from crawler to crawl4ai 2024-05-09 22:16:28 +08:00