unclecode
|
f52f526002
|
chore: Update web_crawler.py to use NoExtractionStrategy as default
|
2024-05-17 16:03:35 +08:00 |
|
unclecode
|
3f8576f870
|
chore: Update model_loader.py to use pretrained models without resume_download
|
2024-05-17 15:26:15 +08:00 |
|
unclecode
|
a317dc5e1d
|
Load CosineStrategy in the function
|
2024-05-17 15:13:06 +08:00 |
|
unclecode
|
a5f9d07dbf
|
Remove dependency on Spacy model.
|
2024-05-17 15:08:03 +08:00 |
|
UncleCode
|
5b4a586b2d
|
Update web_crawler.py
Set CosineExtraction as defaul strategy
|
2024-05-16 22:28:24 +08:00 |
|
UncleCode
|
a856319499
|
Update web_crawler.py
Set NoExtractionStrategy for FetchPages
|
2024-05-16 22:06:33 +08:00 |
|
UncleCode
|
5ce1dc1622
|
Update web_crawler.py
Set all extraction strategies default to NoExtractionStrategy
|
2024-05-16 21:58:11 +08:00 |
|
unclecode
|
ea16dec587
|
Improve library loading
|
2024-05-16 21:19:02 +08:00 |
|
unclecode
|
d19488a821
|
chore: Update model_loader.py to create necessary folders in the home directory
|
2024-05-16 21:05:24 +08:00 |
|
unclecode
|
5bb0b0b378
|
chore: Update pip installation command and requirements for Crawl4AI
|
2024-05-16 20:36:29 +08:00 |
|
unclecode
|
8e28eb9efb
|
Add model loader, update requirements.txt
|
2024-05-16 20:08:21 +08:00 |
|
unclecode
|
c8589f8da3
|
Update:
- Fix Spacy model issue
- Update Readme and requirements.txt
|
2024-05-16 19:50:20 +08:00 |
|
unclecode
|
5b80be956d
|
Update:
- Debug
- Refactor code for new version
|
2024-05-16 17:31:44 +08:00 |
|
unclecode
|
f6e59157bf
|
- Test all methods
- Update index.hml
- Update Readme
- Resolve some bugs
|
2024-05-14 21:27:41 +08:00 |
|
unclecode
|
5fea6c064b
|
Improve libraries import
|
2024-05-13 02:46:35 +08:00 |
|
unclecode
|
7679064521
|
Add model parameter for clustring.
|
2024-05-13 00:06:16 +08:00 |
|
unclecode
|
cf087cfa58
|
Replace embedding model with smaller one
|
2024-05-12 23:55:57 +08:00 |
|
unclecode
|
5693e324a4
|
Add time measurements.
|
2024-05-12 23:35:27 +08:00 |
|
unclecode
|
82706129f5
|
Update:
- Text Categorization
- Crawler, Extraction, and Chunking strategies
- Clustering for semantic segmentation
|
2024-05-12 22:37:21 +08:00 |
|
unclecode
|
7039e3c1ee
|
- Issue Resolved: Every <pre> tag's HTML content is replaced with its inner text to address situations like syntax highlighters, where each character might be in a <span>. This avoids issues where the minimum word threshold might ignore them.
|
2024-05-12 14:08:22 +08:00 |
|
unclecode
|
372c921429
|
Update: Fix bug, when user set extract_blocks to False
|
2024-05-10 20:12:31 +08:00 |
|
unclecode
|
88643612e8
|
chore: Update environment variable usage in config files
|
2024-05-09 22:37:01 +08:00 |
|
unclecode
|
3ff1d15702
|
Change the project folder name from crawler to crawl4ai
|
2024-05-09 22:16:28 +08:00 |
|