This website requires JavaScript.
Explore
Help
Register
Sign In
ayrisdev
/
crawl4ai
Watch
1
Star
0
Fork
0
You've already forked crawl4ai
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
Files
c8a10dc4552ed6b71ca0ff2ac9ccb51a06afa3d1
crawl4ai
/
crawl4ai
History
unclecode
d6182bedd7
chore:
...
- Add demo page to the new mkdocs - Set website home page to mkdocs
2024-06-22 20:36:01 +08:00
..
models
/onnx
- Add ONNX embedding model for CPU devices, Update the similarithy threshold, improve the embedding speed.
2024-05-19 22:30:10 +08:00
__init__.py
Change the project folder name from crawler to crawl4ai
2024-05-09 22:16:28 +08:00
chunking_strategy.py
chore: Update extraction strategy to support GPU, MPS, and CPU, add batch processing for CPU devices
2024-05-19 16:18:58 +00:00
config.py
chore: Update configuration values for chunk token threshold, overlap rate, and minimum word threshold. Create a new example for LLMExtraction Strategy, update Dockerfile, and README
2024-06-19 18:32:20 +08:00
crawler_strategy.py
chore: Update configuration values for chunk token threshold, overlap rate, and minimum word threshold. Create a new example for LLMExtraction Strategy, update Dockerfile, and README
2024-06-19 18:32:20 +08:00
database.py
- User agent
2024-06-08 17:59:42 +08:00
extraction_strategy.py
Update LLMExtractionStrategy to disable chunking if specified, Add example of summarization for a web page.
2024-06-19 19:03:35 +08:00
model_loader.py
Update for v0.2.2
2024-06-02 15:40:18 +08:00
models.py
Extract internal and external links.
2024-06-08 16:53:06 +08:00
onnx_embedding.py
- Add ONNX embedding model for CPU devices, Update the similarithy threshold, improve the embedding speed.
2024-05-19 22:30:10 +08:00
prompts.py
chore: Update configuration values for chunk token threshold, overlap rate, and minimum word threshold. Create a new example for LLMExtraction Strategy, update Dockerfile, and README
2024-06-19 18:32:20 +08:00
train.py
Remove dependency on Spacy model.
2024-05-17 15:08:03 +08:00
utils.py
chore:
2024-06-22 20:36:01 +08:00
web_crawler.back.py
vital: Right now, only raw html is retrived from datbase, therefore, css selector and other filter will be executed every time.
2024-06-08 18:37:40 +08:00
web_crawler.py
chore:
2024-06-22 20:36:01 +08:00