Compare commits
4 Commits
new-releas
...
v0.2.0
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
f5f3cce2c8 | ||
|
|
a085e6315b | ||
|
|
a8d600a3b4 | ||
|
|
4a2e17447b |
18
README.md
18
README.md
@@ -8,6 +8,23 @@
|
|||||||
|
|
||||||
Crawl4AI has one clear task: to simplify crawling and extract useful information from web pages, making it accessible for large language models (LLMs) and AI applications. 🆓🌐
|
Crawl4AI has one clear task: to simplify crawling and extract useful information from web pages, making it accessible for large language models (LLMs) and AI applications. 🆓🌐
|
||||||
|
|
||||||
|
<<<<<<< HEAD
|
||||||
|
## 🚀 New Changes Will be Released Soon
|
||||||
|
|
||||||
|
- 🚀 10x faster!!
|
||||||
|
- 📜 Execute custome JavaScript before crawling!
|
||||||
|
- 🤝 Colab friendly!
|
||||||
|
- 📚 Chunking strategies: topic-based, regex, sentence, and more!
|
||||||
|
- 🧠 Extraction strategies: cosine clustering, LLM, and more!
|
||||||
|
- 🎯 CSS selector support
|
||||||
|
- 📝 Pass instructions/keywords to refine extraction
|
||||||
|
|
||||||
|
## 🚧 Work in Progress 👷♂️
|
||||||
|
|
||||||
|
- 📷 Image Captioning: Incorporating image captioning capabilities to extract descriptions from images.
|
||||||
|
- 💾 Embedding Vector Data: Generate and store embedding data for each crawled website.
|
||||||
|
- 🔍 Semantic Search Engine: Building a semantic search engine that fetches content, performs vector search similarity, and generates labeled chunk data based on user queries and URLs.
|
||||||
|
=======
|
||||||
[](https://colab.research.google.com/drive/1wz8u30rvbq6Scodye9AGCw8Qg_Z8QGsk)
|
[](https://colab.research.google.com/drive/1wz8u30rvbq6Scodye9AGCw8Qg_Z8QGsk)
|
||||||
|
|
||||||
## Recent Changes
|
## Recent Changes
|
||||||
@@ -103,6 +120,7 @@ With Crawl4AI, you can perform advanced web crawling and data extraction tasks w
|
|||||||
8. [Contributing](#contributing-)
|
8. [Contributing](#contributing-)
|
||||||
9. [License](#license-)
|
9. [License](#license-)
|
||||||
10. [Contact](#contact-)
|
10. [Contact](#contact-)
|
||||||
|
>>>>>>> new-release-0.0.2-no-spacy
|
||||||
|
|
||||||
|
|
||||||
## Features ✨
|
## Features ✨
|
||||||
|
|||||||
Reference in New Issue
Block a user