Compare commits

...

4 Commits

Author SHA1 Message Date
unclecode
f5f3cce2c8 Merge new-release-0.0.2-no-spacy into main for v0.2.0 release 2024-05-17 18:23:27 +08:00
unclecode
a085e6315b Merge branch 'main' of https://github.com/unclecode/crawl4ai 2024-05-17 18:21:02 +08:00
unclecode
a8d600a3b4 chore: Add test_pad.py, requirements0.txt, and a.txt to .gitignore 2024-05-17 18:13:43 +08:00
UncleCode
4a2e17447b Update README.md 2024-05-16 08:57:58 +08:00

View File

@@ -8,6 +8,23 @@
Crawl4AI has one clear task: to simplify crawling and extract useful information from web pages, making it accessible for large language models (LLMs) and AI applications. 🆓🌐
<<<<<<< HEAD
## 🚀 New Changes Will be Released Soon
- 🚀 10x faster!!
- 📜 Execute custome JavaScript before crawling!
- 🤝 Colab friendly!
- 📚 Chunking strategies: topic-based, regex, sentence, and more!
- 🧠 Extraction strategies: cosine clustering, LLM, and more!
- 🎯 CSS selector support
- 📝 Pass instructions/keywords to refine extraction
## 🚧 Work in Progress 👷‍♂️
- 📷 Image Captioning: Incorporating image captioning capabilities to extract descriptions from images.
- 💾 Embedding Vector Data: Generate and store embedding data for each crawled website.
- 🔍 Semantic Search Engine: Building a semantic search engine that fetches content, performs vector search similarity, and generates labeled chunk data based on user queries and URLs.
=======
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wz8u30rvbq6Scodye9AGCw8Qg_Z8QGsk)
## Recent Changes
@@ -103,6 +120,7 @@ With Crawl4AI, you can perform advanced web crawling and data extraction tasks w
8. [Contributing](#contributing-)
9. [License](#license-)
10. [Contact](#contact-)
>>>>>>> new-release-0.0.2-no-spacy
## Features ✨