Merge new-release-0.0.2-no-spacy into main for v0.2.0 release

Merge branch 'main' of https://github.com/unclecode/crawl4ai
chore: Add test_pad.py, requirements0.txt, and a.txt to .gitignore
2024-05-17 18:23:27 +08:00 · 2024-05-17 18:21:02 +08:00 · 2024-05-17 18:13:43 +08:00 · 2024-05-16 08:57:58 +08:00
1 changed files with 18 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -8,6 +8,23 @@

 Crawl4AI has one clear task: to simplify crawling and extract useful information from web pages, making it accessible for large language models (LLMs) and AI applications. 🆓🌐

+<<<<<<< HEAD
+## 🚀 New Changes Will be Released Soon
+
+- 🚀 10x faster!!
+- 📜 Execute custome JavaScript before crawling!
+- 🤝 Colab friendly!
+- 📚 Chunking strategies: topic-based, regex, sentence, and more!
+- 🧠 Extraction strategies: cosine clustering, LLM, and more!
+- 🎯 CSS selector support
+- 📝 Pass instructions/keywords to refine extraction
+
+## 🚧 Work in Progress 👷‍♂️
+
+- 📷 Image Captioning: Incorporating image captioning capabilities to extract descriptions from images.
+- 💾 Embedding Vector Data: Generate and store embedding data for each crawled website.
+- 🔍 Semantic Search Engine: Building a semantic search engine that fetches content, performs vector search similarity, and generates labeled chunk data based on user queries and URLs.
+=======
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wz8u30rvbq6Scodye9AGCw8Qg_Z8QGsk)

 ## Recent Changes
@@ -103,6 +120,7 @@ With Crawl4AI, you can perform advanced web crawling and data extraction tasks w
 8. [Contributing](#contributing-)
 9. [License](#license-)
 10. [Contact](#contact-)
+>>>>>>> new-release-0.0.2-no-spacy


 ## Features ✨
Author	SHA1	Message	Date
unclecode	f5f3cce2c8	Merge new-release-0.0.2-no-spacy into main for v0.2.0 release	2024-05-17 18:23:27 +08:00
unclecode	a085e6315b	Merge branch 'main' of https://github.com/unclecode/crawl4ai	2024-05-17 18:21:02 +08:00
unclecode	a8d600a3b4	chore: Add test_pad.py, requirements0.txt, and a.txt to .gitignore	2024-05-17 18:13:43 +08:00
UncleCode	4a2e17447b	Update README.md	2024-05-16 08:57:58 +08:00