From 4a2e17447bfbc49cb4745011869c7edcbcba20f0 Mon Sep 17 00:00:00 2001 From: UncleCode Date: Thu, 16 May 2024 08:57:58 +0800 Subject: [PATCH] Update README.md --- README.md | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 8a49e1e4..d9297e5e 100644 --- a/README.md +++ b/README.md @@ -8,11 +8,18 @@ Crawl4AI is a powerful, free web crawling service designed to extract useful information from web pages and make it accessible for large language models (LLMs) and AI applications. 🆓🌐 +## 🚀 New Changes Will be Released Soon + +- 🚀 10x faster!! +- 📜 Execute custome JavaScript before crawling! +- 🤝 Colab friendly! +- 📚 Chunking strategies: topic-based, regex, sentence, and more! +- 🧠 Extraction strategies: cosine clustering, LLM, and more! +- 🎯 CSS selector support +- 📝 Pass instructions/keywords to refine extraction + ## 🚧 Work in Progress 👷‍♂️ -- 🔧 Separate Crawl and Extract Semantic Chunk: Enhancing efficiency in large-scale tasks. -- 🔍 Colab Integration: Exploring integration with Google Colab for easy experimentation. -- 🎯 XPath and CSS Selector Support: Adding support for selective retrieval of specific elements. - 📷 Image Captioning: Incorporating image captioning capabilities to extract descriptions from images. - 💾 Embedding Vector Data: Generate and store embedding data for each crawled website. - 🔍 Semantic Search Engine: Building a semantic search engine that fetches content, performs vector search similarity, and generates labeled chunk data based on user queries and URLs.