From aac4e073895e5a2a03af91427356bb09388df71b Mon Sep 17 00:00:00 2001 From: unclecode Date: Sun, 12 May 2024 12:39:31 +0800 Subject: [PATCH] `chore: Update README.md and project structure` --- CHANGELOG.md | 31 +++++++++++++++++++++++++++++++ README.md | 11 +++++++++++ 2 files changed, 42 insertions(+) create mode 100644 CHANGELOG.md diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 00000000..9da5fd0e --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,31 @@ +# Changelog + +All notable changes to this project will be documented in this file. + +## [Unreleased] + +### Added +- 🔧 Separate Crawl and Extract JSON Semantic Chunk: Enhancing flexibility and efficiency in large-scale web crawling tasks. +- 🔍 Colab Integration: Exploring integration with Google Colab for easy experimentation in a collaborative notebook environment. +- 🎯 XPath and CSS Selector Support: Adding support for selective retrieval of specific elements from web pages. +- 📷 Image Captioning: Incorporating image captioning capabilities to extract meaningful descriptions from images. +- 💾 Embedding Data Generation and Storage: Developing functionalities to generate and store embedding data for each crawled website. +- 🔍 Semantic Search Engine: Building a semantic search engine that fetches content, performs vector search similarity, and generates labeled chunk data based on user queries and URLs. + +### Changed +- None + +### Deprecated +- None + +### Removed +- None + +### Fixed +- None + +### Security +- None + +## [1.0.0] - YYYY-MM-DD +- Initial release \ No newline at end of file diff --git a/README.md b/README.md index 7fc7a668..cf52e632 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,17 @@ Crawl4AI is a powerful, free web crawling service designed to extract useful information from web pages and make it accessible for large language models (LLMs) and AI applications. 🆓🌐 +## 🚧 Work in Progress 👷‍♂️ + +- 🔧 Separate Crawl and Extract JSON Semantic Chunk: Enhancing flexibility and efficiency in large-scale web crawling tasks. +- 🔍 Colab Integration: Exploring integration with Google Colab for easy experimentation in a collaborative notebook environment. +- 🎯 XPath and CSS Selector Support: Adding support for selective retrieval of specific elements from web pages. +- 📷 Image Captioning: Incorporating image captioning capabilities to extract meaningful descriptions from images. +- 💾 Embedding Data Generation and Storage: Developing functionalities to generate and store embedding data for each crawled website. +- 🔍 Semantic Search Engine: Building a semantic search engine that fetches content, performs vector search similarity, and generates labeled chunk data based on user queries and URLs. + +For more details, refer to the [CHANGELOG.md](https://github.com/unclecode/crawl4ai/edit/main/CHANGELOG.md) file. + ## Features ✨ - 🕷️ Efficient web crawling to extract valuable data from websites