4.0 KiB
Crawl4AI
Welcome to the official documentation for Crawl4AI! 🕷️🤖 Crawl4AI is an open-source Python library designed to simplify web crawling and extract useful information from web pages. This documentation will guide you through the features, usage, and customization of Crawl4AI.
Introduction
Crawl4AI has one clear task: to make crawling and data extraction from web pages easy and efficient, especially for large language models (LLMs) and AI applications. Whether you are using it as a REST API or a Python library, Crawl4AI offers a robust and flexible solution with full asynchronous support.
Quick Start
Here's a quick example to show you how easy it is to use Crawl4AI with its new asynchronous capabilities:
import asyncio
from crawl4ai import AsyncWebCrawler
async def main():
# Create an instance of AsyncWebCrawler
async with AsyncWebCrawler(verbose=True) as crawler:
# Run the crawler on a URL
result = await crawler.arun(url="https://www.nbcnews.com/business")
# Print the extracted content
print(result.markdown)
# Run the async main function
asyncio.run(main())
Explanation
- Importing the Library: We start by importing the
AsyncWebCrawlerclass from thecrawl4ailibrary and theasynciomodule. - Creating an Async Context: We use an async context manager to create an instance of
AsyncWebCrawler. - Running the Crawler: The
arun()method is used to asynchronously crawl the specified URL and extract meaningful content. - Printing the Result: The extracted content is printed, showcasing the data extracted from the web page.
- Running the Async Function: We use
asyncio.run()to execute our async main function.
Documentation Structure
This documentation is organized into several sections to help you navigate and find the information you need quickly:
Home
An introduction to Crawl4AI, including a quick start guide and an overview of the documentation structure.
Installation
Instructions on how to install Crawl4AI and its dependencies.
Introduction
A detailed introduction to Crawl4AI, its features, and how it can be used for various web crawling and data extraction tasks.
Quick Start
A step-by-step guide to get you up and running with Crawl4AI, including installation instructions and basic usage examples.
Examples
This section contains practical examples demonstrating different use cases of Crawl4AI:
- Structured Data Extraction
- LLM Extraction
- JS Execution & CSS Filtering
- Hooks & Auth
- Summarization
- Research Assistant
Full Details of Using Crawler
Comprehensive details on using the crawler, including:
- Crawl Request Parameters
- Crawl Result Class
- Session Based Crawling
- Advanced Structured Data Extraction JsonCssExtraction
- Advanced Features
- Chunking Strategies
- Extraction Strategies
Change Log
A log of all changes, updates, and improvements made to Crawl4AI.
Contact
Information on how to get in touch with the developers, report issues, and contribute to the project.
Get Started
To get started with Crawl4AI, follow the quick start guide above or explore the detailed sections of this documentation. Whether you are a beginner or an advanced user, Crawl4AI has something to offer to make your web crawling and data extraction tasks easier, more efficient, and now fully asynchronous.
Happy Crawling! 🕸️🚀