Push async version last changes for merge to main branch

This commit is contained in:
unclecode
2024-09-24 20:52:08 +08:00
parent d628bc4034
commit 4d48bd31ca
61 changed files with 6219 additions and 891 deletions

View File

@@ -1,193 +1,92 @@
# Installation 💻
There are three ways to use Crawl4AI:
Crawl4AI offers flexible installation options to suit various use cases. You can install it as a Python package, use it with Docker, or run it as a local server.
1. As a library (Recommended).
2. As a local server (Docker) or using the REST API.
3. As a local server (Docker) using the pre-built image from Docker Hub.
## Option 1: Python Package Installation (Recommended)
## Option 1: Library Installation
Crawl4AI is now available on PyPI, making installation easier than ever. Choose the option that best fits your needs:
You can try this Colab for a quick start: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1sJPAmeLj5PMrg2VgOwMJ2ubGIcK0cJeX#scrollTo=g1RrmI4W_rPk)
### Basic Installation
Crawl4AI offers flexible installation options to suit various use cases. Choose the option that best fits your needs:
For basic web crawling and scraping tasks:
- **Default Installation** (Basic functionality):
```bash
virtualenv venv
source venv/bin/activate
pip install "crawl4ai @ git+https://github.com/unclecode/crawl4ai.git"
pip install crawl4ai
playwright install # Install Playwright dependencies
```
Use this for basic web crawling and scraping tasks.
- **Installation with PyTorch** (For advanced text clustering):
### Installation with PyTorch
For advanced text clustering (includes CosineSimilarity cluster strategy):
```bash
virtualenv venv
source venv/bin/activate
pip install "crawl4ai[torch] @ git+https://github.com/unclecode/crawl4ai.git"
pip install crawl4ai[torch]
```
Choose this if you need the CosineSimilarity cluster strategy.
- **Installation with Transformers** (For summarization and Hugging Face models):
### Installation with Transformers
For text summarization and Hugging Face models:
```bash
virtualenv venv
source venv/bin/activate
pip install "crawl4ai[transformer] @ git+https://github.com/unclecode/crawl4ai.git"
pip install crawl4ai[transformer]
```
Opt for this if you require text summarization or plan to use Hugging Face models.
- **Full Installation** (All features):
### Full Installation
For all features:
```bash
virtualenv venv
source venv/bin/activate
pip install "crawl4ai[all] @ git+https://github.com/unclecode/crawl4ai.git"
pip install crawl4ai[all]
```
This installs all dependencies for full functionality.
- **Development Installation** (For contributors):
### Development Installation
For contributors who plan to modify the source code:
```bash
virtualenv venv
source venv/bin/activate
git clone https://github.com/unclecode/crawl4ai.git
cd crawl4ai
pip install -e ".[all]"
playwright install # Install Playwright dependencies
```
Use this if you plan to modify the source code.
💡 After installation, if you have used "torch", "transformer" or "all", it's recommended to run the following CLI command to load the required models. This is optional but will boost the performance and speed of the crawler. You need to do this only once, this is only for when you install using []
💡 After installation with "torch", "transformer", or "all" options, it's recommended to run the following CLI command to load the required models:
```bash
crawl4ai-download-models
```
## Option 2: Using Docker for Local Server
This is optional but will boost the performance and speed of the crawler. You only need to do this once after installation.
Crawl4AI can be run as a local server using Docker. The Dockerfile supports different installation options to cater to various use cases. Here's how you can build and run the Docker image:
## Option 2: Using Docker (Coming Soon)
### Default Installation
Docker support for Crawl4AI is currently in progress and will be available soon. This will allow you to run Crawl4AI in a containerized environment, ensuring consistency across different systems.
The default installation includes the basic Crawl4AI package without additional dependencies or pre-downloaded models.
## Option 3: Local Server Installation
```bash
# For Mac users (M1/M2)
docker build --platform linux/amd64 -t crawl4ai .
For those who prefer to run Crawl4AI as a local server, instructions will be provided once the Docker implementation is complete.
# For other users
docker build -t crawl4ai .
## Verifying Your Installation
# Run the container
docker run -d -p 8000:80 crawl4ai
After installation, you can verify that Crawl4AI is working correctly by running a simple Python script:
```python
import asyncio
from crawl4ai import AsyncWebCrawler
async def main():
async with AsyncWebCrawler(verbose=True) as crawler:
result = await crawler.arun(url="https://www.example.com")
print(result.markdown[:500]) # Print first 500 characters
if __name__ == "__main__":
asyncio.run(main())
```
### Full Installation (All Dependencies and Models)
This script should successfully crawl the example website and print the first 500 characters of the extracted content.
This option installs all dependencies and downloads the models.
## Getting Help
```bash
# For Mac users (M1/M2)
docker build --platform linux/amd64 --build-arg INSTALL_OPTION=all -t crawl4ai:all .
# For other users
docker build --build-arg INSTALL_OPTION=all -t crawl4ai:all .
# Run the container
docker run -d -p 8000:80 crawl4ai:all
```
### Torch Installation
This option installs torch-related dependencies and downloads the models.
```bash
# For Mac users (M1/M2)
docker build --platform linux/amd64 --build-arg INSTALL_OPTION=torch -t crawl4ai:torch .
# For other users
docker build --build-arg INSTALL_OPTION=torch -t crawl4ai:torch .
# Run the container
docker run -d -p 8000:80 crawl4ai:torch
```
### Transformer Installation
This option installs transformer-related dependencies and downloads the models.
```bash
# For Mac users (M1/M2)
docker build --platform linux/amd64 --build-arg INSTALL_OPTION=transformer -t crawl4ai:transformer .
# For other users
docker build --build-arg INSTALL_OPTION=transformer -t crawl4ai:transformer .
# Run the container
docker run -d -p 8000:80 crawl4ai:transformer
```
### Notes
- The `--platform linux/amd64` flag is necessary for Mac users with M1/M2 chips to ensure compatibility.
- The `-t` flag tags the image with a name (and optionally a tag in the 'name:tag' format).
- The `-d` flag runs the container in detached mode.
- The `-p 8000:80` flag maps port 8000 on the host to port 80 in the container.
Choose the installation option that best suits your needs. The default installation is suitable for basic usage, while the other options provide additional capabilities for more advanced use cases.
## Option 3: Using the Pre-built Image from Docker Hub
You can use pre-built Crawl4AI images from Docker Hub, which are available for all platforms (Mac, Linux, Windows). We have official images as well as a community-contributed image (Thanks to https://github.com/FractalMind):
### Default Installation
```bash
# Pull the image
docker pull unclecode/crawl4ai:latest
# Run the container
docker run -d -p 8000:80 unclecode/crawl4ai:latest
```
### Community-Contributed Image
A stable version of Crawl4AI is also available, created and maintained by a community member:
```bash
# Pull the community-contributed image
docker pull ryser007/crawl4ai:stable
# Run the container
docker run -d -p 8000:80 ryser007/crawl4ai:stable
```
We'd like to express our gratitude to GitHub user [@FractalMind](https://github.com/FractalMind) for creating and maintaining this stable version of the Crawl4AI Docker image. Community contributions like this are invaluable to the project.
### Testing the Installation
After running the container, you can test if it's working correctly:
- On Mac and Linux:
```bash
curl http://localhost:8000
```
- On Windows (PowerShell):
```powershell
Invoke-WebRequest -Uri http://localhost:8000
```
Or open a web browser and navigate to http://localhost:8000
If you encounter any issues during installation or usage, please check the [documentation](https://crawl4ai.com/mkdocs/) or raise an issue on the [GitHub repository](https://github.com/unclecode/crawl4ai/issues).
Happy crawling! 🕷️🤖