Update README file

2024-05-09 22:48:42 +08:00
parent a8e7218769
commit f74f4e88c0
1 changed files with 51 additions and 63 deletions
--- a/README.md
+++ b/README.md
@@ -46,9 +46,53 @@ pip install -e .
 ```python
 from crawl4ai.web_crawler import WebCrawler
 from crawl4ai.models import UrlModel
+import os
+
+crawler = WebCrawler(db_path='crawler_data.db')
 ```

-3. Use the Crawl4AI library in your project as needed. Refer to the [Usage with Python](#usage-with-python-) section for more details.
+a. Fetch a single page:
+```python
+single_url = UrlModel(url='https://kidocode.com', forced=False)
+result = crawl4ai.fetch_page(
+    single_url, 
+    provider= "openai/gpt-3.5-turbo", 
+    api_token = os.getenv('OPENAI_API_KEY'), 
+    extract_blocks_flag=True,
+    word_count_threshold=5 # Minimum word count for a HTML tag to be considered as a worthy block
+)
+print(result.model_dump())
+```
+
+b. Fetch multiple pages:
+```python
+urls = [
+    UrlModel(url='http://example.com', forced=False),
+    UrlModel(url='http://example.org', forced=False)
+]
+results = crawl4ai.fetch_pages(
+    urls, 
+    provider= "openai/gpt-3.5-turbo", 
+    api_token = os.getenv('OPENAI_API_KEY'), 
+    extract_blocks_flag=True, 
+    word_count_threshold=5
+)
+
+for res in results:
+    print(res.model_dump())
+```
+
+The response model is a `CrawlResponse` object that contains the following attributes:
+```python
+class CrawlResult(BaseModel):
+    url: str
+    html: str
+    success: bool
+    cleaned_html: str = None
+    markdown: str = None
+    parsed_json: str = None
+    error_message: str = None
+```

 ### Running Crawl4AI as a Local Server 🚀

@@ -82,20 +126,13 @@ docker run -d -p 8000:80 crawl4ai

 6. Access the application at `http://localhost:8000`.

-For more detailed instructions and advanced configuration options, please refer to the [installation guide](https://github.com/unclecode/crawl4ai/blob/main/INSTALL.md).
-
-Choose the approach that best suits your needs. If you want to integrate Crawl4AI into your existing Python projects, installing it as a library is the way to go. If you prefer to run Crawl4AI as a standalone service and interact with it via API endpoints, running it as a local server using Docker is the recommended approach.
-
-## Usage with Python 🐍
-
-Here's an example of how to use Crawl4AI with Python to crawl a webpage and retrieve the extracted data:
-
-1. Make sure you have the `requests` library installed. You can install it using pip:
+- CURL Example:
 ```sh
-pip install requests
+curl -X POST -H "Content-Type: application/json" -d '{"urls":["https://techcrunch.com/"],"provider_model":"openai/gpt-3.5-turbo","api_token":"your_api_token","include_raw_html":true,"forced":false,"extract_blocks":true,"word_count_threshold":10}' http://localhost:8000/crawl
 ```
+**Set the api_token to your OpenAI API key or any other provider you are using.**

-2. Use the following Python code to send a request to the Crawl4AI server and retrieve the crawled data:
+- Python Example:
 ```python
 import requests
 import os
@@ -133,58 +170,9 @@ The response from the server includes the parsed JSON, cleaned HTML, and markdow

 Make sure to replace `"http://localhost:8000/crawl"` with the appropriate server URL if your Crawl4AI server is running on a different host or port.

-## Using Crawl4AI as a Python Library 📚
+Choose the approach that best suits your needs. If you want to integrate Crawl4AI into your existing Python projects, installing it as a library is the way to go. If you prefer to run Crawl4AI as a standalone service and interact with it via API endpoints, running it as a local server using Docker is the recommended approach.

-You can also use Crawl4AI as a Python library in your own projects. Here's an example of how to use the Crawl4AI library:
-
-1. Install the required dependencies:
-```sh
-pip install -r requirements.txt
-```
-
-2. Import the necessary modules and initialize the `WebCrawler`:
-```python
-from crawl4ai.web_crawler import WebCrawler
-from crawl4ai.models import UrlModel
-import os
-
-crawler = WebCrawler(db_path='crawler_data.db')
-```
-
-3. Fetch a single page:
-```python
-single_url = UrlModel(url='https://kidocode.com', forced=False)
-result = crawl4ai.fetch_page(
-    single_url, 
-    provider= "openai/gpt-3.5-turbo", 
-    api_token = os.getenv('OPENAI_API_KEY'), 
-    extract_blocks_flag=True,
-    word_count_threshold=5 # Minimum word count for a HTML tag to be considered as a worthy block
-)
-print(result.model_dump())
-```
-
-4. Fetch multiple pages:
-```python
-urls = [
-    UrlModel(url='http://example.com', forced=False),
-    UrlModel(url='http://example.org', forced=False)
-]
-results = crawl4ai.fetch_pages(
-    urls, 
-    provider= "openai/gpt-3.5-turbo", 
-    api_token = os.getenv('OPENAI_API_KEY'), 
-    extract_blocks_flag=True, 
-    word_count_threshold=5
-)
-
-for res in results:
-    print(res.json())
-```
-
-This code demonstrates how to use the Crawl4AI library to fetch a single page or multiple pages. The `WebCrawler` is initialized with the path to the database, and the `fetch_page` and `fetch_pages` methods are used to crawl the specified URLs.
-
-Make sure to check the config.py tp set required environment variables.
+**Make sure to check the config.py tp set required environment variables.**

 That's it! You can now integrate Crawl4AI into your Python projects and leverage its web crawling capabilities. 🎉