docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files: - arun.md - arun_many.md - async-webcrawler.md - parameters.md Changes include: - Consistent list formatting and indentation - Better spacing between sections - Clearer separation of content blocks - Fixed quotation marks and code block formatting
2025-01-25 22:06:11 +08:00
parent 09ac7ed008
commit 54c84079c4
4 changed files with 103 additions and 80 deletions
--- a/docs/md_v2/api/async-webcrawler.md
+++ b/docs/md_v2/api/async-webcrawler.md
@@ -1,16 +1,20 @@
 # AsyncWebCrawler

-The **`AsyncWebCrawler`** is the core class for asynchronous web crawling in Crawl4AI. You typically create it **once**, optionally customize it with a **`BrowserConfig`** (e.g., headless, user agent), then **run** multiple **`arun()`** calls with different **`CrawlerRunConfig`** objects.
+The **`AsyncWebCrawler`** is the core class for asynchronous web crawling in Crawl4AI. You typically create it **once**, optionally customize it with a **`BrowserConfig`** (e.g., headless, user agent), then **run** multiple **`arun()`** calls with different **`CrawlerRunConfig`** objects.

 **Recommended usage**:
-1. **Create** a `BrowserConfig` for global browser settings.  
-2. **Instantiate** `AsyncWebCrawler(config=browser_config)`.  
-3. **Use** the crawler in an async context manager (`async with`) or manage start/close manually.  
+
+1. **Create** a `BrowserConfig` for global browser settings.  
+
+2. **Instantiate** `AsyncWebCrawler(config=browser_config)`.  
+
+3. **Use** the crawler in an async context manager (`async with`) or manage start/close manually.  
+
 4. **Call** `arun(url, config=crawler_run_config)` for each page you want.

 ---

-## 1. Constructor Overview
+## 1. Constructor Overview

 ```python
 class AsyncWebCrawler:
@@ -37,7 +41,7 @@ class AsyncWebCrawler:
            base_directory:     
                Folder for storing caches/logs (if relevant).
            thread_safe: 
-                If True, attempts some concurrency safeguards. Usually False.
+                If True, attempts some concurrency safeguards. Usually False.
            **kwargs: 
                Additional legacy or debugging parameters.
        """
@@ -58,11 +62,12 @@ crawler = AsyncWebCrawler(config=browser_cfg)
 ```

 **Notes**:
+
 - **Legacy** parameters like `always_bypass_cache` remain for backward compatibility, but prefer to set **caching** in `CrawlerRunConfig`.

 ---

-## 2. Lifecycle: Start/Close or Context Manager
+## 2. Lifecycle: Start/Close or Context Manager

 ### 2.1 Context Manager (Recommended)

@@ -90,7 +95,7 @@ Use this style if you have a **long-running** application or need full control o

 ---

-## 3. Primary Method: `arun()`
+## 3. Primary Method: `arun()`

 ```python
 async def arun(
@@ -130,7 +135,7 @@ For **backward** compatibility, `arun()` can still accept direct arguments like

 ---

-## 4. Batch Processing: `arun_many()`
+## 4. Batch Processing: `arun_many()`

 ```python
 async def arun_many(
@@ -147,6 +152,7 @@ async def arun_many(
 ### 4.1 Resource-Aware Crawling

 The `arun_many()` method now uses an intelligent dispatcher that:
+
 - Monitors system memory usage
 - Implements adaptive rate limiting
 - Provides detailed progress monitoring
@@ -192,30 +198,34 @@ async with AsyncWebCrawler(config=browser_cfg) as crawler:

 ### 4.3 Key Features

-1. **Rate Limiting**
+1. **Rate Limiting**
+   
   - Automatic delay between requests
   - Exponential backoff on rate limit detection
   - Domain-specific rate limiting
   - Configurable retry strategy

-2. **Resource Monitoring**
+2. **Resource Monitoring**
+
   - Memory usage tracking
   - Adaptive concurrency based on system load
   - Automatic pausing when resources are constrained

-3. **Progress Monitoring**
+3. **Progress Monitoring**
+
   - Detailed or aggregated progress display
   - Real-time status updates
   - Memory usage statistics

-4. **Error Handling**
+4. **Error Handling**
+
   - Graceful handling of rate limits
   - Automatic retries with backoff
   - Detailed error reporting

 ---

-## 5. `CrawlResult` Output
+## 5. `CrawlResult` Output

 Each `arun()` returns a **`CrawlResult`** containing:

@@ -232,7 +242,7 @@ For details, see [CrawlResult doc](./crawl-result.md).

 ---

-## 6. Quick Example
+## 6. Quick Example

 Below is an example hooking it all together:

@@ -243,14 +253,14 @@ from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
 import json

 async def main():
-    # 1. Browser config
+    # 1. Browser config
    browser_cfg = BrowserConfig(
        browser_type="firefox",
        headless=False,
        verbose=True
    )

-    # 2. Run config
+    # 2. Run config
    schema = {
        "name": "Articles",
        "baseSelector": "article.post",
@@ -295,17 +305,18 @@ asyncio.run(main())
 ```

 **Explanation**:
- We define a **`BrowserConfig`** with Firefox, no headless, and `verbose=True`.  
- We define a **`CrawlerRunConfig`** that **bypasses cache**, uses a **CSS** extraction schema, has a `word_count_threshold=15`, etc.  
+
+- We define a **`BrowserConfig`** with Firefox, no headless, and `verbose=True`.  
+- We define a **`CrawlerRunConfig`** that **bypasses cache**, uses a **CSS** extraction schema, has a `word_count_threshold=15`, etc.  
 - We pass them to `AsyncWebCrawler(config=...)` and `arun(url=..., config=...)`.

 ---

-## 7. Best Practices & Migration Notes
+## 7. Best Practices & Migration Notes

-1. **Use** `BrowserConfig` for **global** settings about the browser’s environment.  
-2. **Use** `CrawlerRunConfig` for **per-crawl** logic (caching, content filtering, extraction strategies, wait conditions).  
-3. **Avoid** legacy parameters like `css_selector` or `word_count_threshold` directly in `arun()`. Instead:
+1. **Use** `BrowserConfig` for **global** settings about the browser’s environment.  
+2. **Use** `CrawlerRunConfig` for **per-crawl** logic (caching, content filtering, extraction strategies, wait conditions).  
+3. **Avoid** legacy parameters like `css_selector` or `word_count_threshold` directly in `arun()`. Instead:

   ```python
   run_cfg = CrawlerRunConfig(css_selector=".main-content", word_count_threshold=20)
@@ -316,16 +327,17 @@ asyncio.run(main())

 ---

-## 8. Summary
+## 8. Summary

 **AsyncWebCrawler** is your entry point to asynchronous crawling:

- **Constructor** accepts **`BrowserConfig`** (or defaults).  
- **`arun(url, config=CrawlerRunConfig)`** is the main method for single-page crawls.  
- **`arun_many(urls, config=CrawlerRunConfig)`** handles concurrency across multiple URLs.  
- For advanced lifecycle control, use `start()` and `close()` explicitly.  
+- **Constructor** accepts **`BrowserConfig`** (or defaults).  
+- **`arun(url, config=CrawlerRunConfig)`** is the main method for single-page crawls.  
+- **`arun_many(urls, config=CrawlerRunConfig)`** handles concurrency across multiple URLs.  
+- For advanced lifecycle control, use `start()` and `close()` explicitly.  

 **Migration**:  
+
 - If you used `AsyncWebCrawler(browser_type="chromium", css_selector="...")`, move browser settings to `BrowserConfig(...)` and content/crawl logic to `CrawlerRunConfig(...)`.

-This modular approach ensures your code is **clean**, **scalable**, and **easy to maintain**. For any advanced or rarely used parameters, see the [BrowserConfig docs](../api/parameters.md).
+This modular approach ensures your code is **clean**, **scalable**, and **easy to maintain**. For any advanced or rarely used parameters, see the [BrowserConfig docs](../api/parameters.md).