docs(api): improve formatting and readability of API documentation

Enhanced markdown formatting, fixed list indentation, and improved readability across multiple API documentation files: - arun.md - arun_many.md - async-webcrawler.md - parameters.md Changes include: - Consistent list formatting and indentation - Better spacing between sections - Clearer separation of content blocks - Fixed quotation marks and code block formatting
2025-01-25 22:06:11 +08:00
parent 09ac7ed008
commit 54c84079c4
4 changed files with 103 additions and 80 deletions
--- a/docs/md_v2/api/arun_many.md
+++ b/docs/md_v2/api/arun_many.md
@@ -1,6 +1,6 @@
 # `arun_many(...)` Reference

-> **Note**: This function is very similar to [`arun()`](./arun.md) but focused on **concurrent** or **batch** crawling. If you’re unfamiliar with `arun()` usage, please read that doc first, then review this for differences.
+> **Note**: This function is very similar to [`arun()`](./arun.md) but focused on **concurrent** or **batch** crawling. If you’re unfamiliar with `arun()` usage, please read that doc first, then review this for differences.

 ## Function Signature

@@ -16,7 +16,7 @@ async def arun_many(

    :param urls: A list of URLs (or tasks) to crawl.
    :param config: (Optional) A default `CrawlerRunConfig` applying to each crawl.
-    :param dispatcher: (Optional) A concurrency controller (e.g. MemoryAdaptiveDispatcher).
+    :param dispatcher: (Optional) A concurrency controller (e.g. MemoryAdaptiveDispatcher).
    ...
    :return: Either a list of `CrawlResult` objects, or an async generator if streaming is enabled.
    """
@@ -24,22 +24,26 @@ async def arun_many(

 ## Differences from `arun()`

-1. **Multiple URLs**:  
-   - Instead of crawling a single URL, you pass a list of them (strings or tasks).  
+1. **Multiple URLs**:  
+   
+   - Instead of crawling a single URL, you pass a list of them (strings or tasks).  
   - The function returns either a **list** of `CrawlResult` or an **async generator** if streaming is enabled.

-2. **Concurrency & Dispatchers**:  
-   - **`dispatcher`** param allows advanced concurrency control.  
-   - If omitted, a default dispatcher (like `MemoryAdaptiveDispatcher`) is used internally.  
+2. **Concurrency & Dispatchers**:  
+
+   - **`dispatcher`** param allows advanced concurrency control.  
+   - If omitted, a default dispatcher (like `MemoryAdaptiveDispatcher`) is used internally.  
   - Dispatchers handle concurrency, rate limiting, and memory-based adaptive throttling (see [Multi-URL Crawling](../advanced/multi-url-crawling.md)).

-3. **Streaming Support**:  
+3. **Streaming Support**:  
+
   - Enable streaming by setting `stream=True` in your `CrawlerRunConfig`.
   - When streaming, use `async for` to process results as they become available.
   - Ideal for processing large numbers of URLs without waiting for all to complete.

-4. **Parallel** Execution**:  
-   - `arun_many()` can run multiple requests concurrently under the hood.  
+4. **Parallel** Execution**:  
+
+   - `arun_many()` can run multiple requests concurrently under the hood.  
   - Each `CrawlResult` might also include a **`dispatch_result`** with concurrency details (like memory usage, start/end times).

 ### Basic Example (Batch Mode)
@@ -93,19 +97,19 @@ results = await crawler.arun_many(

 **Key Points**:
 - Each URL is processed by the same or separate sessions, depending on the dispatcher’s strategy.
- `dispatch_result` in each `CrawlResult` (if using concurrency) can hold memory and timing info.  
+- `dispatch_result` in each `CrawlResult` (if using concurrency) can hold memory and timing info.  
 - If you need to handle authentication or session IDs, pass them in each individual task or within your run config.

 ### Return Value

-Either a **list** of [`CrawlResult`](./crawl-result.md) objects, or an **async generator** if streaming is enabled. You can iterate to check `result.success` or read each item’s `extracted_content`, `markdown`, or `dispatch_result`.
+Either a **list** of [`CrawlResult`](./crawl-result.md) objects, or an **async generator** if streaming is enabled. You can iterate to check `result.success` or read each item’s `extracted_content`, `markdown`, or `dispatch_result`.

 ---

 ## Dispatcher Reference

- **`MemoryAdaptiveDispatcher`**: Dynamically manages concurrency based on system memory usage.  
- **`SemaphoreDispatcher`**: Fixed concurrency limit, simpler but less adaptive.  
+- **`MemoryAdaptiveDispatcher`**: Dynamically manages concurrency based on system memory usage.  
+- **`SemaphoreDispatcher`**: Fixed concurrency limit, simpler but less adaptive.  

 For advanced usage or custom settings, see [Multi-URL Crawling with Dispatchers](../advanced/multi-url-crawling.md).

@@ -113,12 +117,14 @@ For advanced usage or custom settings, see [Multi-URL Crawling with Dispatchers]

 ## Common Pitfalls

-1. **Large Lists**: If you pass thousands of URLs, be mindful of memory or rate-limits. A dispatcher can help.  
-2. **Session Reuse**: If you need specialized logins or persistent contexts, ensure your dispatcher or tasks handle sessions accordingly.  
-3. **Error Handling**: Each `CrawlResult` might fail for different reasons—always check `result.success` or the `error_message` before proceeding.
+1. **Large Lists**: If you pass thousands of URLs, be mindful of memory or rate-limits. A dispatcher can help.  
+
+2. **Session Reuse**: If you need specialized logins or persistent contexts, ensure your dispatcher or tasks handle sessions accordingly.  
+
+3. **Error Handling**: Each `CrawlResult` might fail for different reasons—always check `result.success` or the `error_message` before proceeding.

 ---

 ## Conclusion

-Use `arun_many()` when you want to **crawl multiple URLs** simultaneously or in controlled parallel tasks. If you need advanced concurrency features (like memory-based adaptive throttling or complex rate-limiting), provide a **dispatcher**. Each result is a standard `CrawlResult`, possibly augmented with concurrency stats (`dispatch_result`) for deeper inspection. For more details on concurrency logic and dispatchers, see the [Advanced Multi-URL Crawling](../advanced/multi-url-crawling.md) docs.
+Use `arun_many()` when you want to **crawl multiple URLs** simultaneously or in controlled parallel tasks. If you need advanced concurrency features (like memory-based adaptive throttling or complex rate-limiting), provide a **dispatcher**. Each result is a standard `CrawlResult`, possibly augmented with concurrency stats (`dispatch_result`) for deeper inspection. For more details on concurrency logic and dispatchers, see the [Advanced Multi-URL Crawling](../advanced/multi-url-crawling.md) docs.