refactor(dispatcher): migrate to modular dispatcher system with enhanced monitoring

Reorganize dispatcher functionality into separate components: - Create dedicated dispatcher classes (MemoryAdaptive, Semaphore) - Add RateLimiter for smart request throttling - Implement CrawlerMonitor for real-time progress tracking - Move dispatcher config from CrawlerRunConfig to separate classes BREAKING CHANGE: Dispatcher configuration moved from CrawlerRunConfig to dedicated dispatcher classes. Users need to update their configuration approach for multi-URL crawling.
2025-01-11 21:10:27 +08:00
parent 3865342c93
commit 825c78a048
19 changed files with 1742 additions and 484 deletions
--- a/docs/md_v2/advanced/multi-url-crawling
+++ b/docs/md_v2/advanced/multi-url-crawling
@@ -0,0 +1,264 @@
+# Optimized Multi-URL Crawling
+
+> **Note**: We’re developing a new **executor module** that uses a sophisticated algorithm to dynamically manage multi-URL crawling, optimizing for speed and memory usage. The approaches in this document remain fully valid, but keep an eye on **Crawl4AI**’s upcoming releases for this powerful feature! Follow [@unclecode](https://twitter.com/unclecode) on X and check the changelogs to stay updated.
+
+
+Crawl4AI’s **AsyncWebCrawler** can handle multiple URLs in a single run, which can greatly reduce overhead and speed up crawling. This guide shows how to:
+
+1. **Sequentially** crawl a list of URLs using the **same** session, avoiding repeated browser creation.  
+2. **Parallel**-crawl subsets of URLs in batches, again reusing the same browser.  
+
+When the entire process finishes, you close the browser once—**minimizing** memory and resource usage.
+
+---
+
+## 1. Why Avoid Simple Loops per URL?
+
+If you naively do:
+
+```python
+for url in urls:
+    async with AsyncWebCrawler() as crawler:
+        result = await crawler.arun(url)
+```
+
+You end up:
+
+1. Spinning up a **new** browser for each URL  
+2. Closing it immediately after the single crawl  
+3. Potentially using a lot of CPU/memory for short-living browsers  
+4. Missing out on session reusability if you have login or ongoing states
+
+**Better** approaches ensure you **create** the browser once, then crawl multiple URLs with minimal overhead.
+
+---
+
+## 2. Sequential Crawling with Session Reuse
+
+### 2.1 Overview
+
+1. **One** `AsyncWebCrawler` instance for **all** URLs.  
+2. **One** session (via `session_id`) so we can preserve local storage or cookies across URLs if needed.  
+3. The crawler is only closed at the **end**.
+
+**This** is the simplest pattern if your workload is moderate (dozens to a few hundred URLs).
+
+### 2.2 Example Code
+
+```python
+import asyncio
+from typing import List
+from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
+from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
+
+async def crawl_sequential(urls: List[str]):
+    print("\n=== Sequential Crawling with Session Reuse ===")
+
+    browser_config = BrowserConfig(
+        headless=True,
+        # For better performance in Docker or low-memory environments:
+        extra_args=["--disable-gpu", "--disable-dev-shm-usage", "--no-sandbox"],
+    )
+
+    crawl_config = CrawlerRunConfig(
+        markdown_generator=DefaultMarkdownGenerator()
+    )
+
+    # Create the crawler (opens the browser)
+    crawler = AsyncWebCrawler(config=browser_config)
+    await crawler.start()
+
+    try:
+        session_id = "session1"  # Reuse the same session across all URLs
+        for url in urls:
+            result = await crawler.arun(
+                url=url,
+                config=crawl_config,
+                session_id=session_id
+            )
+            if result.success:
+                print(f"Successfully crawled: {url}")
+                # E.g. check markdown length
+                print(f"Markdown length: {len(result.markdown_v2.raw_markdown)}")
+            else:
+                print(f"Failed: {url} - Error: {result.error_message}")
+    finally:
+        # After all URLs are done, close the crawler (and the browser)
+        await crawler.close()
+
+async def main():
+    urls = [
+        "https://example.com/page1",
+        "https://example.com/page2",
+        "https://example.com/page3"
+    ]
+    await crawl_sequential(urls)
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+**Why It’s Good**:
+
+- **One** browser launch.  
+- Minimal memory usage.  
+- If the site requires login, you can log in once in `session_id` context and preserve auth across all URLs.
+
+---
+
+## 3. Parallel Crawling with Browser Reuse
+
+### 3.1 Overview
+
+To speed up crawling further, you can crawl multiple URLs in **parallel** (batches or a concurrency limit). The crawler still uses **one** browser, but spawns different sessions (or the same, depending on your logic) for each task.
+
+### 3.2 Example Code
+
+For this example make sure to install the [psutil](https://pypi.org/project/psutil/) package.
+
+```bash
+pip install psutil
+```
+
+Then you can run the following code:
+
+```python
+import os
+import sys
+import psutil
+import asyncio
+
+__location__ = os.path.dirname(os.path.abspath(__file__))
+__output__ = os.path.join(__location__, "output")
+
+# Append parent directory to system path
+parent_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+sys.path.append(parent_dir)
+
+from typing import List
+from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
+
+async def crawl_parallel(urls: List[str], max_concurrent: int = 3):
+    print("\n=== Parallel Crawling with Browser Reuse + Memory Check ===")
+
+    # We'll keep track of peak memory usage across all tasks
+    peak_memory = 0
+    process = psutil.Process(os.getpid())
+
+    def log_memory(prefix: str = ""):
+        nonlocal peak_memory
+        current_mem = process.memory_info().rss  # in bytes
+        if current_mem > peak_memory:
+            peak_memory = current_mem
+        print(f"{prefix} Current Memory: {current_mem // (1024 * 1024)} MB, Peak: {peak_memory // (1024 * 1024)} MB")
+
+    # Minimal browser config
+    browser_config = BrowserConfig(
+        headless=True,
+        verbose=False,   # corrected from 'verbos=False'
+        extra_args=["--disable-gpu", "--disable-dev-shm-usage", "--no-sandbox"],
+    )
+    crawl_config = CrawlerRunConfig(cache_mode=CacheMode.BYPASS)
+
+    # Create the crawler instance
+    crawler = AsyncWebCrawler(config=browser_config)
+    await crawler.start()
+
+    try:
+        # We'll chunk the URLs in batches of 'max_concurrent'
+        success_count = 0
+        fail_count = 0
+        for i in range(0, len(urls), max_concurrent):
+            batch = urls[i : i + max_concurrent]
+            tasks = []
+
+            for j, url in enumerate(batch):
+                # Unique session_id per concurrent sub-task
+                session_id = f"parallel_session_{i + j}"
+                task = crawler.arun(url=url, config=crawl_config, session_id=session_id)
+                tasks.append(task)
+
+            # Check memory usage prior to launching tasks
+            log_memory(prefix=f"Before batch {i//max_concurrent + 1}: ")
+
+            # Gather results
+            results = await asyncio.gather(*tasks, return_exceptions=True)
+
+            # Check memory usage after tasks complete
+            log_memory(prefix=f"After batch {i//max_concurrent + 1}: ")
+
+            # Evaluate results
+            for url, result in zip(batch, results):
+                if isinstance(result, Exception):
+                    print(f"Error crawling {url}: {result}")
+                    fail_count += 1
+                elif result.success:
+                    success_count += 1
+                else:
+                    fail_count += 1
+
+        print(f"\nSummary:")
+        print(f"  - Successfully crawled: {success_count}")
+        print(f"  - Failed: {fail_count}")
+
+    finally:
+        print("\nClosing crawler...")
+        await crawler.close()
+        # Final memory log
+        log_memory(prefix="Final: ")
+        print(f"\nPeak memory usage (MB): {peak_memory // (1024 * 1024)}")
+
+async def main():
+    urls = [
+        "https://example.com/page1",
+        "https://example.com/page2",
+        "https://example.com/page3",
+        "https://example.com/page4"
+    ]
+    await crawl_parallel(urls, max_concurrent=2)
+
+if __name__ == "__main__":
+    asyncio.run(main())
+
+```
+
+**Notes**:
+
+- We **reuse** the same `AsyncWebCrawler` instance for all parallel tasks, launching **one** browser.  
+- Each parallel sub-task might get its own `session_id` so they don’t share cookies/localStorage (unless that’s desired).  
+- We limit concurrency to `max_concurrent=2` or 3 to avoid saturating CPU/memory.
+
+---
+
+## 4. Performance Tips
+
+1. **Extra Browser Args**  
+   - `--disable-gpu`, `--no-sandbox` can help in Docker or restricted environments.  
+   - `--disable-dev-shm-usage` avoids using `/dev/shm` which can be small on some systems.
+
+2. **Session Reuse**  
+   - If your site requires a login or you want to maintain local data across URLs, share the **same** `session_id`.  
+   - If you want isolation (each URL fresh), create unique sessions.
+
+3. **Batching**  
+   - If you have **many** URLs (like thousands), you can do parallel crawling in chunks (like `max_concurrent=5`).  
+   - Use `arun_many()` for a built-in approach if you prefer, but the example above is often more flexible.
+
+4. **Cache**  
+   - If your pages share many resources or you’re re-crawling the same domain repeatedly, consider setting `cache_mode=CacheMode.ENABLED` in `CrawlerRunConfig`.  
+   - If you need fresh data each time, keep `cache_mode=CacheMode.BYPASS`.
+
+5. **Hooks**  
+   - You can set up global hooks for each crawler (like to block images) or per-run if you want.  
+   - Keep them consistent if you’re reusing sessions.
+
+---
+
+## 5. Summary
+
+- **One** `AsyncWebCrawler` + multiple calls to `.arun()` is far more efficient than launching a new crawler per URL.  
+- **Sequential** approach with a shared session is simple and memory-friendly for moderate sets of URLs.  
+- **Parallel** approach can speed up large crawls by concurrency, but keep concurrency balanced to avoid overhead.  
+- Close the crawler once at the end, ensuring the browser is only opened/closed once.
+
+For even more advanced memory optimizations or dynamic concurrency patterns, see future sections on hooking or distributed crawling. The patterns above suffice for the majority of multi-URL scenarios—**giving you speed, simplicity, and minimal resource usage**. Enjoy your optimized crawling!
--- a/docs/md_v2/advanced/multi-url-crawling.md
+++ b/docs/md_v2/advanced/multi-url-crawling.md
@@ -1,264 +1,205 @@
-# Optimized Multi-URL Crawling
+# Advanced Multi-URL Crawling with Dispatchers

-> **Note**: We’re developing a new **executor module** that uses a sophisticated algorithm to dynamically manage multi-URL crawling, optimizing for speed and memory usage. The approaches in this document remain fully valid, but keep an eye on **Crawl4AI**’s upcoming releases for this powerful feature! Follow [@unclecode](https://twitter.com/unclecode) on X and check the changelogs to stay updated.
+> **Heads Up**: Crawl4AI supports advanced dispatchers for **parallel** or **throttled** crawling, providing dynamic rate limiting and memory usage checks. The built-in `arun_many()` function uses these dispatchers to handle concurrency efficiently.

+## 1. Introduction

-Crawl4AI’s **AsyncWebCrawler** can handle multiple URLs in a single run, which can greatly reduce overhead and speed up crawling. This guide shows how to:
+When crawling many URLs:
+- **Basic**: Use `arun()` in a loop (simple but less efficient)
+- **Better**: Use `arun_many()`, which efficiently handles multiple URLs with proper concurrency control
+- **Best**: Customize dispatcher behavior for your specific needs (memory management, rate limits, etc.)

-1. **Sequentially** crawl a list of URLs using the **same** session, avoiding repeated browser creation.  
-2. **Parallel**-crawl subsets of URLs in batches, again reusing the same browser.  
+**Why Dispatchers?**  
+- **Adaptive**: Memory-based dispatchers can pause or slow down based on system resources
+- **Rate-limiting**: Built-in rate limiting with exponential backoff for 429/503 responses
+- **Real-time Monitoring**: Live dashboard of ongoing tasks, memory usage, and performance
+- **Flexibility**: Choose between memory-adaptive or semaphore-based concurrency

-When the entire process finishes, you close the browser once—**minimizing** memory and resource usage.
+## 2. Core Components

---
-
-## 1. Why Avoid Simple Loops per URL?
-
-If you naively do:
+### 2.1 Rate Limiter

 ```python
-for url in urls:
-    async with AsyncWebCrawler() as crawler:
-        result = await crawler.arun(url)
+class RateLimiter:
+    def __init__(
+        base_delay: Tuple[float, float] = (1.0, 3.0),  # Random delay range between requests
+        max_delay: float = 60.0,                        # Maximum backoff delay
+        max_retries: int = 3,                          # Retries before giving up
+        rate_limit_codes: List[int] = [429, 503]       # Status codes triggering backoff
+    )
 ```

-You end up:
+The RateLimiter provides:
+- Random delays between requests
+- Exponential backoff on rate limit responses
+- Domain-specific rate limiting
+- Automatic retry handling

-1. Spinning up a **new** browser for each URL  
-2. Closing it immediately after the single crawl  
-3. Potentially using a lot of CPU/memory for short-living browsers  
-4. Missing out on session reusability if you have login or ongoing states
+### 2.2 Crawler Monitor

-**Better** approaches ensure you **create** the browser once, then crawl multiple URLs with minimal overhead.
-
---
-
-## 2. Sequential Crawling with Session Reuse
-
-### 2.1 Overview
-
-1. **One** `AsyncWebCrawler` instance for **all** URLs.  
-2. **One** session (via `session_id`) so we can preserve local storage or cookies across URLs if needed.  
-3. The crawler is only closed at the **end**.
-
-**This** is the simplest pattern if your workload is moderate (dozens to a few hundred URLs).
-
-### 2.2 Example Code
+The CrawlerMonitor provides real-time visibility into crawling operations:

 ```python
-import asyncio
-from typing import List
-from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
-from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
-
-async def crawl_sequential(urls: List[str]):
-    print("\n=== Sequential Crawling with Session Reuse ===")
-
-    browser_config = BrowserConfig(
-        headless=True,
-        # For better performance in Docker or low-memory environments:
-        extra_args=["--disable-gpu", "--disable-dev-shm-usage", "--no-sandbox"],
-    )
-
-    crawl_config = CrawlerRunConfig(
-        markdown_generator=DefaultMarkdownGenerator()
-    )
-
-    # Create the crawler (opens the browser)
-    crawler = AsyncWebCrawler(config=browser_config)
-    await crawler.start()
-
-    try:
-        session_id = "session1"  # Reuse the same session across all URLs
-        for url in urls:
-            result = await crawler.arun(
-                url=url,
-                config=crawl_config,
-                session_id=session_id
-            )
-            if result.success:
-                print(f"Successfully crawled: {url}")
-                # E.g. check markdown length
-                print(f"Markdown length: {len(result.markdown_v2.raw_markdown)}")
-            else:
-                print(f"Failed: {url} - Error: {result.error_message}")
-    finally:
-        # After all URLs are done, close the crawler (and the browser)
-        await crawler.close()
-
-async def main():
-    urls = [
-        "https://example.com/page1",
-        "https://example.com/page2",
-        "https://example.com/page3"
-    ]
-    await crawl_sequential(urls)
-
-if __name__ == "__main__":
-    asyncio.run(main())
+monitor = CrawlerMonitor(
+    max_visible_rows=15,           # Maximum rows in live display
+    display_mode=DisplayMode.DETAILED  # DETAILED or AGGREGATED view
+)
 ```

-**Why It’s Good**:
+**Display Modes**:
+1. **DETAILED**: Shows individual task status, memory usage, and timing
+2. **AGGREGATED**: Displays summary statistics and overall progress

- **One** browser launch.  
- Minimal memory usage.  
- If the site requires login, you can log in once in `session_id` context and preserve auth across all URLs.
+## 3. Available Dispatchers

---
+### 3.1 MemoryAdaptiveDispatcher (Default)

-## 3. Parallel Crawling with Browser Reuse
-
-### 3.1 Overview
-
-To speed up crawling further, you can crawl multiple URLs in **parallel** (batches or a concurrency limit). The crawler still uses **one** browser, but spawns different sessions (or the same, depending on your logic) for each task.
-
-### 3.2 Example Code
-
-For this example make sure to install the [psutil](https://pypi.org/project/psutil/) package.
-
-```bash
-pip install psutil
-```
-
-Then you can run the following code:
+Automatically manages concurrency based on system memory usage:

 ```python
-import os
-import sys
-import psutil
-import asyncio
-
-__location__ = os.path.dirname(os.path.abspath(__file__))
-__output__ = os.path.join(__location__, "output")
-
-# Append parent directory to system path
-parent_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
-sys.path.append(parent_dir)
-
-from typing import List
-from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
-
-async def crawl_parallel(urls: List[str], max_concurrent: int = 3):
-    print("\n=== Parallel Crawling with Browser Reuse + Memory Check ===")
-
-    # We'll keep track of peak memory usage across all tasks
-    peak_memory = 0
-    process = psutil.Process(os.getpid())
-
-    def log_memory(prefix: str = ""):
-        nonlocal peak_memory
-        current_mem = process.memory_info().rss  # in bytes
-        if current_mem > peak_memory:
-            peak_memory = current_mem
-        print(f"{prefix} Current Memory: {current_mem // (1024 * 1024)} MB, Peak: {peak_memory // (1024 * 1024)} MB")
-
-    # Minimal browser config
-    browser_config = BrowserConfig(
-        headless=True,
-        verbose=False,   # corrected from 'verbos=False'
-        extra_args=["--disable-gpu", "--disable-dev-shm-usage", "--no-sandbox"],
+dispatcher = MemoryAdaptiveDispatcher(
+    memory_threshold_percent=70.0,  # Pause if memory exceeds this
+    check_interval=1.0,             # How often to check memory
+    max_session_permit=10,          # Maximum concurrent tasks
+    rate_limiter=RateLimiter(       # Optional rate limiting
+        base_delay=(1.0, 2.0),
+        max_delay=30.0,
+        max_retries=2
+    ),
+    monitor=CrawlerMonitor(         # Optional monitoring
+        max_visible_rows=15,
+        display_mode=DisplayMode.DETAILED
    )
-    crawl_config = CrawlerRunConfig(cache_mode=CacheMode.BYPASS)
-
-    # Create the crawler instance
-    crawler = AsyncWebCrawler(config=browser_config)
-    await crawler.start()
-
-    try:
-        # We'll chunk the URLs in batches of 'max_concurrent'
-        success_count = 0
-        fail_count = 0
-        for i in range(0, len(urls), max_concurrent):
-            batch = urls[i : i + max_concurrent]
-            tasks = []
-
-            for j, url in enumerate(batch):
-                # Unique session_id per concurrent sub-task
-                session_id = f"parallel_session_{i + j}"
-                task = crawler.arun(url=url, config=crawl_config, session_id=session_id)
-                tasks.append(task)
-
-            # Check memory usage prior to launching tasks
-            log_memory(prefix=f"Before batch {i//max_concurrent + 1}: ")
-
-            # Gather results
-            results = await asyncio.gather(*tasks, return_exceptions=True)
-
-            # Check memory usage after tasks complete
-            log_memory(prefix=f"After batch {i//max_concurrent + 1}: ")
-
-            # Evaluate results
-            for url, result in zip(batch, results):
-                if isinstance(result, Exception):
-                    print(f"Error crawling {url}: {result}")
-                    fail_count += 1
-                elif result.success:
-                    success_count += 1
-                else:
-                    fail_count += 1
-
-        print(f"\nSummary:")
-        print(f"  - Successfully crawled: {success_count}")
-        print(f"  - Failed: {fail_count}")
-
-    finally:
-        print("\nClosing crawler...")
-        await crawler.close()
-        # Final memory log
-        log_memory(prefix="Final: ")
-        print(f"\nPeak memory usage (MB): {peak_memory // (1024 * 1024)}")
-
-async def main():
-    urls = [
-        "https://example.com/page1",
-        "https://example.com/page2",
-        "https://example.com/page3",
-        "https://example.com/page4"
-    ]
-    await crawl_parallel(urls, max_concurrent=2)
-
-if __name__ == "__main__":
-    asyncio.run(main())
-
+)
 ```

-**Notes**:
+### 3.2 SemaphoreDispatcher

- We **reuse** the same `AsyncWebCrawler` instance for all parallel tasks, launching **one** browser.  
- Each parallel sub-task might get its own `session_id` so they don’t share cookies/localStorage (unless that’s desired).  
- We limit concurrency to `max_concurrent=2` or 3 to avoid saturating CPU/memory.
+Provides simple concurrency control with a fixed limit:

---
+```python
+dispatcher = SemaphoreDispatcher(
+    semaphore_count=5,             # Fixed concurrent tasks
+    rate_limiter=RateLimiter(      # Optional rate limiting
+        base_delay=(0.5, 1.0),
+        max_delay=10.0
+    ),
+    monitor=CrawlerMonitor(        # Optional monitoring
+        max_visible_rows=15,
+        display_mode=DisplayMode.DETAILED
+    )
+)
+```

-## 4. Performance Tips
+## 4. Usage Examples

-1. **Extra Browser Args**  
-   - `--disable-gpu`, `--no-sandbox` can help in Docker or restricted environments.  
-   - `--disable-dev-shm-usage` avoids using `/dev/shm` which can be small on some systems.
+### 4.1 Simple Usage (Default MemoryAdaptiveDispatcher)

-2. **Session Reuse**  
-   - If your site requires a login or you want to maintain local data across URLs, share the **same** `session_id`.  
-   - If you want isolation (each URL fresh), create unique sessions.
+```python
+async with AsyncWebCrawler(config=browser_config) as crawler:
+    results = await crawler.arun_many(urls, config=run_config)
+```

-3. **Batching**  
-   - If you have **many** URLs (like thousands), you can do parallel crawling in chunks (like `max_concurrent=5`).  
-   - Use `arun_many()` for a built-in approach if you prefer, but the example above is often more flexible.
+### 4.2 Memory Adaptive with Rate Limiting

-4. **Cache**  
-   - If your pages share many resources or you’re re-crawling the same domain repeatedly, consider setting `cache_mode=CacheMode.ENABLED` in `CrawlerRunConfig`.  
-   - If you need fresh data each time, keep `cache_mode=CacheMode.BYPASS`.
+```python
+async def crawl_with_memory_adaptive(urls):
+    browser_config = BrowserConfig(headless=True, verbose=False)
+    run_config = CrawlerRunConfig(cache_mode=CacheMode.BYPASS)
+    
+    dispatcher = MemoryAdaptiveDispatcher(
+        memory_threshold_percent=70.0,
+        max_session_permit=10,
+        rate_limiter=RateLimiter(
+            base_delay=(1.0, 2.0),
+            max_delay=30.0,
+            max_retries=2
+        ),
+        monitor=CrawlerMonitor(
+            max_visible_rows=15,
+            display_mode=DisplayMode.DETAILED
+        )
+    )
+    
+    async with AsyncWebCrawler(config=browser_config) as crawler:
+        results = await crawler.arun_many(
+            urls,
+            config=run_config,
+            dispatcher=dispatcher
+        )
+        return results
+```

-5. **Hooks**  
-   - You can set up global hooks for each crawler (like to block images) or per-run if you want.  
-   - Keep them consistent if you’re reusing sessions.
+### 4.3 Semaphore with Rate Limiting

---
+```python
+async def crawl_with_semaphore(urls):
+    browser_config = BrowserConfig(headless=True, verbose=False)
+    run_config = CrawlerRunConfig(cache_mode=CacheMode.BYPASS)
+    
+    dispatcher = SemaphoreDispatcher(
+        semaphore_count=5,
+        rate_limiter=RateLimiter(
+            base_delay=(0.5, 1.0),
+            max_delay=10.0
+        ),
+        monitor=CrawlerMonitor(
+            max_visible_rows=15,
+            display_mode=DisplayMode.DETAILED
+        )
+    )
+    
+    async with AsyncWebCrawler(config=browser_config) as crawler:
+        results = await crawler.arun_many(
+            urls, 
+            config=run_config,
+            dispatcher=dispatcher
+        )
+        return results
+```

-## 5. Summary
+## 5. Dispatch Results

- **One** `AsyncWebCrawler` + multiple calls to `.arun()` is far more efficient than launching a new crawler per URL.  
- **Sequential** approach with a shared session is simple and memory-friendly for moderate sets of URLs.  
- **Parallel** approach can speed up large crawls by concurrency, but keep concurrency balanced to avoid overhead.  
- Close the crawler once at the end, ensuring the browser is only opened/closed once.
+Each crawl result includes dispatch information:

-For even more advanced memory optimizations or dynamic concurrency patterns, see future sections on hooking or distributed crawling. The patterns above suffice for the majority of multi-URL scenarios—**giving you speed, simplicity, and minimal resource usage**. Enjoy your optimized crawling!
+```python
+@dataclass
+class DispatchResult:
+    task_id: str
+    memory_usage: float
+    peak_memory: float
+    start_time: datetime
+    end_time: datetime
+    error_message: str = ""
+```
+
+Access via `result.dispatch_result`:
+
+```python
+for result in results:
+    if result.success:
+        dr = result.dispatch_result
+        print(f"URL: {result.url}")
+        print(f"Memory: {dr.memory_usage:.1f}MB")
+        print(f"Duration: {dr.end_time - dr.start_time}")
+```
+
+## 6. Summary
+
+1. **Two Dispatcher Types**:
+   - MemoryAdaptiveDispatcher (default): Dynamic concurrency based on memory
+   - SemaphoreDispatcher: Fixed concurrency limit
+
+2. **Optional Components**:
+   - RateLimiter: Smart request pacing and backoff
+   - CrawlerMonitor: Real-time progress visualization
+
+3. **Key Benefits**:
+   - Automatic memory management
+   - Built-in rate limiting
+   - Live progress monitoring
+   - Flexible concurrency control
+
+Choose the dispatcher that best fits your needs:
+- **MemoryAdaptiveDispatcher**: For large crawls or limited resources
+- **SemaphoreDispatcher**: For simple, fixed-concurrency scenarios