From 529a79725e267e0abd119482bc498d74a414176d Mon Sep 17 00:00:00 2001 From: Aravind Karnam Date: Tue, 18 Mar 2025 16:14:00 +0530 Subject: [PATCH] docs: remove hallucinations from docs for CrawlerRunConfig + Add chunking strategy docs in the table --- docs/md_v2/api/parameters.md | 3 ++- docs/md_v2/core/browser-crawler-config.md | 26 ----------------------- 2 files changed, 2 insertions(+), 27 deletions(-) diff --git a/docs/md_v2/api/parameters.md b/docs/md_v2/api/parameters.md index b3e4349b..7e615a8c 100644 --- a/docs/md_v2/api/parameters.md +++ b/docs/md_v2/api/parameters.md @@ -69,7 +69,8 @@ We group them by category. | **Parameter** | **Type / Default** | **What It Does** | |------------------------------|--------------------------------------|-------------------------------------------------------------------------------------------------| | **`word_count_threshold`** | `int` (default: ~200) | Skips text blocks below X words. Helps ignore trivial sections. | -| **`extraction_strategy`** | `ExtractionStrategy` (default: None) | If set, extracts structured data (CSS-based, LLM-based, etc.). | +| **`extraction_strategy`** | `ExtractionStrategy` (default: None) | If set, extracts structured data (CSS-based, LLM-based, etc.). +| **`chunking_strategy`** | `ChunkingStrategy` (default: RegexChunking) | If set, extracts structured data (CSS-based, LLM-based, etc.). | | **`markdown_generator`** | `MarkdownGenerationStrategy` (None) | If you want specialized markdown output (citations, filtering, chunking, etc.). | | **`css_selector`** | `str` (None) | Retains only the part of the page matching this selector. Affects the entire extraction process. | | **`target_elements`** | `List[str]` (None) | List of CSS selectors for elements to focus on for markdown generation and data extraction, while still processing the entire page for links, media, etc. Provides more flexibility than `css_selector`. | diff --git a/docs/md_v2/core/browser-crawler-config.md b/docs/md_v2/core/browser-crawler-config.md index 0d97e0fc..a080fca3 100644 --- a/docs/md_v2/core/browser-crawler-config.md +++ b/docs/md_v2/core/browser-crawler-config.md @@ -136,11 +136,6 @@ class CrawlerRunConfig: wait_for=None, screenshot=False, pdf=False, - enable_rate_limiting=False, - rate_limit_config=None, - memory_threshold_percent=70.0, - check_interval=1.0, - max_session_permit=20, display_mode=None, verbose=True, stream=False, # Enable streaming for arun_many() @@ -183,25 +178,7 @@ class CrawlerRunConfig: - Logs additional runtime details. - Overlaps with the browser’s verbosity if also set to `True` in `BrowserConfig`. -9. **`enable_rate_limiting`**: - - If `True`, enables rate limiting for batch processing. - - Requires `rate_limit_config` to be set. -10. **`memory_threshold_percent`**: - - The memory threshold (as a percentage) to monitor. - - If exceeded, the crawler will pause or slow down. - -11. **`check_interval`**: - - The interval (in seconds) to check system resources. - - Affects how often memory and CPU usage are monitored. - -12. **`max_session_permit`**: - - The maximum number of concurrent crawl sessions. - - Helps prevent overwhelming the system. - -13. **`display_mode`**: - - The display mode for progress information (`DETAILED`, `BRIEF`, etc.). - - Affects how much information is printed during the crawl. ### Helper Methods @@ -236,9 +213,6 @@ The `clone()` method: --- - - - ## 3. LLMConfig Essentials ### Key fields to note