Release prep (#749)

* fix: Update export of URLPatternFilter * chore: Add dependancy for cchardet in requirements * docs: Update example for deep crawl in release note for v0.5 * Docs: update the example for memory dispatcher * docs: updated example for crawl strategies * Refactor: Removed wrapping in if __name__==main block since this is a markdown file. * chore: removed cchardet from dependancy list, since unclecode is planning to remove it * docs: updated the example for proxy rotation to a working example * feat: Introduced ProxyConfig param * Add tutorial for deep crawl & update contributor list for bug fixes in feb alpha-1 * chore: update and test new dependancies * feat:Make PyPDF2 a conditional dependancy * updated tutorial and release note for v0.5 * docs: update docs for deep crawl, and fix a typo in docker-deployment markdown filename * refactor: 1. Deprecate markdown_v2 2. Make markdown backward compatible to behave as a string when needed. 3. Fix LlmConfig usage in cli 4. Deprecate markdown_v2 in cli 5. Update AsyncWebCrawler for changes in CrawlResult * fix: Bug in serialisation of markdown in acache_url * Refactor: Added deprecation errors for fit_html and fit_markdown directly on markdown. Now access them via markdown * fix: remove deprecated markdown_v2 from docker * Refactor: remove deprecated fit_markdown and fit_html from result * refactor: fix cache retrieval for markdown as a string * chore: update all docs, examples and tests with deprecation announcements for markdown_v2, fit_html, fit_markdown
2025-02-28 17:23:35 +05:30
parent 3a87b4e43b
commit a9e24307cc
38 changed files with 2040 additions and 326 deletions
--- a/docs/md_v2/core/crawler-result.md
+++ b/docs/md_v2/core/crawler-result.md
@@ -27,7 +27,6 @@ class CrawlResult(BaseModel):
    screenshot: Optional[str] = None
    pdf : Optional[bytes] = None
    markdown: Optional[Union[str, MarkdownGenerationResult]] = None
-    markdown_v2: Optional[MarkdownGenerationResult] = None
    extracted_content: Optional[str] = None
    metadata: Optional[dict] = None
    error_message: Optional[str] = None
@@ -52,8 +51,7 @@ class CrawlResult(BaseModel):
 | **downloaded_files (`Optional[List[str]]`)** | If `accept_downloads=True` in `BrowserConfig`, this lists the filepaths of saved downloads.         |
 | **screenshot (`Optional[str]`)**          | Screenshot of the page (base64-encoded) if `screenshot=True`.                                       |
 | **pdf (`Optional[bytes]`)**               | PDF of the page if `pdf=True`.                                                                      |
-| **markdown (`Optional[str or MarkdownGenerationResult]`)** | For now, `markdown_v2` holds a `MarkdownGenerationResult`. Over time, this will be consolidated into `markdown`. The generator can provide raw markdown, citations, references, and optionally `fit_markdown`. |
-| **markdown_v2 (`Optional[MarkdownGenerationResult]`)** | Legacy field for detailed markdown output. This will be replaced by `markdown` soon.                |
+| **markdown (`Optional[str or MarkdownGenerationResult]`)** | It holds a `MarkdownGenerationResult`. Over time, this will be consolidated into `markdown`. The generator can provide raw markdown, citations, references, and optionally `fit_markdown`. |
 | **extracted_content (`Optional[str]`)**   | The output of a structured extraction (CSS/LLM-based) stored as JSON string or other text.          |
 | **metadata (`Optional[dict]`)**           | Additional info about the crawl or extracted data.                                                  |
 | **error_message (`Optional[str]`)**       | If `success=False`, contains a short description of what went wrong.                                |
@@ -90,10 +88,10 @@ print(result.cleaned_html)  # Freed of forms, header, footer, data-* attributes

 ## 3. Markdown Generation

-### 3.1 `markdown_v2` (Legacy) vs `markdown`
+### 3.1 `markdown`

- **`markdown_v2`**: The current location for detailed markdown output, returning a **`MarkdownGenerationResult`** object.  
- **`markdown`**: Eventually, we’re merging these fields. For now, you might see `result.markdown_v2` used widely in code examples.
+- **`markdown`**: The current location for detailed markdown output, returning a **`MarkdownGenerationResult`** object.  
+- **`markdown_v2`**: Deprecated since v0.5.

 **`MarkdownGenerationResult`** Fields:

@@ -118,7 +116,7 @@ config = CrawlerRunConfig(
 )
 result = await crawler.arun(url="https://example.com", config=config)

-md_res = result.markdown_v2  # or eventually 'result.markdown'
+md_res = result.markdown  # or eventually 'result.markdown'
 print(md_res.raw_markdown[:500])
 print(md_res.markdown_with_citations)
 print(md_res.references_markdown)
@@ -224,15 +222,17 @@ Check any field:
 if result.success:
    print(result.status_code, result.response_headers)
    print("Links found:", len(result.links.get("internal", [])))
-    if result.markdown_v2:
-        print("Markdown snippet:", result.markdown_v2.raw_markdown[:200])
+    if result.markdown:
+        print("Markdown snippet:", result.markdown.raw_markdown[:200])
    if result.extracted_content:
        print("Structured JSON:", result.extracted_content)
 else:
    print("Error:", result.error_message)
 ```

-**Remember**: Use `result.markdown_v2` for now. It will eventually become `result.markdown`.
+**Deprecation**: Since v0.5 `result.markdown_v2`, `result.fit_html`,`result.fit_markdown` are deprecated. Use `result.markdown` instead! It holds `MarkdownGenerationResult`, which includes `fit_html` and `fit_markdown`
+as it's properties.
+

 ---

--- a/docs/md_v2/core/deep-crawling.md
+++ b/docs/md_v2/core/deep-crawling.md
@@ -0,0 +1,436 @@
+# Deep Crawling
+
+One of Crawl4AI's most powerful features is its ability to perform **configurable deep crawling** that can explore websites beyond a single page. With fine-tuned control over crawl depth, domain boundaries, and content filtering, Crawl4AI gives you the tools to extract precisely the content you need.
+
+In this tutorial, you'll learn:
+
+1. How to set up a **Basic Deep Crawler** with BFS strategy  
+2. Understanding the difference between **streamed and non-streamed** output  
+3. Implementing **filters and scorers** to target specific content  
+4. Creating **advanced filtering chains** for sophisticated crawls  
+5. Using **BestFirstCrawling** for intelligent exploration prioritization  
+
+> **Prerequisites**  
+> - You’ve completed or read [AsyncWebCrawler Basics](../core/simple-crawling.md) to understand how to run a simple crawl.  
+> - You know how to configure `CrawlerRunConfig`.
+
+---
+
+## 1. Quick Example
+
+Here's a minimal code snippet that implements a basic deep crawl using the **BFSDeepCrawlStrategy**:
+
+```python
+import asyncio
+from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
+from crawl4ai.deep_crawling import BFSDeepCrawlStrategy
+from crawl4ai.content_scraping_strategy import LXMLWebScrapingStrategy
+
+async def main():
+    # Configure a 2-level deep crawl
+    config = CrawlerRunConfig(
+        deep_crawl_strategy=BFSDeepCrawlStrategy(
+            max_depth=2, 
+            include_external=False
+        ),
+        scraping_strategy=LXMLWebScrapingStrategy(),
+        verbose=True
+    )
+    
+    async with AsyncWebCrawler() as crawler:
+        results = await crawler.arun("https://example.com", config=config)
+        
+        print(f"Crawled {len(results)} pages in total")
+        
+        # Access individual results
+        for result in results[:3]:  # Show first 3 results
+            print(f"URL: {result.url}")
+            print(f"Depth: {result.metadata.get('depth', 0)}")
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+**What's happening?**  
+- `BFSDeepCrawlStrategy(max_depth=2, include_external=False)` instructs Crawl4AI to:
+  - Crawl the starting page (depth 0) plus 2 more levels
+  - Stay within the same domain (don't follow external links)
+- Each result contains metadata like the crawl depth
+- Results are returned as a list after all crawling is complete
+
+---
+
+## 2. Understanding Deep Crawling Strategy Options
+
+### 2.1 BFSDeepCrawlStrategy (Breadth-First Search)
+
+The **BFSDeepCrawlStrategy** uses a breadth-first approach, exploring all links at one depth before moving deeper:
+
+```python
+from crawl4ai.deep_crawling import BFSDeepCrawlStrategy
+
+# Basic configuration
+strategy = BFSDeepCrawlStrategy(
+    max_depth=2,               # Crawl initial page + 2 levels deep
+    include_external=False,    # Stay within the same domain
+)
+```
+
+**Key parameters:**
+- **`max_depth`**: Number of levels to crawl beyond the starting page
+- **`include_external`**: Whether to follow links to other domains
+
+### 2.2 DFSDeepCrawlStrategy (Depth-First Search)
+
+The **DFSDeepCrawlStrategy** uses a depth-first approach, explores as far down a branch as possible before backtracking.
+
+```python
+from crawl4ai.deep_crawling import DFSDeepCrawlStrategy
+
+# Basic configuration
+strategy = DFSDeepCrawlStrategy(
+    max_depth=2,               # Crawl initial page + 2 levels deep
+    include_external=False,    # Stay within the same domain
+)
+```
+
+**Key parameters:**
+- **`max_depth`**: Number of levels to crawl beyond the starting page
+- **`include_external`**: Whether to follow links to other domains
+
+### 2.3 BestFirstCrawlingStrategy (⭐️ - Recommended Deep crawl strategy)
+
+For more intelligent crawling, use **BestFirstCrawlingStrategy** with scorers to prioritize the most relevant pages:
+
+```python
+from crawl4ai.deep_crawling import BestFirstCrawlingStrategy
+from crawl4ai.deep_crawling.scorers import KeywordRelevanceScorer
+
+# Create a scorer
+scorer = KeywordRelevanceScorer(
+    keywords=["crawl", "example", "async", "configuration"],
+    weight=0.7
+)
+
+# Configure the strategy
+strategy = BestFirstCrawlingStrategy(
+    max_depth=2,
+    include_external=False,
+    url_scorer=scorer
+)
+```
+
+This crawling approach:
+- Evaluates each discovered URL based on scorer criteria
+- Visits higher-scoring pages first
+- Helps focus crawl resources on the most relevant content
+
+---
+
+## 3. Streaming vs. Non-Streaming Results
+
+Crawl4AI can return results in two modes:
+
+### 3.1 Non-Streaming Mode (Default)
+
+```python
+config = CrawlerRunConfig(
+    deep_crawl_strategy=BFSDeepCrawlStrategy(max_depth=1),
+    stream=False  # Default behavior
+)
+
+async with AsyncWebCrawler() as crawler:
+    # Wait for ALL results to be collected before returning
+    results = await crawler.arun("https://example.com", config=config)
+    
+    for result in results:
+        process_result(result)
+```
+
+**When to use non-streaming mode:**
+- You need the complete dataset before processing
+- You're performing batch operations on all results together
+- Crawl time isn't a critical factor
+
+### 3.2 Streaming Mode
+
+```python
+config = CrawlerRunConfig(
+    deep_crawl_strategy=BFSDeepCrawlStrategy(max_depth=1),
+    stream=True  # Enable streaming
+)
+
+async with AsyncWebCrawler() as crawler:
+    # Returns an async iterator
+    async for result in await crawler.arun("https://example.com", config=config):
+        # Process each result as it becomes available
+        process_result(result)
+```
+
+**Benefits of streaming mode:**
+- Process results immediately as they're discovered
+- Start working with early results while crawling continues
+- Better for real-time applications or progressive display
+- Reduces memory pressure when handling many pages
+
+---
+
+## 4. Filtering Content with Filter Chains
+
+Filters help you narrow down which pages to crawl. Combine multiple filters using **FilterChain** for powerful targeting.
+
+### 4.1 Basic URL Pattern Filter
+
+```python
+from crawl4ai.deep_crawling.filters import FilterChain, URLPatternFilter
+
+# Only follow URLs containing "blog" or "docs"
+url_filter = URLPatternFilter(patterns=["*blog*", "*docs*"])
+
+config = CrawlerRunConfig(
+    deep_crawl_strategy=BFSDeepCrawlStrategy(
+        max_depth=1,
+        filter_chain=FilterChain([url_filter])
+    )
+)
+```
+
+### 4.2 Combining Multiple Filters
+
+```python
+from crawl4ai.deep_crawling.filters import (
+    FilterChain,
+    URLPatternFilter,
+    DomainFilter,
+    ContentTypeFilter
+)
+
+# Create a chain of filters
+filter_chain = FilterChain([
+    # Only follow URLs with specific patterns
+    URLPatternFilter(patterns=["*guide*", "*tutorial*"]),
+    
+    # Only crawl specific domains
+    DomainFilter(
+        allowed_domains=["docs.example.com"],
+        blocked_domains=["old.docs.example.com"]
+    ),
+    
+    # Only include specific content types
+    ContentTypeFilter(allowed_types=["text/html"])
+])
+
+config = CrawlerRunConfig(
+    deep_crawl_strategy=BFSDeepCrawlStrategy(
+        max_depth=2,
+        filter_chain=filter_chain
+    )
+)
+```
+
+### 4.3 Available Filter Types
+
+Crawl4AI includes several specialized filters:
+
+- **`URLPatternFilter`**: Matches URL patterns using wildcard syntax
+- **`DomainFilter`**: Controls which domains to include or exclude
+- **`ContentTypeFilter`**: Filters based on HTTP Content-Type
+- **`ContentRelevanceFilter`**: Uses similarity to a text query
+- **`SEOFilter`**: Evaluates SEO elements (meta tags, headers, etc.)
+
+---
+
+## 5. Using Scorers for Prioritized Crawling
+
+Scorers assign priority values to discovered URLs, helping the crawler focus on the most relevant content first.
+
+### 5.1 KeywordRelevanceScorer
+
+```python
+from crawl4ai.deep_crawling.scorers import KeywordRelevanceScorer
+from crawl4ai.deep_crawling import BestFirstCrawlingStrategy
+
+# Create a keyword relevance scorer
+keyword_scorer = KeywordRelevanceScorer(
+    keywords=["crawl", "example", "async", "configuration"],
+    weight=0.7  # Importance of this scorer (0.0 to 1.0)
+)
+
+config = CrawlerRunConfig(
+    deep_crawl_strategy=BestFirstCrawlingStrategy(
+        max_depth=2,
+        url_scorer=keyword_scorer
+    ),
+    stream=True  # Recommended with BestFirstCrawling
+)
+
+# Results will come in order of relevance score
+async with AsyncWebCrawler() as crawler:
+    async for result in await crawler.arun("https://example.com", config=config):
+        score = result.metadata.get("score", 0)
+        print(f"Score: {score:.2f} | {result.url}")
+```
+
+**How scorers work:**
+- Evaluate each discovered URL before crawling
+- Calculate relevance based on various signals
+- Help the crawler make intelligent choices about traversal order
+
+---
+
+## 6. Advanced Filtering Techniques
+
+### 6.1 SEO Filter for Quality Assessment
+
+The **SEOFilter** helps you identify pages with strong SEO characteristics:
+
+```python
+from crawl4ai.deep_crawling.filters import FilterChain, SEOFilter
+
+# Create an SEO filter that looks for specific keywords in page metadata
+seo_filter = SEOFilter(
+    threshold=0.5,  # Minimum score (0.0 to 1.0)
+    keywords=["tutorial", "guide", "documentation"]
+)
+
+config = CrawlerRunConfig(
+    deep_crawl_strategy=BFSDeepCrawlStrategy(
+        max_depth=1,
+        filter_chain=FilterChain([seo_filter])
+    )
+)
+```
+
+### 6.2 Content Relevance Filter
+
+The **ContentRelevanceFilter** analyzes the actual content of pages:
+
+```python
+from crawl4ai.deep_crawling.filters import FilterChain, ContentRelevanceFilter
+
+# Create a content relevance filter
+relevance_filter = ContentRelevanceFilter(
+    query="Web crawling and data extraction with Python",
+    threshold=0.7  # Minimum similarity score (0.0 to 1.0)
+)
+
+config = CrawlerRunConfig(
+    deep_crawl_strategy=BFSDeepCrawlStrategy(
+        max_depth=1,
+        filter_chain=FilterChain([relevance_filter])
+    )
+)
+```
+
+This filter:
+- Measures semantic similarity between query and page content
+- It's a BM25-based relevance filter using head section content
+
+---
+
+## 7. Building a Complete Advanced Crawler
+
+This example combines multiple techniques for a sophisticated crawl:
+
+```python
+import asyncio
+from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
+from crawl4ai.content_scraping_strategy import LXMLWebScrapingStrategy
+from crawl4ai.deep_crawling import BestFirstCrawlingStrategy
+from crawl4ai.deep_crawling.filters import (
+    FilterChain,
+    DomainFilter,
+    URLPatternFilter,
+    ContentTypeFilter
+)
+from crawl4ai.deep_crawling.scorers import KeywordRelevanceScorer
+
+async def run_advanced_crawler():
+    # Create a sophisticated filter chain
+    filter_chain = FilterChain([
+        # Domain boundaries
+        DomainFilter(
+            allowed_domains=["docs.example.com"],
+            blocked_domains=["old.docs.example.com"]
+        ),
+        
+        # URL patterns to include
+        URLPatternFilter(patterns=["*guide*", "*tutorial*", "*blog*"]),
+        
+        # Content type filtering
+        ContentTypeFilter(allowed_types=["text/html"])
+    ])
+
+    # Create a relevance scorer
+    keyword_scorer = KeywordRelevanceScorer(
+        keywords=["crawl", "example", "async", "configuration"],
+        weight=0.7
+    )
+
+    # Set up the configuration
+    config = CrawlerRunConfig(
+        deep_crawl_strategy=BestFirstCrawlingStrategy(
+            max_depth=2,
+            include_external=False,
+            filter_chain=filter_chain,
+            url_scorer=keyword_scorer
+        ),
+        scraping_strategy=LXMLWebScrapingStrategy(),
+        stream=True,
+        verbose=True
+    )
+
+    # Execute the crawl
+    results = []
+    async with AsyncWebCrawler() as crawler:
+        async for result in await crawler.arun("https://docs.example.com", config=config):
+            results.append(result)
+            score = result.metadata.get("score", 0)
+            depth = result.metadata.get("depth", 0)
+            print(f"Depth: {depth} | Score: {score:.2f} | {result.url}")
+
+    # Analyze the results
+    print(f"Crawled {len(results)} high-value pages")
+    print(f"Average score: {sum(r.metadata.get('score', 0) for r in results) / len(results):.2f}")
+
+    # Group by depth
+    depth_counts = {}
+    for result in results:
+        depth = result.metadata.get("depth", 0)
+        depth_counts[depth] = depth_counts.get(depth, 0) + 1
+
+    print("Pages crawled by depth:")
+    for depth, count in sorted(depth_counts.items()):
+        print(f"  Depth {depth}: {count} pages")
+
+if __name__ == "__main__":
+    asyncio.run(run_advanced_crawler())
+```
+
+---
+
+
+## 8. Common Pitfalls & Tips
+
+1.**Set realistic depth limits.** Be cautious with `max_depth` values > 3, which can exponentially increase crawl size. 
+
+2.**Don't neglect the scoring component.** BestFirstCrawling works best with well-tuned scorers. Experiment with keyword weights for optimal prioritization.
+
+3.**Be a good web citizen.**  Respect robots.txt. (disabled by default)
+  
+
+4.**Handle page errors gracefully.** Not all pages will be accessible. Check `result.success` and `result.error_message` when processing results.
+
+---
+
+## 9. Summary & Next Steps
+
+In this **Deep Crawling with Crawl4AI** tutorial, you learned to:
+
+- Configure **BFSDeepCrawlStrategy** and **BestFirstCrawlingStrategy**
+- Process results in streaming or non-streaming mode
+- Apply filters to target specific content
+- Use scorers to prioritize the most relevant pages
+- Build a complete advanced crawler with combined techniques
+
+With these tools, you can efficiently extract structured data from websites at scale, focusing precisely on the content you need for your specific use case.
--- a/docs/md_v2/core/docker-deployment.md
+++ b/docs/md_v2/core/docker-deployment.md
--- a/docs/md_v2/core/fit-markdown.md
+++ b/docs/md_v2/core/fit-markdown.md
@@ -10,11 +10,10 @@

 In **`CrawlerRunConfig`**, you can specify a **`content_filter`** to shape how content is pruned or ranked before final markdown generation. A filter’s logic is applied **before** or **during** the HTML→Markdown process, producing:

- **`result.markdown_v2.raw_markdown`** (unfiltered)
- **`result.markdown_v2.fit_markdown`** (filtered or “fit” version)
- **`result.markdown_v2.fit_html`** (the corresponding HTML snippet that produced `fit_markdown`)
+- **`result.markdown.raw_markdown`** (unfiltered)
+- **`result.markdown.fit_markdown`** (filtered or “fit” version)
+- **`result.markdown.fit_html`** (the corresponding HTML snippet that produced `fit_markdown`)

-> **Note**: We’re currently storing the result in `markdown_v2`, but eventually we’ll unify it as `result.markdown`.

 ### 1.2 Common Filters

@@ -62,8 +61,8 @@ async def main():
        
        if result.success:
            # 'fit_markdown' is your pruned content, focusing on "denser" text
-            print("Raw Markdown length:", len(result.markdown_v2.raw_markdown))
-            print("Fit Markdown length:", len(result.markdown_v2.fit_markdown))
+            print("Raw Markdown length:", len(result.markdown.raw_markdown))
+            print("Fit Markdown length:", len(result.markdown.fit_markdown))
        else:
            print("Error:", result.error_message)

@@ -123,7 +122,7 @@ async def main():
        )
        if result.success:
            print("Fit Markdown (BM25 query-based):")
-            print(result.markdown_v2.fit_markdown)
+            print(result.markdown.fit_markdown)
        else:
            print("Error:", result.error_message)

@@ -144,11 +143,11 @@ if __name__ == "__main__":

 ## 4. Accessing the “Fit” Output

-After the crawl, your “fit” content is found in **`result.markdown_v2.fit_markdown`**. In future versions, it will be **`result.markdown.fit_markdown`**. Meanwhile:
+After the crawl, your “fit” content is found in **`result.markdown.fit_markdown`**. 

 ```python
-fit_md = result.markdown_v2.fit_markdown
-fit_html = result.markdown_v2.fit_html
+fit_md = result.markdown.fit_markdown
+fit_html = result.markdown.fit_html
 ```

 If the content filter is **BM25**, you might see additional logic or references in `fit_markdown` that highlight relevant segments. If it’s **Pruning**, the text is typically well-cleaned but not necessarily matched to a query.
@@ -167,7 +166,6 @@ prune_filter = PruningContentFilter(
 )
 md_generator = DefaultMarkdownGenerator(content_filter=prune_filter)
 config = CrawlerRunConfig(markdown_generator=md_generator)
-# => result.markdown_v2.fit_markdown
 ```

 ### 5.2 BM25
@@ -179,7 +177,6 @@ bm25_filter = BM25ContentFilter(
 )
 md_generator = DefaultMarkdownGenerator(content_filter=bm25_filter)
 config = CrawlerRunConfig(markdown_generator=md_generator)
-# => result.markdown_v2.fit_markdown
 ```

 ---
@@ -203,7 +200,7 @@ Thus, **multi-level** filtering occurs:

 1. The crawler’s `excluded_tags` are removed from the HTML first.  
 2. The content filter (Pruning, BM25, or custom) prunes or ranks the remaining text blocks.  
-3. The final “fit” content is generated in `result.markdown_v2.fit_markdown`.
+3. The final “fit” content is generated in `result.markdown.fit_markdown`.

 ---

@@ -241,7 +238,7 @@ class MyCustomFilter(RelevantContentFilter):
 - **PruningContentFilter**: Great if you just want the “meatiest” text without a user query.  
 - **BM25ContentFilter**: Perfect for query-based extraction or searching.  
 - Combine with **`excluded_tags`, `exclude_external_links`, `word_count_threshold`** to refine your final “fit” text.  
- Fit markdown ends up in **`result.markdown_v2.fit_markdown`**; eventually **`result.markdown.fit_markdown`** in future versions.
+- Fit markdown ends up in **`result.markdown.fit_markdown`**; eventually **`result.markdown.fit_markdown`** in future versions.

 With these tools, you can **zero in** on the text that truly matters, ignoring spammy or boilerplate content, and produce a concise, relevant “fit markdown” for your AI or data pipelines. Happy pruning and searching!

--- a/docs/md_v2/core/markdown-generation.md
+++ b/docs/md_v2/core/markdown-generation.md
@@ -204,7 +204,7 @@ async def main():

    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun("https://example.com", config=config)
-        print(result.fit_markdown)  # Filtered markdown content
+        print(result.markdown.fit_markdown)  # Filtered markdown content
 ```

 **Key Features:**
@@ -249,14 +249,11 @@ filter = LLMContentFilter(

 ## 5. Using Fit Markdown

-When a content filter is active, the library produces two forms of markdown inside `result.markdown_v2` or (if using the simplified field) `result.markdown`:
+When a content filter is active, the library produces two forms of markdown inside `result.markdown`:

 1. **`raw_markdown`**: The full unfiltered markdown.  
 2. **`fit_markdown`**: A “fit” version where the filter has removed or trimmed noisy segments.

-**Note**:  
-> In earlier examples, you may see references to `result.markdown_v2`. Depending on your library version, you might access `result.markdown`, `result.markdown_v2`, or an object named `MarkdownGenerationResult`. The idea is the same: you’ll have a raw version and a filtered (“fit”) version if a filter is used.
-
 ```python
 import asyncio
 from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
@@ -276,7 +273,7 @@ async def main():
            print("Raw markdown:\n", result.markdown)
            
            # If a filter is used, we also have .fit_markdown:
-            md_object = result.markdown_v2  # or your equivalent
+            md_object = result.markdown  # or your equivalent
            print("Filtered markdown:\n", md_object.fit_markdown)
        else:
            print("Crawl failed:", result.error_message)
@@ -300,7 +297,7 @@ If your library stores detailed markdown output in an object like `MarkdownGener
 **Example**:

 ```python
-md_obj = result.markdown_v2  # your library’s naming may vary
+md_obj = result.markdown  # your library’s naming may vary
 print("RAW:\n", md_obj.raw_markdown)
 print("CITED:\n", md_obj.markdown_with_citations)
 print("REFERENCES:\n", md_obj.references_markdown)
--- a/docs/md_v2/core/quickstart.md
+++ b/docs/md_v2/core/quickstart.md
@@ -296,7 +296,7 @@ async def quick_parallel_example():
        # Stream results as they complete
        async for result in await crawler.arun_many(urls, config=run_conf):
            if result.success:
-                print(f"[OK] {result.url}, length: {len(result.markdown_v2.raw_markdown)}")
+                print(f"[OK] {result.url}, length: {len(result.markdown.raw_markdown)}")
            else:
                print(f"[ERROR] {result.url} => {result.error_message}")

@@ -305,7 +305,7 @@ async def quick_parallel_example():
        results = await crawler.arun_many(urls, config=run_conf)
        for res in results:
            if res.success:
-                print(f"[OK] {res.url}, length: {len(res.markdown_v2.raw_markdown)}")
+                print(f"[OK] {res.url}, length: {len(res.markdown.raw_markdown)}")
            else:
                print(f"[ERROR] {res.url} => {res.error_message}")

--- a/docs/md_v2/core/simple-crawling.md
+++ b/docs/md_v2/core/simple-crawling.md
@@ -39,8 +39,8 @@ result = await crawler.arun(
 # Different content formats
 print(result.html)         # Raw HTML
 print(result.cleaned_html) # Cleaned HTML
-print(result.markdown)     # Markdown version
-print(result.fit_markdown) # Most relevant content in markdown
+print(result.markdown.raw_markdown) # Raw markdown from cleaned html
+print(result.markdown.fit_markdown) # Most relevant content in markdown

 # Check success status
 print(result.success)      # True if crawl succeeded