chore: Clean up test artifacts and disable test workflow

fix: Re-enable multi-platform Docker builds for ARM64 support
fix: Use hardcoded Docker repository name to avoid masking issues
2025-07-25 17:31:52 +08:00 · 2025-07-25 16:38:11 +08:00 · 2025-07-25 15:52:26 +08:00 · 2025-07-25 15:41:16 +08:00 · 2025-07-25 15:37:44 +08:00 · 2025-07-25 15:35:53 +08:00
24 changed files with 941 additions and 329 deletions
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@@ -0,0 +1,141 @@
+name: Release Pipeline
+on:
+  push:
+    tags:
+      - 'v*'
+      - '!test-v*'  # Exclude test tags
+
+jobs:
+  release:
+    runs-on: ubuntu-latest
+    
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+      
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.12'
+      
+      - name: Extract version from tag
+        id: get_version
+        run: |
+          TAG_VERSION=${GITHUB_REF#refs/tags/v}
+          echo "VERSION=$TAG_VERSION" >> $GITHUB_OUTPUT
+          echo "Releasing version: $TAG_VERSION"
+      
+      - name: Install package dependencies
+        run: |
+          pip install -e .
+      
+      - name: Check version consistency
+        run: |
+          TAG_VERSION=${{ steps.get_version.outputs.VERSION }}
+          PACKAGE_VERSION=$(python -c "from crawl4ai.__version__ import __version__; print(__version__)")
+          
+          echo "Tag version: $TAG_VERSION"
+          echo "Package version: $PACKAGE_VERSION"
+          
+          if [ "$TAG_VERSION" != "$PACKAGE_VERSION" ]; then
+            echo "❌ Version mismatch! Tag: $TAG_VERSION, Package: $PACKAGE_VERSION"
+            echo "Please update crawl4ai/__version__.py to match the tag version"
+            exit 1
+          fi
+          echo "✅ Version check passed: $TAG_VERSION"
+      
+      - name: Install build dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install build twine
+      
+      - name: Build package
+        run: python -m build
+      
+      - name: Check package
+        run: twine check dist/*
+      
+      - name: Upload to PyPI
+        env:
+          TWINE_USERNAME: __token__
+          TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
+        run: |
+          echo "📦 Uploading to PyPI..."
+          twine upload dist/*
+          echo "✅ Package uploaded to https://pypi.org/project/crawl4ai/"
+      
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+      
+      - name: Log in to Docker Hub
+        uses: docker/login-action@v3
+        with:
+          username: ${{ secrets.DOCKER_USERNAME }}
+          password: ${{ secrets.DOCKER_TOKEN }}
+      
+      - name: Extract major and minor versions
+        id: versions
+        run: |
+          VERSION=${{ steps.get_version.outputs.VERSION }}
+          MAJOR=$(echo $VERSION | cut -d. -f1)
+          MINOR=$(echo $VERSION | cut -d. -f1-2)
+          echo "MAJOR=$MAJOR" >> $GITHUB_OUTPUT
+          echo "MINOR=$MINOR" >> $GITHUB_OUTPUT
+      
+      - name: Build and push Docker images
+        uses: docker/build-push-action@v5
+        with:
+          context: .
+          push: true
+          tags: |
+            unclecode/crawl4ai:${{ steps.get_version.outputs.VERSION }}
+            unclecode/crawl4ai:${{ steps.versions.outputs.MINOR }}
+            unclecode/crawl4ai:${{ steps.versions.outputs.MAJOR }}
+            unclecode/crawl4ai:latest
+          platforms: linux/amd64,linux/arm64
+      
+      - name: Create GitHub Release
+        uses: actions/create-release@v1
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        with:
+          tag_name: v${{ steps.get_version.outputs.VERSION }}
+          release_name: Release v${{ steps.get_version.outputs.VERSION }}
+          body: |
+            ## 🎉 Crawl4AI v${{ steps.get_version.outputs.VERSION }} Released!
+            
+            ### 📦 Installation
+            
+            **PyPI:**
+            ```bash
+            pip install crawl4ai==${{ steps.get_version.outputs.VERSION }}
+            ```
+            
+            **Docker:**
+            ```bash
+            docker pull unclecode/crawl4ai:${{ steps.get_version.outputs.VERSION }}
+            docker pull unclecode/crawl4ai:latest
+            ```
+            
+            ### 📝 What's Changed
+            See [CHANGELOG.md](https://github.com/${{ github.repository }}/blob/main/CHANGELOG.md) for details.
+          draft: false
+          prerelease: false
+      
+      - name: Summary
+        run: |
+          echo "## 🚀 Release Complete!" >> $GITHUB_STEP_SUMMARY
+          echo "" >> $GITHUB_STEP_SUMMARY
+          echo "### 📦 PyPI Package" >> $GITHUB_STEP_SUMMARY
+          echo "- Version: ${{ steps.get_version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
+          echo "- URL: https://pypi.org/project/crawl4ai/" >> $GITHUB_STEP_SUMMARY
+          echo "- Install: \`pip install crawl4ai==${{ steps.get_version.outputs.VERSION }}\`" >> $GITHUB_STEP_SUMMARY
+          echo "" >> $GITHUB_STEP_SUMMARY
+          echo "### 🐳 Docker Images" >> $GITHUB_STEP_SUMMARY
+          echo "- \`unclecode/crawl4ai:${{ steps.get_version.outputs.VERSION }}\`" >> $GITHUB_STEP_SUMMARY
+          echo "- \`unclecode/crawl4ai:${{ steps.versions.outputs.MINOR }}\`" >> $GITHUB_STEP_SUMMARY
+          echo "- \`unclecode/crawl4ai:${{ steps.versions.outputs.MAJOR }}\`" >> $GITHUB_STEP_SUMMARY
+          echo "- \`unclecode/crawl4ai:latest\`" >> $GITHUB_STEP_SUMMARY
+          echo "" >> $GITHUB_STEP_SUMMARY
+          echo "### 📋 GitHub Release" >> $GITHUB_STEP_SUMMARY
+          echo "https://github.com/${{ github.repository }}/releases/tag/v${{ steps.get_version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
--- a/.github/workflows/test-release.yml.disabled
+++ b/.github/workflows/test-release.yml.disabled
@@ -0,0 +1,116 @@
+name: Test Release Pipeline
+on:
+  push:
+    tags:
+      - 'test-v*'
+
+jobs:
+  test-release:
+    runs-on: ubuntu-latest
+    
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+      
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.12'
+      
+      - name: Extract version from tag
+        id: get_version
+        run: |
+          TAG_VERSION=${GITHUB_REF#refs/tags/test-v}
+          echo "VERSION=$TAG_VERSION" >> $GITHUB_OUTPUT
+          echo "Testing with version: $TAG_VERSION"
+      
+      - name: Install package dependencies
+        run: |
+          pip install -e .
+      
+      - name: Check version consistency
+        run: |
+          TAG_VERSION=${{ steps.get_version.outputs.VERSION }}
+          PACKAGE_VERSION=$(python -c "from crawl4ai.__version__ import __version__; print(__version__)")
+          
+          echo "Tag version: $TAG_VERSION"
+          echo "Package version: $PACKAGE_VERSION"
+          
+          if [ "$TAG_VERSION" != "$PACKAGE_VERSION" ]; then
+            echo "❌ Version mismatch! Tag: $TAG_VERSION, Package: $PACKAGE_VERSION"
+            echo "Please update crawl4ai/__version__.py to match the tag version"
+            exit 1
+          fi
+          echo "✅ Version check passed: $TAG_VERSION"
+      
+      - name: Install build dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install build twine
+      
+      - name: Build package
+        run: python -m build
+      
+      - name: Check package
+        run: twine check dist/*
+      
+      - name: Upload to Test PyPI
+        env:
+          TWINE_USERNAME: __token__
+          TWINE_PASSWORD: ${{ secrets.TEST_PYPI_TOKEN }}
+        run: |
+          echo "📦 Uploading to Test PyPI..."
+          twine upload --repository testpypi dist/* || {
+            if [ $? -eq 1 ]; then
+              echo "⚠️ Upload failed - likely version already exists on Test PyPI"
+              echo "Continuing anyway for test purposes..."
+            else
+              exit 1
+            fi
+          }
+          echo "✅ Test PyPI step complete"
+      
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+      
+      - name: Log in to Docker Hub
+        uses: docker/login-action@v3
+        with:
+          username: ${{ secrets.DOCKER_USERNAME }}
+          password: ${{ secrets.DOCKER_TOKEN }}
+      
+      - name: Build and push Docker test images
+        uses: docker/build-push-action@v5
+        with:
+          context: .
+          push: true
+          tags: |
+            unclecode/crawl4ai:test-${{ steps.get_version.outputs.VERSION }}
+            unclecode/crawl4ai:test-latest
+          platforms: linux/amd64,linux/arm64
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
+      
+      - name: Summary
+        run: |
+          echo "## 🎉 Test Release Complete!" >> $GITHUB_STEP_SUMMARY
+          echo "" >> $GITHUB_STEP_SUMMARY
+          echo "### 📦 Test PyPI Package" >> $GITHUB_STEP_SUMMARY
+          echo "- Version: ${{ steps.get_version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
+          echo "- URL: https://test.pypi.org/project/crawl4ai/" >> $GITHUB_STEP_SUMMARY
+          echo "- Install: \`pip install -i https://test.pypi.org/simple/ crawl4ai==${{ steps.get_version.outputs.VERSION }}\`" >> $GITHUB_STEP_SUMMARY
+          echo "" >> $GITHUB_STEP_SUMMARY
+          echo "### 🐳 Docker Test Images" >> $GITHUB_STEP_SUMMARY
+          echo "- \`unclecode/crawl4ai:test-${{ steps.get_version.outputs.VERSION }}\`" >> $GITHUB_STEP_SUMMARY
+          echo "- \`unclecode/crawl4ai:test-latest\`" >> $GITHUB_STEP_SUMMARY
+          echo "" >> $GITHUB_STEP_SUMMARY
+          echo "### 🧹 Cleanup Commands" >> $GITHUB_STEP_SUMMARY
+          echo "\`\`\`bash" >> $GITHUB_STEP_SUMMARY
+          echo "# Remove test tag" >> $GITHUB_STEP_SUMMARY
+          echo "git tag -d test-v${{ steps.get_version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
+          echo "git push origin :test-v${{ steps.get_version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
+          echo "" >> $GITHUB_STEP_SUMMARY
+          echo "# Remove Docker test images" >> $GITHUB_STEP_SUMMARY
+          echo "docker rmi unclecode/crawl4ai:test-${{ steps.get_version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
+          echo "docker rmi unclecode/crawl4ai:test-latest" >> $GITHUB_STEP_SUMMARY
+          echo "\`\`\`" >> $GITHUB_STEP_SUMMARY
--- a/README.md
+++ b/README.md
@@ -28,7 +28,7 @@ Crawl4AI is the #1 trending GitHub repository, actively maintained by a vibrant

 [✨ Check out latest update v0.7.0](#-recent-updates)

-🎉 **Version 0.7.0 is now available!** The Adaptive Intelligence Update introduces groundbreaking features: Adaptive Crawling that learns website patterns, Virtual Scroll support for infinite pages, intelligent Link Preview with 3-layer scoring, Async URL Seeder for massive discovery, and significant performance improvements. [Read the release notes →](https://docs.crawl4ai.com/blog/release-v0.7.0)
+🎉 **Version 0.7.0 is now available!** The Adaptive Intelligence Update introduces groundbreaking features: Adaptive Crawling that learns website patterns, Virtual Scroll support for infinite pages, intelligent Link Preview with 3-layer scoring, Async URL Seeder for massive discovery, and significant performance improvements. [Read the release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.0.md)

 <details>
 <summary>🤓 <strong>My Personal Story</strong></summary>
--- a/crawl4ai/init.py
+++ b/crawl4ai/init.py
@@ -3,7 +3,7 @@ import warnings

 from .async_webcrawler import AsyncWebCrawler, CacheMode
 # MODIFIED: Add SeedingConfig and VirtualScrollConfig here
-from .async_configs import BrowserConfig, CrawlerRunConfig, HTTPCrawlerConfig, LLMConfig, ProxyConfig, GeolocationConfig, SeedingConfig, VirtualScrollConfig
+from .async_configs import BrowserConfig, CrawlerRunConfig, HTTPCrawlerConfig, LLMConfig, ProxyConfig, GeolocationConfig, SeedingConfig, VirtualScrollConfig, LinkPreviewConfig

 from .content_scraping_strategy import (
    ContentScrapingStrategy,
@@ -173,6 +173,7 @@ __all__ = [
    "CompilationResult",
    "ValidationResult",
    "ErrorDetail",
+    "LinkPreviewConfig"
 ]


--- a/crawl4ai/version.py
+++ b/crawl4ai/version.py
@@ -1,7 +1,7 @@
 # crawl4ai/__version__.py

 # This is the version that will be used for stable releases
-__version__ = "0.7.0"
+__version__ = "0.7.1"

 # For nightly builds, this gets set during build process
 __nightly_version__ = None
--- a/crawl4ai/async_crawler_strategy.py
+++ b/crawl4ai/async_crawler_strategy.py
@@ -824,7 +824,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
            except Error:
                visibility_info = await self.check_visibility(page)

-                if self.browser_config.config.verbose:
+                if self.browser_config.verbose:
                    self.logger.debug(
                        message="Body visibility info: {info}",
                        tag="DEBUG",
--- a/crawl4ai/async_webcrawler.py
+++ b/crawl4ai/async_webcrawler.py
@@ -502,9 +502,12 @@ class AsyncWebCrawler:
            metadata = result.get("metadata", {})
        else:
            cleaned_html = sanitize_input_encode(result.cleaned_html)
-            media = result.media.model_dump()
-            tables = media.pop("tables", [])
-            links = result.links.model_dump()
+            # media = result.media.model_dump()
+            # tables = media.pop("tables", [])
+            # links = result.links.model_dump()
+            media = result.media.model_dump() if hasattr(result.media, 'model_dump') else result.media
+            tables = media.pop("tables", []) if isinstance(media, dict) else []
+            links = result.links.model_dump() if hasattr(result.links, 'model_dump') else result.links
            metadata = result.metadata

        fit_html = preprocess_html_for_schema(html_content=html, text_threshold= 500, max_size= 300_000)
--- a/crawl4ai/browser_manager.py
+++ b/crawl4ai/browser_manager.py
@@ -14,23 +14,8 @@ import hashlib
 from .js_snippet import load_js_script
 from .config import DOWNLOAD_PAGE_TIMEOUT
 from .async_configs import BrowserConfig, CrawlerRunConfig
-from playwright_stealth import StealthConfig
 from .utils import get_chromium_path

-stealth_config = StealthConfig(
-    webdriver=True,
-    chrome_app=True,
-    chrome_csi=True,
-    chrome_load_times=True,
-    chrome_runtime=True,
-    navigator_languages=True,
-    navigator_plugins=True,
-    navigator_permissions=True,
-    webgl_vendor=True,
-    outerdimensions=True,
-    navigator_hardware_concurrency=True,
-    media_codecs=True,
-)

 BROWSER_DISABLE_OPTIONS = [
    "--disable-background-networking",
--- a/crawl4ai/cli.py
+++ b/crawl4ai/cli.py
@@ -27,7 +27,10 @@ from crawl4ai import (
    PruningContentFilter,
    BrowserProfiler,
    DefaultMarkdownGenerator,
-    LLMConfig
+    LLMConfig,
+    BFSDeepCrawlStrategy,
+    DFSDeepCrawlStrategy,
+    BestFirstCrawlingStrategy,
 )
 from crawl4ai.config import USER_SETTINGS
 from litellm import completion
@@ -1014,9 +1017,11 @@ def cdp_cmd(user_data_dir: Optional[str], port: int, browser_type: str, headless
@click.option("--question", "-q", help="Ask a question about the crawled content")
@click.option("--verbose", "-v", is_flag=True)
@click.option("--profile", "-p", help="Use a specific browser profile (by name)")
+@click.option("--deep-crawl", type=click.Choice(["bfs", "dfs", "best-first"]), help="Enable deep crawling with specified strategy (bfs, dfs, or best-first)")
+@click.option("--max-pages", type=int, default=10, help="Maximum number of pages to crawl in deep crawl mode")
 def crawl_cmd(url: str, browser_config: str, crawler_config: str, filter_config: str, 
           extraction_config: str, json_extract: str, schema: str, browser: Dict, crawler: Dict,
-           output: str, output_file: str, bypass_cache: bool, question: str, verbose: bool, profile: str):
+           output: str, output_file: str, bypass_cache: bool, question: str, verbose: bool, profile: str, deep_crawl: str, max_pages: int):
    """Crawl a website and extract content
    
    Simple Usage:
@@ -1156,6 +1161,27 @@ Always return valid, properly formatted JSON."""

        crawler_cfg.scraping_strategy = LXMLWebScrapingStrategy()    

+        # Handle deep crawling configuration
+        if deep_crawl:
+            if deep_crawl == "bfs":
+                crawler_cfg.deep_crawl_strategy = BFSDeepCrawlStrategy(
+                    max_depth=3,
+                    max_pages=max_pages
+                )
+            elif deep_crawl == "dfs":
+                crawler_cfg.deep_crawl_strategy = DFSDeepCrawlStrategy(
+                    max_depth=3,
+                    max_pages=max_pages
+                )
+            elif deep_crawl == "best-first":
+                crawler_cfg.deep_crawl_strategy = BestFirstCrawlingStrategy(
+                    max_depth=3,
+                    max_pages=max_pages
+                )
+            
+            if verbose:
+                console.print(f"[green]Deep crawling enabled:[/green] {deep_crawl} strategy, max {max_pages} pages")
+
        config = get_global_config()
        
        browser_cfg.verbose = config.get("VERBOSE", False)
@@ -1170,39 +1196,60 @@ Always return valid, properly formatted JSON."""
            verbose
        )

+        # Handle deep crawl results (list) vs single result
+        if isinstance(result, list):
+            if len(result) == 0:
+                click.echo("No results found during deep crawling")
+                return
+            # Use the first result for question answering and output
+            main_result = result[0]
+            all_results = result
+        else:
+            # Single result from regular crawling
+            main_result = result
+            all_results = [result]
+
        # Handle question
        if question:
            provider, token = setup_llm_config()
-            markdown = result.markdown.raw_markdown
+            markdown = main_result.markdown.raw_markdown
            anyio.run(stream_llm_response, url, markdown, question, provider, token)
            return
        
        # Handle output
        if not output_file:
            if output == "all":
-                click.echo(json.dumps(result.model_dump(), indent=2))
+                if isinstance(result, list):
+                    output_data = [r.model_dump() for r in all_results]
+                    click.echo(json.dumps(output_data, indent=2))
+                else:
+                    click.echo(json.dumps(main_result.model_dump(), indent=2))
            elif output == "json":
-                print(result.extracted_content)
-                extracted_items = json.loads(result.extracted_content)
+                print(main_result.extracted_content)
+                extracted_items = json.loads(main_result.extracted_content)
                click.echo(json.dumps(extracted_items, indent=2))
                
            elif output in ["markdown", "md"]:
-                click.echo(result.markdown.raw_markdown)
+                click.echo(main_result.markdown.raw_markdown)
            elif output in ["markdown-fit", "md-fit"]:
-                click.echo(result.markdown.fit_markdown)
+                click.echo(main_result.markdown.fit_markdown)
        else:
            if output == "all":
                with open(output_file, "w") as f:
-                    f.write(json.dumps(result.model_dump(), indent=2))
+                    if isinstance(result, list):
+                        output_data = [r.model_dump() for r in all_results]
+                        f.write(json.dumps(output_data, indent=2))
+                    else:
+                        f.write(json.dumps(main_result.model_dump(), indent=2))
            elif output == "json":
                with open(output_file, "w") as f:
-                    f.write(result.extracted_content)
+                    f.write(main_result.extracted_content)
            elif output in ["markdown", "md"]:
                with open(output_file, "w") as f:
-                    f.write(result.markdown.raw_markdown)
+                    f.write(main_result.markdown.raw_markdown)
            elif output in ["markdown-fit", "md-fit"]:
                with open(output_file, "w") as f:
-                    f.write(result.markdown.fit_markdown)
+                    f.write(main_result.markdown.fit_markdown)
            
    except Exception as e:
        raise click.ClickException(str(e))
@@ -1354,9 +1401,11 @@ def profiles_cmd():
@click.option("--question", "-q", help="Ask a question about the crawled content")
@click.option("--verbose", "-v", is_flag=True)
@click.option("--profile", "-p", help="Use a specific browser profile (by name)")
+@click.option("--deep-crawl", type=click.Choice(["bfs", "dfs", "best-first"]), help="Enable deep crawling with specified strategy")
+@click.option("--max-pages", type=int, default=10, help="Maximum number of pages to crawl in deep crawl mode")
 def default(url: str, example: bool, browser_config: str, crawler_config: str, filter_config: str, 
        extraction_config: str, json_extract: str, schema: str, browser: Dict, crawler: Dict,
-        output: str, bypass_cache: bool, question: str, verbose: bool, profile: str):
+        output: str, bypass_cache: bool, question: str, verbose: bool, profile: str, deep_crawl: str, max_pages: int):
    """Crawl4AI CLI - Web content extraction tool

    Simple Usage:
@@ -1406,7 +1455,9 @@ def default(url: str, example: bool, browser_config: str, crawler_config: str, f
        bypass_cache=bypass_cache,
        question=question,
        verbose=verbose,
-        profile=profile
+        profile=profile,
+        deep_crawl=deep_crawl,
+        max_pages=max_pages
    )

 def main():
--- a/crawl4ai/content_scraping_strategy.py
+++ b/crawl4ai/content_scraping_strategy.py
@@ -1145,10 +1145,10 @@ class LXMLWebScrapingStrategy(WebScrapingStrategy):
                        link_data["intrinsic_score"] = intrinsic_score
                    except Exception:
                        # Fail gracefully - assign default score
-                        link_data["intrinsic_score"] = float('inf')
+                        link_data["intrinsic_score"] = 0
                else:
                    # No scoring enabled - assign infinity (all links equal priority)
-                    link_data["intrinsic_score"] = float('inf')
+                    link_data["intrinsic_score"] = 0

                is_external = is_external_url(normalized_href, base_domain)
                if is_external:
--- a/crawl4ai/utils.py
+++ b/crawl4ai/utils.py
@@ -3342,7 +3342,13 @@ async def get_text_embeddings(
    # Default: use sentence-transformers
    else:
        # Lazy load to avoid importing heavy libraries unless needed
-        from sentence_transformers import SentenceTransformer
+        try:
+            from sentence_transformers import SentenceTransformer
+        except ImportError:
+            raise ImportError(
+                "sentence-transformers is required for local embeddings. "
+                "Install it with: pip install 'crawl4ai[transformer]' or pip install sentence-transformers"
+            )
        
        # Cache the model in function attribute to avoid reloading
        if not hasattr(get_text_embeddings, '_models'):
--- a/deploy/docker/api.py
+++ b/deploy/docker/api.py
@@ -5,6 +5,7 @@ from typing import List, Tuple, Dict
 from functools import partial
 from uuid import uuid4
 from datetime import datetime
+from base64 import b64encode

 import logging
 from typing import Optional, AsyncGenerator
@@ -371,6 +372,9 @@ async def stream_results(crawler: AsyncWebCrawler, results_gen: AsyncGenerator)
                server_memory_mb = _get_memory_mb()
                result_dict = result.model_dump()
                result_dict['server_memory_mb'] = server_memory_mb
+                # If PDF exists, encode it to base64
+                if result_dict.get('pdf') is not None:
+                    result_dict['pdf'] = b64encode(result_dict['pdf']).decode('utf-8')
                logger.info(f"Streaming result for {result_dict.get('url', 'unknown')}")
                data = json.dumps(result_dict, default=datetime_handler) + "\n"
                yield data.encode('utf-8')
@@ -443,10 +447,19 @@ async def handle_crawl_request(
            mem_delta_mb = end_mem_mb - start_mem_mb # <--- Calculate delta
            peak_mem_mb = max(peak_mem_mb if peak_mem_mb else 0, end_mem_mb) # <--- Get peak memory
        logger.info(f"Memory usage: Start: {start_mem_mb} MB, End: {end_mem_mb} MB, Delta: {mem_delta_mb} MB, Peak: {peak_mem_mb} MB")
-                              
+
+        # Process results to handle PDF bytes
+        processed_results = []
+        for result in results:
+            result_dict = result.model_dump()
+            # If PDF exists, encode it to base64
+            if result_dict.get('pdf') is not None:
+                result_dict['pdf'] = b64encode(result_dict['pdf']).decode('utf-8')
+            processed_results.append(result_dict)
+            
        return {
            "success": True,
-            "results": [result.model_dump() for result in results],
+            "results": processed_results,
            "server_processing_time_s": end_time - start_time,
            "server_memory_delta_mb": mem_delta_mb,
            "server_peak_memory_mb": peak_mem_mb
--- a/docs/blog/release-v0.7.0.md
+++ b/docs/blog/release-v0.7.0.md
@@ -30,33 +30,40 @@ The Adaptive Crawler maintains a persistent state for each domain, tracking:

 ```python
 from crawl4ai import AsyncWebCrawler, AdaptiveCrawler, AdaptiveConfig
+import asyncio

-# Initialize with custom adaptive parameters
-config = AdaptiveConfig(
-    confidence_threshold=0.7,    # Min confidence to stop crawling
-    max_depth=5,                # Maximum crawl depth
-    max_pages=20,               # Maximum number of pages to crawl
-    top_k_links=3,              # Number of top links to follow per page
-    strategy="statistical",     # 'statistical' or 'embedding'
-    coverage_weight=0.4,        # Weight for coverage in confidence calculation
-    consistency_weight=0.3,     # Weight for consistency in confidence calculation
-    saturation_weight=0.3       # Weight for saturation in confidence calculation
-)
-
-# Initialize adaptive crawler with web crawler
-async with AsyncWebCrawler() as crawler:
-    adaptive_crawler = AdaptiveCrawler(crawler, config)
+async def main():
    
-    # Crawl and learn patterns
-    state = await adaptive_crawler.digest(
-        start_url="https://news.example.com/article/12345",
-        query="latest news articles and content"
+    # Configure adaptive crawler
+    config = AdaptiveConfig(
+        strategy="statistical",  # or "embedding" for semantic understanding
+        max_pages=10,
+        confidence_threshold=0.7,  # Stop at 70% confidence
+        top_k_links=3,  # Follow top 3 links per page
+        min_gain_threshold=0.05  # Need 5% information gain to continue
    )
    
-    # Access results and confidence
-    print(f"Confidence Level: {adaptive_crawler.confidence:.0%}")
-    print(f"Pages Crawled: {len(state.crawled_urls)}")
-    print(f"Knowledge Base: {len(adaptive_crawler.state.knowledge_base)} documents")
+    async with AsyncWebCrawler(verbose=False) as crawler:
+        adaptive = AdaptiveCrawler(crawler, config)
+        
+        print("Starting adaptive crawl about Python decorators...")
+        result = await adaptive.digest(
+            start_url="https://docs.python.org/3/glossary.html",
+            query="python decorators functions wrapping"
+        )
+        
+        print(f"\n✅ Crawling Complete!")
+        print(f"• Confidence Level: {adaptive.confidence:.0%}")
+        print(f"• Pages Crawled: {len(result.crawled_urls)}")
+        print(f"• Knowledge Base: {len(adaptive.state.knowledge_base)} documents")
+        
+        # Get most relevant content
+        relevant = adaptive.get_relevant_content(top_k=3)
+        print(f"\nMost Relevant Pages:")
+        for i, page in enumerate(relevant, 1):
+            print(f"{i}. {page['url']} (relevance: {page['score']:.2%})")
+
+asyncio.run(main())
 ```

 **Expected Real-World Impact:**
@@ -141,56 +148,47 @@ async with AsyncWebCrawler() as crawler:

 **My Solution:** I implemented a three-layer scoring system that analyzes links like a human would—considering their position, context, and relevance to your goals.

-### The Three-Layer Scoring System
+### Intelligent Link Analysis and Scoring

 ```python
-from crawl4ai import LinkPreviewConfig, CrawlerRunConfig, CacheMode
+import asyncio
+from crawl4ai import CrawlerRunConfig, CacheMode, AsyncWebCrawler
+from crawl4ai.adaptive_crawler import LinkPreviewConfig

-# Configure intelligent link analysis
-link_config = LinkPreviewConfig(
-    include_internal=True,
-    include_external=False,
-    max_links=10,
-    concurrency=5,
-    query="python tutorial",  # For contextual scoring
-    score_threshold=0.3,
-    verbose=True
-)
-
-# Use in your crawl
-result = await crawler.arun(
-    "https://tech-blog.example.com",
-    config=CrawlerRunConfig(
-        link_preview_config=link_config,
-        score_links=True,  # Enable intrinsic scoring
-        cache_mode=CacheMode.BYPASS
+async def main():
+    # Configure intelligent link analysis
+    link_config = LinkPreviewConfig(
+        include_internal=True,
+        include_external=False,
+        max_links=10,
+        concurrency=5,
+        query="python tutorial",  # For contextual scoring
+        score_threshold=0.3,
+        verbose=True
    )
-)
+    # Use in your crawl
+    async with AsyncWebCrawler() as crawler:
+        result = await crawler.arun(
+            "https://www.geeksforgeeks.org/",
+            config=CrawlerRunConfig(
+                link_preview_config=link_config,
+                score_links=True,  # Enable intrinsic scoring
+                cache_mode=CacheMode.BYPASS
+            )
+        )

-# Access scored and sorted links
-if result.success and result.links:
-# Get scored links
-internal_links = result.links.get("internal", [])
-scored_links = [l for l in internal_links if l.get("total_score")]
-scored_links.sort(key=lambda x: x.get("total_score", 0), reverse=True)
+        # Access scored and sorted links
+        if result.success and result.links:
+            for link in result.links.get("internal", []):
+                text = link.get('text', 'No text')[:40]
+                print(
+                    text,
+                    f"{link.get('intrinsic_score', 0):.1f}/10" if link.get('intrinsic_score') is not None else "0.0/10",
+                    f"{link.get('contextual_score', 0):.2f}/1" if link.get('contextual_score') is not None else "0.00/1",
+                    f"{link.get('total_score', 0):.3f}" if link.get('total_score') is not None else "0.000"
+                )

-# Create a scoring table
-table = Table(title="Link Scoring Results", box=box.ROUNDED)
-table.add_column("Link Text", style="cyan", width=40)
-table.add_column("Intrinsic Score", justify="center")
-table.add_column("Contextual Score", justify="center")
-table.add_column("Total Score", justify="center", style="bold green")
-
-for link in scored_links[:5]:
-    text = link.get('text', 'No text')[:40]
-    table.add_row(
-        text,
-        f"{link.get('intrinsic_score', 0):.1f}/10",
-        f"{link.get('contextual_score', 0):.2f}/1",
-        f"{link.get('total_score', 0):.3f}"
-    )
-
-console.print(table)
+asyncio.run(main())
 ```

 **Scoring Components:**
@@ -223,58 +221,34 @@ console.print(table)
 ### Technical Architecture

 ```python
+import asyncio
 from crawl4ai import AsyncUrlSeeder, SeedingConfig

-# Basic discovery - find all product pages
-seeder_config = SeedingConfig(
-    # Discovery sources
-    source="cc+sitemap",        # Sitemap + Common Crawl
-    
-    # Filtering
-    pattern="*/product/*",      # URL pattern matching
-    
-    # Validation
-    live_check=True,           # Verify URLs are alive
-    max_urls=50,             # Stop at 50 URLs
-    
-    # Performance  
-    concurrency=100,           # Maximum concurrent requests for live checks/head extraction
-    hits_per_sec=10           # Rate limit in requests per second to avoid overwhelming servers
-)
+async def main():
+    async with AsyncUrlSeeder() as seeder:
+        # Discover Python tutorial URLs
+        config = SeedingConfig(
+            source="sitemap",  # Use sitemap
+            pattern="*python*",  # URL pattern filter
+            extract_head=True,  # Get metadata
+            query="python tutorial",  # For relevance scoring
+            scoring_method="bm25",
+            score_threshold=0.2,
+            max_urls=10
+        )
+        
+        print("Discovering Python async tutorial URLs...")
+        urls = await seeder.urls("https://www.geeksforgeeks.org/", config)
+        
+        print(f"\n✅ Found {len(urls)} relevant URLs:")
+        for i, url_info in enumerate(urls[:5], 1):
+            print(f"\n{i}. {url_info['url']}")
+            if url_info.get('relevance_score'):
+                print(f"   Relevance: {url_info['relevance_score']:.3f}")
+            if url_info.get('head_data', {}).get('title'):
+                print(f"   Title: {url_info['head_data']['title'][:60]}...")

-async with AsyncUrlSeeder() as seeder:
-    console.print("Discovering URLs from Python docs...")
-    urls = await seeder.urls("docs.python.org", seeding_config)
-    console.print(f"\n✓ Discovered {len(urls)} URLs")
-
-# Advanced: Relevance-based discovery
-research_config = SeedingConfig(
-    source="sitemap+cc",       # Sitemap + Common Crawl
-    pattern="*/blog/*",        # Blog posts only
-    
-    # Content relevance
-    extract_head=True,         # Get meta tags
-    query="quantum computing tutorials",
-    scoring_method="bm25",     # BM25 scoring method
-    score_threshold=0.4,       # High relevance only
-    
-    # Smart filtering
-    filter_nonsense_urls=True,  # Remove .xml, .txt, etc.
-    
-    force=True                 # Bypass cache
-)
-
-# Discover with progress tracking
-discovered = []
-async with AsyncUrlSeeder() as seeder:
-    discovered = await seeder.urls("https://physics-blog.com", research_config)
-    console.print(f"\n✓ Discovered {len(discovered)} URLs")
-
-# Results include scores and metadata
-for url_data in discovered[:5]:
-    print(f"URL: {url_data['url']}")
-    print(f"Score: {url_data['relevance_score']:.3f}")
-    print(f"Title: {url_data['head_data']['title']}")
+asyncio.run(main())
 ```

 **Discovery Methods:**
--- a/docs/blog/release-v0.7.1.md
+++ b/docs/blog/release-v0.7.1.md
@@ -0,0 +1,43 @@
+# 🛠️ Crawl4AI v0.7.1: Minor Cleanup Update
+
+*July 17, 2025 • 2 min read*
+
+---
+
+A small maintenance release that removes unused code and improves documentation.
+
+## 🎯 What's Changed
+
+- **Removed unused StealthConfig** from `crawl4ai/browser_manager.py`
+- **Updated documentation** with better examples and parameter explanations
+- **Fixed virtual scroll configuration** examples in docs
+
+## 🧹 Code Cleanup
+
+Removed unused `StealthConfig` import and configuration that wasn't being used anywhere in the codebase. The project uses its own custom stealth implementation through JavaScript injection instead.
+
+```python
+# Removed unused code:
+from playwright_stealth import StealthConfig
+stealth_config = StealthConfig(...)  # This was never used
+```
+
+## 📖 Documentation Updates
+
+- Fixed adaptive crawling parameter examples
+- Updated session management documentation
+- Corrected virtual scroll configuration examples
+
+## 🚀 Installation
+
+```bash
+pip install crawl4ai==0.7.1
+```
+
+No breaking changes - upgrade directly from v0.7.0.
+
+---
+
+Questions? Issues? 
+- GitHub: [github.com/unclecode/crawl4ai](https://github.com/unclecode/crawl4ai)
+- Discord: [discord.gg/crawl4ai](https://discord.gg/jP8KfhDhyN)
--- a/docs/examples/link_head_extraction_example.py
+++ b/docs/examples/link_head_extraction_example.py
@@ -18,7 +18,7 @@ Usage:

 import asyncio
 from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
-from crawl4ai.async_configs import LinkPreviewConfig
+from crawl4ai import LinkPreviewConfig


 async def basic_link_head_extraction():
--- a/docs/md_v2/blog/releases/0.7.0.md
+++ b/docs/md_v2/blog/releases/0.7.0.md
@@ -30,33 +30,40 @@ The Adaptive Crawler maintains a persistent state for each domain, tracking:

 ```python
 from crawl4ai import AsyncWebCrawler, AdaptiveCrawler, AdaptiveConfig
+import asyncio

-# Initialize with custom adaptive parameters
-config = AdaptiveConfig(
-    confidence_threshold=0.7,    # Min confidence to stop crawling
-    max_depth=5,                # Maximum crawl depth
-    max_pages=20,               # Maximum number of pages to crawl
-    top_k_links=3,              # Number of top links to follow per page
-    strategy="statistical",     # 'statistical' or 'embedding'
-    coverage_weight=0.4,        # Weight for coverage in confidence calculation
-    consistency_weight=0.3,     # Weight for consistency in confidence calculation
-    saturation_weight=0.3       # Weight for saturation in confidence calculation
-)
-
-# Initialize adaptive crawler with web crawler
-async with AsyncWebCrawler() as crawler:
-    adaptive_crawler = AdaptiveCrawler(crawler, config)
+async def main():
    
-    # Crawl and learn patterns
-    state = await adaptive_crawler.digest(
-        start_url="https://news.example.com/article/12345",
-        query="latest news articles and content"
+    # Configure adaptive crawler
+    config = AdaptiveConfig(
+        strategy="statistical",  # or "embedding" for semantic understanding
+        max_pages=10,
+        confidence_threshold=0.7,  # Stop at 70% confidence
+        top_k_links=3,  # Follow top 3 links per page
+        min_gain_threshold=0.05  # Need 5% information gain to continue
    )
    
-    # Access results and confidence
-    print(f"Confidence Level: {adaptive_crawler.confidence:.0%}")
-    print(f"Pages Crawled: {len(state.crawled_urls)}")
-    print(f"Knowledge Base: {len(adaptive_crawler.state.knowledge_base)} documents")
+    async with AsyncWebCrawler(verbose=False) as crawler:
+        adaptive = AdaptiveCrawler(crawler, config)
+        
+        print("Starting adaptive crawl about Python decorators...")
+        result = await adaptive.digest(
+            start_url="https://docs.python.org/3/glossary.html",
+            query="python decorators functions wrapping"
+        )
+        
+        print(f"\n✅ Crawling Complete!")
+        print(f"• Confidence Level: {adaptive.confidence:.0%}")
+        print(f"• Pages Crawled: {len(result.crawled_urls)}")
+        print(f"• Knowledge Base: {len(adaptive.state.knowledge_base)} documents")
+        
+        # Get most relevant content
+        relevant = adaptive.get_relevant_content(top_k=3)
+        print(f"\nMost Relevant Pages:")
+        for i, page in enumerate(relevant, 1):
+            print(f"{i}. {page['url']} (relevance: {page['score']:.2%})")
+
+asyncio.run(main())
 ```

 **Expected Real-World Impact:**
@@ -141,56 +148,47 @@ async with AsyncWebCrawler() as crawler:

 **My Solution:** I implemented a three-layer scoring system that analyzes links like a human would—considering their position, context, and relevance to your goals.

-### The Three-Layer Scoring System
+### Intelligent Link Analysis and Scoring

 ```python
-from crawl4ai import LinkPreviewConfig, CrawlerRunConfig, CacheMode
+import asyncio
+from crawl4ai import CrawlerRunConfig, CacheMode, AsyncWebCrawler
+from crawl4ai.adaptive_crawler import LinkPreviewConfig

-# Configure intelligent link analysis
-link_config = LinkPreviewConfig(
-    include_internal=True,
-    include_external=False,
-    max_links=10,
-    concurrency=5,
-    query="python tutorial",  # For contextual scoring
-    score_threshold=0.3,
-    verbose=True
-)
-
-# Use in your crawl
-result = await crawler.arun(
-    "https://tech-blog.example.com",
-    config=CrawlerRunConfig(
-        link_preview_config=link_config,
-        score_links=True,  # Enable intrinsic scoring
-        cache_mode=CacheMode.BYPASS
+async def main():
+    # Configure intelligent link analysis
+    link_config = LinkPreviewConfig(
+        include_internal=True,
+        include_external=False,
+        max_links=10,
+        concurrency=5,
+        query="python tutorial",  # For contextual scoring
+        score_threshold=0.3,
+        verbose=True
    )
-)
+    # Use in your crawl
+    async with AsyncWebCrawler() as crawler:
+        result = await crawler.arun(
+            "https://www.geeksforgeeks.org/",
+            config=CrawlerRunConfig(
+                link_preview_config=link_config,
+                score_links=True,  # Enable intrinsic scoring
+                cache_mode=CacheMode.BYPASS
+            )
+        )

-# Access scored and sorted links
-if result.success and result.links:
-# Get scored links
-internal_links = result.links.get("internal", [])
-scored_links = [l for l in internal_links if l.get("total_score")]
-scored_links.sort(key=lambda x: x.get("total_score", 0), reverse=True)
+        # Access scored and sorted links
+        if result.success and result.links:
+            for link in result.links.get("internal", []):
+                text = link.get('text', 'No text')[:40]
+                print(
+                    text,
+                    f"{link.get('intrinsic_score', 0):.1f}/10" if link.get('intrinsic_score') is not None else "0.0/10",
+                    f"{link.get('contextual_score', 0):.2f}/1" if link.get('contextual_score') is not None else "0.00/1",
+                    f"{link.get('total_score', 0):.3f}" if link.get('total_score') is not None else "0.000"
+                )

-# Create a scoring table
-table = Table(title="Link Scoring Results", box=box.ROUNDED)
-table.add_column("Link Text", style="cyan", width=40)
-table.add_column("Intrinsic Score", justify="center")
-table.add_column("Contextual Score", justify="center")
-table.add_column("Total Score", justify="center", style="bold green")
-
-for link in scored_links[:5]:
-    text = link.get('text', 'No text')[:40]
-    table.add_row(
-        text,
-        f"{link.get('intrinsic_score', 0):.1f}/10",
-        f"{link.get('contextual_score', 0):.2f}/1",
-        f"{link.get('total_score', 0):.3f}"
-    )
-
-console.print(table)
+asyncio.run(main())
 ```

 **Scoring Components:**
@@ -223,58 +221,34 @@ console.print(table)
 ### Technical Architecture

 ```python
+import asyncio
 from crawl4ai import AsyncUrlSeeder, SeedingConfig

-# Basic discovery - find all product pages
-seeder_config = SeedingConfig(
-    # Discovery sources
-    source="cc+sitemap",        # Sitemap + Common Crawl
-    
-    # Filtering
-    pattern="*/product/*",      # URL pattern matching
-    
-    # Validation
-    live_check=True,           # Verify URLs are alive
-    max_urls=50,             # Stop at 50 URLs
-    
-    # Performance  
-    concurrency=100,           # Maximum concurrent requests for live checks/head extraction
-    hits_per_sec=10           # Rate limit in requests per second to avoid overwhelming servers
-)
+async def main():
+    async with AsyncUrlSeeder() as seeder:
+        # Discover Python tutorial URLs
+        config = SeedingConfig(
+            source="sitemap",  # Use sitemap
+            pattern="*python*",  # URL pattern filter
+            extract_head=True,  # Get metadata
+            query="python tutorial",  # For relevance scoring
+            scoring_method="bm25",
+            score_threshold=0.2,
+            max_urls=10
+        )
+        
+        print("Discovering Python async tutorial URLs...")
+        urls = await seeder.urls("https://www.geeksforgeeks.org/", config)
+        
+        print(f"\n✅ Found {len(urls)} relevant URLs:")
+        for i, url_info in enumerate(urls[:5], 1):
+            print(f"\n{i}. {url_info['url']}")
+            if url_info.get('relevance_score'):
+                print(f"   Relevance: {url_info['relevance_score']:.3f}")
+            if url_info.get('head_data', {}).get('title'):
+                print(f"   Title: {url_info['head_data']['title'][:60]}...")

-async with AsyncUrlSeeder() as seeder:
-    console.print("Discovering URLs from Python docs...")
-    urls = await seeder.urls("docs.python.org", seeding_config)
-    console.print(f"\n✓ Discovered {len(urls)} URLs")
-
-# Advanced: Relevance-based discovery
-research_config = SeedingConfig(
-    source="sitemap+cc",       # Sitemap + Common Crawl
-    pattern="*/blog/*",        # Blog posts only
-    
-    # Content relevance
-    extract_head=True,         # Get meta tags
-    query="quantum computing tutorials",
-    scoring_method="bm25",     # BM25 scoring method
-    score_threshold=0.4,       # High relevance only
-    
-    # Smart filtering
-    filter_nonsense_urls=True,  # Remove .xml, .txt, etc.
-    
-    force=True                 # Bypass cache
-)
-
-# Discover with progress tracking
-discovered = []
-async with AsyncUrlSeeder() as seeder:
-    discovered = await seeder.urls("https://physics-blog.com", research_config)
-    console.print(f"\n✓ Discovered {len(discovered)} URLs")
-
-# Results include scores and metadata
-for url_data in discovered[:5]:
-    print(f"URL: {url_data['url']}")
-    print(f"Score: {url_data['relevance_score']:.3f}")
-    print(f"Title: {url_data['head_data']['title']}")
+asyncio.run(main())
 ```

 **Discovery Methods:**
--- a/docs/md_v2/core/c4a-script.md
+++ b/docs/md_v2/core/c4a-script.md
@@ -52,11 +52,9 @@ That's it! In just a few lines, you've automated a complete search workflow.

 Want to learn by doing? We've got you covered:

-**🚀 [Live Demo](https://docs.crawl4ai.com/c4a-script/demo)** - Try C4A-Script in your browser right now!
+**🚀 [Live Demo](https://docs.crawl4ai.com/apps/c4a-script/)** - Try C4A-Script in your browser right now!

-**📁 [Tutorial Examples](/examples/c4a_script/)** - Complete examples with source code
-
-**🛠️ [Local Tutorial](/examples/c4a_script/tutorial/)** - Run the interactive tutorial on your machine
+**📁 [Tutorial Examples](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/c4a_script/)** - Complete examples with source code

 ### Running the Tutorial Locally

--- a/docs/md_v2/core/link-media.md
+++ b/docs/md_v2/core/link-media.md
@@ -125,7 +125,7 @@ Here's a full example you can copy, paste, and run immediately:
 ```python
 import asyncio
 from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
-from crawl4ai.async_configs import LinkPreviewConfig
+from crawl4ai import LinkPreviewConfig

 async def extract_link_heads_example():
    """
@@ -237,7 +237,7 @@ if __name__ == "__main__":
 The `LinkPreviewConfig` class supports these options:

 ```python
-from crawl4ai.async_configs import LinkPreviewConfig
+from crawl4ai import LinkPreviewConfig

 link_preview_config = LinkPreviewConfig(
    # BASIC SETTINGS
--- a/docs/releases_review/crawl4ai_v0_7_0_showcase.py
+++ b/docs/releases_review/crawl4ai_v0_7_0_showcase.py
@@ -28,7 +28,7 @@ from rich import box

 from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, AdaptiveCrawler, AdaptiveConfig, BrowserConfig, CacheMode
 from crawl4ai import AsyncUrlSeeder, SeedingConfig
-from crawl4ai.async_configs import LinkPreviewConfig, VirtualScrollConfig
+from crawl4ai import LinkPreviewConfig, VirtualScrollConfig
 from crawl4ai import c4a_compile, CompilationResult

 # Initialize Rich console for beautiful output
--- a/docs/releases_review/v0_7_0_features_demo.py
+++ b/docs/releases_review/v0_7_0_features_demo.py
@@ -13,14 +13,13 @@ from crawl4ai import (
    BrowserConfig,
    CacheMode,
    # New imports for v0.7.0
-    LinkPreviewConfig,
    VirtualScrollConfig,
+    LinkPreviewConfig,
    AdaptiveCrawler,
    AdaptiveConfig,
    AsyncUrlSeeder,
    SeedingConfig,
    c4a_compile,
-    CompilationResult
 )


@@ -170,16 +169,16 @@ async def demo_url_seeder():
        # Discover Python tutorial URLs
        config = SeedingConfig(
            source="sitemap",  # Use sitemap
-            pattern="*tutorial*",  # URL pattern filter
+            pattern="*python*",  # URL pattern filter
            extract_head=True,  # Get metadata
-            query="python async programming",  # For relevance scoring
+            query="python tutorial",  # For relevance scoring
            scoring_method="bm25",
            score_threshold=0.2,
            max_urls=10
        )
        
        print("Discovering Python async tutorial URLs...")
-        urls = await seeder.urls("docs.python.org", config)
+        urls = await seeder.urls("https://www.geeksforgeeks.org/", config)
        
        print(f"\n✅ Found {len(urls)} relevant URLs:")
        for i, url_info in enumerate(urls[:5], 1):
@@ -245,39 +244,6 @@ IF (EXISTS `.price-filter`) THEN CLICK `input[data-max-price="100"]`
        print(f"❌ Compilation error: {result.first_error.message}")


-async def demo_pdf_support():
-    """
-    Demo 6: PDF Parsing Support
-    
-    Shows how to extract content from PDF files.
-    Note: Requires 'pip install crawl4ai[pdf]'
-    """
-    print("\n" + "="*60)
-    print("📄 DEMO 6: PDF Parsing Support")
-    print("="*60)
-    
-    try:
-        # Check if PDF support is installed
-        import PyPDF2
-        
-        # Example: Process a PDF URL
-        config = CrawlerRunConfig(
-            cache_mode=CacheMode.BYPASS,
-            pdf=True,  # Enable PDF generation
-            extract_text_from_pdf=True  # Extract text content
-        )
-        
-        print("PDF parsing is available!")
-        print("You can now crawl PDF URLs and extract their content.")
-        print("\nExample usage:")
-        print('  result = await crawler.arun("https://example.com/document.pdf")')
-        print('  pdf_text = result.extracted_content  # Contains extracted text')
-        
-    except ImportError:
-        print("⚠️  PDF support not installed.")
-        print("Install with: pip install crawl4ai[pdf]")
-
-
 async def main():
    """Run all demos"""
    print("\n🚀 Crawl4AI v0.7.0 Feature Demonstrations")
@@ -289,7 +255,6 @@ async def main():
        ("Virtual Scroll", demo_virtual_scroll),
        ("URL Seeder", demo_url_seeder),
        ("C4A Script", demo_c4a_script),
-        ("PDF Support", demo_pdf_support)
    ]
    
    for name, demo_func in demos:
@@ -309,7 +274,6 @@ async def main():
    print("• Virtual Scroll: Capture all content from modern web pages")
    print("• URL Seeder: Pre-discover and filter URLs efficiently")
    print("• C4A Script: Simple language for complex automations")
-    print("• PDF Support: Extract content from PDF documents")


 if __name__ == "__main__":
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -44,7 +44,6 @@ dependencies = [
    "brotli>=1.1.0",
    "humanize>=4.10.0",
    "lark>=1.2.2",
-    "sentence-transformers>=2.2.0",
    "alphashape>=1.3.1",
    "shapely>=2.0.0"
 ]
@@ -62,8 +61,8 @@ classifiers = [
 [project.optional-dependencies]
 pdf = ["PyPDF2"]  
 torch = ["torch", "nltk", "scikit-learn"]
-transformer = ["transformers", "tokenizers"]
-cosine = ["torch", "transformers", "nltk"]
+transformer = ["transformers", "tokenizers", "sentence-transformers"]
+cosine = ["torch", "transformers", "nltk", "sentence-transformers"]
 sync = ["selenium"]
 all = [
    "PyPDF2",
@@ -72,8 +71,8 @@ all = [
    "scikit-learn",
    "transformers",
    "tokenizers",
-    "selenium",
-    "PyPDF2"  
+    "sentence-transformers",
+    "selenium"
 ]

 [project.scripts]
--- a/requirements.txt
+++ b/requirements.txt
@@ -24,7 +24,6 @@ cssselect>=1.2.0
 chardet>=5.2.0
 brotli>=1.1.0
 httpx[http2]>=0.27.2
-sentence-transformers>=2.2.0
 alphashape>=1.3.1
 shapely>=2.0.0

--- a/tests/docker/simple_api_test.py
+++ b/tests/docker/simple_api_test.py
@@ -0,0 +1,345 @@
+#!/usr/bin/env python3
+"""
+Simple API Test for Crawl4AI Docker Server v0.7.0
+Uses only built-in Python modules to test all endpoints.
+"""
+
+import urllib.request
+import urllib.parse
+import json
+import time
+import sys
+from typing import Dict, List, Optional
+
+# Configuration
+BASE_URL = "http://localhost:11234"  # Change to your server URL
+TEST_TIMEOUT = 30
+
+class SimpleApiTester:
+    def __init__(self, base_url: str = BASE_URL):
+        self.base_url = base_url
+        self.token = None
+        self.results = []
+        
+    def log(self, message: str):
+        print(f"[INFO] {message}")
+    
+    def test_get_endpoint(self, endpoint: str) -> Dict:
+        """Test a GET endpoint"""
+        url = f"{self.base_url}{endpoint}"
+        start_time = time.time()
+        
+        try:
+            req = urllib.request.Request(url)
+            if self.token:
+                req.add_header('Authorization', f'Bearer {self.token}')
+            
+            with urllib.request.urlopen(req, timeout=TEST_TIMEOUT) as response:
+                response_time = time.time() - start_time
+                status_code = response.getcode()
+                content = response.read().decode('utf-8')
+                
+                # Try to parse JSON
+                try:
+                    data = json.loads(content)
+                except:
+                    data = {"raw_response": content[:200]}
+                
+                return {
+                    "endpoint": endpoint,
+                    "method": "GET",
+                    "status": "PASS" if status_code < 400 else "FAIL",
+                    "status_code": status_code,
+                    "response_time": response_time,
+                    "data": data
+                }
+        except Exception as e:
+            response_time = time.time() - start_time
+            return {
+                "endpoint": endpoint,
+                "method": "GET",
+                "status": "FAIL",
+                "status_code": None,
+                "response_time": response_time,
+                "error": str(e)
+            }
+    
+    def test_post_endpoint(self, endpoint: str, payload: Dict) -> Dict:
+        """Test a POST endpoint"""
+        url = f"{self.base_url}{endpoint}"
+        start_time = time.time()
+        
+        try:
+            data = json.dumps(payload).encode('utf-8')
+            req = urllib.request.Request(url, data=data, method='POST')
+            req.add_header('Content-Type', 'application/json')
+            
+            if self.token:
+                req.add_header('Authorization', f'Bearer {self.token}')
+            
+            with urllib.request.urlopen(req, timeout=TEST_TIMEOUT) as response:
+                response_time = time.time() - start_time
+                status_code = response.getcode()
+                content = response.read().decode('utf-8')
+                
+                # Try to parse JSON
+                try:
+                    data = json.loads(content)
+                except:
+                    data = {"raw_response": content[:200]}
+                
+                return {
+                    "endpoint": endpoint,
+                    "method": "POST",
+                    "status": "PASS" if status_code < 400 else "FAIL",
+                    "status_code": status_code,
+                    "response_time": response_time,
+                    "data": data
+                }
+        except Exception as e:
+            response_time = time.time() - start_time
+            return {
+                "endpoint": endpoint,
+                "method": "POST",
+                "status": "FAIL",
+                "status_code": None,
+                "response_time": response_time,
+                "error": str(e)
+            }
+    
+    def print_result(self, result: Dict):
+        """Print a formatted test result"""
+        status_color = {
+            "PASS": "✅",
+            "FAIL": "❌",
+            "SKIP": "⏭️"
+        }
+        
+        print(f"{status_color[result['status']]} {result['method']} {result['endpoint']} "
+              f"| {result['response_time']:.3f}s | Status: {result['status_code'] or 'N/A'}")
+        
+        if result['status'] == 'FAIL' and 'error' in result:
+            print(f"    Error: {result['error']}")
+        
+        self.results.append(result)
+    
+    def run_all_tests(self):
+        """Run all API tests"""
+        print("🚀 Starting Crawl4AI v0.7.0 API Test Suite")
+        print(f"📡 Testing server at: {self.base_url}")
+        print("=" * 60)
+        
+        # # Test basic endpoints
+        # print("\n=== BASIC ENDPOINTS ===")
+        
+        # # Health check
+        # result = self.test_get_endpoint("/health")
+        # self.print_result(result)
+        
+        
+        # # Schema endpoint
+        # result = self.test_get_endpoint("/schema")
+        # self.print_result(result)
+        
+        # # Metrics endpoint
+        # result = self.test_get_endpoint("/metrics")
+        # self.print_result(result)
+        
+        # # Root redirect
+        # result = self.test_get_endpoint("/")
+        # self.print_result(result)
+        
+        # # Test authentication
+        # print("\n=== AUTHENTICATION ===")
+        
+        # # Get token
+        # token_payload = {"email": "test@example.com"}
+        # result = self.test_post_endpoint("/token", token_payload)
+        # self.print_result(result)
+        
+        # # Extract token if successful
+        # if result['status'] == 'PASS' and 'data' in result:
+        #     token = result['data'].get('access_token')
+        #     if token:
+        #         self.token = token
+        #         self.log(f"Successfully obtained auth token: {token[:20]}...")
+        
+        # Test core APIs
+        print("\n=== CORE APIs ===")
+        
+        test_url = "https://example.com"
+        
+        # Test markdown endpoint
+        md_payload = {
+            "url": test_url,
+            "f": "fit",
+            "q": "test query",
+            "c": "0"
+        }
+        result = self.test_post_endpoint("/md", md_payload)
+        # print(result['data'].get('markdown', ''))
+        self.print_result(result)
+        
+        # Test HTML endpoint
+        html_payload = {"url": test_url}
+        result = self.test_post_endpoint("/html", html_payload)
+        self.print_result(result)
+        
+        # Test screenshot endpoint
+        screenshot_payload = {
+            "url": test_url,
+            "screenshot_wait_for": 2
+        }
+        result = self.test_post_endpoint("/screenshot", screenshot_payload)
+        self.print_result(result)
+        
+        # Test PDF endpoint
+        pdf_payload = {"url": test_url}
+        result = self.test_post_endpoint("/pdf", pdf_payload)
+        self.print_result(result)
+        
+        # Test JavaScript execution
+        js_payload = {
+            "url": test_url,
+            "scripts": ["(() => document.title)()"]
+        }
+        result = self.test_post_endpoint("/execute_js", js_payload)
+        self.print_result(result)
+        
+        # Test crawl endpoint
+        crawl_payload = {
+            "urls": [test_url],
+            "browser_config": {},
+            "crawler_config": {}
+        }
+        result = self.test_post_endpoint("/crawl", crawl_payload)
+        self.print_result(result)
+        
+        # Test config dump
+        config_payload = {"code": "CrawlerRunConfig()"}
+        result = self.test_post_endpoint("/config/dump", config_payload)
+        self.print_result(result)
+        
+        # Test LLM endpoint
+        llm_endpoint = f"/llm/{test_url}?q=Extract%20main%20content"
+        result = self.test_get_endpoint(llm_endpoint)
+        self.print_result(result)
+        
+        # Test ask endpoint
+        ask_endpoint = "/ask?context_type=all&query=crawl4ai&max_results=5"
+        result = self.test_get_endpoint(ask_endpoint)
+        print(result)
+        self.print_result(result)
+        
+        # Test job APIs
+        print("\n=== JOB APIs ===")
+        
+        # Test LLM job
+        llm_job_payload = {
+            "url": test_url,
+            "q": "Extract main content",
+            "cache": False
+        }
+        result = self.test_post_endpoint("/llm/job", llm_job_payload)
+        self.print_result(result)
+        
+        # Test crawl job
+        crawl_job_payload = {
+            "urls": [test_url],
+            "browser_config": {},
+            "crawler_config": {}
+        }
+        result = self.test_post_endpoint("/crawl/job", crawl_job_payload)
+        self.print_result(result)
+        
+        # Test MCP
+        print("\n=== MCP APIs ===")
+        
+        # Test MCP schema
+        result = self.test_get_endpoint("/mcp/schema")
+        self.print_result(result)
+        
+        # Test error handling
+        print("\n=== ERROR HANDLING ===")
+        
+        # Test invalid URL
+        invalid_payload = {"url": "invalid-url", "f": "fit"}
+        result = self.test_post_endpoint("/md", invalid_payload)
+        self.print_result(result)
+        
+        # Test invalid endpoint
+        result = self.test_get_endpoint("/nonexistent")
+        self.print_result(result)
+        
+        # Print summary
+        self.print_summary()
+    
+    def print_summary(self):
+        """Print test results summary"""
+        print("\n" + "=" * 60)
+        print("📊 TEST RESULTS SUMMARY")
+        print("=" * 60)
+        
+        total = len(self.results)
+        passed = sum(1 for r in self.results if r['status'] == 'PASS')
+        failed = sum(1 for r in self.results if r['status'] == 'FAIL')
+        
+        print(f"Total Tests: {total}")
+        print(f"✅ Passed: {passed}")
+        print(f"❌ Failed: {failed}")
+        print(f"📈 Success Rate: {(passed/total)*100:.1f}%")
+        
+        if failed > 0:
+            print("\n❌ FAILED TESTS:")
+            for result in self.results:
+                if result['status'] == 'FAIL':
+                    print(f"  • {result['method']} {result['endpoint']}")
+                    if 'error' in result:
+                        print(f"    Error: {result['error']}")
+        
+        # Performance statistics
+        response_times = [r['response_time'] for r in self.results if r['response_time'] > 0]
+        if response_times:
+            avg_time = sum(response_times) / len(response_times)
+            max_time = max(response_times)
+            print(f"\n⏱️  Average Response Time: {avg_time:.3f}s")
+            print(f"⏱️  Max Response Time: {max_time:.3f}s")
+        
+        # Save detailed report
+        report_file = f"crawl4ai_test_report_{int(time.time())}.json"
+        with open(report_file, 'w') as f:
+            json.dump({
+                "timestamp": time.time(),
+                "server_url": self.base_url,
+                "version": "0.7.0",
+                "summary": {
+                    "total": total,
+                    "passed": passed,
+                    "failed": failed
+                },
+                "results": self.results
+            }, f, indent=2)
+        
+        print(f"\n📄 Detailed report saved to: {report_file}")
+
+def main():
+    """Main test runner"""
+    import argparse
+    
+    parser = argparse.ArgumentParser(description='Crawl4AI v0.7.0 API Test Suite')
+    parser.add_argument('--url', default=BASE_URL, help='Base URL of the server')
+    
+    args = parser.parse_args()
+    
+    tester = SimpleApiTester(args.url)
+    
+    try:
+        tester.run_all_tests()
+    except KeyboardInterrupt:
+        print("\n🛑 Test suite interrupted by user")
+    except Exception as e:
+        print(f"\n💥 Test suite failed with error: {e}")
+        sys.exit(1)
+
+if __name__ == "__main__":
+    main()
--- a/tests/test_link_extractor.py
+++ b/tests/test_link_extractor.py
@@ -5,7 +5,7 @@ Test script for Link Extractor functionality

 from crawl4ai.models import Link
 from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
-from crawl4ai.async_configs import LinkPreviewConfig
+from crawl4ai import LinkPreviewConfig
 import asyncio
 import sys
 import os
@@ -237,7 +237,7 @@ def test_config_examples():
            print(f"     {key}: {value}")

        print("   Usage:")
-        print("     from crawl4ai.async_configs import LinkPreviewConfig")
+        print("     from crawl4ai import LinkPreviewConfig")
        print("     config = CrawlerRunConfig(")
        print("         link_preview_config=LinkPreviewConfig(")
        for key, value in config_dict.items():
Author	SHA1	Message	Date
UncleCode	9f9ea3bb3b	chore: Clean up test artifacts and disable test workflow	2025-07-25 17:31:52 +08:00
UncleCode	d58b93c207	fix: Re-enable multi-platform Docker builds for ARM64 support	2025-07-25 16:38:11 +08:00
UncleCode	e2b4705010	fix: Use hardcoded Docker repository name to avoid masking issues	2025-07-25 15:52:26 +08:00
UncleCode	4a1abd5086	fix: Handle existing version on Test PyPI gracefully	2025-07-25 15:41:16 +08:00
UncleCode	04258cd4f2	fix: Speed up Docker test builds by using single platform and caching	2025-07-25 15:37:44 +08:00
UncleCode	84e462d9f8	Merge remote-tracking branch 'origin/develop'	2025-07-25 15:35:53 +08:00
UncleCode	9546773a07	fix: Move sentence-transformers to optional dependencies - Moved sentence-transformers from core to optional dependencies in pyproject.toml - Removed sentence-transformers from requirements.txt - Added proper ImportError handling with helpful installation message - This prevents ~2.5GB of NVIDIA CUDA libraries from being installed by default - Users who need embedding features can install with: pip install 'crawl4ai[transformer]'	2025-07-24 21:24:40 +08:00
UncleCode	66a979ad11	fix: Install dependencies before version check in workflows	2025-07-24 21:01:36 +08:00
UncleCode	0c31e91b53	feat: Add CI/CD workflows for automated PyPI and Docker releases	2025-07-24 20:58:43 +08:00
ntohidi	1b6a31f88f	fix: encode PDF results to base64 in /crawl endpoint. ref #1301	2025-07-23 13:52:18 +02:00
Nasrin	b8c261780f	Merge pull request #1319 from volumetric/fix_for_bug_#1310 Removed the incorrect reference in browser_config variable	2025-07-23 12:45:12 +02:00
ntohidi	db6ad7a79d	fix: update links in README and C4A-Script documentation for accuracy	2025-07-23 09:47:18 +02:00
Nasrin	004d514f33	Merge pull request #1265 from unclecode/feature/nasrin-cli-deep-crawl Feature/CLI - deep-crawl: Add --deep-crawl CLI option with BFS/DFS/Best-First strategies and fix serialization error. ref #874	2025-07-23 09:40:33 +02:00
Vinit Agrawal	3a9e2c716e	Remvoed the incorrect reference in browser_config variable	2025-07-18 10:01:00 +05:30
unclecode	0163bd797c	Merge branch 'release/v0.7.1'	2025-07-17 17:42:04 +08:00
ntohidi	26bad799e4	chore: update version to 0.7.1	2025-07-17 11:37:41 +02:00
ntohidi	cf8badfe27	feat: cleanup unused code and enhance documentation for v0.7.1 - Remove unused StealthConfig from browser_manager.py - Update LinkPreviewConfig import path in __init__.py and examples - Fix infinity handling in content_scraping_strategy.py (use 0 instead of float('inf')) - Remove sanitize_json_data functions from API endpoints - Add comprehensive C4A Script documentation to release notes - Update v0.7.0 release notes with improved code examples - Create v0.7.1 release notes focusing on cleanup and documentation improvements - Update demo files with corrected import paths and examples - Fix virtual scroll and adaptive crawling examples across documentation 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>	2025-07-17 11:35:16 +02:00
ntohidi	ccbe3c105c	refactor: improve link scoring output format in release notes	2025-07-17 09:13:20 +02:00
Nasrin	761c19d54b	Merge pull request #1307 from unclecode/fix/json-infinity-serialization fix: Handle infinity values in JSON serialization for API responses	2025-07-16 13:34:25 +02:00
Nasrin	14b0ecb137	Merge pull request #1305 from unclecode/fix/release-notes-demo-code Fix: Update release notes and demo code	2025-07-16 13:33:53 +02:00
ntohidi	0eaa9f9895	fix: handle infinity values in JSON serialization for API responses - Add sanitize_json_data() function to convert infinity/NaN to JSON-compliant strings - Fix /execute_js endpoint returning ValueError: Out of range float values are not JSON compliant: inf - Fix /crawl endpoint batch responses with infinity values - Fix /crawl/stream endpoint streaming responses with infinity values - Fix /crawl/job endpoint background job responses with infinity values The sanitize_json_data() function recursively processes response data: - float('inf') → \"Infinity\" - float('-inf') → \"-Infinity\" - float('nan') → \"NaN\" This prevents JSON serialization errors when JavaScript execution or crawling operations produce infinity values, ensuring all API endpoints return valid JSON. Fixes: API endpoints crashing with infinity JSON serialization errors Affects: /execute_js, /crawl, /crawl/stream, /crawl/job endpoints	2025-07-15 13:49:07 +02:00
UncleCode	bde1bba6a2	docs: Add missing documentation pages to mkdocs.yml - Added Adaptive Crawling to Core section - Added URL Seeding to Core section - Added Adaptive Strategies to Advanced section	2025-07-12 19:56:33 +08:00
ntohidi	ee25c771d8	feat(cli): add deep crawling options with configurable strategies and max pages. ref #874	2025-07-02 14:07:23 +02:00