fix(dependencies): Update and clean up package versions in pyproject.toml, the bundle size will be much smaller.

feat(extraction_strategy): Enhance schema generation with improved validation and task description handling
fix(prompts): Update GENERATE_SCRIPT_PROMPT to raw string for better formatting docs: Add missing import for GENERATE_SCRIPT_PROMPT in hello_world example
2025-07-29 19:56:27 +08:00 · 2025-07-29 19:33:36 +08:00 · 2025-07-24 20:11:43 +08:00 · 2025-07-21 21:19:37 +08:00 · 2025-07-18 16:27:19 +08:00
96 changed files with 2818 additions and 12298 deletions
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@@ -1,142 +0,0 @@
-name: Release Pipeline
-on:
-  push:
-    tags:
-      - 'v*'
-      - '!test-v*'  # Exclude test tags
-
-jobs:
-  release:
-    runs-on: ubuntu-latest
-    permissions:
-      contents: write  # Required for creating releases
-    
-    steps:
-      - name: Checkout code
-        uses: actions/checkout@v4
-      
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: '3.12'
-      
-      - name: Extract version from tag
-        id: get_version
-        run: |
-          TAG_VERSION=${GITHUB_REF#refs/tags/v}
-          echo "VERSION=$TAG_VERSION" >> $GITHUB_OUTPUT
-          echo "Releasing version: $TAG_VERSION"
-      
-      - name: Install package dependencies
-        run: |
-          pip install -e .
-      
-      - name: Check version consistency
-        run: |
-          TAG_VERSION=${{ steps.get_version.outputs.VERSION }}
-          PACKAGE_VERSION=$(python -c "from crawl4ai.__version__ import __version__; print(__version__)")
-          
-          echo "Tag version: $TAG_VERSION"
-          echo "Package version: $PACKAGE_VERSION"
-          
-          if [ "$TAG_VERSION" != "$PACKAGE_VERSION" ]; then
-            echo "❌ Version mismatch! Tag: $TAG_VERSION, Package: $PACKAGE_VERSION"
-            echo "Please update crawl4ai/__version__.py to match the tag version"
-            exit 1
-          fi
-          echo "✅ Version check passed: $TAG_VERSION"
-      
-      - name: Install build dependencies
-        run: |
-          python -m pip install --upgrade pip
-          pip install build twine
-      
-      - name: Build package
-        run: python -m build
-      
-      - name: Check package
-        run: twine check dist/*
-      
-      - name: Upload to PyPI
-        env:
-          TWINE_USERNAME: __token__
-          TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
-        run: |
-          echo "📦 Uploading to PyPI..."
-          twine upload dist/*
-          echo "✅ Package uploaded to https://pypi.org/project/crawl4ai/"
-      
-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v3
-      
-      - name: Log in to Docker Hub
-        uses: docker/login-action@v3
-        with:
-          username: ${{ secrets.DOCKER_USERNAME }}
-          password: ${{ secrets.DOCKER_TOKEN }}
-      
-      - name: Extract major and minor versions
-        id: versions
-        run: |
-          VERSION=${{ steps.get_version.outputs.VERSION }}
-          MAJOR=$(echo $VERSION | cut -d. -f1)
-          MINOR=$(echo $VERSION | cut -d. -f1-2)
-          echo "MAJOR=$MAJOR" >> $GITHUB_OUTPUT
-          echo "MINOR=$MINOR" >> $GITHUB_OUTPUT
-      
-      - name: Build and push Docker images
-        uses: docker/build-push-action@v5
-        with:
-          context: .
-          push: true
-          tags: |
-            unclecode/crawl4ai:${{ steps.get_version.outputs.VERSION }}
-            unclecode/crawl4ai:${{ steps.versions.outputs.MINOR }}
-            unclecode/crawl4ai:${{ steps.versions.outputs.MAJOR }}
-            unclecode/crawl4ai:latest
-          platforms: linux/amd64,linux/arm64
-      
-      - name: Create GitHub Release
-        uses: softprops/action-gh-release@v2
-        with:
-          tag_name: v${{ steps.get_version.outputs.VERSION }}
-          name: Release v${{ steps.get_version.outputs.VERSION }}
-          body: |
-            ## 🎉 Crawl4AI v${{ steps.get_version.outputs.VERSION }} Released!
-            
-            ### 📦 Installation
-            
-            **PyPI:**
-            ```bash
-            pip install crawl4ai==${{ steps.get_version.outputs.VERSION }}
-            ```
-            
-            **Docker:**
-            ```bash
-            docker pull unclecode/crawl4ai:${{ steps.get_version.outputs.VERSION }}
-            docker pull unclecode/crawl4ai:latest
-            ```
-            
-            ### 📝 What's Changed
-            See [CHANGELOG.md](https://github.com/${{ github.repository }}/blob/main/CHANGELOG.md) for details.
-          draft: false
-          prerelease: false
-          token: ${{ secrets.GITHUB_TOKEN }}
-      
-      - name: Summary
-        run: |
-          echo "## 🚀 Release Complete!" >> $GITHUB_STEP_SUMMARY
-          echo "" >> $GITHUB_STEP_SUMMARY
-          echo "### 📦 PyPI Package" >> $GITHUB_STEP_SUMMARY
-          echo "- Version: ${{ steps.get_version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
-          echo "- URL: https://pypi.org/project/crawl4ai/" >> $GITHUB_STEP_SUMMARY
-          echo "- Install: \`pip install crawl4ai==${{ steps.get_version.outputs.VERSION }}\`" >> $GITHUB_STEP_SUMMARY
-          echo "" >> $GITHUB_STEP_SUMMARY
-          echo "### 🐳 Docker Images" >> $GITHUB_STEP_SUMMARY
-          echo "- \`unclecode/crawl4ai:${{ steps.get_version.outputs.VERSION }}\`" >> $GITHUB_STEP_SUMMARY
-          echo "- \`unclecode/crawl4ai:${{ steps.versions.outputs.MINOR }}\`" >> $GITHUB_STEP_SUMMARY
-          echo "- \`unclecode/crawl4ai:${{ steps.versions.outputs.MAJOR }}\`" >> $GITHUB_STEP_SUMMARY
-          echo "- \`unclecode/crawl4ai:latest\`" >> $GITHUB_STEP_SUMMARY
-          echo "" >> $GITHUB_STEP_SUMMARY
-          echo "### 📋 GitHub Release" >> $GITHUB_STEP_SUMMARY
-          echo "https://github.com/${{ github.repository }}/releases/tag/v${{ steps.get_version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
--- a/.github/workflows/test-release.yml.disabled
+++ b/.github/workflows/test-release.yml.disabled
@@ -1,116 +0,0 @@
-name: Test Release Pipeline
-on:
-  push:
-    tags:
-      - 'test-v*'
-
-jobs:
-  test-release:
-    runs-on: ubuntu-latest
-    
-    steps:
-      - name: Checkout code
-        uses: actions/checkout@v4
-      
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: '3.12'
-      
-      - name: Extract version from tag
-        id: get_version
-        run: |
-          TAG_VERSION=${GITHUB_REF#refs/tags/test-v}
-          echo "VERSION=$TAG_VERSION" >> $GITHUB_OUTPUT
-          echo "Testing with version: $TAG_VERSION"
-      
-      - name: Install package dependencies
-        run: |
-          pip install -e .
-      
-      - name: Check version consistency
-        run: |
-          TAG_VERSION=${{ steps.get_version.outputs.VERSION }}
-          PACKAGE_VERSION=$(python -c "from crawl4ai.__version__ import __version__; print(__version__)")
-          
-          echo "Tag version: $TAG_VERSION"
-          echo "Package version: $PACKAGE_VERSION"
-          
-          if [ "$TAG_VERSION" != "$PACKAGE_VERSION" ]; then
-            echo "❌ Version mismatch! Tag: $TAG_VERSION, Package: $PACKAGE_VERSION"
-            echo "Please update crawl4ai/__version__.py to match the tag version"
-            exit 1
-          fi
-          echo "✅ Version check passed: $TAG_VERSION"
-      
-      - name: Install build dependencies
-        run: |
-          python -m pip install --upgrade pip
-          pip install build twine
-      
-      - name: Build package
-        run: python -m build
-      
-      - name: Check package
-        run: twine check dist/*
-      
-      - name: Upload to Test PyPI
-        env:
-          TWINE_USERNAME: __token__
-          TWINE_PASSWORD: ${{ secrets.TEST_PYPI_TOKEN }}
-        run: |
-          echo "📦 Uploading to Test PyPI..."
-          twine upload --repository testpypi dist/* || {
-            if [ $? -eq 1 ]; then
-              echo "⚠️ Upload failed - likely version already exists on Test PyPI"
-              echo "Continuing anyway for test purposes..."
-            else
-              exit 1
-            fi
-          }
-          echo "✅ Test PyPI step complete"
-      
-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v3
-      
-      - name: Log in to Docker Hub
-        uses: docker/login-action@v3
-        with:
-          username: ${{ secrets.DOCKER_USERNAME }}
-          password: ${{ secrets.DOCKER_TOKEN }}
-      
-      - name: Build and push Docker test images
-        uses: docker/build-push-action@v5
-        with:
-          context: .
-          push: true
-          tags: |
-            unclecode/crawl4ai:test-${{ steps.get_version.outputs.VERSION }}
-            unclecode/crawl4ai:test-latest
-          platforms: linux/amd64,linux/arm64
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
-      
-      - name: Summary
-        run: |
-          echo "## 🎉 Test Release Complete!" >> $GITHUB_STEP_SUMMARY
-          echo "" >> $GITHUB_STEP_SUMMARY
-          echo "### 📦 Test PyPI Package" >> $GITHUB_STEP_SUMMARY
-          echo "- Version: ${{ steps.get_version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
-          echo "- URL: https://test.pypi.org/project/crawl4ai/" >> $GITHUB_STEP_SUMMARY
-          echo "- Install: \`pip install -i https://test.pypi.org/simple/ crawl4ai==${{ steps.get_version.outputs.VERSION }}\`" >> $GITHUB_STEP_SUMMARY
-          echo "" >> $GITHUB_STEP_SUMMARY
-          echo "### 🐳 Docker Test Images" >> $GITHUB_STEP_SUMMARY
-          echo "- \`unclecode/crawl4ai:test-${{ steps.get_version.outputs.VERSION }}\`" >> $GITHUB_STEP_SUMMARY
-          echo "- \`unclecode/crawl4ai:test-latest\`" >> $GITHUB_STEP_SUMMARY
-          echo "" >> $GITHUB_STEP_SUMMARY
-          echo "### 🧹 Cleanup Commands" >> $GITHUB_STEP_SUMMARY
-          echo "\`\`\`bash" >> $GITHUB_STEP_SUMMARY
-          echo "# Remove test tag" >> $GITHUB_STEP_SUMMARY
-          echo "git tag -d test-v${{ steps.get_version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
-          echo "git push origin :test-v${{ steps.get_version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
-          echo "" >> $GITHUB_STEP_SUMMARY
-          echo "# Remove Docker test images" >> $GITHUB_STEP_SUMMARY
-          echo "docker rmi unclecode/crawl4ai:test-${{ steps.get_version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
-          echo "docker rmi unclecode/crawl4ai:test-latest" >> $GITHUB_STEP_SUMMARY
-          echo "\`\`\`" >> $GITHUB_STEP_SUMMARY
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -21,21 +21,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ## [Unreleased]

-### Added
- **Flexible LLM Provider Configuration** (Docker): 
-  - Support for `LLM_PROVIDER` environment variable to override default provider
-  - Per-request provider override via optional `provider` parameter in API endpoints
-  - Automatic provider validation with clear error messages
-  - Updated Docker documentation and examples
-
-### Changed
- **WebScrapingStrategy Refactoring**: Simplified content scraping architecture
-  - `WebScrapingStrategy` is now an alias for `LXMLWebScrapingStrategy` for backward compatibility
-  - Removed redundant BeautifulSoup-based implementation (~1000 lines of code)
-  - `LXMLWebScrapingStrategy` now inherits directly from `ContentScrapingStrategy`
-  - All existing code using `WebScrapingStrategy` continues to work without modification
-  - Default scraping strategy remains `LXMLWebScrapingStrategy` for optimal performance
-
 ### Added
 - **AsyncUrlSeeder**: High-performance URL discovery system for intelligent crawling at scale
  - Discover URLs from sitemaps and Common Crawl index
--- a/PROGRESSIVE_CRAWLING.md
+++ b/PROGRESSIVE_CRAWLING.md
@@ -216,7 +216,7 @@ Under certain assumptions about link preview accuracy:

 ### 8.1 Core Components

-1. **CrawlState**: Maintains crawl history and metrics
+1. **AdaptiveCrawlResult**: Maintains crawl history and metrics
 2. **AdaptiveConfig**: Configuration parameters
 3. **CrawlStrategy**: Pluggable strategy interface
 4. **AdaptiveCrawler**: Main orchestrator
--- a/README.md
+++ b/README.md
@@ -28,7 +28,7 @@ Crawl4AI is the #1 trending GitHub repository, actively maintained by a vibrant

 [✨ Check out latest update v0.7.0](#-recent-updates)

-🎉 **Version 0.7.0 is now available!** The Adaptive Intelligence Update introduces groundbreaking features: Adaptive Crawling that learns website patterns, Virtual Scroll support for infinite pages, intelligent Link Preview with 3-layer scoring, Async URL Seeder for massive discovery, and significant performance improvements. [Read the release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.0.md)
+🎉 **Version 0.7.0 is now available!** The Adaptive Intelligence Update introduces groundbreaking features: Adaptive Crawling that learns website patterns, Virtual Scroll support for infinite pages, intelligent Link Preview with 3-layer scoring, Async URL Seeder for massive discovery, and significant performance improvements. [Read the release notes →](https://docs.crawl4ai.com/blog/release-v0.7.0)

 <details>
 <summary>🤓 <strong>My Personal Story</strong></summary>
@@ -618,16 +618,16 @@ Read the full details in our [0.7.0 Release Notes](https://docs.crawl4ai.com/blo
        # Process results
        raw_df = pd.DataFrame()
        for result in results:
-            if result.success and result.tables:
+            if result.success and result.media["tables"]:
                raw_df = pd.DataFrame(
-                    result.tables[0]["rows"],
-                    columns=result.tables[0]["headers"],
+                    result.media["tables"][0]["rows"],
+                    columns=result.media["tables"][0]["headers"],
                )
                break
        print(raw_df.head())

    finally:
-        await crawler.close()
+        await crawler.stop()
  ```

 - **🚀 Browser Pooling**: Pages launch hot with pre-warmed browser instances for lower latency and memory usage
--- a/crawl4ai/init.py
+++ b/crawl4ai/init.py
@@ -3,12 +3,12 @@ import warnings

 from .async_webcrawler import AsyncWebCrawler, CacheMode
 # MODIFIED: Add SeedingConfig and VirtualScrollConfig here
-from .async_configs import BrowserConfig, CrawlerRunConfig, HTTPCrawlerConfig, LLMConfig, ProxyConfig, GeolocationConfig, SeedingConfig, VirtualScrollConfig, LinkPreviewConfig, MatchMode
+from .async_configs import BrowserConfig, CrawlerRunConfig, HTTPCrawlerConfig, LLMConfig, ProxyConfig, GeolocationConfig, SeedingConfig, VirtualScrollConfig, LinkPreviewConfig

 from .content_scraping_strategy import (
    ContentScrapingStrategy,
+    WebScrapingStrategy,
    LXMLWebScrapingStrategy,
-    WebScrapingStrategy,  # Backward compatibility alias
 )
 from .async_logger import (
    AsyncLoggerBase,
@@ -73,7 +73,7 @@ from .async_url_seeder import AsyncUrlSeeder
 from .adaptive_crawler import (
    AdaptiveCrawler,
    AdaptiveConfig,
-    CrawlState,
+    AdaptiveCrawlResult,
    CrawlStrategy,
    StatisticalStrategy
 )
@@ -88,13 +88,6 @@ from .script import (
    ErrorDetail
 )

-# Browser Adapters
-from .browser_adapter import (
-    BrowserAdapter,
-    PlaywrightAdapter,
-    UndetectedAdapter
-)
-
 from .utils import (
    start_colab_display_server,
    setup_colab_environment
@@ -115,7 +108,7 @@ __all__ = [
    # Adaptive Crawler
    "AdaptiveCrawler",
    "AdaptiveConfig", 
-    "CrawlState",
+    "AdaptiveCrawlResult",
    "CrawlStrategy",
    "StatisticalStrategy",
    "DeepCrawlStrategy",
@@ -139,7 +132,6 @@ __all__ = [
    "CrawlResult",
    "CrawlerHub",
    "CacheMode",
-    "MatchMode",
    "ContentScrapingStrategy",
    "WebScrapingStrategy",
    "LXMLWebScrapingStrategy",
@@ -181,10 +173,6 @@ __all__ = [
    "CompilationResult",
    "ValidationResult",
    "ErrorDetail",
-    # Browser Adapters
-    "BrowserAdapter",
-    "PlaywrightAdapter", 
-    "UndetectedAdapter",
    "LinkPreviewConfig"
 ]

--- a/crawl4ai/version.py
+++ b/crawl4ai/version.py
@@ -1,7 +1,7 @@
 # crawl4ai/__version__.py

 # This is the version that will be used for stable releases
-__version__ = "0.7.2"
+__version__ = "0.7.1"

 # For nightly builds, this gets set during build process
 __nightly_version__ = None
--- a/crawl4ai/adaptive_crawler
+++ b/crawl4ai/adaptive_crawler
--- a/crawl4ai/adaptive_crawler.py
+++ b/crawl4ai/adaptive_crawler.py
@@ -24,7 +24,7 @@ from crawl4ai.models import Link, CrawlResult
 import numpy as np

@dataclass
-class CrawlState:
+class AdaptiveCrawlResult:
    """Tracks the current state of adaptive crawling"""
    crawled_urls: Set[str] = field(default_factory=set)
    knowledge_base: List[CrawlResult] = field(default_factory=list)
@@ -80,7 +80,7 @@ class CrawlState:
            json.dump(state_dict, f, indent=2)
    
    @classmethod
-    def load(cls, path: Union[str, Path]) -> 'CrawlState':
+    def load(cls, path: Union[str, Path]) -> 'AdaptiveCrawlResult':
        """Load state from disk"""
        path = Path(path)
        with open(path, 'r') as f:
@@ -256,22 +256,22 @@ class CrawlStrategy(ABC):
    """Abstract base class for crawling strategies"""
    
    @abstractmethod
-    async def calculate_confidence(self, state: CrawlState) -> float:
+    async def calculate_confidence(self, state: AdaptiveCrawlResult) -> float:
        """Calculate overall confidence that we have sufficient information"""
        pass
    
    @abstractmethod
-    async def rank_links(self, state: CrawlState, config: AdaptiveConfig) -> List[Tuple[Link, float]]:
+    async def rank_links(self, state: AdaptiveCrawlResult, config: AdaptiveConfig) -> List[Tuple[Link, float]]:
        """Rank pending links by expected information gain"""
        pass
    
    @abstractmethod
-    async def should_stop(self, state: CrawlState, config: AdaptiveConfig) -> bool:
+    async def should_stop(self, state: AdaptiveCrawlResult, config: AdaptiveConfig) -> bool:
        """Determine if crawling should stop"""
        pass
    
    @abstractmethod
-    async def update_state(self, state: CrawlState, new_results: List[CrawlResult]) -> None:
+    async def update_state(self, state: AdaptiveCrawlResult, new_results: List[CrawlResult]) -> None:
        """Update state with new crawl results"""
        pass

@@ -284,7 +284,7 @@ class StatisticalStrategy(CrawlStrategy):
        self.bm25_k1 = 1.2  # BM25 parameter
        self.bm25_b = 0.75  # BM25 parameter
        
-    async def calculate_confidence(self, state: CrawlState) -> float:
+    async def calculate_confidence(self, state: AdaptiveCrawlResult) -> float:
        """Calculate confidence using coverage, consistency, and saturation"""
        if not state.knowledge_base:
            return 0.0
@@ -303,7 +303,7 @@ class StatisticalStrategy(CrawlStrategy):
        
        return confidence
    
-    def _calculate_coverage(self, state: CrawlState) -> float:
+    def _calculate_coverage(self, state: AdaptiveCrawlResult) -> float:
        """Coverage scoring - measures query term presence across knowledge base
        
        Returns a score between 0 and 1, where:
@@ -344,7 +344,7 @@ class StatisticalStrategy(CrawlStrategy):
        # This helps differentiate between partial and good coverage
        return min(1.0, math.sqrt(coverage))
    
-    def _calculate_consistency(self, state: CrawlState) -> float:
+    def _calculate_consistency(self, state: AdaptiveCrawlResult) -> float:
        """Information overlap between pages - high overlap suggests coherent topic coverage"""
        if len(state.knowledge_base) < 2:
            return 1.0  # Single or no documents are perfectly consistent
@@ -371,7 +371,7 @@ class StatisticalStrategy(CrawlStrategy):
            
        return consistency
    
-    def _calculate_saturation(self, state: CrawlState) -> float:
+    def _calculate_saturation(self, state: AdaptiveCrawlResult) -> float:
        """Diminishing returns indicator - are we still discovering new information?"""
        if not state.new_terms_history:
            return 0.0
@@ -388,7 +388,7 @@ class StatisticalStrategy(CrawlStrategy):
        
        return max(0.0, min(saturation, 1.0))
    
-    async def rank_links(self, state: CrawlState, config: AdaptiveConfig) -> List[Tuple[Link, float]]:
+    async def rank_links(self, state: AdaptiveCrawlResult, config: AdaptiveConfig) -> List[Tuple[Link, float]]:
        """Rank links by expected information gain"""
        scored_links = []
        
@@ -415,7 +415,7 @@ class StatisticalStrategy(CrawlStrategy):
        
        return scored_links
    
-    def _calculate_relevance(self, link: Link, state: CrawlState) -> float:
+    def _calculate_relevance(self, link: Link, state: AdaptiveCrawlResult) -> float:
        """BM25 relevance score between link preview and query"""
        if not state.query or not link:
            return 0.0
@@ -447,7 +447,7 @@ class StatisticalStrategy(CrawlStrategy):
        overlap = len(query_terms & link_terms) / len(query_terms)
        return overlap
    
-    def _calculate_novelty(self, link: Link, state: CrawlState) -> float:
+    def _calculate_novelty(self, link: Link, state: AdaptiveCrawlResult) -> float:
        """Estimate how much new information this link might provide"""
        if not state.knowledge_base:
            return 1.0  # First links are maximally novel
@@ -502,7 +502,7 @@ class StatisticalStrategy(CrawlStrategy):
            
        return min(score, 1.0)
    
-    async def should_stop(self, state: CrawlState, config: AdaptiveConfig) -> bool:
+    async def should_stop(self, state: AdaptiveCrawlResult, config: AdaptiveConfig) -> bool:
        """Determine if crawling should stop"""
        # Check confidence threshold
        confidence = state.metrics.get('confidence', 0.0)
@@ -523,7 +523,7 @@ class StatisticalStrategy(CrawlStrategy):
            
        return False
    
-    async def update_state(self, state: CrawlState, new_results: List[CrawlResult]) -> None:
+    async def update_state(self, state: AdaptiveCrawlResult, new_results: List[CrawlResult]) -> None:
        """Update state with new crawl results"""
        for result in new_results:
            # Track new terms
@@ -921,7 +921,7 @@ class EmbeddingStrategy(CrawlStrategy):
            
        return sorted(scored_links, key=lambda x: x[1], reverse=True)

-    async def calculate_confidence(self, state: CrawlState) -> float:
+    async def calculate_confidence(self, state: AdaptiveCrawlResult) -> float:
        """Coverage-based learning score (0–1)."""
        # Guard clauses
        if state.kb_embeddings is None or state.query_embeddings is None:
@@ -951,7 +951,7 @@ class EmbeddingStrategy(CrawlStrategy):


    
-    # async def calculate_confidence(self, state: CrawlState) -> float:
+    # async def calculate_confidence(self, state: AdaptiveCrawlResult) -> float:
    #     """Calculate learning score for adaptive crawling (used for stopping)"""
    #     
        
@@ -1021,7 +1021,7 @@ class EmbeddingStrategy(CrawlStrategy):
    #     # For stopping criteria, return learning score
    #     return float(learning_score)
        
-    async def rank_links(self, state: CrawlState, config: AdaptiveConfig) -> List[Tuple[Link, float]]:
+    async def rank_links(self, state: AdaptiveCrawlResult, config: AdaptiveConfig) -> List[Tuple[Link, float]]:
        """Main entry point for link ranking"""
        # Store config for use in other methods
        self.config = config
@@ -1052,7 +1052,7 @@ class EmbeddingStrategy(CrawlStrategy):
            state.kb_embeddings
        )
        
-    async def validate_coverage(self, state: CrawlState) -> float:
+    async def validate_coverage(self, state: AdaptiveCrawlResult) -> float:
        """Validate coverage using held-out queries with caching"""
        if not hasattr(self, '_validation_queries') or not self._validation_queries:
            return state.metrics.get('confidence', 0.0)
@@ -1088,7 +1088,7 @@ class EmbeddingStrategy(CrawlStrategy):
        
        return validation_confidence
    
-    async def should_stop(self, state: CrawlState, config: AdaptiveConfig) -> bool:
+    async def should_stop(self, state: AdaptiveCrawlResult, config: AdaptiveConfig) -> bool:
        """Stop based on learning curve convergence"""
        confidence = state.metrics.get('confidence', 0.0)
        
@@ -1139,7 +1139,7 @@ class EmbeddingStrategy(CrawlStrategy):
        
        return False
        
-    def get_quality_confidence(self, state: CrawlState) -> float:
+    def get_quality_confidence(self, state: AdaptiveCrawlResult) -> float:
        """Calculate quality-based confidence score for display"""
        learning_score = state.metrics.get('learning_score', 0.0)
        validation_score = state.metrics.get('validation_confidence', 0.0)
@@ -1166,7 +1166,7 @@ class EmbeddingStrategy(CrawlStrategy):
            
        return confidence
    
-    async def update_state(self, state: CrawlState, new_results: List[CrawlResult]) -> None:
+    async def update_state(self, state: AdaptiveCrawlResult, new_results: List[CrawlResult]) -> None:
        """Update embeddings and coverage metrics with deduplication"""
        from .utils import get_text_embeddings
        
@@ -1246,7 +1246,7 @@ class AdaptiveCrawler:
            self.strategy = self._create_strategy(self.config.strategy)
        
        # Initialize state
-        self.state: Optional[CrawlState] = None
+        self.state: Optional[AdaptiveCrawlResult] = None
        
        # Track if we own the crawler (for cleanup)
        self._owns_crawler = crawler is None
@@ -1266,14 +1266,14 @@ class AdaptiveCrawler:
    async def digest(self, 
                               start_url: str, 
                               query: str,
-                               resume_from: Optional[str] = None) -> CrawlState:
+                               resume_from: Optional[str] = None) -> AdaptiveCrawlResult:
        """Main entry point for adaptive crawling"""
        # Initialize or resume state
        if resume_from:
-            self.state = CrawlState.load(resume_from)
+            self.state = AdaptiveCrawlResult.load(resume_from)
            self.state.query = query  # Update query in case it changed
        else:
-            self.state = CrawlState(
+            self.state = AdaptiveCrawlResult(
                crawled_urls=set(),
                knowledge_base=[],
                pending_links=[],
@@ -1803,7 +1803,7 @@ class AdaptiveCrawler:
            
            # Initialize state if needed
            if not self.state:
-                self.state = CrawlState()
+                self.state = AdaptiveCrawlResult()
            
            # Add imported results
            self.state.knowledge_base.extend(imported_results)
--- a/crawl4ai/async_configs.py
+++ b/crawl4ai/async_configs.py
@@ -18,24 +18,17 @@ from .extraction_strategy import ExtractionStrategy, LLMExtractionStrategy
 from .chunking_strategy import ChunkingStrategy, RegexChunking

 from .markdown_generation_strategy import MarkdownGenerationStrategy, DefaultMarkdownGenerator
-from .content_scraping_strategy import ContentScrapingStrategy, LXMLWebScrapingStrategy
+from .content_scraping_strategy import ContentScrapingStrategy, WebScrapingStrategy, LXMLWebScrapingStrategy
 from .deep_crawling import DeepCrawlStrategy

 from .cache_context import CacheMode
 from .proxy_strategy import ProxyRotationStrategy

-from typing import Union, List, Callable
+from typing import Union, List
 import inspect
 from typing import Any, Dict, Optional
 from enum import Enum

-# Type alias for URL matching
-UrlMatcher = Union[str, Callable[[str], bool], List[Union[str, Callable[[str], bool]]]]
-
-class MatchMode(Enum):
-    OR = "or"
-    AND = "and"
-
 # from .proxy_strategy import ProxyConfig


@@ -390,8 +383,6 @@ class BrowserConfig:
        light_mode (bool): Disables certain background features for performance gains. Default: False.
        extra_args (list): Additional command-line arguments passed to the browser.
                           Default: [].
-        enable_stealth (bool): If True, applies playwright-stealth to bypass basic bot detection.
-                              Cannot be used with use_undetected browser mode. Default: False.
    """

    def __init__(
@@ -432,7 +423,6 @@ class BrowserConfig:
        extra_args: list = None,
        debugging_port: int = 9222,
        host: str = "localhost",
-        enable_stealth: bool = False,
    ):
        self.browser_type = browser_type
        self.headless = headless 
@@ -473,7 +463,6 @@ class BrowserConfig:
        self.verbose = verbose
        self.debugging_port = debugging_port
        self.host = host
-        self.enable_stealth = enable_stealth

        fa_user_agenr_generator = ValidUAGenerator()
        if self.user_agent_mode == "random":
@@ -505,13 +494,6 @@ class BrowserConfig:
        # If persistent context is requested, ensure managed browser is enabled
        if self.use_persistent_context:
            self.use_managed_browser = True
-            
-        # Validate stealth configuration
-        if self.enable_stealth and self.use_managed_browser and self.browser_mode == "builtin":
-            raise ValueError(
-                "enable_stealth cannot be used with browser_mode='builtin'. "
-                "Stealth mode requires a dedicated browser instance."
-            )

    @staticmethod
    def from_kwargs(kwargs: dict) -> "BrowserConfig":
@@ -548,7 +530,6 @@ class BrowserConfig:
            extra_args=kwargs.get("extra_args", []),
            debugging_port=kwargs.get("debugging_port", 9222),
            host=kwargs.get("host", "localhost"),
-            enable_stealth=kwargs.get("enable_stealth", False),
        )

    def to_dict(self):
@@ -583,7 +564,6 @@ class BrowserConfig:
            "verbose": self.verbose,
            "debugging_port": self.debugging_port,
            "host": self.host,
-            "enable_stealth": self.enable_stealth,
        }

                
@@ -882,7 +862,7 @@ class CrawlerRunConfig():
        parser_type (str): Type of parser to use for HTML parsing.
                           Default: "lxml".
        scraping_strategy (ContentScrapingStrategy): Scraping strategy to use.
-                           Default: LXMLWebScrapingStrategy.
+                           Default: WebScrapingStrategy.
        proxy_config (ProxyConfig or dict or None): Detailed proxy configuration, e.g. {"server": "...", "username": "..."}.
                                     If None, no additional proxy config. Default: None.

@@ -1133,9 +1113,6 @@ class CrawlerRunConfig():
        link_preview_config: Union[LinkPreviewConfig, Dict[str, Any]] = None,
        # Virtual Scroll Parameters
        virtual_scroll_config: Union[VirtualScrollConfig, Dict[str, Any]] = None,
-        # URL Matching Parameters
-        url_matcher: Optional[UrlMatcher] = None,
-        match_mode: MatchMode = MatchMode.OR,
        # Experimental Parameters
        experimental: Dict[str, Any] = None,
    ):
@@ -1289,10 +1266,6 @@ class CrawlerRunConfig():
        else:
            raise ValueError("virtual_scroll_config must be VirtualScrollConfig object or dict")
        
-        # URL Matching Parameters
-        self.url_matcher = url_matcher
-        self.match_mode = match_mode
-        
        # Experimental Parameters
        self.experimental = experimental or {}
        
@@ -1348,51 +1321,6 @@ class CrawlerRunConfig():
            if "compilation error" not in str(e).lower():
                raise ValueError(f"Failed to compile C4A script: {str(e)}")
            raise
-    
-    def is_match(self, url: str) -> bool:
-        """Check if this config matches the given URL.
-        
-        Args:
-            url: The URL to check against this config's matcher
-            
-        Returns:
-            bool: True if this config should be used for the URL or if no matcher is set.
-        """
-        if self.url_matcher is None:
-            return True
-            
-        if callable(self.url_matcher):
-            # Single function matcher
-            return self.url_matcher(url)
-        
-        elif isinstance(self.url_matcher, str):
-            # Single pattern string
-            from fnmatch import fnmatch
-            return fnmatch(url, self.url_matcher)
-        
-        elif isinstance(self.url_matcher, list):
-            # List of mixed matchers
-            if not self.url_matcher:  # Empty list
-                return False
-                
-            results = []
-            for matcher in self.url_matcher:
-                if callable(matcher):
-                    results.append(matcher(url))
-                elif isinstance(matcher, str):
-                    from fnmatch import fnmatch
-                    results.append(fnmatch(url, matcher))
-                else:
-                    # Skip invalid matchers
-                    continue
-            
-            # Apply match mode logic
-            if self.match_mode == MatchMode.OR:
-                return any(results) if results else False
-            else:  # AND mode
-                return all(results) if results else False
-        
-        return False


    def __getattr__(self, name):
@@ -1515,9 +1443,6 @@ class CrawlerRunConfig():
            # Link Extraction Parameters
            link_preview_config=kwargs.get("link_preview_config"),
            url=kwargs.get("url"),
-            # URL Matching Parameters
-            url_matcher=kwargs.get("url_matcher"),
-            match_mode=kwargs.get("match_mode", MatchMode.OR),
            # Experimental Parameters 
            experimental=kwargs.get("experimental"),
        )
@@ -1615,8 +1540,6 @@ class CrawlerRunConfig():
            "deep_crawl_strategy": self.deep_crawl_strategy,
            "link_preview_config": self.link_preview_config.to_dict() if self.link_preview_config else None,
            "url": self.url,
-            "url_matcher": self.url_matcher,
-            "match_mode": self.match_mode,
            "experimental": self.experimental,
        }

--- a/crawl4ai/async_crawler_strategy.back.py
+++ b/crawl4ai/async_crawler_strategy.back.py
--- a/crawl4ai/async_crawler_strategy.py
+++ b/crawl4ai/async_crawler_strategy.py
@@ -21,7 +21,6 @@ from .async_logger import AsyncLogger
 from .ssl_certificate import SSLCertificate
 from .user_agent_generator import ValidUAGenerator
 from .browser_manager import BrowserManager
-from .browser_adapter import BrowserAdapter, PlaywrightAdapter, UndetectedAdapter

 import aiofiles
 import aiohttp
@@ -72,7 +71,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
    """

    def __init__(
-        self, browser_config: BrowserConfig = None, logger: AsyncLogger = None, browser_adapter: BrowserAdapter = None, **kwargs
+        self, browser_config: BrowserConfig = None, logger: AsyncLogger = None, **kwargs
    ):
        """
        Initialize the AsyncPlaywrightCrawlerStrategy with a browser configuration.
@@ -81,16 +80,11 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
            browser_config (BrowserConfig): Configuration object containing browser settings.
                                          If None, will be created from kwargs for backwards compatibility.
            logger: Logger instance for recording events and errors.
-            browser_adapter (BrowserAdapter): Browser adapter for handling browser-specific operations.
-                                           If None, defaults to PlaywrightAdapter.
            **kwargs: Additional arguments for backwards compatibility and extending functionality.
        """
        # Initialize browser config, either from provided object or kwargs
        self.browser_config = browser_config or BrowserConfig.from_kwargs(kwargs)
        self.logger = logger
-        
-        # Initialize browser adapter
-        self.adapter = browser_adapter or PlaywrightAdapter()

        # Initialize session management
        self._downloaded_files = []
@@ -110,9 +104,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):

        # Initialize browser manager with config
        self.browser_manager = BrowserManager(
-            browser_config=self.browser_config, 
-            logger=self.logger,
-            use_undetected=isinstance(self.adapter, UndetectedAdapter)
+            browser_config=self.browser_config, logger=self.logger
        )

    async def __aenter__(self):
@@ -330,7 +322,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
        """

        try:
-            result = await self.adapter.evaluate(page, wrapper_js)
+            result = await page.evaluate(wrapper_js)
            return result
        except Exception as e:
            if "Error evaluating condition" in str(e):
@@ -375,7 +367,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):

                    # Replace the iframe with a div containing the extracted content
                    _iframe = iframe_content.replace("`", "\\`")
-                    await self.adapter.evaluate(page,
+                    await page.evaluate(
                        f"""
                        () => {{
                            const iframe = document.getElementById('iframe-{i}');
@@ -636,16 +628,91 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
            page.on("requestfailed", handle_request_failed_capture)

        # Console Message Capturing
-        handle_console = None
-        handle_error = None
        if config.capture_console_messages:
-            # Set up console capture using adapter
-            handle_console = await self.adapter.setup_console_capture(page, captured_console)
-            handle_error = await self.adapter.setup_error_capture(page, captured_console)
+            def handle_console_capture(msg):
+                try:
+                    message_type = "unknown"
+                    try:
+                        message_type = msg.type
+                    except:
+                        pass
+                        
+                    message_text = "unknown"
+                    try:
+                        message_text = msg.text
+                    except:
+                        pass
+                        
+                    # Basic console message with minimal content
+                    entry = {
+                        "type": message_type,
+                        "text": message_text,
+                        "timestamp": time.time()
+                    }
+                    
+                    captured_console.append(entry)
+                    
+                except Exception as e:
+                    if self.logger:
+                        self.logger.warning(f"Error capturing console message: {e}", tag="CAPTURE")
+                    # Still add something to the list even on error
+                    captured_console.append({
+                        "type": "console_capture_error", 
+                        "error": str(e), 
+                        "timestamp": time.time()
+                    })
+
+            def handle_pageerror_capture(err):
+                try:
+                    error_message = "Unknown error"
+                    try:
+                        error_message = err.message
+                    except:
+                        pass
+                        
+                    error_stack = ""
+                    try:
+                        error_stack = err.stack
+                    except:
+                        pass
+                        
+                    captured_console.append({
+                        "type": "error",
+                        "text": error_message,
+                        "stack": error_stack,
+                        "timestamp": time.time()
+                    })
+                except Exception as e:
+                    if self.logger:
+                        self.logger.warning(f"Error capturing page error: {e}", tag="CAPTURE")
+                    captured_console.append({
+                        "type": "pageerror_capture_error", 
+                        "error": str(e), 
+                        "timestamp": time.time()
+                    })
+
+            # Add event listeners directly
+            page.on("console", handle_console_capture)
+            page.on("pageerror", handle_pageerror_capture)

        # Set up console logging if requested
-        # Note: For undetected browsers, console logging won't work directly
-        # but captured messages can still be logged after retrieval
+        if config.log_console:
+            def log_consol(
+                msg, console_log_type="debug"
+            ):  # Corrected the parameter syntax
+                if console_log_type == "error":
+                    self.logger.error(
+                        message=f"Console error: {msg}",  # Use f-string for variable interpolation
+                        tag="CONSOLE"
+                    )
+                elif console_log_type == "debug":
+                    self.logger.debug(
+                        message=f"Console: {msg}",  # Use f-string for variable interpolation
+                        tag="CONSOLE"
+                    )
+
+            page.on("console", log_consol)
+            page.on("pageerror", lambda e: log_consol(e, "error"))

        try:
            # Get SSL certificate information if requested and URL is HTTPS
@@ -757,7 +824,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
            except Error:
                visibility_info = await self.check_visibility(page)

-                if self.browser_config.verbose:
+                if self.browser_config.config.verbose:
                    self.logger.debug(
                        message="Body visibility info: {info}",
                        tag="DEBUG",
@@ -931,7 +998,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
                        await page.wait_for_load_state("domcontentloaded", timeout=5)
                    except PlaywrightTimeoutError:
                        pass
-                    await self.adapter.evaluate(page, update_image_dimensions_js)
+                    await page.evaluate(update_image_dimensions_js)
                except Exception as e:
                    self.logger.error(
                        message="Error updating image dimensions: {error}",
@@ -960,7 +1027,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
                    
                    for selector in selectors:
                        try:
-                            content = await self.adapter.evaluate(page,
+                            content = await page.evaluate(
                                f"""Array.from(document.querySelectorAll("{selector}"))
                                    .map(el => el.outerHTML)
                                    .join('')"""
@@ -1018,11 +1085,6 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
                await asyncio.sleep(delay)
                return await page.content()

-            # For undetected browsers, retrieve console messages before returning
-            if config.capture_console_messages and hasattr(self.adapter, 'retrieve_console_messages'):
-                final_messages = await self.adapter.retrieve_console_messages(page)
-                captured_console.extend(final_messages)
-
            # Return complete response
            return AsyncCrawlResponse(
                html=html,
@@ -1061,13 +1123,8 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
                    page.remove_listener("response", handle_response_capture)
                    page.remove_listener("requestfailed", handle_request_failed_capture)
                if config.capture_console_messages:
-                    # Retrieve any final console messages for undetected browsers
-                    if hasattr(self.adapter, 'retrieve_console_messages'):
-                        final_messages = await self.adapter.retrieve_console_messages(page)
-                        captured_console.extend(final_messages)
-                    
-                    # Clean up console capture
-                    await self.adapter.cleanup_console_capture(page, handle_console, handle_error)
+                    page.remove_listener("console", handle_console_capture)
+                    page.remove_listener("pageerror", handle_pageerror_capture)
                
                # Close the page
                await page.close()
@@ -1297,7 +1354,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
            """
            
            # Execute virtual scroll capture
-            result = await self.adapter.evaluate(page, virtual_scroll_js, config.to_dict())
+            result = await page.evaluate(virtual_scroll_js, config.to_dict())
            
            if result.get("replaced", False):
                self.logger.success(
@@ -1381,7 +1438,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
        remove_overlays_js = load_js_script("remove_overlay_elements")

        try:
-            await self.adapter.evaluate(page,
+            await page.evaluate(
                f"""
                (() => {{
                    try {{
@@ -1786,7 +1843,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
                        # When {script} contains statements (e.g., const link = …; link.click();), 
                        # this forms invalid JavaScript, causing Playwright execution error: SyntaxError: Unexpected token 'const'.
                        # """
-                        result = await self.adapter.evaluate(page,
+                        result = await page.evaluate(
                            f"""
                        (async () => {{
                            try {{
@@ -1908,7 +1965,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
            for script in scripts:
                try:
                    # Execute the script and wait for network idle
-                    result = await self.adapter.evaluate(page,
+                    result = await page.evaluate(
                        f"""
                        (() => {{
                            return new Promise((resolve) => {{
@@ -1992,7 +2049,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
        Returns:
            Boolean indicating visibility
        """
-        return await self.adapter.evaluate(page,
+        return await page.evaluate(
            """
            () => {
                const element = document.body;
@@ -2033,7 +2090,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
            Dict containing scroll status and position information
        """
        try:
-            result = await self.adapter.evaluate(page,
+            result = await page.evaluate(
                f"""() => {{
                    try {{
                        const startX = window.scrollX;
@@ -2090,7 +2147,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
        Returns:
            Dict containing width and height of the page
        """
-        return await self.adapter.evaluate(page,
+        return await page.evaluate(
            """
            () => {
                const {scrollWidth, scrollHeight} = document.documentElement;
@@ -2110,7 +2167,7 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
            bool: True if page needs scrolling
        """
        try:
-            need_scroll = await self.adapter.evaluate(page,
+            need_scroll = await page.evaluate(
                """
            () => {
                const scrollHeight = document.documentElement.scrollHeight;
@@ -2129,3 +2186,265 @@ class AsyncPlaywrightCrawlerStrategy(AsyncCrawlerStrategy):
            return True  # Default to scrolling if check fails


+####################################################################################################
+# HTTP Crawler Strategy
+####################################################################################################
+
+class HTTPCrawlerError(Exception):
+    """Base error class for HTTP crawler specific exceptions"""
+    pass
+
+
+class ConnectionTimeoutError(HTTPCrawlerError):
+    """Raised when connection timeout occurs"""
+    pass
+
+
+class HTTPStatusError(HTTPCrawlerError):
+    """Raised for unexpected status codes"""
+    def __init__(self, status_code: int, message: str):
+        self.status_code = status_code
+        super().__init__(f"HTTP {status_code}: {message}")
+
+
+class AsyncHTTPCrawlerStrategy(AsyncCrawlerStrategy):
+    """
+    Fast, lightweight HTTP-only crawler strategy optimized for memory efficiency.
+    """
+    
+    __slots__ = ('logger', 'max_connections', 'dns_cache_ttl', 'chunk_size', '_session', 'hooks', 'browser_config')
+
+    DEFAULT_TIMEOUT: Final[int] = 30
+    DEFAULT_CHUNK_SIZE: Final[int] = 64 * 1024  
+    DEFAULT_MAX_CONNECTIONS: Final[int] = min(32, (os.cpu_count() or 1) * 4)
+    DEFAULT_DNS_CACHE_TTL: Final[int] = 300
+    VALID_SCHEMES: Final = frozenset({'http', 'https', 'file', 'raw'})
+
+    _BASE_HEADERS: Final = MappingProxyType({
+        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
+        'Accept-Language': 'en-US,en;q=0.5',
+        'Accept-Encoding': 'gzip, deflate, br',
+        'Connection': 'keep-alive',
+        'Upgrade-Insecure-Requests': '1',
+        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
+    })
+    
+    def __init__(
+        self, 
+        browser_config: Optional[HTTPCrawlerConfig] = None,
+        logger: Optional[AsyncLogger] = None,
+        max_connections: int = DEFAULT_MAX_CONNECTIONS,
+        dns_cache_ttl: int = DEFAULT_DNS_CACHE_TTL,
+        chunk_size: int = DEFAULT_CHUNK_SIZE
+    ):
+        """Initialize the HTTP crawler with config"""
+        self.browser_config = browser_config or HTTPCrawlerConfig()
+        self.logger = logger
+        self.max_connections = max_connections
+        self.dns_cache_ttl = dns_cache_ttl
+        self.chunk_size = chunk_size
+        self._session: Optional[aiohttp.ClientSession] = None
+        
+        self.hooks = {
+            k: partial(self._execute_hook, k) 
+            for k in ('before_request', 'after_request', 'on_error')
+        }
+
+        # Set default hooks
+        self.set_hook('before_request', lambda *args, **kwargs: None)
+        self.set_hook('after_request', lambda *args, **kwargs: None)
+        self.set_hook('on_error', lambda *args, **kwargs: None)
+                      
+
+    async def __aenter__(self) -> AsyncHTTPCrawlerStrategy:
+        await self.start()
+        return self
+        
+    async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
+        await self.close()
+
+    @contextlib.asynccontextmanager
+    async def _session_context(self):
+        try:
+            if not self._session:
+                await self.start()
+            yield self._session
+        finally:
+            pass
+
+    def set_hook(self, hook_type: str, hook_func: Callable) -> None:
+        if hook_type in self.hooks:
+            self.hooks[hook_type] = partial(self._execute_hook, hook_type, hook_func)
+        else:
+            raise ValueError(f"Invalid hook type: {hook_type}")
+
+    async def _execute_hook(
+        self, 
+        hook_type: str, 
+        hook_func: Callable,
+        *args: Any, 
+        **kwargs: Any
+    ) -> Any:
+        if asyncio.iscoroutinefunction(hook_func):
+            return await hook_func(*args, **kwargs)
+        return hook_func(*args, **kwargs)
+
+    async def start(self) -> None:
+        if not self._session:
+            connector = aiohttp.TCPConnector(
+                limit=self.max_connections,
+                ttl_dns_cache=self.dns_cache_ttl,
+                use_dns_cache=True,
+                force_close=False
+            )
+            self._session = aiohttp.ClientSession(
+                headers=dict(self._BASE_HEADERS),
+                connector=connector,
+                timeout=ClientTimeout(total=self.DEFAULT_TIMEOUT)
+            )
+
+    async def close(self) -> None:
+        if self._session and not self._session.closed:
+            try:
+                await asyncio.wait_for(self._session.close(), timeout=5.0)
+            except asyncio.TimeoutError:
+                if self.logger:
+                    self.logger.warning(
+                        message="Session cleanup timed out",
+                        tag="CLEANUP"
+                    )
+            finally:
+                self._session = None
+
+    async def _stream_file(self, path: str) -> AsyncGenerator[memoryview, None]:
+        async with aiofiles.open(path, mode='rb') as f:
+            while chunk := await f.read(self.chunk_size):
+                yield memoryview(chunk)
+
+    async def _handle_file(self, path: str) -> AsyncCrawlResponse:
+        if not os.path.exists(path):
+            raise FileNotFoundError(f"Local file not found: {path}")
+            
+        chunks = []
+        async for chunk in self._stream_file(path):
+            chunks.append(chunk.tobytes().decode('utf-8', errors='replace'))
+            
+        return AsyncCrawlResponse(
+            html=''.join(chunks),
+            response_headers={},
+            status_code=200
+        )
+
+    async def _handle_raw(self, content: str) -> AsyncCrawlResponse:
+        return AsyncCrawlResponse(
+            html=content,
+            response_headers={},
+            status_code=200
+        )
+
+
+    async def _handle_http(
+        self, 
+        url: str, 
+        config: CrawlerRunConfig
+    ) -> AsyncCrawlResponse:
+        async with self._session_context() as session:
+            timeout = ClientTimeout(
+                total=config.page_timeout or self.DEFAULT_TIMEOUT,
+                connect=10,
+                sock_read=30
+            )
+            
+            headers = dict(self._BASE_HEADERS)
+            if self.browser_config.headers:
+                headers.update(self.browser_config.headers)
+
+            request_kwargs = {
+                'timeout': timeout,
+                'allow_redirects': self.browser_config.follow_redirects,
+                'ssl': self.browser_config.verify_ssl,
+                'headers': headers
+            }
+
+            if self.browser_config.method == "POST":
+                if self.browser_config.data:
+                    request_kwargs['data'] = self.browser_config.data
+                if self.browser_config.json:
+                    request_kwargs['json'] = self.browser_config.json
+
+            await self.hooks['before_request'](url, request_kwargs)
+
+            try:
+                async with session.request(self.browser_config.method, url, **request_kwargs) as response:
+                    content = memoryview(await response.read())
+                    
+                    if not (200 <= response.status < 300):
+                        raise HTTPStatusError(
+                            response.status,
+                            f"Unexpected status code for {url}"
+                        )
+                    
+                    encoding = response.charset
+                    if not encoding:
+                        encoding = chardet.detect(content.tobytes())['encoding'] or 'utf-8'                    
+                    
+                    result = AsyncCrawlResponse(
+                        html=content.tobytes().decode(encoding, errors='replace'),
+                        response_headers=dict(response.headers),
+                        status_code=response.status,
+                        redirected_url=str(response.url)
+                    )
+                    
+                    await self.hooks['after_request'](result)
+                    return result
+
+            except aiohttp.ServerTimeoutError as e:
+                await self.hooks['on_error'](e)
+                raise ConnectionTimeoutError(f"Request timed out: {str(e)}")
+                
+            except aiohttp.ClientConnectorError as e:
+                await self.hooks['on_error'](e)
+                raise ConnectionError(f"Connection failed: {str(e)}")
+                
+            except aiohttp.ClientError as e:
+                await self.hooks['on_error'](e)
+                raise HTTPCrawlerError(f"HTTP client error: {str(e)}")
+            
+            except asyncio.exceptions.TimeoutError as e:
+                await self.hooks['on_error'](e)
+                raise ConnectionTimeoutError(f"Request timed out: {str(e)}")
+            
+            except Exception as e:
+                await self.hooks['on_error'](e)
+                raise HTTPCrawlerError(f"HTTP request failed: {str(e)}")
+
+    async def crawl(
+        self, 
+        url: str, 
+        config: Optional[CrawlerRunConfig] = None, 
+        **kwargs
+    ) -> AsyncCrawlResponse:
+        config = config or CrawlerRunConfig.from_kwargs(kwargs)
+        
+        parsed = urlparse(url)
+        scheme = parsed.scheme.rstrip('/')
+        
+        if scheme not in self.VALID_SCHEMES:
+            raise ValueError(f"Unsupported URL scheme: {scheme}")
+            
+        try:
+            if scheme == 'file':
+                return await self._handle_file(parsed.path)
+            elif scheme == 'raw':
+                return await self._handle_raw(parsed.path)
+            else:  # http or https
+                return await self._handle_http(url, config)
+                
+        except Exception as e:
+            if self.logger:
+                self.logger.error(
+                    message="Crawl failed: {error}",
+                    tag="CRAWL",
+                    params={"error": str(e), "url": url}
+                )
+            raise
--- a/crawl4ai/async_dispatcher.py
+++ b/crawl4ai/async_dispatcher.py
@@ -1,4 +1,4 @@
-from typing import Dict, Optional, List, Tuple, Union
+from typing import Dict, Optional, List, Tuple
 from .async_configs import CrawlerRunConfig
 from .models import (
    CrawlResult,
@@ -22,8 +22,6 @@ from urllib.parse import urlparse
 import random
 from abc import ABC, abstractmethod

-from .memory_utils import get_true_memory_usage_percent
-

 class RateLimiter:
    def __init__(
@@ -98,37 +96,11 @@ class BaseDispatcher(ABC):
        self.rate_limiter = rate_limiter
        self.monitor = monitor

-    def select_config(self, url: str, configs: Union[CrawlerRunConfig, List[CrawlerRunConfig]]) -> Optional[CrawlerRunConfig]:
-        """Select the appropriate config for a given URL.
-        
-        Args:
-            url: The URL to match against
-            configs: Single config or list of configs to choose from
-            
-        Returns:
-            The matching config, or None if no match found
-        """
-        # Single config - return as is
-        if isinstance(configs, CrawlerRunConfig):
-            return configs
-        
-        # Empty list - return None
-        if not configs:
-            return None
-        
-        # Find first matching config
-        for config in configs:
-            if config.is_match(url):
-                return config
-        
-        # No match found - return None to indicate URL should be skipped
-        return None
-
    @abstractmethod
    async def crawl_url(
        self,
        url: str,
-        config: Union[CrawlerRunConfig, List[CrawlerRunConfig]],
+        config: CrawlerRunConfig,
        task_id: str,
        monitor: Optional[CrawlerMonitor] = None,
    ) -> CrawlerTaskResult:
@@ -139,7 +111,7 @@ class BaseDispatcher(ABC):
        self,
        urls: List[str],
        crawler: AsyncWebCrawler,  # noqa: F821
-        config: Union[CrawlerRunConfig, List[CrawlerRunConfig]],
+        config: CrawlerRunConfig,
        monitor: Optional[CrawlerMonitor] = None,
    ) -> List[CrawlerTaskResult]:
        pass
@@ -175,7 +147,7 @@ class MemoryAdaptiveDispatcher(BaseDispatcher):
    async def _memory_monitor_task(self):
        """Background task to continuously monitor memory usage and update state"""
        while True:
-            self.current_memory_percent = get_true_memory_usage_percent()
+            self.current_memory_percent = psutil.virtual_memory().percent

            # Enter memory pressure mode if we cross the threshold
            if self.current_memory_percent >= self.memory_threshold_percent:
@@ -228,7 +200,7 @@ class MemoryAdaptiveDispatcher(BaseDispatcher):
    async def crawl_url(
        self,
        url: str,
-        config: Union[CrawlerRunConfig, List[CrawlerRunConfig]],
+        config: CrawlerRunConfig,
        task_id: str,
        retry_count: int = 0,
    ) -> CrawlerTaskResult:
@@ -236,37 +208,6 @@ class MemoryAdaptiveDispatcher(BaseDispatcher):
        error_message = ""
        memory_usage = peak_memory = 0.0
        
-        # Select appropriate config for this URL
-        selected_config = self.select_config(url, config)
-        
-        # If no config matches, return failed result
-        if selected_config is None:
-            error_message = f"No matching configuration found for URL: {url}"
-            if self.monitor:
-                self.monitor.update_task(
-                    task_id, 
-                    status=CrawlStatus.FAILED,
-                    error_message=error_message
-                )
-            
-            return CrawlerTaskResult(
-                task_id=task_id,
-                url=url,
-                result=CrawlResult(
-                    url=url, 
-                    html="", 
-                    metadata={"status": "no_config_match"}, 
-                    success=False, 
-                    error_message=error_message
-                ),
-                memory_usage=0,
-                peak_memory=0,
-                start_time=start_time,
-                end_time=time.time(),
-                error_message=error_message,
-                retry_count=retry_count
-            )
-        
        # Get starting memory for accurate measurement
        process = psutil.Process()
        start_memory = process.memory_info().rss / (1024 * 1024)
@@ -316,8 +257,8 @@ class MemoryAdaptiveDispatcher(BaseDispatcher):
                    retry_count=retry_count + 1
                )
            
-            # Execute the crawl with selected config
-            result = await self.crawler.arun(url, config=selected_config, session_id=task_id)
+            # Execute the crawl
+            result = await self.crawler.arun(url, config=config, session_id=task_id)
            
            # Measure memory usage
            end_memory = process.memory_info().rss / (1024 * 1024)
@@ -375,7 +316,7 @@ class MemoryAdaptiveDispatcher(BaseDispatcher):
        self,
        urls: List[str],
        crawler: AsyncWebCrawler,
-        config: Union[CrawlerRunConfig, List[CrawlerRunConfig]],
+        config: CrawlerRunConfig,
    ) -> List[CrawlerTaskResult]:
        self.crawler = crawler
        
@@ -529,7 +470,7 @@ class MemoryAdaptiveDispatcher(BaseDispatcher):
        self,
        urls: List[str],
        crawler: AsyncWebCrawler,
-        config: Union[CrawlerRunConfig, List[CrawlerRunConfig]],
+        config: CrawlerRunConfig,
    ) -> AsyncGenerator[CrawlerTaskResult, None]:
        self.crawler = crawler
        
@@ -631,7 +572,7 @@ class SemaphoreDispatcher(BaseDispatcher):
    async def crawl_url(
        self,
        url: str,
-        config: Union[CrawlerRunConfig, List[CrawlerRunConfig]],
+        config: CrawlerRunConfig,
        task_id: str,
        semaphore: asyncio.Semaphore = None,
    ) -> CrawlerTaskResult:
@@ -639,36 +580,6 @@ class SemaphoreDispatcher(BaseDispatcher):
        error_message = ""
        memory_usage = peak_memory = 0.0

-        # Select appropriate config for this URL
-        selected_config = self.select_config(url, config)
-        
-        # If no config matches, return failed result
-        if selected_config is None:
-            error_message = f"No matching configuration found for URL: {url}"
-            if self.monitor:
-                self.monitor.update_task(
-                    task_id, 
-                    status=CrawlStatus.FAILED,
-                    error_message=error_message
-                )
-            
-            return CrawlerTaskResult(
-                task_id=task_id,
-                url=url,
-                result=CrawlResult(
-                    url=url, 
-                    html="", 
-                    metadata={"status": "no_config_match"}, 
-                    success=False, 
-                    error_message=error_message
-                ),
-                memory_usage=0,
-                peak_memory=0,
-                start_time=start_time,
-                end_time=time.time(),
-                error_message=error_message
-            )
-
        try:
            if self.monitor:
                self.monitor.update_task(
@@ -681,7 +592,7 @@ class SemaphoreDispatcher(BaseDispatcher):
            async with semaphore:
                process = psutil.Process()
                start_memory = process.memory_info().rss / (1024 * 1024)
-                result = await self.crawler.arun(url, config=selected_config, session_id=task_id)
+                result = await self.crawler.arun(url, config=config, session_id=task_id)
                end_memory = process.memory_info().rss / (1024 * 1024)

                memory_usage = peak_memory = end_memory - start_memory
@@ -743,7 +654,7 @@ class SemaphoreDispatcher(BaseDispatcher):
        self,
        crawler: AsyncWebCrawler,  # noqa: F821
        urls: List[str],
-        config: Union[CrawlerRunConfig, List[CrawlerRunConfig]],
+        config: CrawlerRunConfig,
    ) -> List[CrawlerTaskResult]:
        self.crawler = crawler
        if self.monitor:
--- a/crawl4ai/async_url_seeder.py
+++ b/crawl4ai/async_url_seeder.py
@@ -829,7 +829,7 @@ class AsyncUrlSeeder:

    async def _iter_sitemap(self, url: str):
        try:
-            r = await self.client.get(url, timeout=15, follow_redirects=True)
+            r = await self.client.get(url, timeout=15)
            r.raise_for_status()
        except httpx.HTTPStatusError as e:
            self._log("warning", "Failed to fetch sitemap {url}: HTTP {status_code}",
--- a/crawl4ai/async_webcrawler.py
+++ b/crawl4ai/async_webcrawler.py
@@ -47,6 +47,7 @@ from .utils import (
    get_error_context,
    RobotsParser,
    preprocess_html_for_schema,
+    should_crawl_based_on_head,
 )


@@ -268,31 +269,56 @@ class AsyncWebCrawler:
                    cached_result = await async_db_manager.aget_cached_url(url)

                if cached_result:
-                    html = sanitize_input_encode(cached_result.html)
-                    extracted_content = sanitize_input_encode(
-                        cached_result.extracted_content or ""
-                    )
-                    extracted_content = (
-                        None
-                        if not extracted_content or extracted_content == "[]"
-                        else extracted_content
-                    )
-                    # If screenshot is requested but its not in cache, then set cache_result to None
-                    screenshot_data = cached_result.screenshot
-                    pdf_data = cached_result.pdf
-                    # if config.screenshot and not screenshot or config.pdf and not pdf:
-                    if config.screenshot and not screenshot_data:
-                        cached_result = None
+                    # Check if SMART mode requires validation
+                    if cache_context.cache_mode == CacheMode.SMART:
+                        # Perform HEAD check to see if content has changed
+                        user_agent = self.crawler_strategy.user_agent if hasattr(self.crawler_strategy, 'user_agent') else "Mozilla/5.0"
+                        should_crawl, reason = await should_crawl_based_on_head(
+                            url=url,
+                            cached_headers=cached_result.response_headers or {},
+                            user_agent=user_agent,
+                            timeout=5
+                        )
+                        
+                        if should_crawl:
+                            self.logger.info(
+                                f"SMART cache: {reason} - Re-crawling {url}",
+                                tag="SMART"
+                            )
+                            cached_result = None  # Force re-crawl
+                        else:
+                            self.logger.info(
+                                f"SMART cache: {reason} - Using cache for {url}",
+                                tag="SMART"
+                            )
+                    
+                    # Process cached result if still valid
+                    if cached_result:
+                        html = sanitize_input_encode(cached_result.html)
+                        extracted_content = sanitize_input_encode(
+                            cached_result.extracted_content or ""
+                        )
+                        extracted_content = (
+                            None
+                            if not extracted_content or extracted_content == "[]"
+                            else extracted_content
+                        )
+                        # If screenshot is requested but its not in cache, then set cache_result to None
+                        screenshot_data = cached_result.screenshot
+                        pdf_data = cached_result.pdf
+                        # if config.screenshot and not screenshot or config.pdf and not pdf:
+                        if config.screenshot and not screenshot_data:
+                            cached_result = None

-                    if config.pdf and not pdf_data:
-                        cached_result = None
+                        if config.pdf and not pdf_data:
+                            cached_result = None

-                    self.logger.url_status(
-                        url=cache_context.display_url,
-                        success=bool(html),
-                        timing=time.perf_counter() - start_time,
-                        tag="FETCH",
-                    )
+                        self.logger.url_status(
+                            url=cache_context.display_url,
+                            success=bool(html),
+                            timing=time.perf_counter() - start_time,
+                            tag="FETCH",
+                        )

                # Update proxy configuration from rotation strategy if available
                if config and config.proxy_rotation_strategy:
@@ -502,12 +528,9 @@ class AsyncWebCrawler:
            metadata = result.get("metadata", {})
        else:
            cleaned_html = sanitize_input_encode(result.cleaned_html)
-            # media = result.media.model_dump()
-            # tables = media.pop("tables", [])
-            # links = result.links.model_dump()
-            media = result.media.model_dump() if hasattr(result.media, 'model_dump') else result.media
-            tables = media.pop("tables", []) if isinstance(media, dict) else []
-            links = result.links.model_dump() if hasattr(result.links, 'model_dump') else result.links
+            media = result.media.model_dump()
+            tables = media.pop("tables", [])
+            links = result.links.model_dump()
            metadata = result.metadata

        fit_html = preprocess_html_for_schema(html_content=html, text_threshold= 500, max_size= 300_000)
@@ -653,7 +676,7 @@ class AsyncWebCrawler:
    async def arun_many(
        self,
        urls: List[str],
-        config: Optional[Union[CrawlerRunConfig, List[CrawlerRunConfig]]] = None,
+        config: Optional[CrawlerRunConfig] = None,
        dispatcher: Optional[BaseDispatcher] = None,
        # Legacy parameters maintained for backwards compatibility
        # word_count_threshold=MIN_WORD_THRESHOLD,
@@ -674,9 +697,7 @@ class AsyncWebCrawler:

        Args:
        urls: List of URLs to crawl
-        config: Configuration object(s) controlling crawl behavior. Can be:
-            - Single CrawlerRunConfig: Used for all URLs
-            - List[CrawlerRunConfig]: Configs with url_matcher for URL-specific settings
+        config: Configuration object controlling crawl behavior for all URLs
        dispatcher: The dispatcher strategy instance to use. Defaults to MemoryAdaptiveDispatcher
        [other parameters maintained for backwards compatibility]

@@ -741,11 +762,7 @@ class AsyncWebCrawler:
                or task_result.result
            )

-        # Handle stream setting - use first config's stream setting if config is a list
-        if isinstance(config, list):
-            stream = config[0].stream if config else False
-        else:
-            stream = config.stream
+        stream = config.stream

        if stream:

--- a/crawl4ai/browser_adapter.py
+++ b/crawl4ai/browser_adapter.py
@@ -1,293 +0,0 @@
-# browser_adapter.py
-"""
-Browser adapter for Crawl4AI to support both Playwright and undetected browsers
-with minimal changes to existing codebase.
-"""
-
-from abc import ABC, abstractmethod
-from typing import List, Dict, Any, Optional, Callable
-import time
-import json
-
-# Import both, but use conditionally
-try:
-    from playwright.async_api import Page
-except ImportError:
-    Page = Any
-
-try:
-    from patchright.async_api import Page as UndetectedPage
-except ImportError:
-    UndetectedPage = Any
-
-
-class BrowserAdapter(ABC):
-    """Abstract adapter for browser-specific operations"""
-    
-    @abstractmethod
-    async def evaluate(self, page: Page, expression: str, arg: Any = None) -> Any:
-        """Execute JavaScript in the page"""
-        pass
-    
-    @abstractmethod
-    async def setup_console_capture(self, page: Page, captured_console: List[Dict]) -> Optional[Callable]:
-        """Setup console message capturing, returns handler function if needed"""
-        pass
-    
-    @abstractmethod
-    async def setup_error_capture(self, page: Page, captured_console: List[Dict]) -> Optional[Callable]:
-        """Setup error capturing, returns handler function if needed"""
-        pass
-    
-    @abstractmethod
-    async def retrieve_console_messages(self, page: Page) -> List[Dict]:
-        """Retrieve captured console messages (for undetected browsers)"""
-        pass
-    
-    @abstractmethod
-    async def cleanup_console_capture(self, page: Page, handle_console: Optional[Callable], handle_error: Optional[Callable]):
-        """Clean up console event listeners"""
-        pass
-    
-    @abstractmethod
-    def get_imports(self) -> tuple:
-        """Get the appropriate imports for this adapter"""
-        pass
-
-
-class PlaywrightAdapter(BrowserAdapter):
-    """Adapter for standard Playwright"""
-    
-    async def evaluate(self, page: Page, expression: str, arg: Any = None) -> Any:
-        """Standard Playwright evaluate"""
-        if arg is not None:
-            return await page.evaluate(expression, arg)
-        return await page.evaluate(expression)
-    
-    async def setup_console_capture(self, page: Page, captured_console: List[Dict]) -> Optional[Callable]:
-        """Setup console capture using Playwright's event system"""
-        def handle_console_capture(msg):
-            try:
-                message_type = "unknown"
-                try:
-                    message_type = msg.type
-                except:
-                    pass
-                    
-                message_text = "unknown"
-                try:
-                    message_text = msg.text
-                except:
-                    pass
-                    
-                entry = {
-                    "type": message_type,
-                    "text": message_text,
-                    "timestamp": time.time()
-                }
-                
-                captured_console.append(entry)
-                
-            except Exception as e:
-                captured_console.append({
-                    "type": "console_capture_error", 
-                    "error": str(e), 
-                    "timestamp": time.time()
-                })
-        
-        page.on("console", handle_console_capture)
-        return handle_console_capture
-    
-    async def setup_error_capture(self, page: Page, captured_console: List[Dict]) -> Optional[Callable]:
-        """Setup error capture using Playwright's event system"""
-        def handle_pageerror_capture(err):
-            try:
-                error_message = "Unknown error"
-                try:
-                    error_message = err.message
-                except:
-                    pass
-                    
-                error_stack = ""
-                try:
-                    error_stack = err.stack
-                except:
-                    pass
-                    
-                captured_console.append({
-                    "type": "error",
-                    "text": error_message,
-                    "stack": error_stack,
-                    "timestamp": time.time()
-                })
-            except Exception as e:
-                captured_console.append({
-                    "type": "pageerror_capture_error", 
-                    "error": str(e), 
-                    "timestamp": time.time()
-                })
-        
-        page.on("pageerror", handle_pageerror_capture)
-        return handle_pageerror_capture
-    
-    async def retrieve_console_messages(self, page: Page) -> List[Dict]:
-        """Not needed for Playwright - messages are captured via events"""
-        return []
-    
-    async def cleanup_console_capture(self, page: Page, handle_console: Optional[Callable], handle_error: Optional[Callable]):
-        """Remove event listeners"""
-        if handle_console:
-            page.remove_listener("console", handle_console)
-        if handle_error:
-            page.remove_listener("pageerror", handle_error)
-    
-    def get_imports(self) -> tuple:
-        """Return Playwright imports"""
-        from playwright.async_api import Page, Error
-        from playwright.async_api import TimeoutError as PlaywrightTimeoutError
-        return Page, Error, PlaywrightTimeoutError
-
-
-class UndetectedAdapter(BrowserAdapter):
-    """Adapter for undetected browser automation with stealth features"""
-    
-    def __init__(self):
-        self._console_script_injected = {}
-    
-    async def evaluate(self, page: UndetectedPage, expression: str, arg: Any = None) -> Any:
-        """Undetected browser evaluate with isolated context"""
-        # For most evaluations, use isolated context for stealth
-        # Only use non-isolated when we need to access our injected console capture
-        isolated = not (
-            "__console" in expression or 
-            "__captured" in expression or
-            "__error" in expression or
-            "window.__" in expression
-        )
-        
-        if arg is not None:
-            return await page.evaluate(expression, arg, isolated_context=isolated)
-        return await page.evaluate(expression, isolated_context=isolated)
-    
-    async def setup_console_capture(self, page: UndetectedPage, captured_console: List[Dict]) -> Optional[Callable]:
-        """Setup console capture using JavaScript injection for undetected browsers"""
-        if not self._console_script_injected.get(page, False):
-            await page.add_init_script("""
-                // Initialize console capture
-                window.__capturedConsole = [];
-                window.__capturedErrors = [];
-                
-                // Store original console methods
-                const originalConsole = {};
-                ['log', 'info', 'warn', 'error', 'debug'].forEach(method => {
-                    originalConsole[method] = console[method];
-                    console[method] = function(...args) {
-                        try {
-                            window.__capturedConsole.push({
-                                type: method,
-                                text: args.map(arg => {
-                                    try {
-                                        if (typeof arg === 'object') {
-                                            return JSON.stringify(arg);
-                                        }
-                                        return String(arg);
-                                    } catch (e) {
-                                        return '[Object]';
-                                    }
-                                }).join(' '),
-                                timestamp: Date.now()
-                            });
-                        } catch (e) {
-                            // Fail silently to avoid detection
-                        }
-                        
-                        // Call original method
-                        originalConsole[method].apply(console, args);
-                    };
-                });
-            """)
-            self._console_script_injected[page] = True
-        
-        return None  # No handler function needed for undetected browser
-    
-    async def setup_error_capture(self, page: UndetectedPage, captured_console: List[Dict]) -> Optional[Callable]:
-        """Setup error capture using JavaScript injection for undetected browsers"""
-        if not self._console_script_injected.get(page, False):
-            await page.add_init_script("""
-                // Capture errors
-                window.addEventListener('error', (event) => {
-                    try {
-                        window.__capturedErrors.push({
-                            type: 'error',
-                            text: event.message,
-                            stack: event.error ? event.error.stack : '',
-                            filename: event.filename,
-                            lineno: event.lineno,
-                            colno: event.colno,
-                            timestamp: Date.now()
-                        });
-                    } catch (e) {
-                        // Fail silently
-                    }
-                });
-                
-                // Capture unhandled promise rejections
-                window.addEventListener('unhandledrejection', (event) => {
-                    try {
-                        window.__capturedErrors.push({
-                            type: 'unhandledrejection',
-                            text: event.reason ? String(event.reason) : 'Unhandled Promise Rejection',
-                            stack: event.reason && event.reason.stack ? event.reason.stack : '',
-                            timestamp: Date.now()
-                        });
-                    } catch (e) {
-                        // Fail silently
-                    }
-                });
-            """)
-            self._console_script_injected[page] = True
-        
-        return None  # No handler function needed for undetected browser
-    
-    async def retrieve_console_messages(self, page: UndetectedPage) -> List[Dict]:
-        """Retrieve captured console messages and errors from the page"""
-        messages = []
-        
-        try:
-            # Get console messages
-            console_messages = await page.evaluate(
-                "() => { const msgs = window.__capturedConsole || []; window.__capturedConsole = []; return msgs; }",
-                isolated_context=False
-            )
-            messages.extend(console_messages)
-            
-            # Get errors
-            errors = await page.evaluate(
-                "() => { const errs = window.__capturedErrors || []; window.__capturedErrors = []; return errs; }",
-                isolated_context=False
-            )
-            messages.extend(errors)
-            
-            # Convert timestamps from JS to Python format
-            for msg in messages:
-                if 'timestamp' in msg and isinstance(msg['timestamp'], (int, float)):
-                    msg['timestamp'] = msg['timestamp'] / 1000.0  # Convert from ms to seconds
-                    
-        except Exception:
-            # If retrieval fails, return empty list
-            pass
-        
-        return messages
-    
-    async def cleanup_console_capture(self, page: UndetectedPage, handle_console: Optional[Callable], handle_error: Optional[Callable]):
-        """Clean up for undetected browser - retrieve final messages"""
-        # For undetected browser, we don't have event listeners to remove
-        # but we should retrieve any final messages
-        final_messages = await self.retrieve_console_messages(page)
-        return final_messages
-    
-    def get_imports(self) -> tuple:
-        """Return undetected browser imports"""
-        from patchright.async_api import Page, Error
-        from patchright.async_api import TimeoutError as PlaywrightTimeoutError
-        return Page, Error, PlaywrightTimeoutError
--- a/crawl4ai/browser_manager.py
+++ b/crawl4ai/browser_manager.py
@@ -573,26 +573,21 @@ class BrowserManager:
    _playwright_instance = None
    
    @classmethod
-    async def get_playwright(cls, use_undetected: bool = False):
-        if use_undetected:
-            from patchright.async_api import async_playwright
-        else:
-            from playwright.async_api import async_playwright
+    async def get_playwright(cls):
+        from playwright.async_api import async_playwright
        cls._playwright_instance = await async_playwright().start()
        return cls._playwright_instance    

-    def __init__(self, browser_config: BrowserConfig, logger=None, use_undetected: bool = False):
+    def __init__(self, browser_config: BrowserConfig, logger=None):
        """
        Initialize the BrowserManager with a browser configuration.

        Args:
            browser_config (BrowserConfig): Configuration object containing all browser settings
            logger: Logger instance for recording events and errors
-            use_undetected (bool): Whether to use undetected browser (Patchright)
        """
        self.config: BrowserConfig = browser_config
        self.logger = logger
-        self.use_undetected = use_undetected

        # Browser state
        self.browser = None
@@ -606,11 +601,7 @@ class BrowserManager:

        # Keep track of contexts by a "config signature," so each unique config reuses a single context
        self.contexts_by_config = {}
-        self._contexts_lock = asyncio.Lock()
-        
-        # Stealth-related attributes
-        self._stealth_instance = None
-        self._stealth_cm = None 
+        self._contexts_lock = asyncio.Lock() 

        # Initialize ManagedBrowser if needed
        if self.config.use_managed_browser:
@@ -639,21 +630,9 @@ class BrowserManager:
        if self.playwright is not None:
            await self.close()
            
-        if self.use_undetected:
-            from patchright.async_api import async_playwright
-        else:
-            from playwright.async_api import async_playwright
+        from playwright.async_api import async_playwright

-        # Initialize playwright with or without stealth
-        if self.config.enable_stealth and not self.use_undetected:
-            # Import stealth only when needed
-            from playwright_stealth import Stealth
-            # Use the recommended stealth wrapper approach
-            self._stealth_instance = Stealth()
-            self._stealth_cm = self._stealth_instance.use_async(async_playwright())
-            self.playwright = await self._stealth_cm.__aenter__()
-        else:
-            self.playwright = await async_playwright().start()
+        self.playwright = await async_playwright().start()

        if self.config.cdp_url or self.config.use_managed_browser:
            self.config.use_managed_browser = True
@@ -1115,19 +1094,5 @@ class BrowserManager:
            self.managed_browser = None

        if self.playwright:
-            # Handle stealth context manager cleanup if it exists
-            if hasattr(self, '_stealth_cm') and self._stealth_cm is not None:
-                try:
-                    await self._stealth_cm.__aexit__(None, None, None)
-                except Exception as e:
-                    if self.logger:
-                        self.logger.error(
-                            message="Error closing stealth context: {error}",
-                            tag="ERROR", 
-                            params={"error": str(e)}
-                        )
-                self._stealth_cm = None
-                self._stealth_instance = None
-            else:
-                await self.playwright.stop()
+            await self.playwright.stop()
            self.playwright = None
--- a/crawl4ai/cache_context.py
+++ b/crawl4ai/cache_context.py
@@ -11,6 +11,7 @@ class CacheMode(Enum):
    - READ_ONLY: Only read from cache, don't write
    - WRITE_ONLY: Only write to cache, don't read
    - BYPASS: Bypass cache for this operation
+    - SMART: Validate cache with HEAD request before using
    """

    ENABLED = "enabled"
@@ -18,6 +19,7 @@ class CacheMode(Enum):
    READ_ONLY = "read_only"
    WRITE_ONLY = "write_only"
    BYPASS = "bypass"
+    SMART = "smart"


 class CacheContext:
@@ -62,14 +64,14 @@ class CacheContext:

        How it works:
        1. If always_bypass is True or is_cacheable is False, return False.
-        2. If cache_mode is ENABLED or READ_ONLY, return True.
+        2. If cache_mode is ENABLED, READ_ONLY, or SMART, return True.

        Returns:
            bool: True if cache should be read, False otherwise.
        """
        if self.always_bypass or not self.is_cacheable:
            return False
-        return self.cache_mode in [CacheMode.ENABLED, CacheMode.READ_ONLY]
+        return self.cache_mode in [CacheMode.ENABLED, CacheMode.READ_ONLY, CacheMode.SMART]

    def should_write(self) -> bool:
        """
@@ -77,14 +79,14 @@ class CacheContext:

        How it works:
        1. If always_bypass is True or is_cacheable is False, return False.
-        2. If cache_mode is ENABLED or WRITE_ONLY, return True.
+        2. If cache_mode is ENABLED, WRITE_ONLY, or SMART, return True.

        Returns:
            bool: True if cache should be written, False otherwise.
        """
        if self.always_bypass or not self.is_cacheable:
            return False
-        return self.cache_mode in [CacheMode.ENABLED, CacheMode.WRITE_ONLY]
+        return self.cache_mode in [CacheMode.ENABLED, CacheMode.WRITE_ONLY, CacheMode.SMART]

    @property
    def display_url(self) -> str:
--- a/crawl4ai/cli.py
+++ b/crawl4ai/cli.py
@@ -27,10 +27,7 @@ from crawl4ai import (
    PruningContentFilter,
    BrowserProfiler,
    DefaultMarkdownGenerator,
-    LLMConfig,
-    BFSDeepCrawlStrategy,
-    DFSDeepCrawlStrategy,
-    BestFirstCrawlingStrategy,
+    LLMConfig
 )
 from crawl4ai.config import USER_SETTINGS
 from litellm import completion
@@ -1017,11 +1014,9 @@ def cdp_cmd(user_data_dir: Optional[str], port: int, browser_type: str, headless
@click.option("--question", "-q", help="Ask a question about the crawled content")
@click.option("--verbose", "-v", is_flag=True)
@click.option("--profile", "-p", help="Use a specific browser profile (by name)")
-@click.option("--deep-crawl", type=click.Choice(["bfs", "dfs", "best-first"]), help="Enable deep crawling with specified strategy (bfs, dfs, or best-first)")
-@click.option("--max-pages", type=int, default=10, help="Maximum number of pages to crawl in deep crawl mode")
 def crawl_cmd(url: str, browser_config: str, crawler_config: str, filter_config: str, 
           extraction_config: str, json_extract: str, schema: str, browser: Dict, crawler: Dict,
-           output: str, output_file: str, bypass_cache: bool, question: str, verbose: bool, profile: str, deep_crawl: str, max_pages: int):
+           output: str, output_file: str, bypass_cache: bool, question: str, verbose: bool, profile: str):
    """Crawl a website and extract content
    
    Simple Usage:
@@ -1161,27 +1156,6 @@ Always return valid, properly formatted JSON."""

        crawler_cfg.scraping_strategy = LXMLWebScrapingStrategy()    

-        # Handle deep crawling configuration
-        if deep_crawl:
-            if deep_crawl == "bfs":
-                crawler_cfg.deep_crawl_strategy = BFSDeepCrawlStrategy(
-                    max_depth=3,
-                    max_pages=max_pages
-                )
-            elif deep_crawl == "dfs":
-                crawler_cfg.deep_crawl_strategy = DFSDeepCrawlStrategy(
-                    max_depth=3,
-                    max_pages=max_pages
-                )
-            elif deep_crawl == "best-first":
-                crawler_cfg.deep_crawl_strategy = BestFirstCrawlingStrategy(
-                    max_depth=3,
-                    max_pages=max_pages
-                )
-            
-            if verbose:
-                console.print(f"[green]Deep crawling enabled:[/green] {deep_crawl} strategy, max {max_pages} pages")
-
        config = get_global_config()
        
        browser_cfg.verbose = config.get("VERBOSE", False)
@@ -1196,60 +1170,39 @@ Always return valid, properly formatted JSON."""
            verbose
        )

-        # Handle deep crawl results (list) vs single result
-        if isinstance(result, list):
-            if len(result) == 0:
-                click.echo("No results found during deep crawling")
-                return
-            # Use the first result for question answering and output
-            main_result = result[0]
-            all_results = result
-        else:
-            # Single result from regular crawling
-            main_result = result
-            all_results = [result]
-
        # Handle question
        if question:
            provider, token = setup_llm_config()
-            markdown = main_result.markdown.raw_markdown
+            markdown = result.markdown.raw_markdown
            anyio.run(stream_llm_response, url, markdown, question, provider, token)
            return
        
        # Handle output
        if not output_file:
            if output == "all":
-                if isinstance(result, list):
-                    output_data = [r.model_dump() for r in all_results]
-                    click.echo(json.dumps(output_data, indent=2))
-                else:
-                    click.echo(json.dumps(main_result.model_dump(), indent=2))
+                click.echo(json.dumps(result.model_dump(), indent=2))
            elif output == "json":
-                print(main_result.extracted_content)
-                extracted_items = json.loads(main_result.extracted_content)
+                print(result.extracted_content)
+                extracted_items = json.loads(result.extracted_content)
                click.echo(json.dumps(extracted_items, indent=2))
                
            elif output in ["markdown", "md"]:
-                click.echo(main_result.markdown.raw_markdown)
+                click.echo(result.markdown.raw_markdown)
            elif output in ["markdown-fit", "md-fit"]:
-                click.echo(main_result.markdown.fit_markdown)
+                click.echo(result.markdown.fit_markdown)
        else:
            if output == "all":
                with open(output_file, "w") as f:
-                    if isinstance(result, list):
-                        output_data = [r.model_dump() for r in all_results]
-                        f.write(json.dumps(output_data, indent=2))
-                    else:
-                        f.write(json.dumps(main_result.model_dump(), indent=2))
+                    f.write(json.dumps(result.model_dump(), indent=2))
            elif output == "json":
                with open(output_file, "w") as f:
-                    f.write(main_result.extracted_content)
+                    f.write(result.extracted_content)
            elif output in ["markdown", "md"]:
                with open(output_file, "w") as f:
-                    f.write(main_result.markdown.raw_markdown)
+                    f.write(result.markdown.raw_markdown)
            elif output in ["markdown-fit", "md-fit"]:
                with open(output_file, "w") as f:
-                    f.write(main_result.markdown.fit_markdown)
+                    f.write(result.markdown.fit_markdown)
            
    except Exception as e:
        raise click.ClickException(str(e))
@@ -1401,11 +1354,9 @@ def profiles_cmd():
@click.option("--question", "-q", help="Ask a question about the crawled content")
@click.option("--verbose", "-v", is_flag=True)
@click.option("--profile", "-p", help="Use a specific browser profile (by name)")
-@click.option("--deep-crawl", type=click.Choice(["bfs", "dfs", "best-first"]), help="Enable deep crawling with specified strategy")
-@click.option("--max-pages", type=int, default=10, help="Maximum number of pages to crawl in deep crawl mode")
 def default(url: str, example: bool, browser_config: str, crawler_config: str, filter_config: str, 
        extraction_config: str, json_extract: str, schema: str, browser: Dict, crawler: Dict,
-        output: str, bypass_cache: bool, question: str, verbose: bool, profile: str, deep_crawl: str, max_pages: int):
+        output: str, bypass_cache: bool, question: str, verbose: bool, profile: str):
    """Crawl4AI CLI - Web content extraction tool

    Simple Usage:
@@ -1455,9 +1406,7 @@ def default(url: str, example: bool, browser_config: str, crawler_config: str, f
        bypass_cache=bypass_cache,
        question=question,
        verbose=verbose,
-        profile=profile,
-        deep_crawl=deep_crawl,
-        max_pages=max_pages
+        profile=profile
    )

 def main():
--- a/crawl4ai/content_scraping_strategy.py
+++ b/crawl4ai/content_scraping_strategy.py
@@ -98,20 +98,20 @@ class ContentScrapingStrategy(ABC):
        pass


-class LXMLWebScrapingStrategy(ContentScrapingStrategy):
+class WebScrapingStrategy(ContentScrapingStrategy):
    """
-    LXML-based implementation for fast web content scraping.
-    
-    This is the primary scraping strategy in Crawl4AI, providing high-performance
-    HTML parsing and content extraction using the lxml library.
-    
-    Note: WebScrapingStrategy is now an alias for this class to maintain
-    backward compatibility.
+    Class for web content scraping. Perhaps the most important class.
+
+    How it works:
+    1. Extract content from HTML using BeautifulSoup.
+    2. Clean the extracted content using a content cleaning strategy.
+    3. Filter the cleaned content using a content filtering strategy.
+    4. Generate markdown content from the filtered content.
+    5. Return the markdown content.
    """
+
    def __init__(self, logger=None):
        self.logger = logger
-        self.DIMENSION_REGEX = re.compile(r"(\d+)(\D*)")
-        self.BASE64_PATTERN = re.compile(r'data:image/[^;]+;base64,([^"]+)')

    def _log(self, level, message, tag="SCRAPE", **kwargs):
        """Helper method to safely use logger."""
@@ -132,7 +132,7 @@ class LXMLWebScrapingStrategy(ContentScrapingStrategy):
            ScrapingResult: A structured result containing the scraped content.
        """
        actual_url = kwargs.get("redirected_url", url)
-        raw_result = self._scrap(actual_url, html, **kwargs)
+        raw_result = self._scrap(actual_url, html, is_async=False, **kwargs)
        if raw_result is None:
            return ScrapingResult(
                cleaned_html="",
@@ -196,9 +196,376 @@ class LXMLWebScrapingStrategy(ContentScrapingStrategy):
        Returns:
            ScrapingResult: A structured result containing the scraped content.
        """
-        return await asyncio.to_thread(self.scrap, url, html, **kwargs)
+        return await asyncio.to_thread(self._scrap, url, html, **kwargs)

-    def process_element(self, url, element: lhtml.HtmlElement, **kwargs) -> Dict[str, Any]:
+    def is_data_table(self, table: Tag, **kwargs) -> bool:
+        """
+        Determine if a table element is a data table (not a layout table).
+
+        Args:
+            table (Tag): BeautifulSoup Tag representing a table element
+            **kwargs: Additional keyword arguments including table_score_threshold
+
+        Returns:
+            bool: True if the table is a data table, False otherwise
+        """
+        score = 0
+        
+        # Check for thead and tbody
+        has_thead = len(table.select('thead')) > 0
+        has_tbody = len(table.select('tbody')) > 0
+        if has_thead:
+            score += 2
+        if has_tbody:
+            score += 1
+            
+        # Check for th elements
+        th_count = len(table.select('th'))
+        if th_count > 0:
+            score += 2
+            if has_thead or len(table.select('tr:first-child th')) > 0:
+                score += 1
+                
+        # Check for nested tables
+        if len(table.select('table')) > 0:
+            score -= 3
+            
+        # Role attribute check
+        role = table.get('role', '').lower()
+        if role in {'presentation', 'none'}:
+            score -= 3
+            
+        # Column consistency
+        rows = table.select('tr')
+        if not rows:
+            return False
+            
+        col_counts = [len(row.select('td, th')) for row in rows]
+        avg_cols = sum(col_counts) / len(col_counts)
+        variance = sum((c - avg_cols)**2 for c in col_counts) / len(col_counts)
+        if variance < 1:
+            score += 2
+            
+        # Caption and summary
+        if table.select('caption'):
+            score += 2
+        if table.has_attr('summary') and table['summary']:
+            score += 1
+            
+        # Text density
+        total_text = sum(len(cell.get_text().strip()) for row in rows for cell in row.select('td, th'))
+        total_tags = sum(1 for _ in table.descendants if isinstance(_, Tag))
+        text_ratio = total_text / (total_tags + 1e-5)
+        if text_ratio > 20:
+            score += 3
+        elif text_ratio > 10:
+            score += 2
+            
+        # Data attributes
+        data_attrs = sum(1 for attr in table.attrs if attr.startswith('data-'))
+        score += data_attrs * 0.5
+        
+        # Size check
+        if avg_cols >= 2 and len(rows) >= 2:
+            score += 2
+            
+        threshold = kwargs.get('table_score_threshold', 7)
+        return score >= threshold
+    
+    def extract_table_data(self, table: Tag) -> dict:
+        """
+        Extract structured data from a table element.
+        
+        Args:
+            table (Tag): BeautifulSoup Tag representing a table element
+            
+        Returns:
+            dict: Dictionary containing table data (headers, rows, caption, summary)
+        """
+        caption_elem = table.select_one('caption')
+        caption = caption_elem.get_text().strip() if caption_elem else ""
+        summary = table.get('summary', '').strip()
+        
+        # Extract headers with colspan handling
+        headers = []
+        thead_rows = table.select('thead tr')
+        if thead_rows:
+            header_cells = thead_rows[0].select('th')
+            for cell in header_cells:
+                text = cell.get_text().strip()
+                colspan = int(cell.get('colspan', 1))
+                headers.extend([text] * colspan)
+        else:
+            first_row = table.select('tr:first-child')
+            if first_row:
+                for cell in first_row[0].select('th, td'):
+                    text = cell.get_text().strip()
+                    colspan = int(cell.get('colspan', 1))
+                    headers.extend([text] * colspan)
+        
+        # Extract rows with colspan handling
+        rows = []
+        all_rows = table.select('tr')
+        thead = table.select_one('thead')
+        tbody_rows = []
+
+        if thead:
+            thead_rows = thead.select('tr')
+            tbody_rows = [row for row in all_rows if row not in thead_rows]
+        else:
+            if all_rows and all_rows[0].select('th'):
+                tbody_rows = all_rows[1:]
+            else:
+                tbody_rows = all_rows
+                
+        for row in tbody_rows:        
+        # for row in table.select('tr:not(:has(ancestor::thead))'):
+            row_data = []
+            for cell in row.select('td'):
+                text = cell.get_text().strip()
+                colspan = int(cell.get('colspan', 1))
+                row_data.extend([text] * colspan)
+            if row_data:
+                rows.append(row_data)
+                
+        # Align rows with headers
+        max_columns = len(headers) if headers else (max(len(row) for row in rows) if rows else 0)
+        aligned_rows = []
+        for row in rows:
+            aligned = row[:max_columns] + [''] * (max_columns - len(row))
+            aligned_rows.append(aligned)
+            
+        if not headers:
+            headers = [f"Column {i+1}" for i in range(max_columns)]
+            
+        return {
+            "headers": headers,
+            "rows": aligned_rows,
+            "caption": caption,
+            "summary": summary,
+        }
+    
+    def flatten_nested_elements(self, node):
+        """
+        Flatten nested elements in a HTML tree.
+
+        Args:
+            node (Tag): The root node of the HTML tree.
+
+        Returns:
+            Tag: The flattened HTML tree.
+        """
+        if isinstance(node, NavigableString):
+            return node
+        if (
+            len(node.contents) == 1
+            and isinstance(node.contents[0], Tag)
+            and node.contents[0].name == node.name
+        ):
+            return self.flatten_nested_elements(node.contents[0])
+        node.contents = [self.flatten_nested_elements(child) for child in node.contents]
+        return node
+
+    def find_closest_parent_with_useful_text(self, tag, **kwargs):
+        """
+        Find the closest parent with useful text.
+
+        Args:
+            tag (Tag): The starting tag to search from.
+            **kwargs: Additional keyword arguments.
+
+        Returns:
+            Tag: The closest parent with useful text, or None if not found.
+        """
+        image_description_min_word_threshold = kwargs.get(
+            "image_description_min_word_threshold", IMAGE_DESCRIPTION_MIN_WORD_THRESHOLD
+        )
+        current_tag = tag
+        while current_tag:
+            current_tag = current_tag.parent
+            # Get the text content of the parent tag
+            if current_tag:
+                text_content = current_tag.get_text(separator=" ", strip=True)
+                # Check if the text content has at least word_count_threshold
+                if len(text_content.split()) >= image_description_min_word_threshold:
+                    return text_content
+        return None
+
+    def remove_unwanted_attributes(
+        self, element, important_attrs, keep_data_attributes=False
+    ):
+        """
+        Remove unwanted attributes from an HTML element.
+
+        Args:
+            element (Tag): The HTML element to remove attributes from.
+            important_attrs (list): List of important attributes to keep.
+            keep_data_attributes (bool): Whether to keep data attributes.
+
+        Returns:
+            None
+        """
+        attrs_to_remove = []
+        for attr in element.attrs:
+            if attr not in important_attrs:
+                if keep_data_attributes:
+                    if not attr.startswith("data-"):
+                        attrs_to_remove.append(attr)
+                else:
+                    attrs_to_remove.append(attr)
+
+        for attr in attrs_to_remove:
+            del element[attr]
+
+    def process_image(self, img, url, index, total_images, **kwargs):
+        """
+        Process an image element.
+
+        How it works:
+        1. Check if the image has valid display and inside undesired html elements.
+        2. Score an image for it's usefulness.
+        3. Extract image file metadata to extract size and extension.
+        4. Generate a dictionary with the processed image information.
+        5. Return the processed image information.
+
+        Args:
+            img (Tag): The image element to process.
+            url (str): The URL of the page containing the image.
+            index (int): The index of the image in the list of images.
+            total_images (int): The total number of images in the list.
+            **kwargs: Additional keyword arguments.
+
+        Returns:
+            dict: A dictionary containing the processed image information.
+        """
+        # parse_srcset = lambda s: [{'url': u.strip().split()[0], 'width': u.strip().split()[-1].rstrip('w')
+        #                 if ' ' in u else None}
+        #                 for u in [f"http{p}" for p in s.split("http") if p]]
+
+        # Constants for checks
+        classes_to_check = frozenset(["button", "icon", "logo"])
+        tags_to_check = frozenset(["button", "input"])
+        image_formats = frozenset(["jpg", "jpeg", "png", "webp", "avif", "gif"])
+
+        # Pre-fetch commonly used attributes
+        style = img.get("style", "")
+        alt = img.get("alt", "")
+        src = img.get("src", "")
+        data_src = img.get("data-src", "")
+        srcset = img.get("srcset", "")
+        data_srcset = img.get("data-srcset", "")
+        width = img.get("width")
+        height = img.get("height")
+        parent = img.parent
+        parent_classes = parent.get("class", [])
+
+        # Quick validation checks
+        if (
+            "display:none" in style
+            or parent.name in tags_to_check
+            or any(c in cls for c in parent_classes for cls in classes_to_check)
+            or any(c in src for c in classes_to_check)
+            or any(c in alt for c in classes_to_check)
+        ):
+            return None
+
+        # Quick score calculation
+        score = 0
+        if width and width.isdigit():
+            width_val = int(width)
+            score += 1 if width_val > 150 else 0
+        if height and height.isdigit():
+            height_val = int(height)
+            score += 1 if height_val > 150 else 0
+        if alt:
+            score += 1
+        score += index / total_images < 0.5
+
+        # image_format = ''
+        # if "data:image/" in src:
+        #     image_format = src.split(',')[0].split(';')[0].split('/')[1].split(';')[0]
+        # else:
+        #     image_format = os.path.splitext(src)[1].lower().strip('.').split('?')[0]
+
+        # if image_format in ('jpg', 'png', 'webp', 'avif'):
+        #     score += 1
+
+        # Check for image format in all possible sources
+        def has_image_format(url):
+            return any(fmt in url.lower() for fmt in image_formats)
+
+        # Score for having proper image sources
+        if any(has_image_format(url) for url in [src, data_src, srcset, data_srcset]):
+            score += 1
+        if srcset or data_srcset:
+            score += 1
+        if img.find_parent("picture"):
+            score += 1
+
+        # Detect format from any available source
+        detected_format = None
+        for url in [src, data_src, srcset, data_srcset]:
+            if url:
+                format_matches = [fmt for fmt in image_formats if fmt in url.lower()]
+                if format_matches:
+                    detected_format = format_matches[0]
+                    break
+
+        if score <= kwargs.get("image_score_threshold", IMAGE_SCORE_THRESHOLD):
+            return None
+
+        # Use set for deduplication
+        unique_urls = set()
+        image_variants = []
+
+        # Generate a unique group ID for this set of variants
+        group_id = index
+
+        # Base image info template
+        base_info = {
+            "alt": alt,
+            "desc": self.find_closest_parent_with_useful_text(img, **kwargs),
+            "score": score,
+            "type": "image",
+            "group_id": group_id,  # Group ID for this set of variants
+            "format": detected_format,
+        }
+
+        # Inline function for adding variants
+        def add_variant(src, width=None):
+            if src and not src.startswith("data:") and src not in unique_urls:
+                unique_urls.add(src)
+                image_variants.append({**base_info, "src": src, "width": width})
+
+        # Process all sources
+        add_variant(src)
+        add_variant(data_src)
+
+        # Handle srcset and data-srcset in one pass
+        for attr in ("srcset", "data-srcset"):
+            if value := img.get(attr):
+                for source in parse_srcset(value):
+                    add_variant(source["url"], source["width"])
+
+        # Quick picture element check
+        if picture := img.find_parent("picture"):
+            for source in picture.find_all("source"):
+                if srcset := source.get("srcset"):
+                    for src in parse_srcset(srcset):
+                        add_variant(src["url"], src["width"])
+
+        # Framework-specific attributes in one pass
+        for attr, value in img.attrs.items():
+            if (
+                attr.startswith("data-")
+                and ("src" in attr or "srcset" in attr)
+                and "http" in value
+            ):
+                add_variant(value)
+
+        return image_variants if image_variants else None
+
+    def process_element(self, url, element: PageElement, **kwargs) -> Dict[str, Any]:
        """
        Process an HTML element.

@@ -210,7 +577,7 @@ class LXMLWebScrapingStrategy(ContentScrapingStrategy):

        Args:
            url (str): The URL of the page containing the element.
-            element (lhtml.HtmlElement): The HTML element to process.
+            element (Tag): The HTML element to process.
            **kwargs: Additional keyword arguments.

        Returns:
@@ -228,6 +595,514 @@ class LXMLWebScrapingStrategy(ContentScrapingStrategy):
            "external_links_dict": external_links_dict,
        }

+    def _process_element(
+        self,
+        url,
+        element: PageElement,
+        media: Dict[str, Any],
+        internal_links_dict: Dict[str, Any],
+        external_links_dict: Dict[str, Any],
+        **kwargs,
+    ) -> bool:
+        """
+        Process an HTML element.
+        """
+        try:
+            if isinstance(element, NavigableString):
+                if isinstance(element, Comment):
+                    element.extract()
+                return False
+
+            # if element.name == 'img':
+            #     process_image(element, url, 0, 1)
+            #     return True
+            base_domain = kwargs.get("base_domain", get_base_domain(url))
+
+            if element.name in ["script", "style", "link", "meta", "noscript"]:
+                element.decompose()
+                return False
+
+            keep_element = False
+            # Special case for table elements - always preserve structure
+            if element.name in ["tr", "td", "th"]:
+                keep_element = True
+
+            exclude_domains = kwargs.get("exclude_domains", [])
+            # exclude_social_media_domains = kwargs.get('exclude_social_media_domains', set(SOCIAL_MEDIA_DOMAINS))
+            # exclude_social_media_domains = SOCIAL_MEDIA_DOMAINS + kwargs.get('exclude_social_media_domains', [])
+            # exclude_social_media_domains = list(set(exclude_social_media_domains))
+
+            try:
+                if element.name == "a" and element.get("href"):
+                    href = element.get("href", "").strip()
+                    if not href:  # Skip empty hrefs
+                        return False
+
+                    # url_base = url.split("/")[2]
+
+                    # Normalize the URL
+                    try:
+                        normalized_href = normalize_url(href, url)
+                    except ValueError:
+                        # logging.warning(f"Invalid URL format: {href}, Error: {str(e)}")
+                        return False
+
+                    link_data = {
+                        "href": normalized_href,
+                        "text": element.get_text().strip(),
+                        "title": element.get("title", "").strip(),
+                        "base_domain": base_domain,
+                    }
+
+                    is_external = is_external_url(normalized_href, base_domain)
+
+                    keep_element = True
+
+                    # Handle external link exclusions
+                    if is_external:
+                        link_base_domain = get_base_domain(normalized_href)
+                        link_data["base_domain"] = link_base_domain
+                        if kwargs.get("exclude_external_links", False):
+                            element.decompose()
+                            return False
+                        # elif kwargs.get('exclude_social_media_links', False):
+                        #     if link_base_domain in exclude_social_media_domains:
+                        #         element.decompose()
+                        #         return False
+                        # if any(domain in normalized_href.lower() for domain in exclude_social_media_domains):
+                        #     element.decompose()
+                        #     return False
+                        elif exclude_domains:
+                            if link_base_domain in exclude_domains:
+                                element.decompose()
+                                return False
+                            # if any(domain in normalized_href.lower() for domain in kwargs.get('exclude_domains', [])):
+                            #     element.decompose()
+                            #     return False
+
+                    if is_external:
+                        if normalized_href not in external_links_dict:
+                            external_links_dict[normalized_href] = link_data
+                    else:
+                        if kwargs.get("exclude_internal_links", False):
+                            element.decompose()
+                            return False
+                        if normalized_href not in internal_links_dict:
+                            internal_links_dict[normalized_href] = link_data
+
+            except Exception as e:
+                raise Exception(f"Error processing links: {str(e)}")
+
+            try:
+                if element.name == "img":
+                    potential_sources = [
+                        "src",
+                        "data-src",
+                        "srcset" "data-lazy-src",
+                        "data-original",
+                    ]
+                    src = element.get("src", "")
+                    while not src and potential_sources:
+                        src = element.get(potential_sources.pop(0), "")
+                    if not src:
+                        element.decompose()
+                        return False
+
+                    # If it is srcset pick up the first image
+                    if "srcset" in element.attrs:
+                        src = element.attrs["srcset"].split(",")[0].split(" ")[0]
+
+                    # If image src is internal, then skip
+                    if not is_external_url(src, base_domain):
+                        return True
+
+                    image_src_base_domain = get_base_domain(src)
+
+                    # Check flag if we should remove external images
+                    if kwargs.get("exclude_external_images", False):
+                        # Handle relative URLs (which are always from the same domain)
+                        if not src.startswith('http') and not src.startswith('//'):
+                            return True  # Keep relative URLs
+                        
+                        # For absolute URLs, compare the base domains using the existing function
+                        src_base_domain = get_base_domain(src)
+                        url_base_domain = get_base_domain(url)
+                        
+                        # If the domains don't match and both are valid, the image is external
+                        if src_base_domain and url_base_domain and src_base_domain != url_base_domain:
+                            element.decompose()
+                            return False
+
+                    # if kwargs.get('exclude_social_media_links', False):
+                    #     if image_src_base_domain in exclude_social_media_domains:
+                    #         element.decompose()
+                    #         return False
+                    # src_url_base = src.split('/')[2]
+                    # url_base = url.split('/')[2]
+                    # if any(domain in src for domain in exclude_social_media_domains):
+                    #     element.decompose()
+                    #     return False
+
+                    # Handle exclude domains
+                    if exclude_domains:
+                        if image_src_base_domain in exclude_domains:
+                            element.decompose()
+                            return False
+                        # if any(domain in src for domain in kwargs.get('exclude_domains', [])):
+                        #     element.decompose()
+                        #     return False
+
+                    return True  # Always keep image elements
+            except Exception:
+                raise "Error processing images"
+
+            # Check if flag to remove all forms is set
+            if kwargs.get("remove_forms", False) and element.name == "form":
+                element.decompose()
+                return False
+
+            if element.name in ["video", "audio"]:
+                media[f"{element.name}s"].append(
+                    {
+                        "src": element.get("src"),
+                        "alt": element.get("alt"),
+                        "type": element.name,
+                        "description": self.find_closest_parent_with_useful_text(
+                            element, **kwargs
+                        ),
+                    }
+                )
+                source_tags = element.find_all("source")
+                for source_tag in source_tags:
+                    media[f"{element.name}s"].append(
+                        {
+                            "src": source_tag.get("src"),
+                            "alt": element.get("alt"),
+                            "type": element.name,
+                            "description": self.find_closest_parent_with_useful_text(
+                                element, **kwargs
+                            ),
+                        }
+                    )
+                return True  # Always keep video and audio elements
+
+            if element.name in ONLY_TEXT_ELIGIBLE_TAGS:
+                if kwargs.get("only_text", False):
+                    element.replace_with(element.get_text())
+
+            try:
+                self.remove_unwanted_attributes(
+                    element, IMPORTANT_ATTRS + kwargs.get("keep_attrs", []) , kwargs.get("keep_data_attributes", False)
+                )
+            except Exception as e:
+                # print('Error removing unwanted attributes:', str(e))
+                self._log(
+                    "error",
+                    message="Error removing unwanted attributes: {error}",
+                    tag="SCRAPE",
+                    params={"error": str(e)},
+                )
+            # Process children
+            for child in list(element.children):
+                if isinstance(child, NavigableString) and not isinstance(
+                    child, Comment
+                ):
+                    if len(child.strip()) > 0:
+                        keep_element = True
+                else:
+                    if self._process_element(
+                        url,
+                        child,
+                        media,
+                        internal_links_dict,
+                        external_links_dict,
+                        **kwargs,
+                    ):
+                        keep_element = True
+
+            # Check word count
+            word_count_threshold = kwargs.get(
+                "word_count_threshold", MIN_WORD_THRESHOLD
+            )
+            if not keep_element:
+                word_count = len(element.get_text(strip=True).split())
+                keep_element = word_count >= word_count_threshold
+
+            if not keep_element:
+                element.decompose()
+
+            return keep_element
+        except Exception as e:
+            # print('Error processing element:', str(e))
+            self._log(
+                "error",
+                message="Error processing element: {error}",
+                tag="SCRAPE",
+                params={"error": str(e)},
+            )
+            return False
+
+    def _scrap(
+        self,
+        url: str,
+        html: str,
+        word_count_threshold: int = MIN_WORD_THRESHOLD,
+        css_selector: str = None,
+        target_elements: List[str] = None,
+        **kwargs,
+    ) -> Dict[str, Any]:
+        """
+        Extract content from HTML using BeautifulSoup.
+
+        Args:
+            url (str): The URL of the page to scrape.
+            html (str): The HTML content of the page to scrape.
+            word_count_threshold (int): The minimum word count threshold for content extraction.
+            css_selector (str): The CSS selector to use for content extraction.
+            **kwargs: Additional keyword arguments.
+
+        Returns:
+            dict: A dictionary containing the extracted content.
+        """
+        success = True
+        if not html:
+            return None
+
+        parser_type = kwargs.get("parser", "lxml")
+        soup = BeautifulSoup(html, parser_type)
+        body = soup.body
+        if body is None:
+            raise Exception("'<body>' tag is not found in fetched html. Consider adding wait_for=\"css:body\" to wait for body tag to be loaded into DOM.")
+        base_domain = get_base_domain(url)
+        
+        # Early removal of all images if exclude_all_images is set
+        # This happens before any processing to minimize memory usage
+        if kwargs.get("exclude_all_images", False):
+            for img in body.find_all('img'):
+                img.decompose()
+
+        try:
+            meta = extract_metadata("", soup)
+        except Exception as e:
+            self._log(
+                "error",
+                message="Error extracting metadata: {error}",
+                tag="SCRAPE",
+                params={"error": str(e)},
+            )
+            meta = {}
+
+        # Handle tag-based removal first - faster than CSS selection
+        excluded_tags = set(kwargs.get("excluded_tags", []) or [])
+        if excluded_tags:
+            for element in body.find_all(lambda tag: tag.name in excluded_tags):
+                element.extract()
+
+        # Handle CSS selector-based removal
+        excluded_selector = kwargs.get("excluded_selector", "")
+        if excluded_selector:
+            is_single_selector = (
+                "," not in excluded_selector and " " not in excluded_selector
+            )
+            if is_single_selector:
+                while element := body.select_one(excluded_selector):
+                    element.extract()
+            else:
+                for element in body.select(excluded_selector):
+                    element.extract()
+
+        content_element = None
+        if target_elements:
+            try:
+                for_content_targeted_element = []
+                for target_element in target_elements:
+                    for_content_targeted_element.extend(body.select(target_element))
+                content_element = soup.new_tag("div")
+                for el in for_content_targeted_element:
+                    content_element.append(copy.deepcopy(el))
+            except Exception as e:
+                self._log("error", f"Error with target element detection: {str(e)}", "SCRAPE")
+                return None
+        else:
+            content_element = body     
+
+        kwargs["exclude_social_media_domains"] = set(
+            kwargs.get("exclude_social_media_domains", []) + SOCIAL_MEDIA_DOMAINS
+        )
+        kwargs["exclude_domains"] = set(kwargs.get("exclude_domains", []))
+        if kwargs.get("exclude_social_media_links", False):
+            kwargs["exclude_domains"] = kwargs["exclude_domains"].union(
+                kwargs["exclude_social_media_domains"]
+            )
+
+        result_obj = self.process_element(
+            url,
+            body,
+            word_count_threshold=word_count_threshold,
+            base_domain=base_domain,
+            **kwargs,
+        )
+
+        links = {"internal": [], "external": []}
+        media = result_obj["media"]
+        internal_links_dict = result_obj["internal_links_dict"]
+        external_links_dict = result_obj["external_links_dict"]
+
+        # Update the links dictionary with unique links
+        links["internal"] = list(internal_links_dict.values())
+        links["external"] = list(external_links_dict.values())
+        
+        # Extract head content for links if configured
+        link_preview_config = kwargs.get("link_preview_config")
+        if link_preview_config is not None:
+            try:
+                import asyncio
+                from .link_preview import LinkPreview
+                from .models import Links, Link
+                
+                verbose = link_preview_config.verbose
+                
+                if verbose:
+                    self._log("info", "Starting link head extraction for {internal} internal and {external} external links",
+                              params={"internal": len(links["internal"]), "external": len(links["external"])}, tag="LINK_EXTRACT")
+                
+                # Convert dict links to Link objects
+                internal_links = [Link(**link_data) for link_data in links["internal"]]
+                external_links = [Link(**link_data) for link_data in links["external"]]
+                links_obj = Links(internal=internal_links, external=external_links)
+                
+                # Create a config object for LinkPreview  
+                class TempCrawlerRunConfig:
+                    def __init__(self, link_config, score_links):
+                        self.link_preview_config = link_config
+                        self.score_links = score_links
+                
+                config = TempCrawlerRunConfig(link_preview_config, kwargs.get("score_links", False))
+                
+                # Extract head content (run async operation in sync context)
+                async def extract_links():
+                    async with LinkPreview(self.logger) as extractor:
+                        return await extractor.extract_link_heads(links_obj, config)
+                
+                # Run the async operation
+                try:
+                    # Check if we're already in an async context
+                    loop = asyncio.get_running_loop()
+                    # If we're in an async context, we need to run in a thread
+                    import concurrent.futures
+                    with concurrent.futures.ThreadPoolExecutor() as executor:
+                        future = executor.submit(asyncio.run, extract_links())
+                        updated_links = future.result()
+                except RuntimeError:
+                    # No running loop, we can use asyncio.run directly
+                    updated_links = asyncio.run(extract_links())
+                
+                # Convert back to dict format
+                links["internal"] = [link.dict() for link in updated_links.internal]
+                links["external"] = [link.dict() for link in updated_links.external]
+                
+                if verbose:
+                    successful_internal = len([l for l in updated_links.internal if l.head_extraction_status == "valid"])
+                    successful_external = len([l for l in updated_links.external if l.head_extraction_status == "valid"])
+                    self._log("info", "Link head extraction completed: {internal_success}/{internal_total} internal, {external_success}/{external_total} external",
+                              params={
+                                  "internal_success": successful_internal,
+                                  "internal_total": len(updated_links.internal),
+                                  "external_success": successful_external,
+                                  "external_total": len(updated_links.external)
+                              }, tag="LINK_EXTRACT")
+                else:
+                    self._log("info", "Link head extraction completed successfully", tag="LINK_EXTRACT")
+                
+            except Exception as e:
+                self._log("error", f"Link head extraction failed: {str(e)}", tag="LINK_EXTRACT")
+                # Continue with original links if extraction fails
+
+        # # Process images using ThreadPoolExecutor
+        imgs = body.find_all("img")
+
+        media["images"] = [
+            img
+            for result in (
+                self.process_image(img, url, i, len(imgs), **kwargs)
+                for i, img in enumerate(imgs)
+            )
+            if result is not None
+            for img in result
+        ]
+        
+        # Process tables if not excluded
+        excluded_tags = set(kwargs.get("excluded_tags", []) or [])
+        if 'table' not in excluded_tags:
+            tables = body.find_all('table')
+            for table in tables:
+                if self.is_data_table(table, **kwargs):
+                    table_data = self.extract_table_data(table)
+                    media["tables"].append(table_data)
+
+        body = self.flatten_nested_elements(body)
+        base64_pattern = re.compile(r'data:image/[^;]+;base64,([^"]+)')
+        for img in imgs:
+            src = img.get("src", "")
+            if base64_pattern.match(src):
+                # Replace base64 data with empty string
+                img["src"] = base64_pattern.sub("", src)
+
+        str_body = ""
+        try:
+            str_body = content_element.encode_contents().decode("utf-8")
+        except Exception:
+            # Reset body to the original HTML
+            success = False
+            body = BeautifulSoup(html, "html.parser")
+
+            # Create a new div with a special ID
+            error_div = body.new_tag("div", id="crawl4ai_error_message")
+            error_div.string = """
+            Crawl4AI Error: This page is not fully supported.
+            
+            Possible reasons:
+            1. The page may have restrictions that prevent crawling.
+            2. The page might not be fully loaded.
+            
+            Suggestions:
+            - Try calling the crawl function with these parameters:
+            magic=True,
+            - Set headless=False to visualize what's happening on the page.
+            
+            If the issue persists, please check the page's structure and any potential anti-crawling measures.
+            """
+
+            # Append the error div to the body
+            body.append(error_div)
+            str_body = body.encode_contents().decode("utf-8")
+
+            print(
+                "[LOG] 😧 Error: After processing the crawled HTML and removing irrelevant tags, nothing was left in the page. Check the markdown for further details."
+            )
+            self._log(
+                "error",
+                message="After processing the crawled HTML and removing irrelevant tags, nothing was left in the page. Check the markdown for further details.",
+                tag="SCRAPE",
+            )
+
+        cleaned_html = str_body.replace("\n\n", "\n").replace("  ", " ")
+
+        return {
+            "cleaned_html": cleaned_html,
+            "success": success,
+            "media": media,
+            "links": links,
+            "metadata": meta,
+        }
+
+
+class LXMLWebScrapingStrategy(WebScrapingStrategy):
+    def __init__(self, logger=None):
+        super().__init__(logger)
+        self.DIMENSION_REGEX = re.compile(r"(\d+)(\D*)")
+        self.BASE64_PATTERN = re.compile(r'data:image/[^;]+;base64,([^"]+)')
+
    def _process_element(
        self,
        url: str,
@@ -987,7 +1862,3 @@ class LXMLWebScrapingStrategy(ContentScrapingStrategy):
                "links": {"internal": [], "external": []},
                "metadata": {},
            }
-
-
-# Backward compatibility alias
-WebScrapingStrategy = LXMLWebScrapingStrategy
--- a/crawl4ai/extraction_strategy.py
+++ b/crawl4ai/extraction_strategy.py
@@ -1088,111 +1088,147 @@ class JsonElementExtractionStrategy(ExtractionStrategy):
    @staticmethod
    def generate_schema(
        html: str,
-        schema_type: str = "CSS", # or XPATH
-        query: str = None,
-        target_json_example: str = None,
-        llm_config: 'LLMConfig' = create_llm_config(),
-        provider: str = None,
-        api_token: str = None,
-        **kwargs
+        *,
+        schema_type: str = "CSS",              # "CSS" or "XPATH"
+        query: str | None = None,
+        target_json_example: str | None = None,
+        last_instruction: str | None = None,   # extra “IMPORTANT” notes
+        llm_config: "LLMConfig" = create_llm_config(),
+        token_usages: Optional[list["TokenUsage"]] = None,
+        prompt: str | None = None,
+        **kwargs,
    ) -> dict:
        """
-        Generate extraction schema from HTML content and optional query.
-        
-        Args:
-            html (str): The HTML content to analyze
-            query (str, optional): Natural language description of what data to extract
-            provider (str): Legacy Parameter. LLM provider to use 
-            api_token (str): Legacy Parameter. API token for LLM provider
-            llm_config (LLMConfig): LLM configuration object
-            prompt (str, optional): Custom prompt template to use
-            **kwargs: Additional args passed to LLM processor
-            
-        Returns:
-            dict: Generated schema following the JsonElementExtractionStrategy format
+        Produce a JSON extraction schema from raw HTML.
+
+        - If `query` is given, the task section echoes it.
+        - If no `query` but `target_json_example` exists,
+          we instruct the model to fit the schema to that example.
+        - If neither is provided, we ask the model to detect
+          the most obvious repeating data and build a schema.
+
+        Returns
+        -------
+        dict
+            A schema compliant with JsonElementExtractionStrategy.
        """
-        from .prompts import JSON_SCHEMA_BUILDER
+        import json, re, textwrap
+        from .prompts import JSON_SCHEMA_BUILDER, JSON_SCHEMA_BUILDER_XPATH
        from .utils import perform_completion_with_backoff
-        for name, message in JsonElementExtractionStrategy._GENERATE_SCHEMA_UNWANTED_PROPS.items():
-            if locals()[name] is not None:
-                raise AttributeError(f"Setting '{name}' is deprecated. {message}")
-        
-        # Use default or custom prompt
-        prompt_template = JSON_SCHEMA_BUILDER if schema_type == "CSS" else JSON_SCHEMA_BUILDER_XPATH
-        
-        # Build the prompt
-        system_message = {
-            "role": "system", 
-            "content": f"""You specialize in generating special JSON schemas for web scraping. This schema uses CSS or XPATH selectors to present a repetitive pattern in crawled HTML, such as a product in a product list or a search result item in a list of search results. We use this JSON schema to pass to a language model along with the HTML content to extract structured data from the HTML. The language model uses the JSON schema to extract data from the HTML and retrieve values for fields in the JSON schema, following the schema.

-Generating this HTML manually is not feasible, so you need to generate the JSON schema using the HTML content. The HTML copied from the crawled website is provided below, which we believe contains the repetitive pattern.
+        # ─── basic validation ────────────────────────────────────
+        if not html or not html.strip():
+            raise ValueError("html must be non-empty")
+        if schema_type not in {"CSS", "XPATH"}:
+            raise ValueError("schema_type must be 'CSS' or 'XPATH'")
+        for name, msg in JsonElementExtractionStrategy._GENERATE_SCHEMA_UNWANTED_PROPS.items():
+            if locals().get(name) is not None:
+                raise AttributeError(f"Setting '{name}' is deprecated. {msg}")

-# Schema main keys:
- name: This is the name of the schema.
- baseSelector: This is the CSS or XPATH selector that identifies the base element that contains all the repetitive patterns.
- baseFields: This is a list of fields that you extract from the base element itself.
- fields: This is a list of fields that you extract from the children of the base element. {{name, selector, type}} based on the type, you may have extra keys such as "attribute" when the type is "attribute".
-
-# Extra Context:
-In this context, the following items may or may not be present:
- Example of target JSON object: This is a sample of the final JSON object that we hope to extract from the HTML using the schema you are generating.
- Extra Instructions: This is optional instructions to consider when generating the schema provided by the user.
- Query or explanation of target/goal data item: This is a description of what data we are trying to extract from the HTML. This explanation means we're not sure about the rigid schema of the structures we want, so we leave it to you to use your expertise to create the best and most comprehensive structures aimed at maximizing data extraction from this page. You must ensure that you do not pick up nuances that may exist on a particular page. The focus should be on the data we are extracting, and it must be valid, safe, and robust based on the given HTML.
-
-# What if there is no example of target JSON object and also no extra instructions or even no explanation of target/goal data item?
-In this scenario, use your best judgment to generate the schema. You need to examine the content of the page and understand the data it provides. If the page contains repetitive data, such as lists of items, products, jobs, places, books, or movies, focus on one single item that repeats. If the page is a detailed page about one product or item, create a schema to extract the entire structured data. At this stage, you must think and decide for yourself. Try to maximize the number of fields that you can extract from the HTML.
-
-# What are the instructions and details for this schema generation?
-{prompt_template}"""
-        }
-        
-        user_message = {
-            "role": "user",
-            "content": f"""
-                HTML to analyze:
-                ```html
-                {html}
-                ```
-                """
-        }
+        # ─── prompt selection ────────────────────────────────────
+        prompt_template = (
+            prompt
+            if prompt is not None
+            else (JSON_SCHEMA_BUILDER if schema_type == "CSS" else JSON_SCHEMA_BUILDER_XPATH)
+        )

+        # ─── derive task description ─────────────────────────────
        if query:
-            user_message["content"] += f"\n\n## Query or explanation of target/goal data item:\n{query}"
-        if target_json_example:
-            user_message["content"] += f"\n\n## Example of target JSON object:\n```json\n{target_json_example}\n```"
-
-        if query and not target_json_example:
-            user_message["content"] += """IMPORTANT: To remind you, in this process, we are not providing a rigid example of the adjacent objects we seek. We rely on your understanding of the explanation provided in the above section. Make sure to grasp what we are looking for and, based on that, create the best schema.."""
-        elif not query and target_json_example:
-            user_message["content"] += """IMPORTANT: Please remember that in this process, we provided a proper example of a target JSON object. Make sure to adhere to the structure and create a schema that exactly fits this example. If you find that some elements on the page do not match completely, vote for the majority."""
-        elif not query and not target_json_example:
-            user_message["content"] += """IMPORTANT: Since we neither have a query nor an example, it is crucial to rely solely on the HTML content provided. Leverage your expertise to determine the schema based on the repetitive patterns observed in the content."""
-        
-        user_message["content"] += """IMPORTANT: 
-        0/ Ensure your schema remains reliable by avoiding selectors that appear to generate dynamically and are not dependable. You want a reliable schema, as it consistently returns the same data even after many page reloads.
-        1/ DO NOT USE use base64 kind of classes, they are temporary and not reliable.
-        2/ Every selector must refer to only one unique element. You should ensure your selector points to a single element and is unique to the place that contains the information. You have to use available techniques based on CSS or XPATH requested schema to make sure your selector is unique and also not fragile, meaning if we reload the page now or in the future, the selector should remain reliable.
-        3/ Do not use Regex as much as possible.
-
-        Analyze the HTML and generate a JSON schema that follows the specified format. Only output valid JSON schema, nothing else.
-        """
-
-        try:
-            # Call LLM with backoff handling
-            response = perform_completion_with_backoff(
-                provider=llm_config.provider,
-                prompt_with_variables="\n\n".join([system_message["content"], user_message["content"]]),
-                json_response = True,                
-                api_token=llm_config.api_token,
-                base_url=llm_config.base_url,
-                extra_args=kwargs
+            task_line = query.strip()
+        elif target_json_example:
+            task_line = (
+                "Use the example JSON below to infer all required fields, "
+                "then generate a schema that extracts matching data."
            )
-            
-            # Extract and return schema
-            return json.loads(response.choices[0].message.content)
-            
-        except Exception as e:
-            raise Exception(f"Failed to generate schema: {str(e)}")
+        else:
+            task_line = (
+                "Detect the most obvious repeating data on this page and "
+                "generate a schema that captures it completely."
+            )
+
+        # ─── build user prompt body ──────────────────────────────
+        html_clean = re.sub(r"\s{2,}", " ", textwrap.dedent(html).strip())
+
+        parts: list[str] = [
+            f"{prompt_template}",
+            "\n\n## Extracted HTML\n"
+            "==================== Beginning of Html ====================\n",
+            html_clean,
+            "\n==================== End of Html ====================\n",
+        ]
+
+        if target_json_example:
+            parts.extend(
+                [
+                    "\n## Example of end result\n",
+                    target_json_example.strip(),
+                    "\n",
+                ]
+            )
+
+        if last_instruction:
+            parts.extend(
+                [
+                    "\n## Important\n",
+                    last_instruction.strip(),
+                    "\n",
+                ]
+            )
+
+        parts.extend(
+            [
+                "\n## Task:\n",
+                task_line,
+            ]
+        )
+
+        user_message = {"role": "user", "content": "".join(parts)}
+
+        # slim system message, JSON_SCHEMA_BUILDER already holds heavy guidance
+        system_message = {
+            "role": "system",
+            "content": (
+                "You generate reliable JSON schemas for structured extraction. "
+                "Return valid JSON only."
+            ),
+        }
+
+        # ─── call LLM ─────────────────────────────────────────────
+        response = perform_completion_with_backoff(
+            provider=llm_config.provider,
+            prompt_with_variables="\n\n".join(
+                [system_message["content"], user_message["content"]]
+            ),
+            json_response=True,
+            api_token=llm_config.api_token,
+            base_url=llm_config.base_url,
+            extra_args=kwargs,
+        )
+
+        # ─── token usage accounting ──────────────────────────────
+        if token_usages is not None and hasattr(response, "usage"):
+            token_usages.append(
+                TokenUsage(
+                    completion_tokens=getattr(response.usage, "completion_tokens", 0),
+                    prompt_tokens=getattr(response.usage, "prompt_tokens", 0),
+                    total_tokens=getattr(response.usage, "total_tokens", 0),
+                )
+            )
+
+        # ─── parse and validate JSON answer ──────────────────────
+        try:
+            schema = json.loads(response.choices[0].message.content)
+        except Exception as exc:
+            raise ValueError(f"LLM returned invalid JSON: {exc}") from exc
+
+        required = {"name", "baseSelector", "fields"}
+        if not required.issubset(schema):
+            missing = required - set(schema)
+            raise ValueError(f"Generated schema missing required keys: {missing}")
+
+        return schema
+
+

 class JsonCssExtractionStrategy(JsonElementExtractionStrategy):
    """
--- a/crawl4ai/install.py
+++ b/crawl4ai/install.py
@@ -119,32 +119,6 @@ def install_playwright():
        logger.warning(
            f"Please run '{sys.executable} -m playwright install --with-deps' manually after the installation."
        )
-    
-    # Install Patchright browsers for undetected browser support
-    logger.info("Installing Patchright browsers for undetected mode...", tag="INIT")
-    try:
-        subprocess.check_call(
-            [
-                sys.executable,
-                "-m",
-                "patchright",
-                "install",
-                "--with-deps",
-                "--force",
-                "chromium",
-            ]
-        )
-        logger.success(
-            "Patchright installation completed successfully.", tag="COMPLETE"
-        )
-    except subprocess.CalledProcessError:
-        logger.warning(
-            f"Please run '{sys.executable} -m patchright install --with-deps' manually after the installation."
-        )
-    except Exception:
-        logger.warning(
-            f"Please run '{sys.executable} -m patchright install --with-deps' manually after the installation."
-        )


 def run_migration():
--- a/crawl4ai/legacy/web_crawler.py
+++ b/crawl4ai/legacy/web_crawler.py
@@ -11,7 +11,7 @@ from .extraction_strategy import *
 from .crawler_strategy import *
 from typing import List
 from concurrent.futures import ThreadPoolExecutor
-from ..content_scraping_strategy import LXMLWebScrapingStrategy as WebScrapingStrategy
+from .content_scraping_strategy import WebScrapingStrategy
 from .config import *
 import warnings
 import json
--- a/crawl4ai/memory_utils.py
+++ b/crawl4ai/memory_utils.py
@@ -1,79 +0,0 @@
-import psutil
-import platform
-import subprocess
-from typing import Tuple
-
-
-def get_true_available_memory_gb() -> float:
-    """Get truly available memory including inactive pages (cross-platform)"""
-    vm = psutil.virtual_memory()
-
-    if platform.system() == 'Darwin':  # macOS
-        # On macOS, we need to include inactive memory too
-        try:
-            # Use vm_stat to get accurate values
-            result = subprocess.run(['vm_stat'], capture_output=True, text=True)
-            lines = result.stdout.split('\n')
-
-            page_size = 16384  # macOS page size
-            pages = {}
-
-            for line in lines:
-                if 'Pages free:' in line:
-                    pages['free'] = int(line.split()[-1].rstrip('.'))
-                elif 'Pages inactive:' in line:
-                    pages['inactive'] = int(line.split()[-1].rstrip('.'))
-                elif 'Pages speculative:' in line:
-                    pages['speculative'] = int(line.split()[-1].rstrip('.'))
-                elif 'Pages purgeable:' in line:
-                    pages['purgeable'] = int(line.split()[-1].rstrip('.'))
-
-            # Calculate total available (free + inactive + speculative + purgeable)
-            total_available_pages = (
-                pages.get('free', 0) + 
-                pages.get('inactive', 0) + 
-                pages.get('speculative', 0) + 
-                pages.get('purgeable', 0)
-            )
-            available_gb = (total_available_pages * page_size) / (1024**3)
-
-            return available_gb
-        except:
-            # Fallback to psutil
-            return vm.available / (1024**3)
-    else:
-        # For Windows and Linux, psutil.available is accurate
-        return vm.available / (1024**3)
-
-
-def get_true_memory_usage_percent() -> float:
-    """
-    Get memory usage percentage that accounts for platform differences.
-    
-    Returns:
-        float: Memory usage percentage (0-100)
-    """
-    vm = psutil.virtual_memory()
-    total_gb = vm.total / (1024**3)
-    available_gb = get_true_available_memory_gb()
-    
-    # Calculate used percentage based on truly available memory
-    used_percent = 100.0 * (total_gb - available_gb) / total_gb
-    
-    # Ensure it's within valid range
-    return max(0.0, min(100.0, used_percent))
-
-
-def get_memory_stats() -> Tuple[float, float, float]:
-    """
-    Get comprehensive memory statistics.
-    
-    Returns:
-        Tuple[float, float, float]: (used_percent, available_gb, total_gb)
-    """
-    vm = psutil.virtual_memory()
-    total_gb = vm.total / (1024**3)
-    available_gb = get_true_available_memory_gb()
-    used_percent = get_true_memory_usage_percent()
-    
-    return used_percent, available_gb, total_gb
--- a/crawl4ai/types.py
+++ b/crawl4ai/types.py
@@ -23,9 +23,8 @@ SeedingConfig = Union['SeedingConfigType']

 # Content scraping types
 ContentScrapingStrategy = Union['ContentScrapingStrategyType']
+WebScrapingStrategy = Union['WebScrapingStrategyType']
 LXMLWebScrapingStrategy = Union['LXMLWebScrapingStrategyType']
-# Backward compatibility alias
-WebScrapingStrategy = Union['LXMLWebScrapingStrategyType']

 # Proxy types
 ProxyRotationStrategy = Union['ProxyRotationStrategyType']
@@ -115,6 +114,7 @@ if TYPE_CHECKING:
    # Content scraping imports
    from .content_scraping_strategy import (
        ContentScrapingStrategy as ContentScrapingStrategyType,
+        WebScrapingStrategy as WebScrapingStrategyType,
        LXMLWebScrapingStrategy as LXMLWebScrapingStrategyType,
    )
    
--- a/crawl4ai/utils.py
+++ b/crawl4ai/utils.py
@@ -1517,29 +1517,8 @@ def extract_metadata_using_lxml(html, doc=None):
    head = head[0]

    # Title - using XPath
-    # title = head.xpath(".//title/text()")
-    # metadata["title"] = title[0].strip() if title else None
-
-    # === Title Extraction - New Approach ===
-    # Attempt to extract <title> using XPath
    title = head.xpath(".//title/text()")
-    title = title[0] if title else None
-
-    # Fallback: Use .find() in case XPath fails due to malformed HTML
-    if not title:
-        title_el = doc.find(".//title")
-        title = title_el.text if title_el is not None else None
-
-    # Final fallback: Use OpenGraph or Twitter title if <title> is missing or empty
-    if not title:
-        title_candidates = (
-            doc.xpath("//meta[@property='og:title']/@content") or
-            doc.xpath("//meta[@name='twitter:title']/@content")
-        )
-        title = title_candidates[0] if title_candidates else None
-
-    # Strip and assign title
-    metadata["title"] = title.strip() if title else None
+    metadata["title"] = title[0].strip() if title else None

    # Meta description - using XPath with multiple attribute conditions
    description = head.xpath('.//meta[@name="description"]/@content')
@@ -3363,13 +3342,7 @@ async def get_text_embeddings(
    # Default: use sentence-transformers
    else:
        # Lazy load to avoid importing heavy libraries unless needed
-        try:
-            from sentence_transformers import SentenceTransformer
-        except ImportError:
-            raise ImportError(
-                "sentence-transformers is required for local embeddings. "
-                "Install it with: pip install 'crawl4ai[transformer]' or pip install sentence-transformers"
-            )
+        from sentence_transformers import SentenceTransformer
        
        # Cache the model in function attribute to avoid reloading
        if not hasattr(get_text_embeddings, '_models'):
@@ -3414,3 +3387,90 @@ def cosine_distance(vec1: np.ndarray, vec2: np.ndarray) -> float:
    """Calculate cosine distance (1 - similarity) between two vectors"""
    return 1 - cosine_similarity(vec1, vec2)

+
+async def should_crawl_based_on_head(
+    url: str, 
+    cached_headers: Dict[str, str], 
+    user_agent: str = "Mozilla/5.0",
+    timeout: int = 5
+) -> tuple[bool, str]:
+    """
+    Check if content has changed using HEAD request.
+    
+    Args:
+        url: The URL to check
+        cached_headers: The cached response headers from previous crawl
+        user_agent: User agent string to use for the HEAD request
+        timeout: Timeout in seconds for the HEAD request
+        
+    Returns:
+        Tuple of (should_crawl: bool, reason: str)
+        - should_crawl: True if content has changed and should be re-crawled, False otherwise
+        - reason: Explanation of the decision
+    """
+    import email.utils
+    
+    if not cached_headers:
+        return True, "No cached headers available, must crawl"
+    
+    headers = {
+        "Accept-Encoding": "identity",
+        "User-Agent": user_agent,
+        "Want-Content-Digest": "sha-256",  # Request RFC 9530 digest
+    }
+    
+    # Add conditional headers if available in cache
+    if cached_headers.get("etag"):
+        headers["If-None-Match"] = cached_headers["etag"]
+    if cached_headers.get("last-modified"):
+        headers["If-Modified-Since"] = cached_headers["last-modified"]
+    
+    try:
+        async with aiohttp.ClientSession() as session:
+            async with session.head(
+                url, 
+                headers=headers, 
+                timeout=aiohttp.ClientTimeout(total=timeout),
+                allow_redirects=True
+            ) as response:
+                # 304 Not Modified - content hasn't changed
+                if response.status == 304:
+                    return False, "304 Not Modified - Content unchanged"
+                
+                # Check other headers if no 304 response
+                new_headers = dict(response.headers)
+                
+                # Check Content-Digest (most reliable)
+                if new_headers.get("content-digest") and cached_headers.get("content-digest"):
+                    if new_headers["content-digest"] == cached_headers["content-digest"]:
+                        return False, "Content-Digest matches - Content unchanged"
+                
+                # Check strong ETag
+                if new_headers.get("etag") and cached_headers.get("etag"):
+                    # Strong ETags start with '"'
+                    if (new_headers["etag"].startswith('"') and 
+                        new_headers["etag"] == cached_headers["etag"]):
+                        return False, "Strong ETag matches - Content unchanged"
+                
+                # Check Last-Modified
+                if new_headers.get("last-modified") and cached_headers.get("last-modified"):
+                    try:
+                        new_lm = email.utils.parsedate_to_datetime(new_headers["last-modified"])
+                        cached_lm = email.utils.parsedate_to_datetime(cached_headers["last-modified"])
+                        if new_lm <= cached_lm:
+                            return False, "Last-Modified not newer - Content unchanged"
+                    except Exception:
+                        pass
+                
+                # Content-Length changed is a positive signal
+                if (new_headers.get("content-length") and cached_headers.get("content-length") and
+                    new_headers["content-length"] != cached_headers["content-length"]):
+                    return True, f"Content-Length changed ({cached_headers['content-length']} -> {new_headers['content-length']})"
+                
+                # Default: assume content has changed
+                return True, "No definitive cache headers matched - Assuming content changed"
+                
+    except Exception as e:
+        # On error, assume content has changed (safe default)
+        return True, f"HEAD request failed: {str(e)} - Assuming content changed"
+
--- a/deploy/docker/.llm.env.example
+++ b/deploy/docker/.llm.env.example
@@ -5,9 +5,4 @@ ANTHROPIC_API_KEY=your_anthropic_key_here
 GROQ_API_KEY=your_groq_key_here
 TOGETHER_API_KEY=your_together_key_here
 MISTRAL_API_KEY=your_mistral_key_here
-GEMINI_API_TOKEN=your_gemini_key_here
-
-# Optional: Override the default LLM provider
-# Examples: "openai/gpt-4", "anthropic/claude-3-opus", "deepseek/chat", etc.
-# If not set, uses the provider specified in config.yml (default: openai/gpt-4o-mini)
-# LLM_PROVIDER=anthropic/claude-3-opus
+GEMINI_API_TOKEN=your_gemini_key_here
--- a/deploy/docker/README.md
+++ b/deploy/docker/README.md
@@ -154,29 +154,6 @@ cp deploy/docker/.llm.env.example .llm.env
 # Now edit .llm.env and add your API keys
 ```

-**Flexible LLM Provider Configuration:**
-
-The Docker setup now supports flexible LLM provider configuration through three methods:
-
-1. **Environment Variable** (Highest Priority): Set `LLM_PROVIDER` to override the default
-   ```bash
-   export LLM_PROVIDER="anthropic/claude-3-opus"
-   # Or in your .llm.env file:
-   # LLM_PROVIDER=anthropic/claude-3-opus
-   ```
-
-2. **API Request Parameter**: Specify provider per request
-   ```json
-   {
-     "url": "https://example.com",
-     "provider": "groq/mixtral-8x7b"
-   }
-   ```
-
-3. **Config File Default**: Falls back to `config.yml` (default: `openai/gpt-4o-mini`)
-
-The system automatically selects the appropriate API key based on the provider.
-
 #### 3. Build and Run with Compose

 The `docker-compose.yml` file in the project root provides a simplified approach that automatically handles architecture detection using buildx.
@@ -691,7 +668,7 @@ app:

 # Default LLM Configuration
 llm:
-  provider: "openai/gpt-4o-mini"  # Can be overridden by LLM_PROVIDER env var
+  provider: "openai/gpt-4o-mini"
  api_key_env: "OPENAI_API_KEY"
  # api_key: sk-...  # If you pass the API key directly then api_key_env will be ignored

--- a/deploy/docker/api.py
+++ b/deploy/docker/api.py
@@ -5,7 +5,6 @@ from typing import List, Tuple, Dict
 from functools import partial
 from uuid import uuid4
 from datetime import datetime
-from base64 import b64encode

 import logging
 from typing import Optional, AsyncGenerator
@@ -40,9 +39,7 @@ from utils import (
    get_base_url,
    is_task_id,
    should_cleanup_task,
-    decode_redis_hash,
-    get_llm_api_key,
-    validate_llm_provider
+    decode_redis_hash
 )

 import psutil, time
@@ -91,12 +88,10 @@ async def handle_llm_qa(

    Answer:"""

-        # api_token=os.environ.get(config["llm"].get("api_key_env", ""))
-
        response = perform_completion_with_backoff(
            provider=config["llm"]["provider"],
            prompt_with_variables=prompt,
-            api_token=get_llm_api_key(config)
+            api_token=os.environ.get(config["llm"].get("api_key_env", ""))
        )

        return response.choices[0].message.content
@@ -114,23 +109,19 @@ async def process_llm_extraction(
    url: str,
    instruction: str,
    schema: Optional[str] = None,
-    cache: str = "0",
-    provider: Optional[str] = None
+    cache: str = "0"
 ) -> None:
    """Process LLM extraction in background."""
    try:
-        # Validate provider
-        is_valid, error_msg = validate_llm_provider(config, provider)
-        if not is_valid:
-            await redis.hset(f"task:{task_id}", mapping={
-                "status": TaskStatus.FAILED,
-                "error": error_msg
-            })
-            return
-        api_key = get_llm_api_key(config, provider)
+        # If config['llm'] has api_key then ignore the api_key_env
+        api_key = ""
+        if "api_key" in config["llm"]:
+            api_key = config["llm"]["api_key"]
+        else:
+            api_key = os.environ.get(config["llm"].get("api_key_env", None), "")
        llm_strategy = LLMExtractionStrategy(
            llm_config=LLMConfig(
-                provider=provider or config["llm"]["provider"],
+                provider=config["llm"]["provider"],
                api_token=api_key
            ),
            instruction=instruction,
@@ -177,19 +168,10 @@ async def handle_markdown_request(
    filter_type: FilterType,
    query: Optional[str] = None,
    cache: str = "0",
-    config: Optional[dict] = None,
-    provider: Optional[str] = None
+    config: Optional[dict] = None
 ) -> str:
    """Handle markdown generation requests."""
    try:
-        # Validate provider if using LLM filter
-        if filter_type == FilterType.LLM:
-            is_valid, error_msg = validate_llm_provider(config, provider)
-            if not is_valid:
-                raise HTTPException(
-                    status_code=status.HTTP_400_BAD_REQUEST,
-                    detail=error_msg
-                )
        decoded_url = unquote(url)
        if not decoded_url.startswith(('http://', 'https://')):
            decoded_url = 'https://' + decoded_url
@@ -202,8 +184,8 @@ async def handle_markdown_request(
                FilterType.BM25: BM25ContentFilter(user_query=query or ""),
                FilterType.LLM: LLMContentFilter(
                    llm_config=LLMConfig(
-                        provider=provider or config["llm"]["provider"],
-                        api_token=get_llm_api_key(config, provider),
+                        provider=config["llm"]["provider"],
+                        api_token=os.environ.get(config["llm"].get("api_key_env", None), ""),
                    ),
                    instruction=query or "Extract main content"
                )
@@ -247,8 +229,7 @@ async def handle_llm_request(
    query: Optional[str] = None,
    schema: Optional[str] = None,
    cache: str = "0",
-    config: Optional[dict] = None,
-    provider: Optional[str] = None
+    config: Optional[dict] = None
 ) -> JSONResponse:
    """Handle LLM extraction requests."""
    base_url = get_base_url(request)
@@ -278,8 +259,7 @@ async def handle_llm_request(
            schema,
            cache,
            base_url,
-            config,
-            provider
+            config
        )

    except Exception as e:
@@ -323,8 +303,7 @@ async def create_new_task(
    schema: Optional[str],
    cache: str,
    base_url: str,
-    config: dict,
-    provider: Optional[str] = None
+    config: dict
 ) -> JSONResponse:
    """Create and initialize a new task."""
    decoded_url = unquote(input_path)
@@ -348,8 +327,7 @@ async def create_new_task(
        decoded_url,
        query,
        schema,
-        cache,
-        provider
+        cache
    )

    return JSONResponse({
@@ -393,9 +371,6 @@ async def stream_results(crawler: AsyncWebCrawler, results_gen: AsyncGenerator)
                server_memory_mb = _get_memory_mb()
                result_dict = result.model_dump()
                result_dict['server_memory_mb'] = server_memory_mb
-                # If PDF exists, encode it to base64
-                if result_dict.get('pdf') is not None:
-                    result_dict['pdf'] = b64encode(result_dict['pdf']).decode('utf-8')
                logger.info(f"Streaming result for {result_dict.get('url', 'unknown')}")
                data = json.dumps(result_dict, default=datetime_handler) + "\n"
                yield data.encode('utf-8')
@@ -419,15 +394,13 @@ async def handle_crawl_request(
    urls: List[str],
    browser_config: dict,
    crawler_config: dict,
-    config: dict,
-    hooks_config: Optional[dict] = None
+    config: dict
 ) -> dict:
-    """Handle non-streaming crawl requests with optional hooks."""
+    """Handle non-streaming crawl requests."""
    start_mem_mb = _get_memory_mb() # <--- Get memory before
    start_time = time.time()
    mem_delta_mb = None
    peak_mem_mb = start_mem_mb
-    hook_manager = None
    
    try:
        urls = [('https://' + url) if not url.startswith(('http://', 'https://')) else url for url in urls]
@@ -447,19 +420,6 @@ async def handle_crawl_request(
        # crawler: AsyncWebCrawler = AsyncWebCrawler(config=browser_config)
        # await crawler.start()
        
-        # Attach hooks if provided
-        hooks_status = {}
-        if hooks_config:
-            from hook_manager import attach_user_hooks_to_crawler, UserHookManager
-            hook_manager = UserHookManager(timeout=hooks_config.get('timeout', 30))
-            hooks_status, hook_manager = await attach_user_hooks_to_crawler(
-                crawler,
-                hooks_config.get('code', {}),
-                timeout=hooks_config.get('timeout', 30),
-                hook_manager=hook_manager
-            )
-            logger.info(f"Hooks attachment status: {hooks_status['status']}")
-        
        base_config = config["crawler"]["base_config"]
        # Iterate on key-value pairs in global_config then use haseattr to set them 
        for key, value in base_config.items():
@@ -473,10 +433,6 @@ async def handle_crawl_request(
                                config=crawler_config, 
                                dispatcher=dispatcher)
        results = await partial_func()
-        
-        # Ensure results is always a list
-        if not isinstance(results, list):
-            results = [results]

        # await crawler.close()
        
@@ -487,72 +443,14 @@ async def handle_crawl_request(
            mem_delta_mb = end_mem_mb - start_mem_mb # <--- Calculate delta
            peak_mem_mb = max(peak_mem_mb if peak_mem_mb else 0, end_mem_mb) # <--- Get peak memory
        logger.info(f"Memory usage: Start: {start_mem_mb} MB, End: {end_mem_mb} MB, Delta: {mem_delta_mb} MB, Peak: {peak_mem_mb} MB")
-
-        # Process results to handle PDF bytes
-        processed_results = []
-        for result in results:
-            try:
-                # Check if result has model_dump method (is a proper CrawlResult)
-                if hasattr(result, 'model_dump'):
-                    result_dict = result.model_dump()
-                elif isinstance(result, dict):
-                    result_dict = result
-                else:
-                    # Handle unexpected result type
-                    logger.warning(f"Unexpected result type: {type(result)}")
-                    result_dict = {
-                        "url": str(result) if hasattr(result, '__str__') else "unknown",
-                        "success": False,
-                        "error_message": f"Unexpected result type: {type(result).__name__}"
-                    }
-                
-                # If PDF exists, encode it to base64
-                if result_dict.get('pdf') is not None and isinstance(result_dict.get('pdf'), bytes):
-                    result_dict['pdf'] = b64encode(result_dict['pdf']).decode('utf-8')
-                    
-                processed_results.append(result_dict)
-            except Exception as e:
-                logger.error(f"Error processing result: {e}")
-                processed_results.append({
-                    "url": "unknown",
-                    "success": False,
-                    "error_message": str(e)
-                })
-            
-        response = {
+                              
+        return {
            "success": True,
-            "results": processed_results,
+            "results": [result.model_dump() for result in results],
            "server_processing_time_s": end_time - start_time,
            "server_memory_delta_mb": mem_delta_mb,
            "server_peak_memory_mb": peak_mem_mb
        }
-        
-        # Add hooks information if hooks were used
-        if hooks_config and hook_manager:
-            from hook_manager import UserHookManager
-            if isinstance(hook_manager, UserHookManager):
-                try:
-                    # Ensure all hook data is JSON serializable
-                    import json
-                    hook_data = {
-                        "status": hooks_status,
-                        "execution_log": hook_manager.execution_log,
-                        "errors": hook_manager.errors,
-                        "summary": hook_manager.get_summary()
-                    }
-                    # Test that it's serializable
-                    json.dumps(hook_data)
-                    response["hooks"] = hook_data
-                except (TypeError, ValueError) as e:
-                    logger.error(f"Hook data not JSON serializable: {e}")
-                    response["hooks"] = {
-                        "status": {"status": "error", "message": "Hook data serialization failed"},
-                        "execution_log": [],
-                        "errors": [{"error": str(e)}],
-                        "summary": {}
-                    }
-        
-        return response

    except Exception as e:
        logger.error(f"Crawl error: {str(e)}", exc_info=True)
@@ -581,11 +479,9 @@ async def handle_stream_crawl_request(
    urls: List[str],
    browser_config: dict,
    crawler_config: dict,
-    config: dict,
-    hooks_config: Optional[dict] = None
-) -> Tuple[AsyncWebCrawler, AsyncGenerator, Optional[Dict]]:
-    """Handle streaming crawl requests with optional hooks."""
-    hooks_info = None
+    config: dict
+) -> Tuple[AsyncWebCrawler, AsyncGenerator]:
+    """Handle streaming crawl requests."""
    try:
        browser_config = BrowserConfig.load(browser_config)
        # browser_config.verbose = True # Set to False or remove for production stress testing
@@ -606,20 +502,6 @@ async def handle_stream_crawl_request(

        # crawler = AsyncWebCrawler(config=browser_config)
        # await crawler.start()
-        
-        # Attach hooks if provided
-        if hooks_config:
-            from hook_manager import attach_user_hooks_to_crawler, UserHookManager
-            hook_manager = UserHookManager(timeout=hooks_config.get('timeout', 30))
-            hooks_status, hook_manager = await attach_user_hooks_to_crawler(
-                crawler,
-                hooks_config.get('code', {}),
-                timeout=hooks_config.get('timeout', 30),
-                hook_manager=hook_manager
-            )
-            logger.info(f"Hooks attachment status for streaming: {hooks_status['status']}")
-            # Include hook manager in hooks_info for proper tracking
-            hooks_info = {'status': hooks_status, 'manager': hook_manager}

        results_gen = await crawler.arun_many(
            urls=urls,
@@ -627,7 +509,7 @@ async def handle_stream_crawl_request(
            dispatcher=dispatcher
        )

-        return crawler, results_gen, hooks_info
+        return crawler, results_gen

    except Exception as e:
        # Make sure to close crawler if started during an error here
--- a/deploy/docker/hook_manager.py
+++ b/deploy/docker/hook_manager.py
@@ -1,512 +0,0 @@
-"""
-Hook Manager for User-Provided Hook Functions
-Handles validation, compilation, and safe execution of user-provided hook code
-"""
-
-import ast
-import asyncio
-import traceback
-from typing import Dict, Callable, Optional, Tuple, List, Any
-import logging
-
-logger = logging.getLogger(__name__)
-
-
-class UserHookManager:
-    """Manages user-provided hook functions with error isolation"""
-    
-    # Expected signatures for each hook point
-    HOOK_SIGNATURES = {
-        "on_browser_created": ["browser"],
-        "on_page_context_created": ["page", "context"],
-        "before_goto": ["page", "context", "url"],
-        "after_goto": ["page", "context", "url", "response"],
-        "on_user_agent_updated": ["page", "context", "user_agent"],
-        "on_execution_started": ["page", "context"],
-        "before_retrieve_html": ["page", "context"],
-        "before_return_html": ["page", "context", "html"]
-    }
-    
-    # Default timeout for hook execution (in seconds)
-    DEFAULT_TIMEOUT = 30
-    
-    def __init__(self, timeout: int = DEFAULT_TIMEOUT):
-        self.timeout = timeout
-        self.errors: List[Dict[str, Any]] = []
-        self.compiled_hooks: Dict[str, Callable] = {}
-        self.execution_log: List[Dict[str, Any]] = []
-    
-    def validate_hook_structure(self, hook_code: str, hook_point: str) -> Tuple[bool, str]:
-        """
-        Validate the structure of user-provided hook code
-        
-        Args:
-            hook_code: The Python code string containing the hook function
-            hook_point: The hook point name (e.g., 'on_page_context_created')
-            
-        Returns:
-            Tuple of (is_valid, error_message)
-        """
-        try:
-            # Parse the code
-            tree = ast.parse(hook_code)
-            
-            # Check if it's empty
-            if not tree.body:
-                return False, "Hook code is empty"
-            
-            # Find the function definition
-            func_def = None
-            for node in tree.body:
-                if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
-                    func_def = node
-                    break
-            
-            if not func_def:
-                return False, "Hook must contain a function definition (def or async def)"
-            
-            # Check if it's async (all hooks should be async)
-            if not isinstance(func_def, ast.AsyncFunctionDef):
-                return False, f"Hook function must be async (use 'async def' instead of 'def')"
-            
-            # Get function name for better error messages
-            func_name = func_def.name
-            
-            # Validate parameters
-            expected_params = self.HOOK_SIGNATURES.get(hook_point, [])
-            if not expected_params:
-                return False, f"Unknown hook point: {hook_point}"
-            
-            func_params = [arg.arg for arg in func_def.args.args]
-            
-            # Check if it has **kwargs for flexibility
-            has_kwargs = func_def.args.kwarg is not None
-            
-            # Must have at least the expected parameters
-            missing_params = []
-            for expected in expected_params:
-                if expected not in func_params:
-                    missing_params.append(expected)
-            
-            if missing_params and not has_kwargs:
-                return False, f"Hook function '{func_name}' must accept parameters: {', '.join(expected_params)} (missing: {', '.join(missing_params)})"
-            
-            # Check if it returns something (should return page or browser)
-            has_return = any(isinstance(node, ast.Return) for node in ast.walk(func_def))
-            if not has_return:
-                # Warning, not error - we'll handle this
-                logger.warning(f"Hook function '{func_name}' should return the {expected_params[0]} object")
-            
-            return True, "Valid"
-            
-        except SyntaxError as e:
-            return False, f"Syntax error at line {e.lineno}: {str(e)}"
-        except Exception as e:
-            return False, f"Failed to parse hook code: {str(e)}"
-    
-    def compile_hook(self, hook_code: str, hook_point: str) -> Optional[Callable]:
-        """
-        Compile user-provided hook code into a callable function
-        
-        Args:
-            hook_code: The Python code string
-            hook_point: The hook point name
-            
-        Returns:
-            Compiled function or None if compilation failed
-        """
-        try:
-            # Create a safe namespace for the hook
-            # Use a more complete builtins that includes __import__
-            import builtins
-            safe_builtins = {}
-            
-            # Add safe built-in functions
-            allowed_builtins = [
-                'print', 'len', 'str', 'int', 'float', 'bool',
-                'list', 'dict', 'set', 'tuple', 'range', 'enumerate',
-                'zip', 'map', 'filter', 'any', 'all', 'sum', 'min', 'max',
-                'sorted', 'reversed', 'abs', 'round', 'isinstance', 'type',
-                'getattr', 'hasattr', 'setattr', 'callable', 'iter', 'next',
-                '__import__', '__build_class__'  # Required for exec
-            ]
-            
-            for name in allowed_builtins:
-                if hasattr(builtins, name):
-                    safe_builtins[name] = getattr(builtins, name)
-            
-            namespace = {
-                '__name__': f'user_hook_{hook_point}',
-                '__builtins__': safe_builtins
-            }
-            
-            # Add commonly needed imports
-            exec("import asyncio", namespace)
-            exec("import json", namespace)
-            exec("import re", namespace)
-            exec("from typing import Dict, List, Optional", namespace)
-            
-            # Execute the code to define the function
-            exec(hook_code, namespace)
-            
-            # Find the async function in the namespace
-            for name, obj in namespace.items():
-                if callable(obj) and not name.startswith('_') and asyncio.iscoroutinefunction(obj):
-                    return obj
-            
-            # If no async function found, look for any function
-            for name, obj in namespace.items():
-                if callable(obj) and not name.startswith('_'):
-                    logger.warning(f"Found non-async function '{name}' - wrapping it")
-                    # Wrap sync function in async
-                    async def async_wrapper(*args, **kwargs):
-                        return obj(*args, **kwargs)
-                    return async_wrapper
-            
-            raise ValueError("No callable function found in hook code")
-            
-        except Exception as e:
-            error = {
-                'hook_point': hook_point,
-                'error': f"Failed to compile hook: {str(e)}",
-                'type': 'compilation_error',
-                'traceback': traceback.format_exc()
-            }
-            self.errors.append(error)
-            logger.error(f"Hook compilation failed for {hook_point}: {str(e)}")
-            return None
-    
-    async def execute_hook_safely(
-        self, 
-        hook_func: Callable, 
-        hook_point: str,
-        *args, 
-        **kwargs
-    ) -> Tuple[Any, Optional[Dict]]:
-        """
-        Execute a user hook with error isolation and timeout
-        
-        Args:
-            hook_func: The compiled hook function
-            hook_point: The hook point name
-            *args, **kwargs: Arguments to pass to the hook
-            
-        Returns:
-            Tuple of (result, error_dict)
-        """
-        start_time = asyncio.get_event_loop().time()
-        
-        try:
-            # Add timeout to prevent infinite loops
-            result = await asyncio.wait_for(
-                hook_func(*args, **kwargs),
-                timeout=self.timeout
-            )
-            
-            # Log successful execution
-            execution_time = asyncio.get_event_loop().time() - start_time
-            self.execution_log.append({
-                'hook_point': hook_point,
-                'status': 'success',
-                'execution_time': execution_time,
-                'timestamp': start_time
-            })
-            
-            return result, None
-            
-        except asyncio.TimeoutError:
-            error = {
-                'hook_point': hook_point,
-                'error': f'Hook execution timed out ({self.timeout}s limit)',
-                'type': 'timeout',
-                'execution_time': self.timeout
-            }
-            self.errors.append(error)
-            self.execution_log.append({
-                'hook_point': hook_point,
-                'status': 'timeout',
-                'error': error['error'],
-                'execution_time': self.timeout,
-                'timestamp': start_time
-            })
-            # Return the first argument (usually page/browser) to continue
-            return args[0] if args else None, error
-            
-        except Exception as e:
-            execution_time = asyncio.get_event_loop().time() - start_time
-            error = {
-                'hook_point': hook_point,
-                'error': str(e),
-                'type': type(e).__name__,
-                'traceback': traceback.format_exc(),
-                'execution_time': execution_time
-            }
-            self.errors.append(error)
-            self.execution_log.append({
-                'hook_point': hook_point,
-                'status': 'failed',
-                'error': str(e),
-                'error_type': type(e).__name__,
-                'execution_time': execution_time,
-                'timestamp': start_time
-            })
-            # Return the first argument (usually page/browser) to continue
-            return args[0] if args else None, error
-    
-    def get_summary(self) -> Dict[str, Any]:
-        """Get a summary of hook execution"""
-        total_hooks = len(self.execution_log)
-        successful = sum(1 for log in self.execution_log if log['status'] == 'success')
-        failed = sum(1 for log in self.execution_log if log['status'] == 'failed')
-        timed_out = sum(1 for log in self.execution_log if log['status'] == 'timeout')
-        
-        return {
-            'total_executions': total_hooks,
-            'successful': successful,
-            'failed': failed,
-            'timed_out': timed_out,
-            'success_rate': (successful / total_hooks * 100) if total_hooks > 0 else 0,
-            'total_errors': len(self.errors)
-        }
-
-
-class IsolatedHookWrapper:
-    """Wraps user hooks with error isolation and reporting"""
-    
-    def __init__(self, hook_manager: UserHookManager):
-        self.hook_manager = hook_manager
-    
-    def create_hook_wrapper(self, user_hook: Callable, hook_point: str) -> Callable:
-        """
-        Create a wrapper that isolates hook errors from main process
-        
-        Args:
-            user_hook: The compiled user hook function
-            hook_point: The hook point name
-            
-        Returns:
-            Wrapped async function that handles errors gracefully
-        """
-        
-        async def wrapped_hook(*args, **kwargs):
-            """Wrapped hook with error isolation"""
-            # Get the main return object (page/browser)
-            # This ensures we always have something to return
-            return_obj = None
-            if args:
-                return_obj = args[0]
-            elif 'page' in kwargs:
-                return_obj = kwargs['page']
-            elif 'browser' in kwargs:
-                return_obj = kwargs['browser']
-            
-            try:
-                # Execute user hook with safety
-                result, error = await self.hook_manager.execute_hook_safely(
-                    user_hook, 
-                    hook_point,
-                    *args, 
-                    **kwargs
-                )
-                
-                if error:
-                    # Hook failed but we continue with original object
-                    logger.warning(f"User hook failed at {hook_point}: {error['error']}")
-                    return return_obj
-                
-                # Hook succeeded - return its result or the original object
-                if result is None:
-                    logger.debug(f"Hook at {hook_point} returned None, using original object")
-                    return return_obj
-                
-                return result
-                
-            except Exception as e:
-                # This should rarely happen due to execute_hook_safely
-                logger.error(f"Unexpected error in hook wrapper for {hook_point}: {e}")
-                return return_obj
-        
-        # Set function name for debugging
-        wrapped_hook.__name__ = f"wrapped_{hook_point}"
-        return wrapped_hook
-
-
-async def process_user_hooks(
-    hooks_input: Dict[str, str],
-    timeout: int = 30
-) -> Tuple[Dict[str, Callable], List[Dict], UserHookManager]:
-    """
-    Process and compile user-provided hook functions
-    
-    Args:
-        hooks_input: Dictionary mapping hook points to code strings
-        timeout: Timeout for each hook execution
-        
-    Returns:
-        Tuple of (compiled_hooks, validation_errors, hook_manager)
-    """
-    
-    hook_manager = UserHookManager(timeout=timeout)
-    wrapper = IsolatedHookWrapper(hook_manager)
-    compiled_hooks = {}
-    validation_errors = []
-    
-    for hook_point, hook_code in hooks_input.items():
-        # Skip empty hooks
-        if not hook_code or not hook_code.strip():
-            continue
-        
-        # Validate hook point
-        if hook_point not in UserHookManager.HOOK_SIGNATURES:
-            validation_errors.append({
-                'hook_point': hook_point,
-                'error': f'Unknown hook point. Valid points: {", ".join(UserHookManager.HOOK_SIGNATURES.keys())}',
-                'code_preview': hook_code[:100] + '...' if len(hook_code) > 100 else hook_code
-            })
-            continue
-        
-        # Validate structure
-        is_valid, message = hook_manager.validate_hook_structure(hook_code, hook_point)
-        if not is_valid:
-            validation_errors.append({
-                'hook_point': hook_point,
-                'error': message,
-                'code_preview': hook_code[:100] + '...' if len(hook_code) > 100 else hook_code
-            })
-            continue
-        
-        # Compile the hook
-        hook_func = hook_manager.compile_hook(hook_code, hook_point)
-        if hook_func:
-            # Wrap with error isolation
-            wrapped_hook = wrapper.create_hook_wrapper(hook_func, hook_point)
-            compiled_hooks[hook_point] = wrapped_hook
-            logger.info(f"Successfully compiled hook for {hook_point}")
-        else:
-            validation_errors.append({
-                'hook_point': hook_point,
-                'error': 'Failed to compile hook function - check syntax and structure',
-                'code_preview': hook_code[:100] + '...' if len(hook_code) > 100 else hook_code
-            })
-    
-    return compiled_hooks, validation_errors, hook_manager
-
-
-async def process_user_hooks_with_manager(
-    hooks_input: Dict[str, str],
-    hook_manager: UserHookManager
-) -> Tuple[Dict[str, Callable], List[Dict]]:
-    """
-    Process and compile user-provided hook functions with existing manager
-    
-    Args:
-        hooks_input: Dictionary mapping hook points to code strings
-        hook_manager: Existing UserHookManager instance
-        
-    Returns:
-        Tuple of (compiled_hooks, validation_errors)
-    """
-    
-    wrapper = IsolatedHookWrapper(hook_manager)
-    compiled_hooks = {}
-    validation_errors = []
-    
-    for hook_point, hook_code in hooks_input.items():
-        # Skip empty hooks
-        if not hook_code or not hook_code.strip():
-            continue
-        
-        # Validate hook point
-        if hook_point not in UserHookManager.HOOK_SIGNATURES:
-            validation_errors.append({
-                'hook_point': hook_point,
-                'error': f'Unknown hook point. Valid points: {", ".join(UserHookManager.HOOK_SIGNATURES.keys())}',
-                'code_preview': hook_code[:100] + '...' if len(hook_code) > 100 else hook_code
-            })
-            continue
-        
-        # Validate structure
-        is_valid, message = hook_manager.validate_hook_structure(hook_code, hook_point)
-        if not is_valid:
-            validation_errors.append({
-                'hook_point': hook_point,
-                'error': message,
-                'code_preview': hook_code[:100] + '...' if len(hook_code) > 100 else hook_code
-            })
-            continue
-        
-        # Compile the hook
-        hook_func = hook_manager.compile_hook(hook_code, hook_point)
-        if hook_func:
-            # Wrap with error isolation
-            wrapped_hook = wrapper.create_hook_wrapper(hook_func, hook_point)
-            compiled_hooks[hook_point] = wrapped_hook
-            logger.info(f"Successfully compiled hook for {hook_point}")
-        else:
-            validation_errors.append({
-                'hook_point': hook_point,
-                'error': 'Failed to compile hook function - check syntax and structure',
-                'code_preview': hook_code[:100] + '...' if len(hook_code) > 100 else hook_code
-            })
-    
-    return compiled_hooks, validation_errors
-
-
-async def attach_user_hooks_to_crawler(
-    crawler,  # AsyncWebCrawler instance
-    user_hooks: Dict[str, str],
-    timeout: int = 30,
-    hook_manager: Optional[UserHookManager] = None
-) -> Tuple[Dict[str, Any], UserHookManager]:
-    """
-    Attach user-provided hooks to crawler with full error reporting
-    
-    Args:
-        crawler: AsyncWebCrawler instance
-        user_hooks: Dictionary mapping hook points to code strings
-        timeout: Timeout for each hook execution
-        hook_manager: Optional existing UserHookManager instance
-        
-    Returns:
-        Tuple of (status_dict, hook_manager)
-    """
-    
-    # Use provided hook_manager or create a new one
-    if hook_manager is None:
-        hook_manager = UserHookManager(timeout=timeout)
-    
-    # Process hooks with the hook_manager
-    compiled_hooks, validation_errors = await process_user_hooks_with_manager(
-        user_hooks, hook_manager
-    )
-    
-    # Log validation errors
-    if validation_errors:
-        logger.warning(f"Hook validation errors: {validation_errors}")
-    
-    # Attach successfully compiled hooks
-    attached_hooks = []
-    for hook_point, wrapped_hook in compiled_hooks.items():
-        try:
-            crawler.crawler_strategy.set_hook(hook_point, wrapped_hook)
-            attached_hooks.append(hook_point)
-            logger.info(f"Attached hook to {hook_point}")
-        except Exception as e:
-            logger.error(f"Failed to attach hook to {hook_point}: {e}")
-            validation_errors.append({
-                'hook_point': hook_point,
-                'error': f'Failed to attach hook: {str(e)}'
-            })
-    
-    status = 'success' if not validation_errors else ('partial' if attached_hooks else 'failed')
-    
-    status_dict = {
-        'status': status,
-        'attached_hooks': attached_hooks,
-        'validation_errors': validation_errors,
-        'total_hooks_provided': len(user_hooks),
-        'successfully_attached': len(attached_hooks),
-        'failed_validation': len(validation_errors)
-    }
-    
-    return status_dict, hook_manager
--- a/deploy/docker/job.py
+++ b/deploy/docker/job.py
@@ -36,7 +36,6 @@ class LlmJobPayload(BaseModel):
    q:      str
    schema: Optional[str] = None
    cache:  bool = False
-    provider: Optional[str] = None


 class CrawlJobPayload(BaseModel):
@@ -62,7 +61,6 @@ async def llm_job_enqueue(
        schema=payload.schema,
        cache=payload.cache,
        config=_config,
-        provider=payload.provider,
    )


--- a/deploy/docker/schemas.py
+++ b/deploy/docker/schemas.py
@@ -9,57 +9,12 @@ class CrawlRequest(BaseModel):
    browser_config: Optional[Dict] = Field(default_factory=dict)
    crawler_config: Optional[Dict] = Field(default_factory=dict)

-
-class HookConfig(BaseModel):
-    """Configuration for user-provided hooks"""
-    code: Dict[str, str] = Field(
-        default_factory=dict,
-        description="Map of hook points to Python code strings"
-    )
-    timeout: int = Field(
-        default=30,
-        ge=1,
-        le=120,
-        description="Timeout in seconds for each hook execution"
-    )
-    
-    class Config:
-        schema_extra = {
-            "example": {
-                "code": {
-                    "on_page_context_created": """
-async def hook(page, context, **kwargs):
-    # Block images to speed up crawling
-    await context.route("**/*.{png,jpg,jpeg,gif}", lambda route: route.abort())
-    return page
-""",
-                    "before_retrieve_html": """
-async def hook(page, context, **kwargs):
-    # Scroll to load lazy content
-    await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
-    await page.wait_for_timeout(2000)
-    return page
-"""
-                },
-                "timeout": 30
-            }
-        }
-
-
-class CrawlRequestWithHooks(CrawlRequest):
-    """Extended crawl request with hooks support"""
-    hooks: Optional[HookConfig] = Field(
-        default=None,
-        description="Optional user-provided hook functions"
-    )
-
 class MarkdownRequest(BaseModel):
    """Request body for the /md endpoint."""
    url: str                    = Field(...,  description="Absolute http/https URL to fetch")
    f:   FilterType             = Field(FilterType.FIT, description="Content‑filter strategy: fit, raw, bm25, or llm")
    q:   Optional[str] = Field(None,  description="Query string used by BM25/LLM filters")
    c:   Optional[str] = Field("0",   description="Cache‑bust / revision counter")
-    provider: Optional[str] = Field(None, description="LLM provider override (e.g., 'anthropic/claude-3-opus')")


 class RawCode(BaseModel):
--- a/deploy/docker/server.py
+++ b/deploy/docker/server.py
@@ -23,7 +23,7 @@ from api import (
    stream_results
 )
 from schemas import (
-    CrawlRequestWithHooks,
+    CrawlRequest,
    MarkdownRequest,
    RawCode,
    HTMLRequest,
@@ -241,7 +241,7 @@ async def get_markdown(
        raise HTTPException(
            400, "URL must be absolute and start with http/https")
    markdown = await handle_markdown_request(
-        body.url, body.f, body.q, body.c, config, body.provider
+        body.url, body.f, body.q, body.c, config
    )
    return JSONResponse({
        "url": body.url,
@@ -414,72 +414,6 @@ async def get_schema():
            "crawler": CrawlerRunConfig().dump()}


-@app.get("/hooks/info")
-async def get_hooks_info():
-    """Get information about available hook points and their signatures"""
-    from hook_manager import UserHookManager
-    
-    hook_info = {}
-    for hook_point, params in UserHookManager.HOOK_SIGNATURES.items():
-        hook_info[hook_point] = {
-            "parameters": params,
-            "description": get_hook_description(hook_point),
-            "example": get_hook_example(hook_point)
-        }
-    
-    return JSONResponse({
-        "available_hooks": hook_info,
-        "timeout_limits": {
-            "min": 1,
-            "max": 120,
-            "default": 30
-        }
-    })
-
-
-def get_hook_description(hook_point: str) -> str:
-    """Get description for each hook point"""
-    descriptions = {
-        "on_browser_created": "Called after browser instance is created",
-        "on_page_context_created": "Called after page and context are created - ideal for authentication",
-        "before_goto": "Called before navigating to the target URL",
-        "after_goto": "Called after navigation is complete",
-        "on_user_agent_updated": "Called when user agent is updated",
-        "on_execution_started": "Called when custom JavaScript execution begins",
-        "before_retrieve_html": "Called before retrieving the final HTML - ideal for scrolling",
-        "before_return_html": "Called just before returning the HTML content"
-    }
-    return descriptions.get(hook_point, "")
-
-
-def get_hook_example(hook_point: str) -> str:
-    """Get example code for each hook point"""
-    examples = {
-        "on_page_context_created": """async def hook(page, context, **kwargs):
-    # Add authentication cookie
-    await context.add_cookies([{
-        'name': 'session',
-        'value': 'my-session-id',
-        'domain': '.example.com'
-    }])
-    return page""",
-        
-        "before_retrieve_html": """async def hook(page, context, **kwargs):
-    # Scroll to load lazy content
-    await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
-    await page.wait_for_timeout(2000)
-    return page""",
-        
-        "before_goto": """async def hook(page, context, url, **kwargs):
-    # Set custom headers
-    await page.set_extra_http_headers({
-        'X-Custom-Header': 'value'
-    })
-    return page"""
-    }
-    return examples.get(hook_point, "# Implement your hook logic here\nreturn page")
-
-
@app.get(config["observability"]["health_check"]["endpoint"])
 async def health():
    return {"status": "ok", "timestamp": time.time(), "version": __version__}
@@ -495,30 +429,19 @@ async def metrics():
@mcp_tool("crawl")
 async def crawl(
    request: Request,
-    crawl_request: CrawlRequestWithHooks,
+    crawl_request: CrawlRequest,
    _td: Dict = Depends(token_dep),
 ):
    """
    Crawl a list of URLs and return the results as JSON.
-    Supports optional user-provided hook functions for customization.
    """
    if not crawl_request.urls:
        raise HTTPException(400, "At least one URL required")
-    
-    # Prepare hooks config if provided
-    hooks_config = None
-    if crawl_request.hooks:
-        hooks_config = {
-            'code': crawl_request.hooks.code,
-            'timeout': crawl_request.hooks.timeout
-        }
-    
    res = await handle_crawl_request(
        urls=crawl_request.urls,
        browser_config=crawl_request.browser_config,
        crawler_config=crawl_request.crawler_config,
        config=config,
-        hooks_config=hooks_config
    )
    return JSONResponse(res)

@@ -527,42 +450,25 @@ async def crawl(
@limiter.limit(config["rate_limiting"]["default_limit"])
 async def crawl_stream(
    request: Request,
-    crawl_request: CrawlRequestWithHooks,
+    crawl_request: CrawlRequest,
    _td: Dict = Depends(token_dep),
 ):
    if not crawl_request.urls:
        raise HTTPException(400, "At least one URL required")
-    
-    # Prepare hooks config if provided
-    hooks_config = None
-    if crawl_request.hooks:
-        hooks_config = {
-            'code': crawl_request.hooks.code,
-            'timeout': crawl_request.hooks.timeout
-        }
-    
-    crawler, gen, hooks_info = await handle_stream_crawl_request(
+    crawler, gen = await handle_stream_crawl_request(
        urls=crawl_request.urls,
        browser_config=crawl_request.browser_config,
        crawler_config=crawl_request.crawler_config,
        config=config,
-        hooks_config=hooks_config
    )
-    
-    # Add hooks info to response headers if available
-    headers = {
-        "Cache-Control": "no-cache",
-        "Connection": "keep-alive",
-        "X-Stream-Status": "active",
-    }
-    if hooks_info:
-        import json
-        headers["X-Hooks-Status"] = json.dumps(hooks_info['status']['status'])
-    
    return StreamingResponse(
        stream_results(crawler, gen),
        media_type="application/x-ndjson",
-        headers=headers,
+        headers={
+            "Cache-Control": "no-cache",
+            "Connection": "keep-alive",
+            "X-Stream-Status": "active",
+        },
    )


--- a/deploy/docker/utils.py
+++ b/deploy/docker/utils.py
@@ -1,7 +1,6 @@
 import dns.resolver
 import logging
 import yaml
-import os
 from datetime import datetime
 from enum import Enum
 from pathlib import Path
@@ -20,24 +19,10 @@ class FilterType(str, Enum):
    LLM = "llm"

 def load_config() -> Dict:
-    """Load and return application configuration with environment variable overrides."""
+    """Load and return application configuration."""
    config_path = Path(__file__).parent / "config.yml"
    with open(config_path, "r") as config_file:
-        config = yaml.safe_load(config_file)
-    
-    # Override LLM provider from environment if set
-    llm_provider = os.environ.get("LLM_PROVIDER")
-    if llm_provider:
-        config["llm"]["provider"] = llm_provider
-        logging.info(f"LLM provider overridden from environment: {llm_provider}")
-    
-    # Also support direct API key from environment if the provider-specific key isn't set
-    llm_api_key = os.environ.get("LLM_API_KEY")
-    if llm_api_key and "api_key" not in config["llm"]:
-        config["llm"]["api_key"] = llm_api_key
-        logging.info("LLM API key loaded from LLM_API_KEY environment variable")
-    
-    return config
+        return yaml.safe_load(config_file)

 def setup_logging(config: Dict) -> None:
    """Configure application logging."""
@@ -71,52 +56,6 @@ def decode_redis_hash(hash_data: Dict[bytes, bytes]) -> Dict[str, str]:



-def get_llm_api_key(config: Dict, provider: Optional[str] = None) -> str:
-    """Get the appropriate API key based on the LLM provider.
-    
-    Args:
-        config: The application configuration dictionary
-        provider: Optional provider override (e.g., "openai/gpt-4")
-    
-    Returns:
-        The API key for the provider, or empty string if not found
-    """
-        
-    # Use provided provider or fall back to config
-    if not provider:
-        provider = config["llm"]["provider"]
-    
-    # Check if direct API key is configured
-    if "api_key" in config["llm"]:
-        return config["llm"]["api_key"]
-    
-    # Fall back to the configured api_key_env if no match
-    return os.environ.get(config["llm"].get("api_key_env", ""), "")
-
-
-def validate_llm_provider(config: Dict, provider: Optional[str] = None) -> tuple[bool, str]:
-    """Validate that the LLM provider has an associated API key.
-    
-    Args:
-        config: The application configuration dictionary
-        provider: Optional provider override (e.g., "openai/gpt-4")
-    
-    Returns:
-        Tuple of (is_valid, error_message)
-    """
-    # Use provided provider or fall back to config
-    if not provider:
-        provider = config["llm"]["provider"]
-    
-    # Get the API key for this provider
-    api_key = get_llm_api_key(config, provider)
-    
-    if not api_key:
-        return False, f"No API key found for provider '{provider}'. Please set the appropriate environment variable."
-    
-    return True, ""
-
-
 def verify_email_domain(email: str) -> bool:
    try:
        domain = email.split('@')[1]
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -14,7 +14,6 @@ x-base-config: &base-config
    - TOGETHER_API_KEY=${TOGETHER_API_KEY:-}
    - MISTRAL_API_KEY=${MISTRAL_API_KEY:-}
    - GEMINI_API_TOKEN=${GEMINI_API_TOKEN:-}
-    - LLM_PROVIDER=${LLM_PROVIDER:-}  # Optional: Override default provider (e.g., "anthropic/claude-3-opus")
  volumes:
    - /dev/shm:/dev/shm  # Chromium performance
  deploy:
--- a/docs/examples/adaptive_crawling/custom_strategies.py
+++ b/docs/examples/adaptive_crawling/custom_strategies.py
@@ -9,7 +9,7 @@ import asyncio
 import re
 from typing import List, Dict, Set
 from crawl4ai import AsyncWebCrawler, AdaptiveCrawler, AdaptiveConfig
-from crawl4ai.adaptive_crawler import CrawlState, Link
+from crawl4ai.adaptive_crawler import AdaptiveCrawlResult, Link
 import math


@@ -45,7 +45,7 @@ class APIDocumentationStrategy:
            r'/legal/'
        ]
    
-    def score_link(self, link: Link, query: str, state: CrawlState) -> float:
+    def score_link(self, link: Link, query: str, state: AdaptiveCrawlResult) -> float:
        """Custom link scoring for API documentation"""
        score = 1.0
        url = link.href.lower()
@@ -77,7 +77,7 @@ class APIDocumentationStrategy:
        
        return score
    
-    def calculate_api_coverage(self, state: CrawlState, query: str) -> Dict[str, float]:
+    def calculate_api_coverage(self, state: AdaptiveCrawlResult, query: str) -> Dict[str, float]:
        """Calculate specialized coverage metrics for API documentation"""
        metrics = {
            'endpoint_coverage': 0.0,
--- a/docs/examples/c4a_script/api_usage_examples.py
+++ b/docs/examples/c4a_script/api_usage_examples.py
@@ -3,8 +3,8 @@ C4A-Script API Usage Examples
 Shows how to use the new Result-based API in various scenarios
 """

-from crawl4ai.script.c4a_compile import compile, validate, compile_file
-from crawl4ai.script.c4a_result import CompilationResult, ValidationResult
+from c4a_compile import compile, validate, compile_file
+from c4a_result import CompilationResult, ValidationResult
 import json


--- a/docs/examples/c4a_script/c4a_script_hello_world.py
+++ b/docs/examples/c4a_script/c4a_script_hello_world.py
@@ -3,7 +3,7 @@ C4A-Script Hello World
 A concise example showing how to use the C4A-Script compiler
 """

-from crawl4ai.script.c4a_compile import compile
+from c4a_compile import compile

 # Define your C4A-Script
 script = """
--- a/docs/examples/c4a_script/c4a_script_hello_world_error.py
+++ b/docs/examples/c4a_script/c4a_script_hello_world_error.py
@@ -3,7 +3,7 @@ C4A-Script Hello World - Error Example
 Shows how error handling works
 """

-from crawl4ai.script.c4a_compile import compile
+from c4a_compile import compile

 # Define a script with an error (missing THEN)
 script = """
--- a/docs/examples/demo_multi_config_clean.py
+++ b/docs/examples/demo_multi_config_clean.py
@@ -1,303 +0,0 @@
-"""
-🎯 Multi-Config URL Matching Demo
-=================================
-Learn how to use different crawler configurations for different URL patterns
-in a single crawl batch with Crawl4AI's multi-config feature.
-
-Part 1: Understanding URL Matching (Pattern Testing)
-Part 2: Practical Example with Real Crawling
-"""
-
-import asyncio
-from crawl4ai import (
-    AsyncWebCrawler, 
-    CrawlerRunConfig,
-    MatchMode
-)
-from crawl4ai.processors.pdf import PDFContentScrapingStrategy
-from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
-from crawl4ai.content_filter_strategy import PruningContentFilter
-from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
-
-
-def print_section(title):
-    """Print a formatted section header"""
-    print(f"\n{'=' * 60}")
-    print(f"{title}")
-    print(f"{'=' * 60}\n")
-
-
-def test_url_matching(config, test_urls, config_name):
-    """Test URL matching for a config and show results"""
-    print(f"Config: {config_name}")
-    print(f"Matcher: {config.url_matcher}")
-    if hasattr(config, 'match_mode'):
-        print(f"Mode: {config.match_mode.value}")
-    print("-" * 40)
-    
-    for url in test_urls:
-        matches = config.is_match(url)
-        symbol = "✓" if matches else "✗"
-        print(f"{symbol} {url}")
-    print()
-
-
-# ==============================================================================
-# PART 1: Understanding URL Matching
-# ==============================================================================
-
-def demo_part1_pattern_matching():
-    """Part 1: Learn how URL matching works without crawling"""
-    
-    print_section("PART 1: Understanding URL Matching")
-    print("Let's explore different ways to match URLs with configs.\n")
-    
-    # Test URLs we'll use throughout
-    test_urls = [
-        "https://example.com/report.pdf",
-        "https://example.com/data.json",
-        "https://example.com/blog/post-1",
-        "https://example.com/article/news",
-        "https://api.example.com/v1/users",
-        "https://example.com/about"
-    ]
-    
-    # 1.1 Simple String Pattern
-    print("1.1 Simple String Pattern Matching")
-    print("-" * 40)
-    
-    pdf_config = CrawlerRunConfig(
-        url_matcher="*.pdf"
-    )
-    
-    test_url_matching(pdf_config, test_urls, "PDF Config")
-    
-    
-    # 1.2 Multiple String Patterns
-    print("1.2 Multiple String Patterns (OR logic)")
-    print("-" * 40)
-    
-    blog_config = CrawlerRunConfig(
-        url_matcher=["*/blog/*", "*/article/*", "*/news/*"],
-        match_mode=MatchMode.OR  # This is default, shown for clarity
-    )
-    
-    test_url_matching(blog_config, test_urls, "Blog/Article Config")
-    
-    
-    # 1.3 Single Function Matcher
-    print("1.3 Function-based Matching")
-    print("-" * 40)
-    
-    api_config = CrawlerRunConfig(
-        url_matcher=lambda url: 'api' in url or url.endswith('.json')
-    )
-    
-    test_url_matching(api_config, test_urls, "API Config")
-    
-    
-    # 1.4 List of Functions
-    print("1.4 Multiple Functions with AND Logic")
-    print("-" * 40)
-    
-    # Must be HTTPS AND contain 'api' AND have version number
-    secure_api_config = CrawlerRunConfig(
-        url_matcher=[
-            lambda url: url.startswith('https://'),
-            lambda url: 'api' in url,
-            lambda url: '/v' in url  # Version indicator
-        ],
-        match_mode=MatchMode.AND
-    )
-    
-    test_url_matching(secure_api_config, test_urls, "Secure API Config")
-    
-    
-    # 1.5 Mixed: String and Function Together
-    print("1.5 Mixed Patterns: String + Function")
-    print("-" * 40)
-    
-    # Match JSON files OR any API endpoint
-    json_or_api_config = CrawlerRunConfig(
-        url_matcher=[
-            "*.json",  # String pattern
-            lambda url: 'api' in url  # Function
-        ],
-        match_mode=MatchMode.OR
-    )
-    
-    test_url_matching(json_or_api_config, test_urls, "JSON or API Config")
-    
-    
-    # 1.6 Complex: Multiple Strings + Multiple Functions
-    print("1.6 Complex Matcher: Mixed Types with AND Logic")
-    print("-" * 40)
-    
-    # Must be: HTTPS AND (.com domain) AND (blog OR article) AND NOT a PDF
-    complex_config = CrawlerRunConfig(
-        url_matcher=[
-            lambda url: url.startswith('https://'),  # Function: HTTPS check
-            "*.com/*",  # String: .com domain
-            lambda url: any(pattern in url for pattern in ['/blog/', '/article/']),  # Function: Blog OR article
-            lambda url: not url.endswith('.pdf')  # Function: Not PDF
-        ],
-        match_mode=MatchMode.AND
-    )
-    
-    test_url_matching(complex_config, test_urls, "Complex Mixed Config")
-    
-    print("\n✅ Key Takeaway: First matching config wins when passed to arun_many()!")
-
-
-# ==============================================================================
-# PART 2: Practical Multi-URL Crawling
-# ==============================================================================
-
-async def demo_part2_practical_crawling():
-    """Part 2: Real-world example with different content types"""
-    
-    print_section("PART 2: Practical Multi-URL Crawling")
-    print("Now let's see multi-config in action with real URLs.\n")
-    
-    # Create specialized configs for different content types
-    configs = [
-        # Config 1: PDF documents - only match files ending with .pdf
-        CrawlerRunConfig(
-            url_matcher="*.pdf",
-            scraping_strategy=PDFContentScrapingStrategy()
-        ),
-        
-        # Config 2: Blog/article pages with content filtering
-        CrawlerRunConfig(
-            url_matcher=["*/blog/*", "*/article/*", "*python.org*"],
-            markdown_generator=DefaultMarkdownGenerator(
-                content_filter=PruningContentFilter(threshold=0.48)
-            )
-        ),
-        
-        # Config 3: Dynamic pages requiring JavaScript
-        CrawlerRunConfig(
-            url_matcher=lambda url: 'github.com' in url,
-            js_code="window.scrollTo(0, 500);"  # Scroll to load content
-        ),
-        
-        # Config 4: Mixed matcher - API endpoints (string OR function)
-        CrawlerRunConfig(
-            url_matcher=[
-                "*.json",  # String pattern for JSON files
-                lambda url: 'api' in url or 'httpbin.org' in url  # Function for API endpoints
-            ],
-            match_mode=MatchMode.OR,
-        ),
-        
-        # Config 5: Complex matcher - Secure documentation sites
-        CrawlerRunConfig(
-            url_matcher=[
-                lambda url: url.startswith('https://'),  # Must be HTTPS
-                "*.org/*",  # String: .org domain
-                lambda url: any(doc in url for doc in ['docs', 'documentation', 'reference']),  # Has docs
-                lambda url: not url.endswith(('.pdf', '.json'))  # Not PDF or JSON
-            ],
-            match_mode=MatchMode.AND,
-            # wait_for="css:.content, css:article"  # Wait for content to load
-        ),
-        
-        # Default config for everything else
-        # CrawlerRunConfig()  # No url_matcher means it matches everything (use it as fallback)
-    ]
-    
-    # URLs to crawl - each will use a different config
-    urls = [
-        "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",  # → PDF config
-        "https://blog.python.org/",  # → Blog config with content filter
-        "https://github.com/microsoft/playwright",  # → JS config
-        "https://httpbin.org/json",  # → Mixed matcher config (API)
-        "https://docs.python.org/3/reference/",  # → Complex matcher config
-        "https://www.w3schools.com/",  # → Default config, if you uncomment the default config line above, if not you will see `Error: No matching configuration`
-    ]
-    
-    print("URLs to crawl:")
-    for i, url in enumerate(urls, 1):
-        print(f"{i}. {url}")
-    
-    print("\nCrawling with appropriate config for each URL...\n")
-    
-    async with AsyncWebCrawler() as crawler:
-        results = await crawler.arun_many(
-            urls=urls,
-            config=configs
-        )
-        
-        # Display results
-        print("Results:")
-        print("-" * 60)
-        
-        for result in results:
-            if result.success:
-                # Determine which config was used
-                config_type = "Default"
-                if result.url.endswith('.pdf'):
-                    config_type = "PDF Strategy"
-                elif any(pattern in result.url for pattern in ['blog', 'python.org']) and 'docs' not in result.url:
-                    config_type = "Blog + Content Filter"
-                elif 'github.com' in result.url:
-                    config_type = "JavaScript Enabled"
-                elif 'httpbin.org' in result.url or result.url.endswith('.json'):
-                    config_type = "Mixed Matcher (API)"
-                elif 'docs.python.org' in result.url:
-                    config_type = "Complex Matcher (Secure Docs)"
-                
-                print(f"\n✓ {result.url}")
-                print(f"  Config used: {config_type}")
-                print(f"  Content size: {len(result.markdown)} chars")
-                
-                # Show if we have fit_markdown (from content filter)
-                if hasattr(result.markdown, 'fit_markdown') and result.markdown.fit_markdown:
-                    print(f"  Fit markdown size: {len(result.markdown.fit_markdown)} chars")
-                    reduction = (1 - len(result.markdown.fit_markdown) / len(result.markdown)) * 100
-                    print(f"  Content reduced by: {reduction:.1f}%")
-                
-                # Show extracted data if using extraction strategy
-                if hasattr(result, 'extracted_content') and result.extracted_content:
-                    print(f"  Extracted data: {str(result.extracted_content)[:100]}...")
-            else:
-                print(f"\n✗ {result.url}")
-                print(f"  Error: {result.error_message}")
-    
-    print("\n" + "=" * 60)
-    print("✅ Multi-config crawling complete!")
-    print("\nBenefits demonstrated:")
-    print("- PDFs handled with specialized scraper")
-    print("- Blog content filtered for relevance") 
-    print("- JavaScript executed only where needed")
-    print("- Mixed matchers (string + function) for flexible matching")
-    print("- Complex matchers for precise URL targeting")
-    print("- Each URL got optimal configuration automatically!")
-
-
-async def main():
-    """Run both parts of the demo"""
-    
-    print("""
-🎯 Multi-Config URL Matching Demo
-=================================
-Learn how Crawl4AI can use different configurations
-for different URLs in a single batch.
-    """)
-    
-    # Part 1: Pattern matching
-    demo_part1_pattern_matching()
-    
-    print("\nPress Enter to continue to Part 2...")
-    try:
-        input()
-    except EOFError:
-        # Running in non-interactive mode, skip input
-        pass
-    
-    # Part 2: Practical crawling
-    await demo_part2_practical_crawling()
-
-
-if __name__ == "__main__":
-    asyncio.run(main())
--- a/docs/examples/docker_hooks_examples.py
+++ b/docs/examples/docker_hooks_examples.py
@@ -1,513 +0,0 @@
-#!/usr/bin/env python3
-"""
-Comprehensive test demonstrating all hook types from hooks_example.py
-adapted for the Docker API with real URLs
-"""
-
-import requests
-import json
-import time
-from typing import Dict, Any
-
-# API_BASE_URL = "http://localhost:11234"
-API_BASE_URL = "http://localhost:11235"
-
-
-def test_all_hooks_demo():
-    """Demonstrate all 8 hook types with practical examples"""
-    print("=" * 70)
-    print("Testing: All Hooks Comprehensive Demo")
-    print("=" * 70)
-    
-    hooks_code = {
-        "on_browser_created": """
-async def hook(browser, **kwargs):
-    # Hook called after browser is created
-    print("[HOOK] on_browser_created - Browser is ready!")
-    # Browser-level configurations would go here
-    return browser
-""",
-        
-        "on_page_context_created": """
-async def hook(page, context, **kwargs):
-    # Hook called after a new page and context are created
-    print("[HOOK] on_page_context_created - New page created!")
-    
-    # Set viewport size for consistent rendering
-    await page.set_viewport_size({"width": 1920, "height": 1080})
-    
-    # Add cookies for the session (using httpbin.org domain)
-    await context.add_cookies([
-        {
-            "name": "test_session",
-            "value": "abc123xyz",
-            "domain": ".httpbin.org",
-            "path": "/",
-            "httpOnly": True,
-            "secure": True
-        }
-    ])
-    
-    # Block ads and tracking scripts to speed up crawling
-    await context.route("**/*.{png,jpg,jpeg,gif,webp,svg}", lambda route: route.abort())
-    await context.route("**/analytics/*", lambda route: route.abort())
-    await context.route("**/ads/*", lambda route: route.abort())
-    
-    print("[HOOK] Viewport set, cookies added, and ads blocked")
-    return page
-""",
-        
-        "on_user_agent_updated": """
-async def hook(page, context, user_agent, **kwargs):
-    # Hook called when user agent is updated
-    print(f"[HOOK] on_user_agent_updated - User agent: {user_agent[:50]}...")
-    return page
-""",
-        
-        "before_goto": """
-async def hook(page, context, url, **kwargs):
-    # Hook called before navigating to each URL
-    print(f"[HOOK] before_goto - About to visit: {url}")
-    
-    # Add custom headers for the request
-    await page.set_extra_http_headers({
-        "X-Custom-Header": "crawl4ai-test",
-        "Accept-Language": "en-US,en;q=0.9",
-        "DNT": "1"
-    })
-    
-    return page
-""",
-        
-        "after_goto": """
-async def hook(page, context, url, response, **kwargs):
-    # Hook called after navigating to each URL
-    print(f"[HOOK] after_goto - Successfully loaded: {url}")
-    
-    # Wait a moment for dynamic content to load
-    await page.wait_for_timeout(1000)
-    
-    # Check if specific elements exist (with error handling)
-    try:
-        # For httpbin.org, wait for body element
-        await page.wait_for_selector("body", timeout=2000)
-        print("[HOOK] Body element found and loaded")
-    except:
-        print("[HOOK] Timeout waiting for body, continuing anyway")
-    
-    return page
-""",
-        
-        "on_execution_started": """
-async def hook(page, context, **kwargs):
-    # Hook called after custom JavaScript execution
-    print("[HOOK] on_execution_started - Custom JS executed!")
-    
-    # You could inject additional JavaScript here if needed
-    await page.evaluate("console.log('[INJECTED] Hook JS running');")
-    
-    return page
-""",
-        
-        "before_retrieve_html": """
-async def hook(page, context, **kwargs):
-    # Hook called before retrieving the HTML content
-    print("[HOOK] before_retrieve_html - Preparing to get HTML")
-    
-    # Scroll to bottom to trigger lazy loading
-    await page.evaluate("window.scrollTo(0, document.body.scrollHeight);")
-    await page.wait_for_timeout(500)
-    
-    # Scroll back to top
-    await page.evaluate("window.scrollTo(0, 0);")
-    await page.wait_for_timeout(500)
-    
-    # One more scroll to middle for good measure
-    await page.evaluate("window.scrollTo(0, document.body.scrollHeight / 2);")
-    
-    print("[HOOK] Scrolling completed for lazy-loaded content")
-    return page
-""",
-        
-        "before_return_html": """
-async def hook(page, context, html, **kwargs):
-    # Hook called before returning the HTML content
-    print(f"[HOOK] before_return_html - HTML length: {len(html)} characters")
-    
-    # Log some page metrics
-    metrics = await page.evaluate('''() => {
-        return {
-            images: document.images.length,
-            links: document.links.length,
-            scripts: document.scripts.length
-        }
-    }''')
-    
-    print(f"[HOOK] Page metrics - Images: {metrics['images']}, Links: {metrics['links']}, Scripts: {metrics['scripts']}")
-    
-    return page
-"""
-    }
-    
-    # Create request payload
-    payload = {
-        "urls": ["https://httpbin.org/html"],
-        "hooks": {
-            "code": hooks_code,
-            "timeout": 30
-        },
-        "crawler_config": {
-            "js_code": "window.scrollTo(0, document.body.scrollHeight);",
-            "wait_for": "body",
-            "cache_mode": "bypass"
-        }
-    }
-    
-    print("\nSending request with all 8 hooks...")
-    start_time = time.time()
-    
-    response = requests.post(f"{API_BASE_URL}/crawl", json=payload)
-    
-    elapsed_time = time.time() - start_time
-    print(f"Request completed in {elapsed_time:.2f} seconds")
-    
-    if response.status_code == 200:
-        data = response.json()
-        print("\n✅ Request successful!")
-        
-        # Check hooks execution
-        if 'hooks' in data:
-            hooks_info = data['hooks']
-            print("\n📊 Hooks Execution Summary:")
-            print(f"  Status: {hooks_info['status']['status']}")
-            print(f"  Attached hooks: {len(hooks_info['status']['attached_hooks'])}")
-            
-            for hook_name in hooks_info['status']['attached_hooks']:
-                print(f"    ✓ {hook_name}")
-            
-            if 'summary' in hooks_info:
-                summary = hooks_info['summary']
-                print(f"\n📈 Execution Statistics:")
-                print(f"  Total executions: {summary['total_executions']}")
-                print(f"  Successful: {summary['successful']}")
-                print(f"  Failed: {summary['failed']}")
-                print(f"  Timed out: {summary['timed_out']}")
-                print(f"  Success rate: {summary['success_rate']:.1f}%")
-            
-            if hooks_info.get('execution_log'):
-                print(f"\n📝 Execution Log:")
-                for log_entry in hooks_info['execution_log']:
-                    status_icon = "✅" if log_entry['status'] == 'success' else "❌"
-                    exec_time = log_entry.get('execution_time', 0)
-                    print(f"  {status_icon} {log_entry['hook_point']}: {exec_time:.3f}s")
-        
-        # Check crawl results
-        if 'results' in data and len(data['results']) > 0:
-            print(f"\n📄 Crawl Results:")
-            for result in data['results']:
-                print(f"  URL: {result['url']}")
-                print(f"  Success: {result.get('success', False)}")
-                if result.get('html'):
-                    print(f"  HTML length: {len(result['html'])} characters")
-    
-    else:
-        print(f"❌ Error: {response.status_code}")
-        try:
-            error_data = response.json()
-            print(f"Error details: {json.dumps(error_data, indent=2)}")
-        except:
-            print(f"Error text: {response.text[:500]}")
-
-
-def test_authentication_flow():
-    """Test a complete authentication flow with multiple hooks"""
-    print("\n" + "=" * 70)
-    print("Testing: Authentication Flow with Multiple Hooks")
-    print("=" * 70)
-    
-    hooks_code = {
-        "on_page_context_created": """
-async def hook(page, context, **kwargs):
-    print("[HOOK] Setting up authentication context")
-    
-    # Add authentication cookies
-    await context.add_cookies([
-        {
-            "name": "auth_token",
-            "value": "fake_jwt_token_here",
-            "domain": ".httpbin.org",
-            "path": "/",
-            "httpOnly": True,
-            "secure": True
-        }
-    ])
-    
-    # Set localStorage items (for SPA authentication)
-    await page.evaluate('''
-        localStorage.setItem('user_id', '12345');
-        localStorage.setItem('auth_time', new Date().toISOString());
-    ''')
-    
-    return page
-""",
-        
-        "before_goto": """
-async def hook(page, context, url, **kwargs):
-    print(f"[HOOK] Adding auth headers for {url}")
-    
-    # Add Authorization header
-    import base64
-    credentials = base64.b64encode(b"user:passwd").decode('ascii')
-    
-    await page.set_extra_http_headers({
-        'Authorization': f'Basic {credentials}',
-        'X-API-Key': 'test-api-key-123'
-    })
-    
-    return page
-"""
-    }
-    
-    payload = {
-        "urls": [
-            "https://httpbin.org/basic-auth/user/passwd"
-        ],
-        "hooks": {
-            "code": hooks_code,
-            "timeout": 15
-        }
-    }
-    
-    print("\nTesting authentication with httpbin endpoints...")
-    response = requests.post(f"{API_BASE_URL}/crawl", json=payload)
-    
-    if response.status_code == 200:
-        data = response.json()
-        print("✅ Authentication test completed")
-        
-        if 'results' in data:
-            for i, result in enumerate(data['results']):
-                print(f"\n  URL {i+1}: {result['url']}")
-                if result.get('success'):
-                    # Check for authentication success indicators
-                    html_content = result.get('html', '')
-                    if '"authenticated"' in html_content and 'true' in html_content:
-                        print("    ✅ Authentication successful! Basic auth worked.")
-                    else:
-                        print("    ⚠️ Page loaded but auth status unclear")
-                else:
-                    print(f"    ❌ Failed: {result.get('error_message', 'Unknown error')}")
-    else:
-        print(f"❌ Error: {response.status_code}")
-
-
-def test_performance_optimization_hooks():
-    """Test hooks for performance optimization"""
-    print("\n" + "=" * 70)
-    print("Testing: Performance Optimization Hooks")
-    print("=" * 70)
-    
-    hooks_code = {
-        "on_page_context_created": """
-async def hook(page, context, **kwargs):
-    print("[HOOK] Optimizing page for performance")
-    
-    # Block resource-heavy content
-    await context.route("**/*.{png,jpg,jpeg,gif,webp,svg,ico}", lambda route: route.abort())
-    await context.route("**/*.{woff,woff2,ttf,otf}", lambda route: route.abort())
-    await context.route("**/*.{mp4,webm,ogg,mp3,wav}", lambda route: route.abort())
-    await context.route("**/googletagmanager.com/*", lambda route: route.abort())
-    await context.route("**/google-analytics.com/*", lambda route: route.abort())
-    await context.route("**/doubleclick.net/*", lambda route: route.abort())
-    await context.route("**/facebook.com/*", lambda route: route.abort())
-    
-    # Disable animations and transitions
-    await page.add_style_tag(content='''
-        *, *::before, *::after {
-            animation-duration: 0s !important;
-            animation-delay: 0s !important;
-            transition-duration: 0s !important;
-            transition-delay: 0s !important;
-        }
-    ''')
-    
-    print("[HOOK] Performance optimizations applied")
-    return page
-""",
-        
-        "before_retrieve_html": """
-async def hook(page, context, **kwargs):
-    print("[HOOK] Removing unnecessary elements before extraction")
-    
-    # Remove ads, popups, and other unnecessary elements
-    await page.evaluate('''() => {
-        // Remove common ad containers
-        const adSelectors = [
-            '.ad', '.ads', '.advertisement', '[id*="ad-"]', '[class*="ad-"]',
-            '.popup', '.modal', '.overlay', '.cookie-banner', '.newsletter-signup'
-        ];
-        
-        adSelectors.forEach(selector => {
-            document.querySelectorAll(selector).forEach(el => el.remove());
-        });
-        
-        // Remove script tags to clean up HTML
-        document.querySelectorAll('script').forEach(el => el.remove());
-        
-        // Remove style tags we don't need
-        document.querySelectorAll('style').forEach(el => el.remove());
-    }''')
-    
-    return page
-"""
-    }
-    
-    payload = {
-        "urls": ["https://httpbin.org/html"],
-        "hooks": {
-            "code": hooks_code,
-            "timeout": 10
-        }
-    }
-    
-    print("\nTesting performance optimization hooks...")
-    start_time = time.time()
-    
-    response = requests.post(f"{API_BASE_URL}/crawl", json=payload)
-    
-    elapsed_time = time.time() - start_time
-    print(f"Request completed in {elapsed_time:.2f} seconds")
-    
-    if response.status_code == 200:
-        data = response.json()
-        print("✅ Performance optimization test completed")
-        
-        if 'results' in data and len(data['results']) > 0:
-            result = data['results'][0]
-            if result.get('html'):
-                print(f"  HTML size: {len(result['html'])} characters")
-                print("  Resources blocked, ads removed, animations disabled")
-    else:
-        print(f"❌ Error: {response.status_code}")
-
-
-def test_content_extraction_hooks():
-    """Test hooks for intelligent content extraction"""
-    print("\n" + "=" * 70)
-    print("Testing: Content Extraction Hooks")
-    print("=" * 70)
-    
-    hooks_code = {
-        "after_goto": """
-async def hook(page, context, url, response, **kwargs):
-    print(f"[HOOK] Waiting for dynamic content on {url}")
-    
-    # Wait for any lazy-loaded content
-    await page.wait_for_timeout(2000)
-    
-    # Trigger any "Load More" buttons
-    try:
-        load_more = await page.query_selector('[class*="load-more"], [class*="show-more"], button:has-text("Load More")')
-        if load_more:
-            await load_more.click()
-            await page.wait_for_timeout(1000)
-            print("[HOOK] Clicked 'Load More' button")
-    except:
-        pass
-    
-    return page
-""",
-        
-        "before_retrieve_html": """
-async def hook(page, context, **kwargs):
-    print("[HOOK] Extracting structured data")
-    
-    # Extract metadata
-    metadata = await page.evaluate('''() => {
-        const getMeta = (name) => {
-            const element = document.querySelector(`meta[name="${name}"], meta[property="${name}"]`);
-            return element ? element.getAttribute('content') : null;
-        };
-        
-        return {
-            title: document.title,
-            description: getMeta('description') || getMeta('og:description'),
-            author: getMeta('author'),
-            keywords: getMeta('keywords'),
-            ogTitle: getMeta('og:title'),
-            ogImage: getMeta('og:image'),
-            canonical: document.querySelector('link[rel="canonical"]')?.href,
-            jsonLd: Array.from(document.querySelectorAll('script[type="application/ld+json"]'))
-                .map(el => el.textContent).filter(Boolean)
-        };
-    }''')
-    
-    print(f"[HOOK] Extracted metadata: {json.dumps(metadata, indent=2)}")
-    
-    # Infinite scroll handling
-    for i in range(3):
-        await page.evaluate("window.scrollTo(0, document.body.scrollHeight);")
-        await page.wait_for_timeout(1000)
-        print(f"[HOOK] Scroll iteration {i+1}/3")
-    
-    return page
-"""
-    }
-    
-    payload = {
-        "urls": ["https://httpbin.org/html", "https://httpbin.org/json"],
-        "hooks": {
-            "code": hooks_code,
-            "timeout": 20
-        }
-    }
-    
-    print("\nTesting content extraction hooks...")
-    response = requests.post(f"{API_BASE_URL}/crawl", json=payload)
-    
-    if response.status_code == 200:
-        data = response.json()
-        print("✅ Content extraction test completed")
-        
-        if 'hooks' in data and 'summary' in data['hooks']:
-            summary = data['hooks']['summary']
-            print(f"  Hooks executed: {summary['successful']}/{summary['total_executions']}")
-        
-        if 'results' in data:
-            for result in data['results']:
-                print(f"\n  URL: {result['url']}")
-                print(f"  Success: {result.get('success', False)}")
-    else:
-        print(f"❌ Error: {response.status_code}")
-
-
-def main():
-    """Run comprehensive hook tests"""
-    print("🔧 Crawl4AI Docker API - Comprehensive Hooks Testing")
-    print("Based on docs/examples/hooks_example.py")
-    print("=" * 70)
-    
-    tests = [
-        ("All Hooks Demo", test_all_hooks_demo),
-        ("Authentication Flow", test_authentication_flow),
-        ("Performance Optimization", test_performance_optimization_hooks),
-        ("Content Extraction", test_content_extraction_hooks),
-    ]
-    
-    for i, (name, test_func) in enumerate(tests, 1):
-        print(f"\n📌 Test {i}/{len(tests)}: {name}")
-        try:
-            test_func()
-            print(f"✅ {name} completed")
-        except Exception as e:
-            print(f"❌ {name} failed: {e}")
-            import traceback
-            traceback.print_exc()
-    
-    print("\n" + "=" * 70)
-    print("🎉 All comprehensive hook tests completed!")
-    print("=" * 70)
-
-
-if __name__ == "__main__":
-    main()
--- a/docs/examples/hello_world.py
+++ b/docs/examples/hello_world.py
@@ -8,6 +8,8 @@ from crawl4ai import (
    CrawlResult
 )

+from crawl4ai.prompts import GENERATE_SCRIPT_PROMPT
+

 async def main():
    browser_config = BrowserConfig(
--- a/docs/examples/hello_world_undetected.py
+++ b/docs/examples/hello_world_undetected.py
@@ -1,57 +0,0 @@
-import asyncio
-from crawl4ai import (
-    AsyncWebCrawler,
-    BrowserConfig,
-    CrawlerRunConfig,
-    DefaultMarkdownGenerator,
-    PruningContentFilter,
-    CrawlResult,
-    UndetectedAdapter
-)
-from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
-
-
-async def main():
-    # Create browser config
-    browser_config = BrowserConfig(
-        headless=False,
-        verbose=True,
-    )
-    
-    # Create the undetected adapter
-    undetected_adapter = UndetectedAdapter()
-    
-    # Create the crawler strategy with the undetected adapter
-    crawler_strategy = AsyncPlaywrightCrawlerStrategy(
-        browser_config=browser_config,
-        browser_adapter=undetected_adapter
-    )
-    
-    # Create the crawler with our custom strategy
-    async with AsyncWebCrawler(
-        crawler_strategy=crawler_strategy,
-        config=browser_config
-    ) as crawler:
-        # Configure the crawl
-        crawler_config = CrawlerRunConfig(
-            markdown_generator=DefaultMarkdownGenerator(
-                content_filter=PruningContentFilter()
-            ),
-            capture_console_messages=True,  # Enable console capture to test adapter
-        )
-        
-        # Test on a site that typically detects bots
-        print("Testing undetected adapter...")
-        result: CrawlResult = await crawler.arun(
-            url="https://www.helloworld.org", 
-            config=crawler_config
-        )
-        
-        print(f"Status: {result.status_code}")
-        print(f"Success: {result.success}")
-        print(f"Console messages captured: {len(result.console_messages or [])}")
-        print(f"Markdown content (first 500 chars):\n{result.markdown.raw_markdown[:500]}")
-
-
-if __name__ == "__main__":
-    asyncio.run(main())
--- a/docs/examples/scraping_strategies_performance.py
+++ b/docs/examples/scraping_strategies_performance.py
@@ -1,6 +1,5 @@
 import time, re
-from crawl4ai.content_scraping_strategy import LXMLWebScrapingStrategy
-# WebScrapingStrategy is now an alias for LXMLWebScrapingStrategy
+from crawl4ai.content_scraping_strategy import WebScrapingStrategy,  LXMLWebScrapingStrategy
 import time
 import functools
 from collections import defaultdict
@@ -58,7 +57,7 @@ methods_to_profile = [


 # Apply decorators to both strategies
-for strategy, name in [(LXMLWebScrapingStrategy, "LXML")]:
+for strategy, name in [(WebScrapingStrategy, "Original"), (LXMLWebScrapingStrategy, "LXML")]:
    for method in methods_to_profile:
        apply_decorators(strategy, method, name)

@@ -86,7 +85,7 @@ def generate_large_html(n_elements=1000):

 def test_scraping():
    # Initialize both scrapers
-    original_scraper = LXMLWebScrapingStrategy()
+    original_scraper = WebScrapingStrategy()
    selected_scraper = LXMLWebScrapingStrategy()
    
    # Generate test HTML
--- a/docs/examples/simple_anti_bot_examples.py
+++ b/docs/examples/simple_anti_bot_examples.py
@@ -1,59 +0,0 @@
-import asyncio
-from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, UndetectedAdapter
-from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
-
-# Example 1: Stealth Mode
-async def stealth_mode_example():
-    browser_config = BrowserConfig(
-        enable_stealth=True,
-        headless=False
-    )
-    
-    async with AsyncWebCrawler(config=browser_config) as crawler:
-        result = await crawler.arun("https://example.com")
-        return result.html[:500]
-
-# Example 2: Undetected Browser
-async def undetected_browser_example():
-    browser_config = BrowserConfig(
-        headless=False
-    )
-    
-    adapter = UndetectedAdapter()
-    strategy = AsyncPlaywrightCrawlerStrategy(
-        browser_config=browser_config,
-        browser_adapter=adapter
-    )
-    
-    async with AsyncWebCrawler(
-        crawler_strategy=strategy,
-        config=browser_config
-    ) as crawler:
-        result = await crawler.arun("https://example.com")
-        return result.html[:500]
-
-# Example 3: Both Combined
-async def combined_example():
-    browser_config = BrowserConfig(
-        enable_stealth=True,
-        headless=False
-    )
-    
-    adapter = UndetectedAdapter()
-    strategy = AsyncPlaywrightCrawlerStrategy(
-        browser_config=browser_config,
-        browser_adapter=adapter
-    )
-    
-    async with AsyncWebCrawler(
-        crawler_strategy=strategy,
-        config=browser_config
-    ) as crawler:
-        result = await crawler.arun("https://example.com")
-        return result.html[:500]
-
-# Run examples
-if __name__ == "__main__":
-    asyncio.run(stealth_mode_example())
-    asyncio.run(undetected_browser_example())
-    asyncio.run(combined_example())
--- a/docs/examples/smart_cache.py
+++ b/docs/examples/smart_cache.py
@@ -0,0 +1,202 @@
+"""
+SMART Cache Mode Example for Crawl4AI
+
+This example demonstrates how to use the SMART cache mode to intelligently
+validate cached content before using it. SMART mode can save 70-95% bandwidth
+on unchanged content while ensuring you always get fresh data when it changes.
+
+SMART Cache Mode: Only Crawl When Changes
+"""
+
+import sys
+import os
+sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))
+
+import asyncio
+import time
+from crawl4ai import AsyncWebCrawler
+from crawl4ai.cache_context import CacheMode
+from crawl4ai.async_configs import CrawlerRunConfig
+
+
+async def basic_smart_cache_example():
+    """Basic example showing SMART cache mode in action"""
+    print("=== Basic SMART Cache Example ===\n")
+    
+    async with AsyncWebCrawler(verbose=True) as crawler:
+        url = "https://example.com"
+        
+        # First crawl: Cache the content
+        print("1. Initial crawl to cache the content:")
+        config = CrawlerRunConfig(cache_mode=CacheMode.ENABLED)
+        result1 = await crawler.arun(url=url, config=config)
+        print(f"   Initial crawl: {len(result1.html)} bytes\n")
+        
+        # Second crawl: Use SMART mode
+        print("2. SMART mode crawl (should use cache for static content):")
+        smart_config = CrawlerRunConfig(cache_mode=CacheMode.SMART)
+        start_time = time.time()
+        result2 = await crawler.arun(url=url, config=smart_config)
+        elapsed = time.time() - start_time
+        print(f"   SMART crawl: {len(result2.html)} bytes in {elapsed:.2f}s")
+        print(f"   Content identical: {result1.html == result2.html}\n")
+
+
+async def news_site_monitoring():
+    """Monitor a news site for changes using SMART cache mode"""
+    print("=== News Site Monitoring Example ===\n")
+    
+    async with AsyncWebCrawler(verbose=True) as crawler:
+        config = CrawlerRunConfig(cache_mode=CacheMode.SMART)
+        url = "https://news.ycombinator.com"
+        
+        print("Monitoring Hacker News for changes...\n")
+        
+        previous_length = 0
+        for i in range(3):
+            result = await crawler.arun(url=url, config=config)
+            current_length = len(result.html)
+            
+            if i == 0:
+                print(f"Check {i+1}: Initial crawl - {current_length} bytes")
+            else:
+                if current_length != previous_length:
+                    print(f"Check {i+1}: Content changed! {previous_length} -> {current_length} bytes")
+                else:
+                    print(f"Check {i+1}: Content unchanged - {current_length} bytes")
+            
+            previous_length = current_length
+            
+            if i < 2:  # Don't wait after last check
+                print("   Waiting 10 seconds before next check...")
+                await asyncio.sleep(10)
+        
+        print()
+
+
+async def compare_cache_modes():
+    """Compare different cache modes to understand SMART mode benefits"""
+    print("=== Cache Mode Comparison ===\n")
+    
+    async with AsyncWebCrawler(verbose=False) as crawler:
+        url = "https://www.wikipedia.org"
+        
+        # First, populate the cache
+        config = CrawlerRunConfig(cache_mode=CacheMode.ENABLED)
+        await crawler.arun(url=url, config=config)
+        print("Cache populated.\n")
+        
+        # Test different cache modes
+        modes = [
+            (CacheMode.ENABLED, "ENABLED (always uses cache if available)"),
+            (CacheMode.BYPASS, "BYPASS (never uses cache)"),
+            (CacheMode.SMART, "SMART (validates cache before using)")
+        ]
+        
+        for mode, description in modes:
+            config = CrawlerRunConfig(cache_mode=mode)
+            start_time = time.time()
+            result = await crawler.arun(url=url, config=config)
+            elapsed = time.time() - start_time
+            
+            print(f"{description}:")
+            print(f"  Time: {elapsed:.2f}s")
+            print(f"  Size: {len(result.html)} bytes\n")
+
+
+async def dynamic_content_example():
+    """Show how SMART mode handles dynamic content"""
+    print("=== Dynamic Content Example ===\n")
+    
+    async with AsyncWebCrawler(verbose=True) as crawler:
+        # URL that returns different content each time
+        dynamic_url = "https://httpbin.org/uuid"
+        
+        print("Testing with dynamic content (changes every request):\n")
+        
+        # First crawl
+        config = CrawlerRunConfig(cache_mode=CacheMode.ENABLED)
+        result1 = await crawler.arun(url=dynamic_url, config=config)
+        
+        # Extract UUID from the response
+        import re
+        uuid1 = re.search(r'"uuid":\s*"([^"]+)"', result1.html)
+        if uuid1:
+            print(f"1. First crawl UUID: {uuid1.group(1)}")
+        
+        # SMART mode crawl - should detect change and re-crawl
+        smart_config = CrawlerRunConfig(cache_mode=CacheMode.SMART)
+        result2 = await crawler.arun(url=dynamic_url, config=smart_config)
+        
+        uuid2 = re.search(r'"uuid":\s*"([^"]+)"', result2.html)
+        if uuid2:
+            print(f"2. SMART crawl UUID: {uuid2.group(1)}")
+            print(f"   Different UUIDs: {uuid1.group(1) != uuid2.group(1)} (should be True)")
+
+
+async def bandwidth_savings_demo():
+    """Demonstrate bandwidth savings with SMART mode"""
+    print("=== Bandwidth Savings Demo ===\n")
+    
+    async with AsyncWebCrawler(verbose=True) as crawler:
+        # List of URLs to crawl
+        urls = [
+            "https://example.com",
+            "https://www.python.org",
+            "https://docs.python.org/3/",
+        ]
+        
+        print("Crawling multiple URLs twice to show bandwidth savings:\n")
+        
+        # First pass: Cache all URLs
+        print("First pass - Caching all URLs:")
+        total_bytes_pass1 = 0
+        config = CrawlerRunConfig(cache_mode=CacheMode.ENABLED)
+        
+        for url in urls:
+            result = await crawler.arun(url=url, config=config)
+            total_bytes_pass1 += len(result.html)
+            print(f"  {url}: {len(result.html)} bytes")
+        
+        print(f"\nTotal downloaded in first pass: {total_bytes_pass1} bytes")
+        
+        # Second pass: Use SMART mode
+        print("\nSecond pass - Using SMART mode:")
+        total_bytes_pass2 = 0
+        smart_config = CrawlerRunConfig(cache_mode=CacheMode.SMART)
+        
+        for url in urls:
+            result = await crawler.arun(url=url, config=smart_config)
+            # In SMART mode, unchanged content uses cache (minimal bandwidth)
+            print(f"  {url}: Using {'cache' if result else 'fresh crawl'}")
+        
+        print(f"\nBandwidth saved: ~{total_bytes_pass1} bytes (only HEAD requests sent)")
+
+
+async def main():
+    """Run all examples"""
+    examples = [
+        basic_smart_cache_example,
+        news_site_monitoring,
+        compare_cache_modes,
+        dynamic_content_example,
+        bandwidth_savings_demo
+    ]
+    
+    for example in examples:
+        await example()
+        print("\n" + "="*50 + "\n")
+        await asyncio.sleep(2)  # Brief pause between examples
+
+
+if __name__ == "__main__":
+    print("""
+Crawl4AI SMART Cache Mode Examples
+==================================
+
+These examples demonstrate the SMART cache mode that intelligently
+validates cached content using HEAD requests before deciding whether
+to use cache or perform a fresh crawl.
+
+""")
+    asyncio.run(main())
--- a/docs/examples/stealth_mode_example.py
+++ b/docs/examples/stealth_mode_example.py
@@ -1,522 +0,0 @@
-"""
-Stealth Mode Example with Crawl4AI
-
-This example demonstrates how to use the stealth mode feature to bypass basic bot detection.
-The stealth mode uses playwright-stealth to modify browser fingerprints and behaviors
-that are commonly used to detect automated browsers.
-
-Key features demonstrated:
-1. Comparing crawling with and without stealth mode
-2. Testing against bot detection sites
-3. Accessing sites that block automated browsers
-4. Best practices for stealth crawling
-"""
-
-import asyncio
-import json
-from typing import Dict, Any
-from colorama import Fore, Style, init
-
-from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
-from crawl4ai.async_logger import AsyncLogger
-
-# Initialize colorama for colored output
-init()
-
-# Create a logger for better output
-logger = AsyncLogger(verbose=True)
-
-
-async def test_bot_detection(use_stealth: bool = False) -> Dict[str, Any]:
-    """Test against a bot detection service"""
-    
-    logger.info(
-        f"Testing bot detection with stealth={'ON' if use_stealth else 'OFF'}",
-        tag="STEALTH"
-    )
-    
-    # Configure browser with or without stealth
-    browser_config = BrowserConfig(
-        headless=False,  # Use False to see the browser in action
-        enable_stealth=use_stealth,
-        viewport_width=1280,
-        viewport_height=800
-    )
-    
-    async with AsyncWebCrawler(config=browser_config) as crawler:
-        # JavaScript to extract bot detection results
-        detection_script = """
-        // Comprehensive bot detection checks
-        (() => {
-        const detectionResults = {
-            // Basic WebDriver detection
-            webdriver: navigator.webdriver,
-            
-            // Chrome specific
-            chrome: !!window.chrome,
-            chromeRuntime: !!window.chrome?.runtime,
-            
-            // Automation indicators
-            automationControlled: navigator.webdriver,
-            
-            // Permissions API
-            permissionsPresent: !!navigator.permissions?.query,
-            
-            // Plugins
-            pluginsLength: navigator.plugins.length,
-            pluginsArray: Array.from(navigator.plugins).map(p => p.name),
-            
-            // Languages
-            languages: navigator.languages,
-            language: navigator.language,
-            
-            // User agent
-            userAgent: navigator.userAgent,
-            
-            // Screen and window properties
-            screen: {
-                width: screen.width,
-                height: screen.height,
-                availWidth: screen.availWidth,
-                availHeight: screen.availHeight,
-                colorDepth: screen.colorDepth,
-                pixelDepth: screen.pixelDepth
-            },
-            
-            // WebGL vendor
-            webglVendor: (() => {
-                try {
-                    const canvas = document.createElement('canvas');
-                    const gl = canvas.getContext('webgl') || canvas.getContext('experimental-webgl');
-                    const ext = gl.getExtension('WEBGL_debug_renderer_info');
-                    return gl.getParameter(ext.UNMASKED_VENDOR_WEBGL);
-                } catch (e) {
-                    return 'Error';
-                }
-            })(),
-            
-            // Platform
-            platform: navigator.platform,
-            
-            // Hardware concurrency
-            hardwareConcurrency: navigator.hardwareConcurrency,
-            
-            // Device memory
-            deviceMemory: navigator.deviceMemory,
-            
-            // Connection
-            connection: navigator.connection?.effectiveType
-        };
-        
-        // Log results for console capture
-        console.log('DETECTION_RESULTS:', JSON.stringify(detectionResults, null, 2));
-        
-        // Return results
-        return detectionResults;
-        })();
-        """
-        
-        # Crawl bot detection test page
-        config = CrawlerRunConfig(
-            js_code=detection_script,
-            capture_console_messages=True,
-            wait_until="networkidle",
-            delay_before_return_html=2.0  # Give time for all checks to complete
-        )
-        
-        result = await crawler.arun(
-            url="https://bot.sannysoft.com",
-            config=config
-        )
-        
-        if result.success:
-            # Extract detection results from console
-            detection_data = None
-            for msg in result.console_messages or []:
-                if "DETECTION_RESULTS:" in msg.get("text", ""):
-                    try:
-                        json_str = msg["text"].replace("DETECTION_RESULTS:", "").strip()
-                        detection_data = json.loads(json_str)
-                    except:
-                        pass
-            
-            # Also try to get from JavaScript execution result
-            if not detection_data and result.js_execution_result:
-                detection_data = result.js_execution_result
-            
-            return {
-                "success": True,
-                "url": result.url,
-                "detection_data": detection_data,
-                "page_title": result.metadata.get("title", ""),
-                "stealth_enabled": use_stealth
-            }
-        else:
-            return {
-                "success": False,
-                "error": result.error_message,
-                "stealth_enabled": use_stealth
-            }
-
-
-async def test_cloudflare_site(use_stealth: bool = False) -> Dict[str, Any]:
-    """Test accessing a Cloudflare-protected site"""
-    
-    logger.info(
-        f"Testing Cloudflare site with stealth={'ON' if use_stealth else 'OFF'}",
-        tag="STEALTH"
-    )
-    
-    browser_config = BrowserConfig(
-        headless=True,  # Cloudflare detection works better in headless mode with stealth
-        enable_stealth=use_stealth,
-        viewport_width=1920,
-        viewport_height=1080
-    )
-    
-    async with AsyncWebCrawler(config=browser_config) as crawler:
-        config = CrawlerRunConfig(
-            wait_until="networkidle",
-            page_timeout=30000,  # 30 seconds
-            delay_before_return_html=3.0
-        )
-        
-        # Test on a site that often shows Cloudflare challenges
-        result = await crawler.arun(
-            url="https://nowsecure.nl",
-            config=config
-        )
-        
-        # Check if we hit Cloudflare challenge
-        cloudflare_detected = False
-        if result.html:
-            cloudflare_indicators = [
-                "Checking your browser",
-                "Just a moment",
-                "cf-browser-verification",
-                "cf-challenge",
-                "ray ID"
-            ]
-            cloudflare_detected = any(indicator in result.html for indicator in cloudflare_indicators)
-        
-        return {
-            "success": result.success,
-            "url": result.url,
-            "cloudflare_challenge": cloudflare_detected,
-            "status_code": result.status_code,
-            "page_title": result.metadata.get("title", "") if result.metadata else "",
-            "stealth_enabled": use_stealth,
-            "html_snippet": result.html[:500] if result.html else ""
-        }
-
-
-async def test_anti_bot_site(use_stealth: bool = False) -> Dict[str, Any]:
-    """Test against sites with anti-bot measures"""
-    
-    logger.info(
-        f"Testing anti-bot site with stealth={'ON' if use_stealth else 'OFF'}",
-        tag="STEALTH"
-    )
-    
-    browser_config = BrowserConfig(
-        headless=False,
-        enable_stealth=use_stealth,
-        # Additional browser arguments that help with stealth
-        extra_args=[
-            "--disable-blink-features=AutomationControlled",
-            "--disable-features=site-per-process"
-        ] if not use_stealth else []  # These are automatically applied with stealth
-    )
-    
-    async with AsyncWebCrawler(config=browser_config) as crawler:
-        # Some sites check for specific behaviors
-        behavior_script = """
-        (async () => {
-            // Simulate human-like behavior
-            const sleep = ms => new Promise(resolve => setTimeout(resolve, ms));
-            
-            // Random mouse movement
-            const moveX = Math.random() * 100;
-            const moveY = Math.random() * 100;
-            
-            // Simulate reading time
-            await sleep(1000 + Math.random() * 2000);
-            
-            // Scroll slightly
-            window.scrollBy(0, 100 + Math.random() * 200);
-            
-            console.log('Human behavior simulation complete');
-            return true;
-        })()
-        """
-        
-        config = CrawlerRunConfig(
-            js_code=behavior_script,
-            wait_until="networkidle",
-            delay_before_return_html=5.0,  # Longer delay to appear more human
-            capture_console_messages=True
-        )
-        
-        # Test on a site that implements anti-bot measures
-        result = await crawler.arun(
-            url="https://www.g2.com/",
-            config=config
-        )
-        
-        # Check for common anti-bot blocks
-        blocked_indicators = [
-            "Access Denied",
-            "403 Forbidden", 
-            "Security Check",
-            "Verify you are human",
-            "captcha",
-            "challenge"
-        ]
-        
-        blocked = False
-        if result.html:
-            blocked = any(indicator.lower() in result.html.lower() for indicator in blocked_indicators)
-        
-        return {
-            "success": result.success and not blocked,
-            "url": result.url,
-            "blocked": blocked,
-            "status_code": result.status_code,
-            "page_title": result.metadata.get("title", "") if result.metadata else "",
-            "stealth_enabled": use_stealth
-        }
-
-
-async def compare_results():
-    """Run all tests with and without stealth mode and compare results"""
-    
-    print(f"\n{Fore.CYAN}{'='*60}{Style.RESET_ALL}")
-    print(f"{Fore.CYAN}Crawl4AI Stealth Mode Comparison{Style.RESET_ALL}")
-    print(f"{Fore.CYAN}{'='*60}{Style.RESET_ALL}\n")
-    
-    # Test 1: Bot Detection
-    print(f"{Fore.YELLOW}1. Bot Detection Test (bot.sannysoft.com){Style.RESET_ALL}")
-    print("-" * 40)
-    
-    # Without stealth
-    regular_detection = await test_bot_detection(use_stealth=False)
-    if regular_detection["success"] and regular_detection["detection_data"]:
-        print(f"{Fore.RED}Without Stealth:{Style.RESET_ALL}")
-        data = regular_detection["detection_data"]
-        print(f"  • WebDriver detected: {data.get('webdriver', 'Unknown')}")
-        print(f"  • Chrome: {data.get('chrome', 'Unknown')}")
-        print(f"  • Languages: {data.get('languages', 'Unknown')}")
-        print(f"  • Plugins: {data.get('pluginsLength', 'Unknown')}")
-        print(f"  • User Agent: {data.get('userAgent', 'Unknown')[:60]}...")
-    
-    # With stealth
-    stealth_detection = await test_bot_detection(use_stealth=True)
-    if stealth_detection["success"] and stealth_detection["detection_data"]:
-        print(f"\n{Fore.GREEN}With Stealth:{Style.RESET_ALL}")
-        data = stealth_detection["detection_data"]
-        print(f"  • WebDriver detected: {data.get('webdriver', 'Unknown')}")
-        print(f"  • Chrome: {data.get('chrome', 'Unknown')}")
-        print(f"  • Languages: {data.get('languages', 'Unknown')}")
-        print(f"  • Plugins: {data.get('pluginsLength', 'Unknown')}")
-        print(f"  • User Agent: {data.get('userAgent', 'Unknown')[:60]}...")
-    
-    # Test 2: Cloudflare Site
-    print(f"\n\n{Fore.YELLOW}2. Cloudflare Protected Site Test{Style.RESET_ALL}")
-    print("-" * 40)
-    
-    # Without stealth
-    regular_cf = await test_cloudflare_site(use_stealth=False)
-    print(f"{Fore.RED}Without Stealth:{Style.RESET_ALL}")
-    print(f"  • Success: {regular_cf['success']}")
-    print(f"  • Cloudflare Challenge: {regular_cf['cloudflare_challenge']}")
-    print(f"  • Status Code: {regular_cf['status_code']}")
-    print(f"  • Page Title: {regular_cf['page_title']}")
-    
-    # With stealth
-    stealth_cf = await test_cloudflare_site(use_stealth=True)
-    print(f"\n{Fore.GREEN}With Stealth:{Style.RESET_ALL}")
-    print(f"  • Success: {stealth_cf['success']}")
-    print(f"  • Cloudflare Challenge: {stealth_cf['cloudflare_challenge']}")
-    print(f"  • Status Code: {stealth_cf['status_code']}")
-    print(f"  • Page Title: {stealth_cf['page_title']}")
-    
-    # Test 3: Anti-bot Site
-    print(f"\n\n{Fore.YELLOW}3. Anti-Bot Site Test{Style.RESET_ALL}")
-    print("-" * 40)
-    
-    # Without stealth
-    regular_antibot = await test_anti_bot_site(use_stealth=False)
-    print(f"{Fore.RED}Without Stealth:{Style.RESET_ALL}")
-    print(f"  • Success: {regular_antibot['success']}")
-    print(f"  • Blocked: {regular_antibot['blocked']}")
-    print(f"  • Status Code: {regular_antibot['status_code']}")
-    print(f"  • Page Title: {regular_antibot['page_title']}")
-    
-    # With stealth
-    stealth_antibot = await test_anti_bot_site(use_stealth=True)
-    print(f"\n{Fore.GREEN}With Stealth:{Style.RESET_ALL}")
-    print(f"  • Success: {stealth_antibot['success']}")
-    print(f"  • Blocked: {stealth_antibot['blocked']}")
-    print(f"  • Status Code: {stealth_antibot['status_code']}")
-    print(f"  • Page Title: {stealth_antibot['page_title']}")
-    
-    # Summary
-    print(f"\n{Fore.CYAN}{'='*60}{Style.RESET_ALL}")
-    print(f"{Fore.CYAN}Summary:{Style.RESET_ALL}")
-    print(f"{Fore.CYAN}{'='*60}{Style.RESET_ALL}")
-    print(f"\nStealth mode helps bypass basic bot detection by:")
-    print(f"  • Hiding webdriver property")
-    print(f"  • Modifying browser fingerprints")
-    print(f"  • Adjusting navigator properties")
-    print(f"  • Emulating real browser plugin behavior")
-    print(f"\n{Fore.YELLOW}Note:{Style.RESET_ALL} Stealth mode is not a silver bullet.")
-    print(f"Advanced anti-bot systems may still detect automation.")
-    print(f"Always respect robots.txt and website terms of service.")
-
-
-async def stealth_best_practices():
-    """Demonstrate best practices for using stealth mode"""
-    
-    print(f"\n\n{Fore.CYAN}{'='*60}{Style.RESET_ALL}")
-    print(f"{Fore.CYAN}Stealth Mode Best Practices{Style.RESET_ALL}")
-    print(f"{Fore.CYAN}{'='*60}{Style.RESET_ALL}\n")
-    
-    # Best Practice 1: Combine with realistic behavior
-    print(f"{Fore.YELLOW}1. Combine with Realistic Behavior:{Style.RESET_ALL}")
-    
-    browser_config = BrowserConfig(
-        headless=False,
-        enable_stealth=True,
-        viewport_width=1920,
-        viewport_height=1080
-    )
-    
-    async with AsyncWebCrawler(config=browser_config) as crawler:
-        # Simulate human-like behavior
-        human_behavior_script = """
-        (async () => {
-            // Wait random time between actions
-            const randomWait = () => Math.random() * 2000 + 1000;
-            
-            // Simulate reading
-            await new Promise(resolve => setTimeout(resolve, randomWait()));
-            
-            // Smooth scroll
-            const smoothScroll = async () => {
-                const totalHeight = document.body.scrollHeight;
-                const viewHeight = window.innerHeight;
-                let currentPosition = 0;
-                
-                while (currentPosition < totalHeight - viewHeight) {
-                    const scrollAmount = Math.random() * 300 + 100;
-                    window.scrollBy({
-                        top: scrollAmount,
-                        behavior: 'smooth'
-                    });
-                    currentPosition += scrollAmount;
-                    await new Promise(resolve => setTimeout(resolve, randomWait()));
-                }
-            };
-            
-            await smoothScroll();
-            console.log('Human-like behavior simulation completed');
-            return true;
-        })()
-        """
-        
-        config = CrawlerRunConfig(
-            js_code=human_behavior_script,
-            wait_until="networkidle",
-            delay_before_return_html=3.0,
-            capture_console_messages=True
-        )
-        
-        result = await crawler.arun(
-            url="https://example.com",
-            config=config
-        )
-        
-        print(f"  ✓ Simulated human-like scrolling and reading patterns")
-        print(f"  ✓ Added random delays between actions")
-        print(f"  ✓ Result: {result.success}")
-    
-    # Best Practice 2: Use appropriate viewport and user agent
-    print(f"\n{Fore.YELLOW}2. Use Realistic Viewport and User Agent:{Style.RESET_ALL}")
-    
-    # Get a realistic user agent
-    from crawl4ai.user_agent_generator import UserAgentGenerator
-    ua_generator = UserAgentGenerator()
-    
-    browser_config = BrowserConfig(
-        headless=True,
-        enable_stealth=True,
-        viewport_width=1920,
-        viewport_height=1080,
-        user_agent=ua_generator.generate(device_type="desktop", browser_type="chrome")
-    )
-    
-    print(f"  ✓ Using realistic viewport: 1920x1080")
-    print(f"  ✓ Using current Chrome user agent")
-    print(f"  ✓ Stealth mode will ensure consistency")
-    
-    # Best Practice 3: Manage request rate
-    print(f"\n{Fore.YELLOW}3. Manage Request Rate:{Style.RESET_ALL}")
-    print(f"  ✓ Add delays between requests")
-    print(f"  ✓ Randomize timing patterns")
-    print(f"  ✓ Respect robots.txt")
-    
-    # Best Practice 4: Session management
-    print(f"\n{Fore.YELLOW}4. Use Session Management:{Style.RESET_ALL}")
-    
-    browser_config = BrowserConfig(
-        headless=False,
-        enable_stealth=True
-    )
-    
-    async with AsyncWebCrawler(config=browser_config) as crawler:
-        # Create a session for multiple requests
-        session_id = "stealth_session_1"
-        
-        config = CrawlerRunConfig(
-            session_id=session_id,
-            wait_until="domcontentloaded"
-        )
-        
-        # First request
-        result1 = await crawler.arun(
-            url="https://example.com",
-            config=config
-        )
-        
-        # Subsequent request reuses the same browser context
-        result2 = await crawler.arun(
-            url="https://example.com/about",
-            config=config
-        )
-        
-        print(f"  ✓ Reused browser session for multiple requests")
-        print(f"  ✓ Maintains cookies and state between requests")
-        print(f"  ✓ More efficient and realistic browsing pattern")
-    
-    print(f"\n{Fore.CYAN}{'='*60}{Style.RESET_ALL}")
-
-
-async def main():
-    """Run all examples"""
-    
-    # Run comparison tests
-    await compare_results()
-    
-    # Show best practices
-    await stealth_best_practices()
-    
-    print(f"\n{Fore.GREEN}Examples completed!{Style.RESET_ALL}")
-    print(f"\n{Fore.YELLOW}Remember:{Style.RESET_ALL}")
-    print(f"• Stealth mode helps with basic bot detection")
-    print(f"• Always respect website terms of service")
-    print(f"• Consider rate limiting and ethical scraping practices")
-    print(f"• For advanced protection, consider additional measures")
-
-
-if __name__ == "__main__":
-    asyncio.run(main())
--- a/docs/examples/stealth_mode_quick_start.py
+++ b/docs/examples/stealth_mode_quick_start.py
@@ -1,215 +0,0 @@
-"""
-Quick Start: Using Stealth Mode in Crawl4AI
-
-This example shows practical use cases for the stealth mode feature.
-Stealth mode helps bypass basic bot detection mechanisms.
-"""
-
-import asyncio
-from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
-
-
-async def example_1_basic_stealth():
-    """Example 1: Basic stealth mode usage"""
-    print("\n=== Example 1: Basic Stealth Mode ===")
-    
-    # Enable stealth mode in browser config
-    browser_config = BrowserConfig(
-        enable_stealth=True,  # This is the key parameter
-        headless=True
-    )
-    
-    async with AsyncWebCrawler(config=browser_config) as crawler:
-        result = await crawler.arun(url="https://example.com")
-        print(f"✓ Crawled {result.url} successfully")
-        print(f"✓ Title: {result.metadata.get('title', 'N/A')}")
-
-
-async def example_2_stealth_with_screenshot():
-    """Example 2: Stealth mode with screenshot to show detection results"""
-    print("\n=== Example 2: Stealth Mode Visual Verification ===")
-    
-    browser_config = BrowserConfig(
-        enable_stealth=True,
-        headless=False  # Set to False to see the browser
-    )
-    
-    async with AsyncWebCrawler(config=browser_config) as crawler:
-        config = CrawlerRunConfig(
-            screenshot=True,
-            wait_until="networkidle"
-        )
-        
-        result = await crawler.arun(
-            url="https://bot.sannysoft.com",
-            config=config
-        )
-        
-        if result.success:
-            print(f"✓ Successfully crawled bot detection site")
-            print(f"✓ With stealth enabled, many detection tests should show as passed")
-            
-            if result.screenshot:
-                # Save screenshot for verification
-                import base64
-                with open("stealth_detection_results.png", "wb") as f:
-                    f.write(base64.b64decode(result.screenshot))
-                print(f"✓ Screenshot saved as 'stealth_detection_results.png'")
-                print(f"  Check the screenshot to see detection results!")
-
-
-async def example_3_stealth_for_protected_sites():
-    """Example 3: Using stealth for sites with bot protection"""
-    print("\n=== Example 3: Stealth for Protected Sites ===")
-    
-    browser_config = BrowserConfig(
-        enable_stealth=True,
-        headless=True,
-        viewport_width=1920,
-        viewport_height=1080
-    )
-    
-    async with AsyncWebCrawler(config=browser_config) as crawler:
-        # Add human-like behavior
-        config = CrawlerRunConfig(
-            wait_until="networkidle",
-            delay_before_return_html=2.0,  # Wait 2 seconds
-            js_code="""
-            // Simulate human-like scrolling
-            window.scrollTo({
-                top: document.body.scrollHeight / 2,
-                behavior: 'smooth'
-            });
-            """
-        )
-        
-        # Try accessing a site that might have bot protection
-        result = await crawler.arun(
-            url="https://www.g2.com/products/slack/reviews",
-            config=config
-        )
-        
-        if result.success:
-            print(f"✓ Successfully accessed protected site")
-            print(f"✓ Retrieved {len(result.html)} characters of HTML")
-        else:
-            print(f"✗ Failed to access site: {result.error_message}")
-
-
-async def example_4_stealth_with_sessions():
-    """Example 4: Stealth mode with session management"""
-    print("\n=== Example 4: Stealth + Session Management ===")
-    
-    browser_config = BrowserConfig(
-        enable_stealth=True,
-        headless=False
-    )
-    
-    async with AsyncWebCrawler(config=browser_config) as crawler:
-        session_id = "my_stealth_session"
-        
-        # First request - establish session
-        config = CrawlerRunConfig(
-            session_id=session_id,
-            wait_until="domcontentloaded"
-        )
-        
-        result1 = await crawler.arun(
-            url="https://news.ycombinator.com",
-            config=config
-        )
-        print(f"✓ First request completed: {result1.url}")
-        
-        # Second request - reuse session
-        await asyncio.sleep(2)  # Brief delay between requests
-        
-        result2 = await crawler.arun(
-            url="https://news.ycombinator.com/best",
-            config=config
-        )
-        print(f"✓ Second request completed: {result2.url}")
-        print(f"✓ Session reused, maintaining cookies and state")
-
-
-async def example_5_stealth_comparison():
-    """Example 5: Compare results with and without stealth using screenshots"""
-    print("\n=== Example 5: Stealth Mode Comparison ===")
-    
-    test_url = "https://bot.sannysoft.com"
-    
-    # First test WITHOUT stealth
-    print("\nWithout stealth:")
-    regular_config = BrowserConfig(
-        enable_stealth=False,
-        headless=True
-    )
-    
-    async with AsyncWebCrawler(config=regular_config) as crawler:
-        config = CrawlerRunConfig(
-            screenshot=True,
-            wait_until="networkidle"
-        )
-        result = await crawler.arun(url=test_url, config=config)
-        
-        if result.success and result.screenshot:
-            import base64
-            with open("comparison_without_stealth.png", "wb") as f:
-                f.write(base64.b64decode(result.screenshot))
-            print(f"  ✓ Screenshot saved: comparison_without_stealth.png")
-            print(f"  Many tests will show as FAILED (red)")
-    
-    # Then test WITH stealth
-    print("\nWith stealth:")
-    stealth_config = BrowserConfig(
-        enable_stealth=True,
-        headless=True
-    )
-    
-    async with AsyncWebCrawler(config=stealth_config) as crawler:
-        config = CrawlerRunConfig(
-            screenshot=True,
-            wait_until="networkidle"
-        )
-        result = await crawler.arun(url=test_url, config=config)
-        
-        if result.success and result.screenshot:
-            import base64
-            with open("comparison_with_stealth.png", "wb") as f:
-                f.write(base64.b64decode(result.screenshot))
-            print(f"  ✓ Screenshot saved: comparison_with_stealth.png")
-            print(f"  More tests should show as PASSED (green)")
-    
-    print("\nCompare the two screenshots to see the difference!")
-
-
-async def main():
-    """Run all examples"""
-    print("Crawl4AI Stealth Mode Examples")
-    print("==============================")
-    
-    # Run basic example
-    await example_1_basic_stealth()
-    
-    # Run screenshot verification example
-    await example_2_stealth_with_screenshot()
-    
-    # Run protected site example
-    await example_3_stealth_for_protected_sites()
-    
-    # Run session example
-    await example_4_stealth_with_sessions()
-    
-    # Run comparison example
-    await example_5_stealth_comparison()
-    
-    print("\n" + "="*50)
-    print("Tips for using stealth mode effectively:")
-    print("- Use realistic viewport sizes (1920x1080, 1366x768)")
-    print("- Add delays between requests to appear more human")
-    print("- Combine with session management for better results")
-    print("- Remember: stealth mode is for legitimate scraping only")
-    print("="*50)
-
-
-if __name__ == "__main__":
-    asyncio.run(main())
--- a/docs/examples/stealth_test_simple.py
+++ b/docs/examples/stealth_test_simple.py
@@ -1,62 +0,0 @@
-"""
-Simple test to verify stealth mode is working
-"""
-
-import asyncio
-from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
-
-
-async def test_stealth():
-    """Test stealth mode effectiveness"""
-    
-    # Test WITHOUT stealth
-    print("=== WITHOUT Stealth ===")
-    config1 = BrowserConfig(
-        headless=False,
-        enable_stealth=False
-    )
-    
-    async with AsyncWebCrawler(config=config1) as crawler:
-        result = await crawler.arun(
-            url="https://bot.sannysoft.com",
-            config=CrawlerRunConfig(
-                wait_until="networkidle",
-                screenshot=True
-            )
-        )
-        print(f"Success: {result.success}")
-        # Take screenshot
-        if result.screenshot:
-            with open("without_stealth.png", "wb") as f:
-                import base64
-                f.write(base64.b64decode(result.screenshot))
-            print("Screenshot saved: without_stealth.png")
-    
-    # Test WITH stealth
-    print("\n=== WITH Stealth ===")
-    config2 = BrowserConfig(
-        headless=False,
-        enable_stealth=True
-    )
-    
-    async with AsyncWebCrawler(config=config2) as crawler:
-        result = await crawler.arun(
-            url="https://bot.sannysoft.com",
-            config=CrawlerRunConfig(
-                wait_until="networkidle",
-                screenshot=True
-            )
-        )
-        print(f"Success: {result.success}")
-        # Take screenshot
-        if result.screenshot:
-            with open("with_stealth.png", "wb") as f:
-                import base64
-                f.write(base64.b64decode(result.screenshot))
-            print("Screenshot saved: with_stealth.png")
-    
-    print("\nCheck the screenshots to see the difference in bot detection results!")
-
-
-if __name__ == "__main__":
-    asyncio.run(test_stealth())
--- a/docs/examples/undetectability/undetected_basic_test.py
+++ b/docs/examples/undetectability/undetected_basic_test.py
@@ -1,74 +0,0 @@
-"""
-Basic Undetected Browser Test
-Simple example to test if undetected mode works
-"""
-
-import asyncio
-from crawl4ai import AsyncWebCrawler, BrowserConfig
-
-async def test_regular_mode():
-    """Test with regular browser"""
-    print("Testing Regular Browser Mode...")
-    browser_config = BrowserConfig(
-        headless=False,
-        verbose=True
-    )
-    
-    async with AsyncWebCrawler(config=browser_config) as crawler:
-        result = await crawler.arun(url="https://www.example.com")
-        print(f"Regular Mode - Success: {result.success}")
-        print(f"Regular Mode - Status: {result.status_code}")
-        print(f"Regular Mode - Content length: {len(result.markdown.raw_markdown)}")
-        print(f"Regular Mode - First 100 chars: {result.markdown.raw_markdown[:100]}...")
-        return result.success
-
-async def test_undetected_mode():
-    """Test with undetected browser"""
-    print("\nTesting Undetected Browser Mode...")
-    from crawl4ai import UndetectedAdapter
-    from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
-    
-    browser_config = BrowserConfig(
-        headless=False,
-        verbose=True
-    )
-    
-    # Create undetected adapter
-    undetected_adapter = UndetectedAdapter()
-    
-    # Create strategy with undetected adapter
-    crawler_strategy = AsyncPlaywrightCrawlerStrategy(
-        browser_config=browser_config,
-        browser_adapter=undetected_adapter
-    )
-    
-    async with AsyncWebCrawler(
-        crawler_strategy=crawler_strategy,
-        config=browser_config
-    ) as crawler:
-        result = await crawler.arun(url="https://www.example.com")
-        print(f"Undetected Mode - Success: {result.success}")
-        print(f"Undetected Mode - Status: {result.status_code}")
-        print(f"Undetected Mode - Content length: {len(result.markdown.raw_markdown)}")
-        print(f"Undetected Mode - First 100 chars: {result.markdown.raw_markdown[:100]}...")
-        return result.success
-
-async def main():
-    """Run both tests"""
-    print("🤖 Crawl4AI Basic Adapter Test\n")
-    
-    # Test regular mode
-    regular_success = await test_regular_mode()
-    
-    # Test undetected mode
-    undetected_success = await test_undetected_mode()
-    
-    # Summary
-    print("\n" + "="*50)
-    print("Summary:")
-    print(f"Regular Mode: {'✅ Success' if regular_success else '❌ Failed'}")
-    print(f"Undetected Mode: {'✅ Success' if undetected_success else '❌ Failed'}")
-    print("="*50)
-
-if __name__ == "__main__":
-    asyncio.run(main())
--- a/docs/examples/undetectability/undetected_bot_test.py
+++ b/docs/examples/undetectability/undetected_bot_test.py
@@ -1,155 +0,0 @@
-"""
-Bot Detection Test - Compare Regular vs Undetected
-Tests browser fingerprinting differences at bot.sannysoft.com
-"""
-
-import asyncio
-from crawl4ai import (
-    AsyncWebCrawler, 
-    BrowserConfig, 
-    CrawlerRunConfig,
-    UndetectedAdapter,
-    CrawlResult
-)
-from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
-
-# Bot detection test site
-TEST_URL = "https://bot.sannysoft.com"
-
-def analyze_bot_detection(result: CrawlResult) -> dict:
-    """Analyze bot detection results from the page"""
-    detections = {
-        "webdriver": False,
-        "headless": False, 
-        "automation": False,
-        "user_agent": False,
-        "total_tests": 0,
-        "failed_tests": 0
-    }
-    
-    if not result.success or not result.html:
-        return detections
-    
-    # Look for specific test results in the HTML
-    html_lower = result.html.lower()
-    
-    # Check for common bot indicators
-    if "webdriver" in html_lower and ("fail" in html_lower or "true" in html_lower):
-        detections["webdriver"] = True
-        detections["failed_tests"] += 1
-    
-    if "headless" in html_lower and ("fail" in html_lower or "true" in html_lower):
-        detections["headless"] = True
-        detections["failed_tests"] += 1
-    
-    if "automation" in html_lower and "detected" in html_lower:
-        detections["automation"] = True
-        detections["failed_tests"] += 1
-    
-    # Count total tests (approximate)
-    detections["total_tests"] = html_lower.count("test") + html_lower.count("check")
-    
-    return detections
-
-async def test_browser_mode(adapter_name: str, adapter=None):
-    """Test a browser mode and return results"""
-    print(f"\n{'='*60}")
-    print(f"Testing: {adapter_name}")
-    print(f"{'='*60}")
-    
-    browser_config = BrowserConfig(
-        headless=False,  # Run in headed mode for better results
-        verbose=True,
-        viewport_width=1920,
-        viewport_height=1080,
-    )
-    
-    if adapter:
-        # Use undetected mode
-        crawler_strategy = AsyncPlaywrightCrawlerStrategy(
-            browser_config=browser_config,
-            browser_adapter=adapter
-        )
-        crawler = AsyncWebCrawler(
-            crawler_strategy=crawler_strategy,
-            config=browser_config
-        )
-    else:
-        # Use regular mode
-        crawler = AsyncWebCrawler(config=browser_config)
-    
-    async with crawler:
-        config = CrawlerRunConfig(
-            delay_before_return_html=3.0,  # Let detection scripts run
-            wait_for_images=True,
-            screenshot=True,
-            simulate_user=False,  # Don't simulate for accurate detection
-        )
-        
-        result = await crawler.arun(url=TEST_URL, config=config)
-        
-        print(f"\n✓ Success: {result.success}")
-        print(f"✓ Status Code: {result.status_code}")
-        
-        if result.success:
-            # Analyze detection results
-            detections = analyze_bot_detection(result)
-            
-            print(f"\n🔍 Bot Detection Analysis:")
-            print(f"  - WebDriver Detected: {'❌ Yes' if detections['webdriver'] else '✅ No'}")
-            print(f"  - Headless Detected: {'❌ Yes' if detections['headless'] else '✅ No'}")
-            print(f"  - Automation Detected: {'❌ Yes' if detections['automation'] else '✅ No'}")
-            print(f"  - Failed Tests: {detections['failed_tests']}")
-            
-            # Show some content
-            if result.markdown.raw_markdown:
-                print(f"\nContent preview:")
-                lines = result.markdown.raw_markdown.split('\n')
-                for line in lines[:20]:  # Show first 20 lines
-                    if any(keyword in line.lower() for keyword in ['test', 'pass', 'fail', 'yes', 'no']):
-                        print(f"  {line.strip()}")
-        
-        return result, detections if result.success else {}
-
-async def main():
-    """Run the comparison"""
-    print("🤖 Crawl4AI - Bot Detection Test")
-    print(f"Testing at: {TEST_URL}")
-    print("This site runs various browser fingerprinting tests\n")
-    
-    # Test regular browser
-    regular_result, regular_detections = await test_browser_mode("Regular Browser")
-    
-    # Small delay
-    await asyncio.sleep(2)
-    
-    # Test undetected browser
-    undetected_adapter = UndetectedAdapter()
-    undetected_result, undetected_detections = await test_browser_mode(
-        "Undetected Browser", 
-        undetected_adapter
-    )
-    
-    # Summary comparison
-    print(f"\n{'='*60}")
-    print("COMPARISON SUMMARY")
-    print(f"{'='*60}")
-    
-    print(f"\n{'Test':<25} {'Regular':<15} {'Undetected':<15}")
-    print(f"{'-'*55}")
-    
-    if regular_detections and undetected_detections:
-        print(f"{'WebDriver Detection':<25} {'❌ Detected' if regular_detections['webdriver'] else '✅ Passed':<15} {'❌ Detected' if undetected_detections['webdriver'] else '✅ Passed':<15}")
-        print(f"{'Headless Detection':<25} {'❌ Detected' if regular_detections['headless'] else '✅ Passed':<15} {'❌ Detected' if undetected_detections['headless'] else '✅ Passed':<15}")
-        print(f"{'Automation Detection':<25} {'❌ Detected' if regular_detections['automation'] else '✅ Passed':<15} {'❌ Detected' if undetected_detections['automation'] else '✅ Passed':<15}")
-        print(f"{'Failed Tests':<25} {regular_detections['failed_tests']:<15} {undetected_detections['failed_tests']:<15}")
-    
-    print(f"\n{'='*60}")
-    
-    if undetected_detections.get('failed_tests', 0) < regular_detections.get('failed_tests', 1):
-        print("✅ Undetected browser performed better at evading detection!")
-    else:
-        print("ℹ️  Both browsers had similar detection results")
-
-if __name__ == "__main__":
-    asyncio.run(main())
--- a/docs/examples/undetectability/undetected_cloudflare_test.py
+++ b/docs/examples/undetectability/undetected_cloudflare_test.py
@@ -1,164 +0,0 @@
-"""
-Undetected Browser Test - Cloudflare Protected Site
-Tests the difference between regular and undetected modes on a Cloudflare-protected site
-"""
-
-import asyncio
-from crawl4ai import (
-    AsyncWebCrawler, 
-    BrowserConfig, 
-    CrawlerRunConfig,
-    UndetectedAdapter
-)
-from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
-
-# Test URL with Cloudflare protection
-TEST_URL = "https://nowsecure.nl"
-
-async def test_regular_browser():
-    """Test with regular browser - likely to be blocked"""
-    print("=" * 60)
-    print("Testing with Regular Browser")
-    print("=" * 60)
-    
-    browser_config = BrowserConfig(
-        headless=False,
-        verbose=True,
-        viewport_width=1920,
-        viewport_height=1080,
-    )
-    
-    async with AsyncWebCrawler(config=browser_config) as crawler:
-        config = CrawlerRunConfig(
-            delay_before_return_html=2.0,
-            simulate_user=True,
-            magic=True,  # Try with magic mode too
-        )
-        
-        result = await crawler.arun(url=TEST_URL, config=config)
-        
-        print(f"\n✓ Success: {result.success}")
-        print(f"✓ Status Code: {result.status_code}")
-        print(f"✓ HTML Length: {len(result.html)}")
-        
-        # Check for Cloudflare challenge
-        if result.html:
-            cf_indicators = [
-                "Checking your browser",
-                "Please stand by",
-                "cloudflare",
-                "cf-browser-verification",
-                "Access denied",
-                "Ray ID"
-            ]
-            
-            detected = False
-            for indicator in cf_indicators:
-                if indicator.lower() in result.html.lower():
-                    print(f"⚠️  Cloudflare Challenge Detected: '{indicator}' found")
-                    detected = True
-                    break
-            
-            if not detected and len(result.markdown.raw_markdown) > 100:
-                print("✅ Successfully bypassed Cloudflare!")
-                print(f"Content preview: {result.markdown.raw_markdown[:200]}...")
-            elif not detected:
-                print("⚠️  Page loaded but content seems minimal")
-        
-        return result
-
-async def test_undetected_browser():
-    """Test with undetected browser - should bypass Cloudflare"""
-    print("\n" + "=" * 60)
-    print("Testing with Undetected Browser")
-    print("=" * 60)
-    
-    browser_config = BrowserConfig(
-        headless=False,  # Headless is easier to detect
-        verbose=True,
-        viewport_width=1920,
-        viewport_height=1080,
-    )
-    
-    # Create undetected adapter
-    undetected_adapter = UndetectedAdapter()
-    
-    # Create strategy with undetected adapter
-    crawler_strategy = AsyncPlaywrightCrawlerStrategy(
-        browser_config=browser_config,
-        browser_adapter=undetected_adapter
-    )
-    
-    async with AsyncWebCrawler(
-        crawler_strategy=crawler_strategy,
-        config=browser_config
-    ) as crawler:
-        config = CrawlerRunConfig(
-            delay_before_return_html=2.0,
-            simulate_user=True,
-        )
-        
-        result = await crawler.arun(url=TEST_URL, config=config)
-        
-        print(f"\n✓ Success: {result.success}")
-        print(f"✓ Status Code: {result.status_code}")
-        print(f"✓ HTML Length: {len(result.html)}")
-        
-        # Check for Cloudflare challenge
-        if result.html:
-            cf_indicators = [
-                "Checking your browser",
-                "Please stand by",
-                "cloudflare",
-                "cf-browser-verification",
-                "Access denied",
-                "Ray ID"
-            ]
-            
-            detected = False
-            for indicator in cf_indicators:
-                if indicator.lower() in result.html.lower():
-                    print(f"⚠️  Cloudflare Challenge Detected: '{indicator}' found")
-                    detected = True
-                    break
-            
-            if not detected and len(result.markdown.raw_markdown) > 100:
-                print("✅ Successfully bypassed Cloudflare!")
-                print(f"Content preview: {result.markdown.raw_markdown[:200]}...")
-            elif not detected:
-                print("⚠️  Page loaded but content seems minimal")
-        
-        return result
-
-async def main():
-    """Compare regular vs undetected browser"""
-    print("🤖 Crawl4AI - Cloudflare Bypass Test")
-    print(f"Testing URL: {TEST_URL}\n")
-    
-    # Test regular browser
-    regular_result = await test_regular_browser()
-    
-    # Small delay
-    await asyncio.sleep(2)
-    
-    # Test undetected browser
-    undetected_result = await test_undetected_browser()
-    
-    # Summary
-    print("\n" + "=" * 60)
-    print("SUMMARY")
-    print("=" * 60)
-    print(f"Regular Browser:")
-    print(f"  - Success: {regular_result.success}")
-    print(f"  - Content Length: {len(regular_result.markdown.raw_markdown) if regular_result.markdown else 0}")
-    
-    print(f"\nUndetected Browser:")
-    print(f"  - Success: {undetected_result.success}")
-    print(f"  - Content Length: {len(undetected_result.markdown.raw_markdown) if undetected_result.markdown else 0}")
-    
-    if undetected_result.success and len(undetected_result.markdown.raw_markdown) > len(regular_result.markdown.raw_markdown):
-        print("\n✅ Undetected browser successfully bypassed protection!")
-    print("=" * 60)
-
-if __name__ == "__main__":
-    asyncio.run(main())
--- a/docs/examples/undetectability/undetected_vs_regular_comparison.py
+++ b/docs/examples/undetectability/undetected_vs_regular_comparison.py
@@ -1,184 +0,0 @@
-"""
-Undetected vs Regular Browser Comparison
-This example demonstrates the difference between regular and undetected browser modes
-when accessing sites with bot detection services.
-
-Based on tested anti-bot services:
- Cloudflare
- Kasada
- Akamai
- DataDome
- Bet365
- And others
-"""
-
-import asyncio
-from crawl4ai import (
-    AsyncWebCrawler,
-    BrowserConfig,
-    CrawlerRunConfig,
-    PlaywrightAdapter,
-    UndetectedAdapter,
-    CrawlResult
-)
-from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
-
-
-# Test URLs for various bot detection services
-TEST_SITES = {
-    "Cloudflare Protected": "https://nowsecure.nl",
-    # "Bot Detection Test": "https://bot.sannysoft.com",
-    # "Fingerprint Test": "https://fingerprint.com/products/bot-detection",
-    # "Browser Scan": "https://browserscan.net",
-    # "CreepJS": "https://abrahamjuliot.github.io/creepjs",
-}
-
-
-async def test_with_adapter(url: str, adapter_name: str, adapter):
-    """Test a URL with a specific adapter"""
-    browser_config = BrowserConfig(
-        headless=False,  # Better for avoiding detection
-        viewport_width=1920,
-        viewport_height=1080,
-        verbose=True,
-    )
-    
-    # Create the crawler strategy with the adapter
-    crawler_strategy = AsyncPlaywrightCrawlerStrategy(
-        browser_config=browser_config,
-        browser_adapter=adapter
-    )
-    
-    print(f"\n{'='*60}")
-    print(f"Testing with {adapter_name} adapter")
-    print(f"URL: {url}")
-    print(f"{'='*60}")
-    
-    try:
-        async with AsyncWebCrawler(
-            crawler_strategy=crawler_strategy,
-            config=browser_config
-        ) as crawler:
-            crawler_config = CrawlerRunConfig(
-                delay_before_return_html=3.0,  # Give page time to load
-                wait_for_images=True,
-                screenshot=True,
-                simulate_user=True,  # Add user simulation
-            )
-            
-            result: CrawlResult = await crawler.arun(
-                url=url,
-                config=crawler_config
-            )
-            
-            # Check results
-            print(f"✓ Status Code: {result.status_code}")
-            print(f"✓ Success: {result.success}")
-            print(f"✓ HTML Length: {len(result.html)}")
-            print(f"✓ Markdown Length: {len(result.markdown.raw_markdown)}")
-            
-            # Check for common bot detection indicators
-            detection_indicators = [
-                "Access denied",
-                "Please verify you are human",
-                "Checking your browser",
-                "Enable JavaScript",
-                "captcha",
-                "403 Forbidden",
-                "Bot detection",
-                "Security check"
-            ]
-            
-            content_lower = result.markdown.raw_markdown.lower()
-            detected = False
-            for indicator in detection_indicators:
-                if indicator.lower() in content_lower:
-                    print(f"⚠️  Possible detection: Found '{indicator}'")
-                    detected = True
-                    break
-            
-            if not detected:
-                print("✅ No obvious bot detection triggered!")
-                # Show first 200 chars of content
-                print(f"Content preview: {result.markdown.raw_markdown[:200]}...")
-            
-            return result.success and not detected
-            
-    except Exception as e:
-        print(f"❌ Error: {str(e)}")
-        return False
-
-
-async def compare_adapters(url: str, site_name: str):
-    """Compare regular and undetected adapters on the same URL"""
-    print(f"\n{'#'*60}")
-    print(f"# Testing: {site_name}")
-    print(f"{'#'*60}")
-    
-    # Test with regular adapter
-    regular_adapter = PlaywrightAdapter()
-    regular_success = await test_with_adapter(url, "Regular", regular_adapter)
-    
-    # Small delay between tests
-    await asyncio.sleep(2)
-    
-    # Test with undetected adapter
-    undetected_adapter = UndetectedAdapter()
-    undetected_success = await test_with_adapter(url, "Undetected", undetected_adapter)
-    
-    # Summary
-    print(f"\n{'='*60}")
-    print(f"Summary for {site_name}:")
-    print(f"Regular Adapter: {'✅ Passed' if regular_success else '❌ Blocked/Detected'}")
-    print(f"Undetected Adapter: {'✅ Passed' if undetected_success else '❌ Blocked/Detected'}")
-    print(f"{'='*60}")
-    
-    return regular_success, undetected_success
-
-
-async def main():
-    """Run comparison tests on multiple sites"""
-    print("🤖 Crawl4AI Browser Adapter Comparison")
-    print("Testing regular vs undetected browser modes\n")
-    
-    results = {}
-    
-    # Test each site
-    for site_name, url in TEST_SITES.items():
-        regular, undetected = await compare_adapters(url, site_name)
-        results[site_name] = {
-            "regular": regular,
-            "undetected": undetected
-        }
-        
-        # Delay between different sites
-        await asyncio.sleep(3)
-    
-    # Final summary
-    print(f"\n{'#'*60}")
-    print("# FINAL RESULTS")
-    print(f"{'#'*60}")
-    print(f"{'Site':<30} {'Regular':<15} {'Undetected':<15}")
-    print(f"{'-'*60}")
-    
-    for site, result in results.items():
-        regular_status = "✅ Passed" if result["regular"] else "❌ Blocked"
-        undetected_status = "✅ Passed" if result["undetected"] else "❌ Blocked"
-        print(f"{site:<30} {regular_status:<15} {undetected_status:<15}")
-    
-    # Calculate success rates
-    regular_success = sum(1 for r in results.values() if r["regular"])
-    undetected_success = sum(1 for r in results.values() if r["undetected"])
-    total = len(results)
-    
-    print(f"\n{'='*60}")
-    print(f"Success Rates:")
-    print(f"Regular Adapter: {regular_success}/{total} ({regular_success/total*100:.1f}%)")
-    print(f"Undetected Adapter: {undetected_success}/{total} ({undetected_success/total*100:.1f}%)")
-    print(f"{'='*60}")
-
-
-if __name__ == "__main__":
-    # Note: This example may take a while to run as it tests multiple sites
-    # You can comment out sites in TEST_SITES to run faster tests
-    asyncio.run(main())
--- a/docs/examples/undetected_simple_demo.py
+++ b/docs/examples/undetected_simple_demo.py
@@ -1,118 +0,0 @@
-"""
-Simple Undetected Browser Demo
-Demonstrates the basic usage of undetected browser mode
-"""
-
-import asyncio
-from crawl4ai import (
-    AsyncWebCrawler, 
-    BrowserConfig, 
-    CrawlerRunConfig,
-    UndetectedAdapter
-)
-from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
-
-async def crawl_with_regular_browser(url: str):
-    """Crawl with regular browser"""
-    print("\n[Regular Browser Mode]")
-    browser_config = BrowserConfig(
-        headless=False,
-        verbose=True,
-    )
-    
-    async with AsyncWebCrawler(config=browser_config) as crawler:
-        result = await crawler.arun(
-            url=url,
-            config=CrawlerRunConfig(
-                delay_before_return_html=2.0
-            )
-        )
-        
-        print(f"Success: {result.success}")
-        print(f"Status: {result.status_code}")
-        print(f"Content length: {len(result.markdown.raw_markdown)}")
-        
-        # Check for bot detection keywords
-        content = result.markdown.raw_markdown.lower()
-        if any(word in content for word in ["cloudflare", "checking your browser", "please wait"]):
-            print("⚠️  Bot detection triggered!")
-        else:
-            print("✅ Page loaded successfully")
-        
-        return result
-
-async def crawl_with_undetected_browser(url: str):
-    """Crawl with undetected browser"""
-    print("\n[Undetected Browser Mode]")
-    browser_config = BrowserConfig(
-        headless=False,
-        verbose=True,
-    )
-    
-    # Create undetected adapter and strategy
-    undetected_adapter = UndetectedAdapter()
-    crawler_strategy = AsyncPlaywrightCrawlerStrategy(
-        browser_config=browser_config,
-        browser_adapter=undetected_adapter
-    )
-    
-    async with AsyncWebCrawler(
-        crawler_strategy=crawler_strategy,
-        config=browser_config
-    ) as crawler:
-        result = await crawler.arun(
-            url=url,
-            config=CrawlerRunConfig(
-                delay_before_return_html=2.0
-            )
-        )
-        
-        print(f"Success: {result.success}")
-        print(f"Status: {result.status_code}")
-        print(f"Content length: {len(result.markdown.raw_markdown)}")
-        
-        # Check for bot detection keywords
-        content = result.markdown.raw_markdown.lower()
-        if any(word in content for word in ["cloudflare", "checking your browser", "please wait"]):
-            print("⚠️  Bot detection triggered!")
-        else:
-            print("✅ Page loaded successfully")
-        
-        return result
-
-async def main():
-    """Demo comparing regular vs undetected modes"""
-    print("🤖 Crawl4AI Undetected Browser Demo")
-    print("="*50)
-    
-    # Test URLs - you can change these
-    test_urls = [
-        "https://www.example.com",  # Simple site
-        "https://httpbin.org/headers",  # Shows request headers
-    ]
-    
-    for url in test_urls:
-        print(f"\n📍 Testing URL: {url}")
-        
-        # Test with regular browser
-        regular_result = await crawl_with_regular_browser(url)
-        
-        # Small delay
-        await asyncio.sleep(2)
-        
-        # Test with undetected browser
-        undetected_result = await crawl_with_undetected_browser(url)
-        
-        # Compare results
-        print(f"\n📊 Comparison for {url}:")
-        print(f"Regular browser content: {len(regular_result.markdown.raw_markdown)} chars")
-        print(f"Undetected browser content: {len(undetected_result.markdown.raw_markdown)} chars")
-        
-        if url == "https://httpbin.org/headers":
-            # Show headers for comparison
-            print("\nHeaders seen by server:")
-            print("Regular:", regular_result.markdown.raw_markdown[:500])
-            print("\nUndetected:", undetected_result.markdown.raw_markdown[:500])
-
-if __name__ == "__main__":
-    asyncio.run(main())
--- a/docs/md_v2/advanced/adaptive-strategies.md
+++ b/docs/md_v2/advanced/adaptive-strategies.md
@@ -130,7 +130,7 @@ Factors:

 ```python
 class CustomLinkScorer:
-    def score(self, link: Link, query: str, state: CrawlState) -> float:
+    def score(self, link: Link, query: str, state: AdaptiveCrawlResult) -> float:
        # Prioritize specific URL patterns
        if "/api/reference/" in link.href:
            return 2.0  # Double the score
@@ -325,17 +325,17 @@ with open("crawl_analysis.json", "w") as f:
 from crawl4ai.adaptive_crawler import BaseStrategy

 class DomainSpecificStrategy(BaseStrategy):
-    def calculate_coverage(self, state: CrawlState) -> float:
+    def calculate_coverage(self, state: AdaptiveCrawlResult) -> float:
        # Custom coverage calculation
        # e.g., weight certain terms more heavily
        pass
    
-    def calculate_consistency(self, state: CrawlState) -> float:
+    def calculate_consistency(self, state: AdaptiveCrawlResult) -> float:
        # Custom consistency logic
        # e.g., domain-specific validation
        pass
    
-    def rank_links(self, links: List[Link], state: CrawlState) -> List[Link]:
+    def rank_links(self, links: List[Link], state: AdaptiveCrawlResult) -> List[Link]:
        # Custom link ranking
        # e.g., prioritize specific URL patterns
        pass
@@ -359,7 +359,7 @@ class HybridStrategy(BaseStrategy):
            URLPatternStrategy()
        ]
    
-    def calculate_confidence(self, state: CrawlState) -> float:
+    def calculate_confidence(self, state: AdaptiveCrawlResult) -> float:
        # Weighted combination of strategies
        scores = [s.calculate_confidence(state) for s in self.strategies]
        weights = [0.5, 0.3, 0.2]
--- a/docs/md_v2/advanced/advanced-features.md
+++ b/docs/md_v2/advanced/advanced-features.md
@@ -358,77 +358,9 @@ if __name__ == "__main__":

 ---

---
-
-## 7. Anti-Bot Features (Stealth Mode & Undetected Browser)
-
-Crawl4AI provides two powerful features to bypass bot detection:
-
-### 7.1 Stealth Mode
-
-Stealth mode uses playwright-stealth to modify browser fingerprints and behaviors. Enable it with a simple flag:
-
-```python
-browser_config = BrowserConfig(
-    enable_stealth=True,  # Activates stealth mode
-    headless=False
-)
-```
-
-**When to use**: Sites with basic bot detection (checking navigator.webdriver, plugins, etc.)
-
-### 7.2 Undetected Browser
-
-For advanced bot detection, use the undetected browser adapter:
-
-```python
-from crawl4ai import UndetectedAdapter
-from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
-
-# Create undetected adapter
-adapter = UndetectedAdapter()
-strategy = AsyncPlaywrightCrawlerStrategy(
-    browser_config=browser_config,
-    browser_adapter=adapter
-)
-
-async with AsyncWebCrawler(crawler_strategy=strategy, config=browser_config) as crawler:
-    # Your crawling code
-```
-
-**When to use**: Sites with sophisticated bot detection (Cloudflare, DataDome, etc.)
-
-### 7.3 Combining Both
-
-For maximum evasion, combine stealth mode with undetected browser:
-
-```python
-browser_config = BrowserConfig(
-    enable_stealth=True,  # Enable stealth
-    headless=False
-)
-
-adapter = UndetectedAdapter()  # Use undetected browser
-```
-
-### Choosing the Right Approach
-
-| Detection Level | Recommended Approach |
-|----------------|---------------------|
-| No protection | Regular browser |
-| Basic checks | Regular + Stealth mode |
-| Advanced protection | Undetected browser |
-| Maximum evasion | Undetected + Stealth mode |
-
-**Best Practice**: Start with regular browser + stealth mode. Only use undetected browser if needed, as it may be slightly slower.
-
-See [Undetected Browser Mode](undetected-browser.md) for detailed examples.
-
---
-
 ## Conclusion & Next Steps

-You've now explored several **advanced** features:
+You’ve now explored several **advanced** features:

 - **Proxy Usage**  
 - **PDF & Screenshot** capturing for large or critical pages  
@@ -436,10 +368,7 @@ You've now explored several **advanced** features:
 - **Custom Headers** for language or specialized requests  
 - **Session Persistence** via storage state
 - **Robots.txt Compliance**
- **Anti-Bot Features** (Stealth Mode & Undetected Browser)

-With these power tools, you can build robust scraping workflows that mimic real user behavior, handle secure sites, capture detailed snapshots, manage sessions across multiple runs, and bypass bot detection—streamlining your entire data collection pipeline.
+With these power tools, you can build robust scraping workflows that mimic real user behavior, handle secure sites, capture detailed snapshots, and manage sessions across multiple runs—streamlining your entire data collection pipeline.

-**Note**: In future versions, we may enable stealth mode and undetected browser by default. For now, users should explicitly enable these features when needed.
-
-**Last Updated**: 2025-01-17
+**Last Updated**: 2025-01-01
--- a/docs/md_v2/advanced/multi-url-crawling.md
+++ b/docs/md_v2/advanced/multi-url-crawling.md
@@ -404,182 +404,7 @@ for result in results:
        print(f"Duration: {dr.end_time - dr.start_time}")
 ```

-## 6. URL-Specific Configurations
-
-When crawling diverse content types, you often need different configurations for different URLs. For example:
- PDFs need specialized extraction
- Blog pages benefit from content filtering
- Dynamic sites need JavaScript execution
- API endpoints need JSON parsing
-
-### 6.1 Basic URL Pattern Matching
-
-```python
-from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, MatchMode
-from crawl4ai.processors.pdf import PDFContentScrapingStrategy
-from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
-from crawl4ai.content_filter_strategy import PruningContentFilter
-from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
-
-async def crawl_mixed_content():
-    # Configure different strategies for different content
-    configs = [
-        # PDF files - specialized extraction
-        CrawlerRunConfig(
-            url_matcher="*.pdf",
-            scraping_strategy=PDFContentScrapingStrategy()
-        ),
-        
-        # Blog/article pages - content filtering
-        CrawlerRunConfig(
-            url_matcher=["*/blog/*", "*/article/*"],
-            markdown_generator=DefaultMarkdownGenerator(
-                content_filter=PruningContentFilter(threshold=0.48)
-            )
-        ),
-        
-        # Dynamic pages - JavaScript execution
-        CrawlerRunConfig(
-            url_matcher=lambda url: 'github.com' in url,
-            js_code="window.scrollTo(0, 500);"
-        ),
-        
-        # API endpoints - JSON extraction
-        CrawlerRunConfig(
-            url_matcher=lambda url: 'api' in url or url.endswith('.json'),
-            # Custome settings for JSON extraction
-        ),
-        
-        # Default config for everything else
-        CrawlerRunConfig()  # No url_matcher means it matches ALL URLs (fallback)
-    ]
-    
-    # Mixed URLs
-    urls = [
-        "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
-        "https://blog.python.org/",
-        "https://github.com/microsoft/playwright",
-        "https://httpbin.org/json",
-        "https://example.com/"
-    ]
-    
-    async with AsyncWebCrawler() as crawler:
-        results = await crawler.arun_many(
-            urls=urls,
-            config=configs  # Pass list of configs
-        )
-        
-        for result in results:
-            print(f"{result.url}: {len(result.markdown)} chars")
-```
-
-### 6.2 Advanced Pattern Matching
-
-**Important**: A `CrawlerRunConfig` without `url_matcher` (or with `url_matcher=None`) matches ALL URLs. This makes it perfect as a default/fallback configuration.
-
-The `url_matcher` parameter supports three types of patterns:
-
-#### Glob Patterns (Strings)
-```python
-# Simple patterns
-"*.pdf"                    # Any PDF file
-"*/api/*"                  # Any URL with /api/ in path
-"https://*.example.com/*"  # Subdomain matching
-"*://example.com/blog/*"   # Any protocol
-```
-
-#### Custom Functions
-```python
-# Complex logic with lambdas
-lambda url: url.startswith('https://') and 'secure' in url
-lambda url: len(url) > 50 and url.count('/') > 5
-lambda url: any(domain in url for domain in ['api.', 'data.', 'feed.'])
-```
-
-#### Mixed Lists with AND/OR Logic
-```python
-# Combine multiple conditions
-CrawlerRunConfig(
-    url_matcher=[
-        "https://*",                        # Must be HTTPS
-        lambda url: 'internal' in url,      # Must contain 'internal'
-        lambda url: not url.endswith('.pdf') # Must not be PDF
-    ],
-    match_mode=MatchMode.AND  # ALL conditions must match
-)
-```
-
-### 6.3 Practical Example: News Site Crawler
-
-```python
-async def crawl_news_site():
-    dispatcher = MemoryAdaptiveDispatcher(
-        memory_threshold_percent=70.0,
-        rate_limiter=RateLimiter(base_delay=(1.0, 2.0))
-    )
-    
-    configs = [
-        # Homepage - light extraction
-        CrawlerRunConfig(
-            url_matcher=lambda url: url.rstrip('/') == 'https://news.ycombinator.com',
-            css_selector="nav, .headline",
-            extraction_strategy=None
-        ),
-        
-        # Article pages - full extraction
-        CrawlerRunConfig(
-            url_matcher="*/article/*",
-            extraction_strategy=CosineStrategy(
-                semantic_filter="article content",
-                word_count_threshold=100
-            ),
-            screenshot=True,
-            excluded_tags=["nav", "aside", "footer"]
-        ),
-        
-        # Author pages - metadata focus
-        CrawlerRunConfig(
-            url_matcher="*/author/*",
-            extraction_strategy=JsonCssExtractionStrategy({
-                "name": "h1.author-name",
-                "bio": ".author-bio",
-                "articles": "article.post-card h2"
-            })
-        ),
-        
-        # Everything else
-        CrawlerRunConfig()
-    ]
-    
-    async with AsyncWebCrawler() as crawler:
-        results = await crawler.arun_many(
-            urls=news_urls,
-            config=configs,
-            dispatcher=dispatcher
-        )
-```
-
-### 6.4 Best Practices
-
-1. **Order Matters**: Configs are evaluated in order - put specific patterns before general ones
-2. **Default Config Behavior**: 
-   - A config without `url_matcher` matches ALL URLs
-   - Always include a default config as the last item if you want to handle all URLs
-   - Without a default config, unmatched URLs will fail with "No matching configuration found"
-3. **Test Your Patterns**: Use the config's `is_match()` method to test patterns:
-   ```python
-   config = CrawlerRunConfig(url_matcher="*.pdf")
-   print(config.is_match("https://example.com/doc.pdf"))  # True
-   
-   default_config = CrawlerRunConfig()  # No url_matcher
-   print(default_config.is_match("https://any-url.com"))  # True - matches everything!
-   ```
-4. **Optimize for Performance**: 
-   - Disable JS for static content
-   - Skip screenshots for data APIs
-   - Use appropriate extraction strategies
-
-## 7. Summary
+## 6. Summary

 1. **Two Dispatcher Types**:

--- a/docs/md_v2/advanced/undetected-browser.md
+++ b/docs/md_v2/advanced/undetected-browser.md
@@ -1,394 +0,0 @@
-# Undetected Browser Mode
-
-## Overview
-
-Crawl4AI offers two powerful anti-bot features to help you access websites with bot detection:
-
-1. **Stealth Mode** - Uses playwright-stealth to modify browser fingerprints and behaviors
-2. **Undetected Browser Mode** - Advanced browser adapter with deep-level patches for sophisticated bot detection
-
-This guide covers both features and helps you choose the right approach for your needs.
-
-## Anti-Bot Features Comparison
-
-| Feature | Regular Browser | Stealth Mode | Undetected Browser |
-|---------|----------------|--------------|-------------------|
-| WebDriver Detection | ❌ | ✅ | ✅ |
-| Navigator Properties | ❌ | ✅ | ✅ |
-| Plugin Emulation | ❌ | ✅ | ✅ |
-| CDP Detection | ❌ | Partial | ✅ |
-| Deep Browser Patches | ❌ | ❌ | ✅ |
-| Performance Impact | None | Minimal | Moderate |
-| Setup Complexity | None | None | Minimal |
-
-## When to Use Each Approach
-
-### Use Regular Browser + Stealth Mode When:
- Sites have basic bot detection (checking navigator.webdriver, plugins, etc.)
- You need good performance with basic protection
- Sites check for common automation indicators
-
-### Use Undetected Browser When:
- Sites employ sophisticated bot detection services (Cloudflare, DataDome, etc.)
- Stealth mode alone isn't sufficient
- You're willing to trade some performance for better evasion
-
-### Best Practice: Progressive Enhancement
-1. **Start with**: Regular browser + Stealth mode
-2. **If blocked**: Switch to Undetected browser
-3. **If still blocked**: Combine Undetected browser + Stealth mode
-
-## Stealth Mode
-
-Stealth mode is the simpler anti-bot solution that works with both regular and undetected browsers:
-
-```python
-from crawl4ai import AsyncWebCrawler, BrowserConfig
-
-# Enable stealth mode with regular browser
-browser_config = BrowserConfig(
-    enable_stealth=True,  # Simple flag to enable
-    headless=False       # Better for avoiding detection
-)
-
-async with AsyncWebCrawler(config=browser_config) as crawler:
-    result = await crawler.arun("https://example.com")
-```
-
-### What Stealth Mode Does:
- Removes `navigator.webdriver` flag
- Modifies browser fingerprints
- Emulates realistic plugin behavior
- Adjusts navigator properties
- Fixes common automation leaks
-
-## Undetected Browser Mode
-
-For sites with sophisticated bot detection that stealth mode can't bypass, use the undetected browser adapter:
-
-### Key Features
-
- **Drop-in Replacement**: Uses the same API as regular browser mode
- **Enhanced Stealth**: Built-in patches to evade common detection methods
- **Browser Adapter Pattern**: Seamlessly switch between regular and undetected modes
- **Automatic Installation**: `crawl4ai-setup` installs all necessary browser dependencies
-
-### Quick Start
-
-```python
-import asyncio
-from crawl4ai import (
-    AsyncWebCrawler, 
-    BrowserConfig, 
-    CrawlerRunConfig,
-    UndetectedAdapter
-)
-from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
-
-async def main():
-    # Create the undetected adapter
-    undetected_adapter = UndetectedAdapter()
-    
-    # Create browser config
-    browser_config = BrowserConfig(
-        headless=False,  # Headless mode can be detected easier
-        verbose=True,
-    )
-    
-    # Create the crawler strategy with undetected adapter
-    crawler_strategy = AsyncPlaywrightCrawlerStrategy(
-        browser_config=browser_config,
-        browser_adapter=undetected_adapter
-    )
-    
-    # Create the crawler with our custom strategy
-    async with AsyncWebCrawler(
-        crawler_strategy=crawler_strategy,
-        config=browser_config
-    ) as crawler:
-        # Your crawling code here
-        result = await crawler.arun(
-            url="https://example.com",
-            config=CrawlerRunConfig()
-        )
-        print(result.markdown[:500])
-
-asyncio.run(main())
-```
-
-## Combining Both Features
-
-For maximum evasion, combine stealth mode with undetected browser:
-
-```python
-from crawl4ai import AsyncWebCrawler, BrowserConfig, UndetectedAdapter
-from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
-
-# Create browser config with stealth enabled
-browser_config = BrowserConfig(
-    enable_stealth=True,  # Enable stealth mode
-    headless=False
-)
-
-# Create undetected adapter
-adapter = UndetectedAdapter()
-
-# Create strategy with both features
-strategy = AsyncPlaywrightCrawlerStrategy(
-    browser_config=browser_config,
-    browser_adapter=adapter
-)
-
-async with AsyncWebCrawler(
-    crawler_strategy=strategy,
-    config=browser_config
-) as crawler:
-    result = await crawler.arun("https://protected-site.com")
-```
-
-## Examples
-
-### Example 1: Basic Stealth Mode
-
-```python
-import asyncio
-from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
-
-async def test_stealth_mode():
-    # Simple stealth mode configuration
-    browser_config = BrowserConfig(
-        enable_stealth=True,
-        headless=False
-    )
-    
-    async with AsyncWebCrawler(config=browser_config) as crawler:
-        result = await crawler.arun(
-            url="https://bot.sannysoft.com",
-            config=CrawlerRunConfig(screenshot=True)
-        )
-        
-        if result.success:
-            print("✓ Successfully accessed bot detection test site")
-            # Save screenshot to verify detection results
-            if result.screenshot:
-                import base64
-                with open("stealth_test.png", "wb") as f:
-                    f.write(base64.b64decode(result.screenshot))
-                print("✓ Screenshot saved - check for green (passed) tests")
-
-asyncio.run(test_stealth_mode())
-```
-
-### Example 2: Undetected Browser Mode
-
-```python
-import asyncio
-from crawl4ai import (
-    AsyncWebCrawler,
-    BrowserConfig,
-    CrawlerRunConfig,
-    UndetectedAdapter
-)
-from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
-
-
-async def main():
-    # Create browser config
-    browser_config = BrowserConfig(
-        headless=False,
-        verbose=True,
-    )
-    
-    # Create the undetected adapter
-    undetected_adapter = UndetectedAdapter()
-    
-    # Create the crawler strategy with the undetected adapter
-    crawler_strategy = AsyncPlaywrightCrawlerStrategy(
-        browser_config=browser_config,
-        browser_adapter=undetected_adapter
-    )
-    
-    # Create the crawler with our custom strategy
-    async with AsyncWebCrawler(
-        crawler_strategy=crawler_strategy,
-        config=browser_config
-    ) as crawler:
-        # Configure the crawl
-        crawler_config = CrawlerRunConfig(
-            markdown_generator=DefaultMarkdownGenerator(
-                content_filter=PruningContentFilter()
-            ),
-            capture_console_messages=True,  # Test adapter console capture
-        )
-        
-        # Test on a site that typically detects bots
-        print("Testing undetected adapter...")
-        result: CrawlResult = await crawler.arun(
-            url="https://www.helloworld.org", 
-            config=crawler_config
-        )
-        
-        print(f"Status: {result.status_code}")
-        print(f"Success: {result.success}")
-        print(f"Console messages captured: {len(result.console_messages or [])}")
-        print(f"Markdown content (first 500 chars):\n{result.markdown.raw_markdown[:500]}")
-
-
-if __name__ == "__main__":
-    asyncio.run(main())
-```
-
-## Browser Adapter Pattern
-
-The undetected browser support is implemented using an adapter pattern, allowing seamless switching between different browser implementations:
-
-```python
-# Regular browser adapter (default)
-from crawl4ai import PlaywrightAdapter
-regular_adapter = PlaywrightAdapter()
-
-# Undetected browser adapter
-from crawl4ai import UndetectedAdapter
-undetected_adapter = UndetectedAdapter()
-```
-
-The adapter handles:
- JavaScript execution
- Console message capture
- Error handling
- Browser-specific optimizations
-
-## Best Practices
-
-1. **Avoid Headless Mode**: Detection is easier in headless mode
-   ```python
-   browser_config = BrowserConfig(headless=False)
-   ```
-
-2. **Use Reasonable Delays**: Don't rush through pages
-   ```python
-   crawler_config = CrawlerRunConfig(
-       wait_time=3.0,  # Wait 3 seconds after page load
-       delay_before_return_html=2.0  # Additional delay
-   )
-   ```
-
-3. **Rotate User Agents**: You can customize user agents
-   ```python
-   browser_config = BrowserConfig(
-       headers={"User-Agent": "your-user-agent"}
-   )
-   ```
-
-4. **Handle Failures Gracefully**: Some sites may still detect and block
-   ```python
-   if not result.success:
-       print(f"Crawl failed: {result.error_message}")
-   ```
-
-## Advanced Usage Tips
-
-### Progressive Detection Handling
-
-```python
-async def crawl_with_progressive_evasion(url):
-    # Step 1: Try regular browser with stealth
-    browser_config = BrowserConfig(
-        enable_stealth=True,
-        headless=False
-    )
-    
-    async with AsyncWebCrawler(config=browser_config) as crawler:
-        result = await crawler.arun(url)
-        if result.success and "Access Denied" not in result.html:
-            return result
-    
-    # Step 2: If blocked, try undetected browser
-    print("Regular + stealth blocked, trying undetected browser...")
-    
-    adapter = UndetectedAdapter()
-    strategy = AsyncPlaywrightCrawlerStrategy(
-        browser_config=browser_config,
-        browser_adapter=adapter
-    )
-    
-    async with AsyncWebCrawler(
-        crawler_strategy=strategy,
-        config=browser_config
-    ) as crawler:
-        result = await crawler.arun(url)
-        return result
-```
-
-## Installation
-
-The undetected browser dependencies are automatically installed when you run:
-
-```bash
-crawl4ai-setup
-```
-
-This command installs all necessary browser dependencies for both regular and undetected modes.
-
-## Limitations
-
- **Performance**: Slightly slower than regular mode due to additional patches
- **Headless Detection**: Some sites can still detect headless mode
- **Resource Usage**: May use more resources than regular mode
- **Not 100% Guaranteed**: Advanced anti-bot services are constantly evolving
-
-## Troubleshooting
-
-### Browser Not Found
-
-Run the setup command:
-```bash
-crawl4ai-setup
-```
-
-### Detection Still Occurring
-
-Try combining with other features:
-```python
-crawler_config = CrawlerRunConfig(
-    simulate_user=True,  # Add user simulation
-    magic=True,  # Enable magic mode
-    wait_time=5.0,  # Longer waits
-)
-```
-
-### Performance Issues
-
-If experiencing slow performance:
-```python
-# Use selective undetected mode only for protected sites
-if is_protected_site(url):
-    adapter = UndetectedAdapter()
-else:
-    adapter = PlaywrightAdapter()  # Default adapter
-```
-
-## Future Plans
-
-**Note**: In future versions of Crawl4AI, we may enable stealth mode and undetected browser by default to provide better out-of-the-box success rates. For now, users should explicitly enable these features when needed.
-
-## Conclusion
-
-Crawl4AI provides flexible anti-bot solutions:
-
-1. **Start Simple**: Use regular browser + stealth mode for most sites
-2. **Escalate if Needed**: Switch to undetected browser for sophisticated protection
-3. **Combine for Maximum Effect**: Use both features together when facing the toughest challenges
-
-Remember:
- Always respect robots.txt and website terms of service
- Use appropriate delays to avoid overwhelming servers
- Consider the performance trade-offs of each approach
- Test progressively to find the minimum necessary evasion level
-
-## See Also
-
- [Advanced Features](advanced-features.md) - Overview of all advanced features
- [Proxy & Security](proxy-security.md) - Using proxies with anti-bot features
- [Session Management](session-management.md) - Maintaining sessions across requests
- [Identity Based Crawling](identity-based-crawling.md) - Additional anti-detection strategies
--- a/docs/md_v2/api/adaptive-crawler.md
+++ b/docs/md_v2/api/adaptive-crawler.md
@@ -27,7 +27,7 @@ async def digest(
    start_url: str,
    query: str,
    resume_from: Optional[Union[str, Path]] = None
-) -> CrawlState
+) -> AdaptiveCrawlResult
 ```

 #### Parameters
@@ -38,7 +38,7 @@ async def digest(

 #### Returns

- **CrawlState**: The final crawl state containing all crawled URLs, knowledge base, and metrics
+- **AdaptiveCrawlResult**: The final crawl state containing all crawled URLs, knowledge base, and metrics

 #### Example

@@ -92,7 +92,7 @@ Access to the current crawl state.

 ```python
@property
-def state(self) -> CrawlState
+def state(self) -> AdaptiveCrawlResult
 ```

 ## Methods
--- a/docs/md_v2/api/arun_many.md
+++ b/docs/md_v2/api/arun_many.md
@@ -7,7 +7,7 @@
 ```python
 async def arun_many(
    urls: Union[List[str], List[Any]],
-    config: Optional[Union[CrawlerRunConfig, List[CrawlerRunConfig]]] = None,
+    config: Optional[CrawlerRunConfig] = None,
    dispatcher: Optional[BaseDispatcher] = None,
    ...
 ) -> Union[List[CrawlResult], AsyncGenerator[CrawlResult, None]]:
@@ -15,9 +15,7 @@ async def arun_many(
    Crawl multiple URLs concurrently or in batches.

    :param urls: A list of URLs (or tasks) to crawl.
-    :param config: (Optional) Either:
-        - A single `CrawlerRunConfig` applying to all URLs
-        - A list of `CrawlerRunConfig` objects with url_matcher patterns
+    :param config: (Optional) A default `CrawlerRunConfig` applying to each crawl.
    :param dispatcher: (Optional) A concurrency controller (e.g. MemoryAdaptiveDispatcher).
    ...
    :return: Either a list of `CrawlResult` objects, or an async generator if streaming is enabled.
@@ -97,70 +95,10 @@ results = await crawler.arun_many(
 )
 ```

-### URL-Specific Configurations
-
-Instead of using one config for all URLs, provide a list of configs with `url_matcher` patterns:
-
-```python
-from crawl4ai import CrawlerRunConfig, MatchMode
-from crawl4ai.processors.pdf import PDFContentScrapingStrategy
-from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
-from crawl4ai.content_filter_strategy import PruningContentFilter
-from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
-
-# PDF files - specialized extraction
-pdf_config = CrawlerRunConfig(
-    url_matcher="*.pdf",
-    scraping_strategy=PDFContentScrapingStrategy()
-)
-
-# Blog/article pages - content filtering
-blog_config = CrawlerRunConfig(
-    url_matcher=["*/blog/*", "*/article/*", "*python.org*"],
-    markdown_generator=DefaultMarkdownGenerator(
-        content_filter=PruningContentFilter(threshold=0.48)
-    )
-)
-
-# Dynamic pages - JavaScript execution
-github_config = CrawlerRunConfig(
-    url_matcher=lambda url: 'github.com' in url,
-    js_code="window.scrollTo(0, 500);"
-)
-
-# API endpoints - JSON extraction
-api_config = CrawlerRunConfig(
-    url_matcher=lambda url: 'api' in url or url.endswith('.json'),
-    # Custome settings for JSON extraction
-)
-
-# Default fallback config
-default_config = CrawlerRunConfig()  # No url_matcher means it never matches except as fallback
-
-# Pass the list of configs - first match wins!
-results = await crawler.arun_many(
-    urls=[
-        "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",  # → pdf_config
-        "https://blog.python.org/",  # → blog_config
-        "https://github.com/microsoft/playwright",  # → github_config
-        "https://httpbin.org/json",  # → api_config
-        "https://example.com/"  # → default_config
-    ],
-    config=[pdf_config, blog_config, github_config, api_config, default_config]
-)
-```
-
-**URL Matching Features**:
- **String patterns**: `"*.pdf"`, `"*/blog/*"`, `"*python.org*"`
- **Function matchers**: `lambda url: 'api' in url`
- **Mixed patterns**: Combine strings and functions with `MatchMode.OR` or `MatchMode.AND`
- **First match wins**: Configs are evaluated in order
-
 **Key Points**:
 - Each URL is processed by the same or separate sessions, depending on the dispatcher’s strategy.
 - `dispatch_result` in each `CrawlResult` (if using concurrency) can hold memory and timing info.  
 - If you need to handle authentication or session IDs, pass them in each individual task or within your run config.
- **Important**: Always include a default config (without `url_matcher`) as the last item if you want to handle all URLs. Otherwise, unmatched URLs will fail.

 ### Return Value

--- a/docs/md_v2/api/digest.md
+++ b/docs/md_v2/api/digest.md
@@ -9,7 +9,7 @@ async def digest(
    start_url: str,
    query: str,
    resume_from: Optional[Union[str, Path]] = None
-) -> CrawlState
+) -> AdaptiveCrawlResult
 ```

 ## Parameters
@@ -31,7 +31,7 @@ async def digest(

 ## Return Value

-Returns a `CrawlState` object containing:
+Returns a `AdaptiveCrawlResult` object containing:

 - **crawled_urls** (`Set[str]`): All URLs that have been crawled
 - **knowledge_base** (`List[CrawlResult]`): Collection of crawled pages with content
--- a/docs/md_v2/api/parameters.md
+++ b/docs/md_v2/api/parameters.md
@@ -208,71 +208,6 @@ config = CrawlerRunConfig(

 See [Virtual Scroll documentation](../../advanced/virtual-scroll.md) for detailed examples.

---
-
-### I) **URL Matching Configuration**
-
-| **Parameter**          | **Type / Default**           | **What It Does**                                                                                                                    |
-|------------------------|------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|
-| **`url_matcher`**      | `UrlMatcher` (None)          | Pattern(s) to match URLs against. Can be: string (glob), function, or list of mixed types. **None means match ALL URLs**         |
-| **`match_mode`**       | `MatchMode` (MatchMode.OR)   | How to combine multiple matchers in a list: `MatchMode.OR` (any match) or `MatchMode.AND` (all must match)                       |
-
-The `url_matcher` parameter enables URL-specific configurations when used with `arun_many()`:
-
-```python
-from crawl4ai import CrawlerRunConfig, MatchMode
-from crawl4ai.processors.pdf import PDFContentScrapingStrategy
-from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
-
-# Simple string pattern (glob-style)
-pdf_config = CrawlerRunConfig(
-    url_matcher="*.pdf",
-    scraping_strategy=PDFContentScrapingStrategy()
-)
-
-# Multiple patterns with OR logic (default)
-blog_config = CrawlerRunConfig(
-    url_matcher=["*/blog/*", "*/article/*", "*/news/*"],
-    match_mode=MatchMode.OR  # Any pattern matches
-)
-
-# Function matcher
-api_config = CrawlerRunConfig(
-    url_matcher=lambda url: 'api' in url or url.endswith('.json'),
-    # Other settings like extraction_strategy
-)
-
-# Mixed: String + Function with AND logic
-complex_config = CrawlerRunConfig(
-    url_matcher=[
-        lambda url: url.startswith('https://'),  # Must be HTTPS
-        "*.org/*",                               # Must be .org domain
-        lambda url: 'docs' in url                # Must contain 'docs'
-    ],
-    match_mode=MatchMode.AND  # ALL conditions must match
-)
-
-# Combined patterns and functions with AND logic
-secure_docs = CrawlerRunConfig(
-    url_matcher=["https://*", lambda url: '.doc' in url],
-    match_mode=MatchMode.AND  # Must be HTTPS AND contain .doc
-)
-
-# Default config - matches ALL URLs
-default_config = CrawlerRunConfig()  # No url_matcher = matches everything
-```
-
-**UrlMatcher Types:**
- **None (default)**: When `url_matcher` is None or not set, the config matches ALL URLs
- **String patterns**: Glob-style patterns like `"*.pdf"`, `"*/api/*"`, `"https://*.example.com/*"`
- **Functions**: `lambda url: bool` - Custom logic for complex matching
- **Lists**: Mix strings and functions, combined with `MatchMode.OR` or `MatchMode.AND`
-
-**Important Behavior:**
- When passing a list of configs to `arun_many()`, URLs are matched against each config's `url_matcher` in order. First match wins!
- If no config matches a URL and there's no default config (one without `url_matcher`), the URL will fail with "No matching configuration found"
- Always include a default config as the last item if you want to handle all URLs
-
 ---## 2.2 Helper Methods

 Both `BrowserConfig` and `CrawlerRunConfig` provide a `clone()` method to create modified copies:
--- a/docs/md_v2/blog/releases/0.7.1.md
+++ b/docs/md_v2/blog/releases/0.7.1.md
@@ -1,43 +0,0 @@
-# 🛠️ Crawl4AI v0.7.1: Minor Cleanup Update
-
-*July 17, 2025 • 2 min read*
-
---
-
-A small maintenance release that removes unused code and improves documentation.
-
-## 🎯 What's Changed
-
- **Removed unused StealthConfig** from `crawl4ai/browser_manager.py`
- **Updated documentation** with better examples and parameter explanations
- **Fixed virtual scroll configuration** examples in docs
-
-## 🧹 Code Cleanup
-
-Removed unused `StealthConfig` import and configuration that wasn't being used anywhere in the codebase. The project uses its own custom stealth implementation through JavaScript injection instead.
-
-```python
-# Removed unused code:
-from playwright_stealth import StealthConfig
-stealth_config = StealthConfig(...)  # This was never used
-```
-
-## 📖 Documentation Updates
-
- Fixed adaptive crawling parameter examples
- Updated session management documentation
- Corrected virtual scroll configuration examples
-
-## 🚀 Installation
-
-```bash
-pip install crawl4ai==0.7.1
-```
-
-No breaking changes - upgrade directly from v0.7.0.
-
---
-
-Questions? Issues? 
- GitHub: [github.com/unclecode/crawl4ai](https://github.com/unclecode/crawl4ai)
- Discord: [discord.gg/crawl4ai](https://discord.gg/jP8KfhDhyN)
--- a/docs/md_v2/blog/releases/0.7.2.md
+++ b/docs/md_v2/blog/releases/0.7.2.md
@@ -1,98 +0,0 @@
-# 🚀 Crawl4AI v0.7.2: CI/CD & Dependency Optimization Update
-
-*July 25, 2025 • 3 min read*
-
---
-
-This release introduces automated CI/CD pipelines for seamless releases and optimizes dependencies for a lighter, more efficient package.
-
-## 🎯 What's New
-
-### 🔄 Automated Release Pipeline
- **GitHub Actions CI/CD**: Automated PyPI and Docker Hub releases on tag push
- **Multi-platform Docker images**: Support for both AMD64 and ARM64 architectures
- **Version consistency checks**: Ensures tag, package, and Docker versions align
- **Automated release notes**: GitHub releases created automatically
-
-### 📦 Dependency Optimization
- **Moved sentence-transformers to optional dependencies**: Significantly reduces default installation size
- **Lighter Docker images**: Optimized Dockerfile for faster builds and smaller images
- **Better dependency management**: Core vs. optional dependencies clearly separated
-
-## 🏗️ CI/CD Pipeline
-
-The new automated release process ensures consistent, reliable releases:
-
-```yaml
-# Trigger releases with a simple tag
-git tag v0.7.2
-git push origin v0.7.2
-
-# Automatically:
-# ✅ Validates version consistency
-# ✅ Builds and publishes to PyPI
-# ✅ Builds multi-platform Docker images
-# ✅ Pushes to Docker Hub with proper tags
-# ✅ Creates GitHub release
-```
-
-## 💾 Lighter Installation
-
-Default installation is now significantly smaller:
-
-```bash
-# Core installation (smaller, faster)
-pip install crawl4ai==0.7.2
-
-# With ML features (includes sentence-transformers)
-pip install crawl4ai[transformer]==0.7.2
-
-# Full installation
-pip install crawl4ai[all]==0.7.2
-```
-
-## 🐳 Docker Improvements
-
-Enhanced Docker support with multi-platform images:
-
-```bash
-# Pull the latest version
-docker pull unclecode/crawl4ai:0.7.2
-docker pull unclecode/crawl4ai:latest
-
-# Available tags:
-# - unclecode/crawl4ai:0.7.2 (specific version)
-# - unclecode/crawl4ai:0.7 (minor version)
-# - unclecode/crawl4ai:0 (major version)
-# - unclecode/crawl4ai:latest
-```
-
-## 🔧 Technical Details
-
-### Dependency Changes
- `sentence-transformers` moved from required to optional dependencies
- Reduces default installation by ~500MB
- No impact on functionality when transformer features aren't needed
-
-### CI/CD Configuration
- GitHub Actions workflows for automated releases
- Version validation before publishing
- Parallel PyPI and Docker Hub deployments
- Automatic tagging strategy for Docker images
-
-## 🚀 Installation
-
-```bash
-pip install crawl4ai==0.7.2
-```
-
-No breaking changes - direct upgrade from v0.7.0 or v0.7.1.
-
---
-
-Questions? Issues? 
- GitHub: [github.com/unclecode/crawl4ai](https://github.com/unclecode/crawl4ai)
- Discord: [discord.gg/crawl4ai](https://discord.gg/jP8KfhDhyN)
- Twitter: [@unclecode](https://x.com/unclecode)
-
-*P.S. The new CI/CD pipeline will make future releases faster and more reliable. Thanks for your patience as we improve our release process!*
--- a/docs/md_v2/core/browser-crawler-config.md
+++ b/docs/md_v2/core/browser-crawler-config.md
@@ -29,7 +29,6 @@ class BrowserConfig:
        text_mode=False,
        light_mode=False,
        extra_args=None,
-        enable_stealth=False,
        # ... other advanced parameters omitted here
    ):
        ...
@@ -85,11 +84,6 @@ class BrowserConfig:
    - Additional flags for the underlying browser.  
    - E.g. `["--disable-extensions"]`.

-11. **`enable_stealth`**:  
-    - If `True`, enables stealth mode using playwright-stealth.  
-    - Modifies browser fingerprints to avoid basic bot detection.  
-    - Default is `False`. Recommended for sites with bot protection.
-
 ### Helper Methods

 Both configuration classes provide a `clone()` method to create modified copies:
@@ -215,13 +209,7 @@ class CrawlerRunConfig:
    - The maximum number of concurrent crawl sessions.  
    - Helps prevent overwhelming the system.

-14. **`url_matcher`** & **`match_mode`**:  
-    - Enable URL-specific configurations when used with `arun_many()`.
-    - Set `url_matcher` to patterns (glob, function, or list) to match specific URLs.
-    - Use `match_mode` (OR/AND) to control how multiple patterns combine.
-    - See [URL-Specific Configurations](../api/arun_many.md#url-specific-configurations) for examples.
-
-15. **`display_mode`**:  
+14. **`display_mode`**:  
    - The display mode for progress information (`DETAILED`, `BRIEF`, etc.).  
    - Affects how much information is printed during the crawl.

--- a/docs/md_v2/core/c4a-script.md
+++ b/docs/md_v2/core/c4a-script.md
@@ -52,9 +52,11 @@ That's it! In just a few lines, you've automated a complete search workflow.

 Want to learn by doing? We've got you covered:

-**🚀 [Live Demo](https://docs.crawl4ai.com/apps/c4a-script/)** - Try C4A-Script in your browser right now!
+**🚀 [Live Demo](https://docs.crawl4ai.com/c4a-script/demo)** - Try C4A-Script in your browser right now!

-**📁 [Tutorial Examples](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/c4a_script/)** - Complete examples with source code
+**📁 [Tutorial Examples](/examples/c4a_script/)** - Complete examples with source code
+
+**🛠️ [Local Tutorial](/examples/c4a_script/tutorial/)** - Run the interactive tutorial on your machine

 ### Running the Tutorial Locally

--- a/docs/md_v2/core/cache-modes.md
+++ b/docs/md_v2/core/cache-modes.md
@@ -19,6 +19,7 @@ The new system uses a single `CacheMode` enum:
 - `CacheMode.READ_ONLY`: Only read from cache
 - `CacheMode.WRITE_ONLY`: Only write to cache
 - `CacheMode.BYPASS`: Skip cache for this operation
+- `CacheMode.SMART`: **NEW** - Intelligently validate cache with HEAD requests

 ## Migration Example

@@ -72,4 +73,128 @@ if __name__ == "__main__":
 | `bypass_cache=True`   | `cache_mode=CacheMode.BYPASS`  |
 | `disable_cache=True`  | `cache_mode=CacheMode.DISABLED`|
 | `no_cache_read=True`  | `cache_mode=CacheMode.WRITE_ONLY` |
-| `no_cache_write=True` | `cache_mode=CacheMode.READ_ONLY` |
+| `no_cache_write=True` | `cache_mode=CacheMode.READ_ONLY` |
+
+## SMART Cache Mode: Only Crawl When Changes
+
+Starting from version 0.7.1, Crawl4AI introduces the **SMART cache mode** - an intelligent caching strategy that validates cached content before using it. This mode uses HTTP HEAD requests to check if content has changed, potentially saving 70-95% bandwidth on unchanged content.
+
+### How SMART Mode Works
+
+When you use `CacheMode.SMART`, Crawl4AI:
+
+1. **Retrieves cached content** (if available)
+2. **Sends a HEAD request** with conditional headers (ETag, Last-Modified)
+3. **Validates the response**:
+   - If server returns `304 Not Modified` → uses cache
+   - If content changed → performs fresh crawl
+   - If headers indicate changes → performs fresh crawl
+
+### Benefits
+
+- **Bandwidth Efficient**: Only downloads full content when necessary
+- **Always Fresh**: Ensures you get the latest content when it changes
+- **Cost Effective**: Reduces API calls and bandwidth usage
+- **Intelligent**: Uses multiple signals to detect changes (ETag, Last-Modified, Content-Length)
+
+### Basic Usage
+
+```python
+import asyncio
+from crawl4ai import AsyncWebCrawler
+from crawl4ai.cache_context import CacheMode
+from crawl4ai.async_configs import CrawlerRunConfig
+
+async def smart_crawl():
+    async with AsyncWebCrawler(verbose=True) as crawler:
+        # First crawl - caches the content
+        config = CrawlerRunConfig(cache_mode=CacheMode.ENABLED)
+        result1 = await crawler.arun(
+            url="https://example.com",
+            config=config
+        )
+        print(f"First crawl: {len(result1.html)} bytes")
+        
+        # Second crawl - uses SMART mode
+        smart_config = CrawlerRunConfig(cache_mode=CacheMode.SMART)
+        result2 = await crawler.arun(
+            url="https://example.com",
+            config=smart_config
+        )
+        print(f"SMART crawl: {len(result2.html)} bytes (from cache if unchanged)")
+
+asyncio.run(smart_crawl())
+```
+
+### When to Use SMART Mode
+
+SMART mode is ideal for:
+
+- **Periodic crawling** of websites that update irregularly
+- **News sites** where you want fresh content but avoid re-downloading unchanged pages
+- **API endpoints** that provide proper caching headers
+- **Large-scale crawling** where bandwidth costs are significant
+
+### How It Detects Changes
+
+SMART mode checks these signals in order:
+
+1. **304 Not Modified** status (most reliable)
+2. **Content-Digest** header (RFC 9530)
+3. **Strong ETag** comparison
+4. **Last-Modified** timestamp
+5. **Content-Length** changes (as a hint)
+
+### Example: News Site Monitoring
+
+```python
+async def monitor_news_site():
+    async with AsyncWebCrawler(verbose=True) as crawler:
+        config = CrawlerRunConfig(cache_mode=CacheMode.SMART)
+        
+        # Check multiple times
+        for i in range(3):
+            result = await crawler.arun(
+                url="https://news.ycombinator.com",
+                config=config
+            )
+            
+            # SMART mode will only re-crawl if content changed
+            print(f"Check {i+1}: Retrieved {len(result.html)} bytes")
+            await asyncio.sleep(300)  # Wait 5 minutes
+
+asyncio.run(monitor_news_site())
+```
+
+### Understanding SMART Mode Logs
+
+When using SMART mode with `verbose=True`, you'll see informative logs:
+
+```
+[SMART] ℹ SMART cache: 304 Not Modified - Content unchanged - Using cache for https://example.com
+[SMART] ℹ SMART cache: Content-Length changed (12345 -> 12789) - Re-crawling https://example.com
+[SMART] ℹ SMART cache: No definitive cache headers matched - Assuming content changed - Re-crawling https://example.com
+```
+
+### Limitations
+
+- Some servers don't properly support HEAD requests
+- Dynamic content without proper cache headers will always be re-crawled
+- Content changes must be reflected in HTTP headers for detection
+
+### Advanced Example
+
+For a complete example demonstrating SMART mode with both static and dynamic content, check out `docs/examples/smart_cache.py`.
+
+## Cache Mode Reference
+
+| Mode | Read from Cache | Write to Cache | Use Case |
+|------|----------------|----------------|----------|
+| `ENABLED` | ✓ | ✓ | Normal operation |
+| `DISABLED` | ✗ | ✗ | No caching needed |
+| `READ_ONLY` | ✓ | ✗ | Use existing cache only |
+| `WRITE_ONLY` | ✗ | ✓ | Refresh cache only |
+| `BYPASS` | ✗ | ✗ | Skip cache for this request |
+| `SMART` | ✓* | ✓ | Validate before using cache |
+
+*SMART mode reads from cache but validates it first with a HEAD request.
--- a/docs/md_v2/core/content-selection.md
+++ b/docs/md_v2/core/content-selection.md
@@ -350,22 +350,15 @@ if __name__ == "__main__":

 ## 6. Scraping Modes

-Crawl4AI uses `LXMLWebScrapingStrategy` (LXML-based) as the default scraping strategy for HTML content processing. This strategy offers excellent performance, especially for large HTML documents.
-
-**Note:** For backward compatibility, `WebScrapingStrategy` is still available as an alias for `LXMLWebScrapingStrategy`.
+Crawl4AI provides two different scraping strategies for HTML content processing: `WebScrapingStrategy` (BeautifulSoup-based, default) and `LXMLWebScrapingStrategy` (LXML-based). The LXML strategy offers significantly better performance, especially for large HTML documents.

 ```python
 from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, LXMLWebScrapingStrategy

 async def main():
-    # Default configuration already uses LXMLWebScrapingStrategy
-    config = CrawlerRunConfig()
-    
-    # Or explicitly specify it if desired
-    config_explicit = CrawlerRunConfig(
-        scraping_strategy=LXMLWebScrapingStrategy()
+    config = CrawlerRunConfig(
+        scraping_strategy=LXMLWebScrapingStrategy()  # Faster alternative to default BeautifulSoup
    )
-    
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url="https://example.com", 
@@ -424,20 +417,21 @@ class CustomScrapingStrategy(ContentScrapingStrategy):

 ### Performance Considerations

-The LXML strategy provides excellent performance, particularly when processing large HTML documents, offering up to 10-20x faster processing compared to BeautifulSoup-based approaches.
+The LXML strategy can be up to 10-20x faster than BeautifulSoup strategy, particularly when processing large HTML documents. However, please note:

-Benefits of LXML strategy:
- Fast processing of large HTML documents (especially >100KB)
- Efficient memory usage
- Good handling of well-formed HTML
- Robust table detection and extraction
+1. LXML strategy is currently experimental
+2. In some edge cases, the parsing results might differ slightly from BeautifulSoup
+3. If you encounter any inconsistencies between LXML and BeautifulSoup results, please [raise an issue](https://github.com/codeium/crawl4ai/issues) with a reproducible example

-### Backward Compatibility
+Choose LXML strategy when:
+- Processing large HTML documents (recommended for >100KB)
+- Performance is critical
+- Working with well-formed HTML

-For users upgrading from earlier versions:
- `WebScrapingStrategy` is now an alias for `LXMLWebScrapingStrategy`
- Existing code using `WebScrapingStrategy` will continue to work without modification
- No changes are required to your existing code
+Stick to BeautifulSoup strategy (default) when:
+- Maximum compatibility is needed
+- Working with malformed HTML
+- Exact parsing behavior is critical

 ---

--- a/docs/md_v2/core/crawler-result.md
+++ b/docs/md_v2/core/crawler-result.md
@@ -19,15 +19,13 @@ class MarkdownGenerationResult(BaseModel):
 class CrawlResult(BaseModel):
    url: str
    html: str
-    fit_html: Optional[str] = None
    success: bool
    cleaned_html: Optional[str] = None
    media: Dict[str, List[Dict]] = {}
    links: Dict[str, List[Dict]] = {}
    downloaded_files: Optional[List[str]] = None
-    js_execution_result: Optional[Dict[str, Any]] = None
    screenshot: Optional[str] = None
-    pdf: Optional[bytes] = None
+    pdf : Optional[bytes] = None
    mhtml: Optional[str] = None
    markdown: Optional[Union[str, MarkdownGenerationResult]] = None
    extracted_content: Optional[str] = None
@@ -37,12 +35,6 @@ class CrawlResult(BaseModel):
    response_headers: Optional[dict] = None
    status_code: Optional[int] = None
    ssl_certificate: Optional[SSLCertificate] = None
-    dispatch_result: Optional[DispatchResult] = None
-    redirected_url: Optional[str] = None
-    network_requests: Optional[List[Dict[str, Any]]] = None
-    console_messages: Optional[List[Dict[str, Any]]] = None
-    tables: List[Dict] = Field(default_factory=list)
-
    class Config:
        arbitrary_types_allowed = True
 ```
@@ -53,13 +45,11 @@ class CrawlResult(BaseModel):
 |-------------------------------------------|-----------------------------------------------------------------------------------------------------|
 | **url (`str`)**                           | The final or actual URL crawled (in case of redirects).                                             |
 | **html (`str`)**                          | Original, unmodified page HTML. Good for debugging or custom processing.                            |
-| **fit_html (`Optional[str]`)**            | Preprocessed HTML optimized for extraction and content filtering.                                    |
 | **success (`bool`)**                      | `True` if the crawl completed without major errors, else `False`.                                   |
 | **cleaned_html (`Optional[str]`)**        | Sanitized HTML with scripts/styles removed; can exclude tags if configured via `excluded_tags` etc. |
 | **media (`Dict[str, List[Dict]]`)**       | Extracted media info (images, audio, etc.), each with attributes like `src`, `alt`, `score`, etc.   |
 | **links (`Dict[str, List[Dict]]`)**       | Extracted link data, split by `internal` and `external`. Each link usually has `href`, `text`, etc. |
 | **downloaded_files (`Optional[List[str]]`)** | If `accept_downloads=True` in `BrowserConfig`, this lists the filepaths of saved downloads.         |
-| **js_execution_result (`Optional[Dict[str, Any]]`)** | Results from JavaScript execution during crawling. |
 | **screenshot (`Optional[str]`)**          | Screenshot of the page (base64-encoded) if `screenshot=True`.                                       |
 | **pdf (`Optional[bytes]`)**               | PDF of the page if `pdf=True`.                                                                      |
 | **mhtml (`Optional[str]`)**               | MHTML snapshot of the page if `capture_mhtml=True`. Contains the full page with all resources.      |
@@ -71,11 +61,6 @@ class CrawlResult(BaseModel):
 | **response_headers (`Optional[dict]`)**   | HTTP response headers, if captured.                                                                 |
 | **status_code (`Optional[int]`)**         | HTTP status code (e.g., 200 for OK).                                                                |
 | **ssl_certificate (`Optional[SSLCertificate]`)** | SSL certificate info if `fetch_ssl_certificate=True`.                                               |
-| **dispatch_result (`Optional[DispatchResult]`)** | Additional concurrency and resource usage information when crawling URLs in parallel.               |
-| **redirected_url (`Optional[str]`)**      | The URL after any redirects (different from `url` which is the final URL).                          |
-| **network_requests (`Optional[List[Dict[str, Any]]]`)** | List of network requests, responses, and failures captured during the crawl if `capture_network_requests=True`. |
-| **console_messages (`Optional[List[Dict[str, Any]]]`)** | List of browser console messages captured during the crawl if `capture_console_messages=True`.       |
-| **tables (`List[Dict]`)**                 | Table data extracted from HTML tables with structure `[{headers, rows, caption, summary}]`.           |

 ---

@@ -187,7 +172,7 @@ Here:

 ---

-## 5. More Fields: Links, Media, Tables and More
+## 5. More Fields: Links, Media, and More

 ### 5.1 `links`

@@ -207,77 +192,7 @@ for img in images:
    print("Image URL:", img["src"], "Alt:", img.get("alt"))
 ```

-### 5.3 `tables`
-
-The `tables` field contains structured data extracted from HTML tables found on the crawled page. Tables are analyzed based on various criteria to determine if they are actual data tables (as opposed to layout tables), including:
-
- Presence of thead and tbody sections
- Use of th elements for headers
- Column consistency
- Text density
- And other factors
-
-Tables that score above the threshold (default: 7) are extracted and stored in result.tables.
-
-### Accessing Table data:
-```python
-import asyncio
-from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
-
-async def main():
-    async with AsyncWebCrawler() as crawler:
-        result = await crawler.arun(
-            url="https://www.w3schools.com/html/html_tables.asp",
-            config=CrawlerRunConfig(
-                table_score_threshold=7  # Minimum score for table detection
-            )
-        )
-
-        if result.success and result.tables:
-            print(f"Found {len(result.tables)} tables")
-
-            for i, table in enumerate(result.tables):
-                print(f"\nTable {i+1}:")
-                print(f"Caption: {table.get('caption', 'No caption')}")
-                print(f"Headers: {table['headers']}")
-                print(f"Rows: {len(table['rows'])}")
-
-                # Print first few rows as example
-                for j, row in enumerate(table['rows'][:3]):
-                    print(f"  Row {j+1}: {row}")
-
-if __name__ == "__main__":
-    asyncio.run(main())
-
-```
-
-### Configuring Table Extraction:
-
-You can adjust the sensitivity of the table detection algorithm with:
-
-```python
-config = CrawlerRunConfig(
-    table_score_threshold=5  # Lower value = more tables detected (default: 7)
-)
-```
-
-Each extracted table contains: 
-
- `headers`: Column header names 
- `rows`: List of rows, each containing cell values
- `caption`: Table caption text (if available) 
- `summary`: Table summary attribute (if specified)
-
-### Table Extraction Tips
-
- Not all HTML tables are extracted - only those detected as "data tables" vs. layout tables.
- Tables with inconsistent cell counts, nested tables, or those used purely for layout may be skipped.
- If you're missing tables, try adjusting the `table_score_threshold` to a lower value (default is 7).
-
-The table detection algorithm scores tables based on features like consistent columns, presence of headers, text density, and more. Tables scoring above the threshold are considered data tables worth extracting.
-
-
-### 5.4 `screenshot`, `pdf`, and `mhtml`
+### 5.3 `screenshot`, `pdf`, and `mhtml`

 If you set `screenshot=True`, `pdf=True`, or `capture_mhtml=True` in **`CrawlerRunConfig`**, then:

@@ -298,7 +213,7 @@ if result.mhtml:

 The MHTML (MIME HTML) format is particularly useful as it captures the entire web page including all of its resources (CSS, images, scripts, etc.) in a single file, making it perfect for archiving or offline viewing.

-### 5.5 `ssl_certificate`
+### 5.4 `ssl_certificate`

 If `fetch_ssl_certificate=True`, `result.ssl_certificate` holds details about the site’s SSL cert, such as issuer, validity dates, etc.

--- a/docs/md_v2/core/docker-deployment.md
+++ b/docs/md_v2/core/docker-deployment.md
@@ -154,30 +154,6 @@ cp deploy/docker/.llm.env.example .llm.env
 # Now edit .llm.env and add your API keys
 ```

-**Flexible LLM Provider Configuration:**
-
-The Docker setup now supports flexible LLM provider configuration through three methods:
-
-1. **Environment Variable** (Highest Priority): Set `LLM_PROVIDER` to override the default
-   ```bash
-   export LLM_PROVIDER="anthropic/claude-3-opus"
-   # Or in your .llm.env file:
-   # LLM_PROVIDER=anthropic/claude-3-opus
-   ```
-
-2. **API Request Parameter**: Specify provider per request
-   ```json
-   {
-     "url": "https://example.com",
-     "f": "llm",
-     "provider": "groq/mixtral-8x7b"
-   }
-   ```
-
-3. **Config File Default**: Falls back to `config.yml` (default: `openai/gpt-4o-mini`)
-
-The system automatically selects the appropriate API key based on the configured `api_key_env` in the config file.
-
 #### 3. Build and Run with Compose

 The `docker-compose.yml` file in the project root provides a simplified approach that automatically handles architecture detection using buildx.
@@ -405,409 +381,6 @@ Executes JavaScript snippets on the specified URL and returns the full crawl res

 ---

-## User-Provided Hooks API
-
-The Docker API supports user-provided hook functions, allowing you to customize the crawling behavior by injecting your own Python code at specific points in the crawling pipeline. This powerful feature enables authentication, performance optimization, and custom content extraction without modifying the server code.
-
-> ⚠️ **IMPORTANT SECURITY WARNING**: 
-> - **Never use hooks with untrusted code or on untrusted websites**
-> - **Be extremely careful when crawling sites that might be phishing or malicious**
-> - **Hook code has access to page context and can interact with the website**
-> - **Always validate and sanitize any data extracted through hooks**
-> - **Never expose credentials or sensitive data in hook code**
-> - **Consider running the Docker container in an isolated network when testing**
-
-### Hook Information Endpoint
-
-```
-GET /hooks/info
-```
-
-Returns information about available hook points and their signatures:
-
-```bash
-curl http://localhost:11235/hooks/info
-```
-
-### Available Hook Points
-
-The API supports 8 hook points that match the local SDK:
-
-| Hook Point | Parameters | Description | Best Use Cases |
-|------------|------------|-------------|----------------|
-| `on_browser_created` | `browser` | After browser instance creation | Light setup tasks |
-| `on_page_context_created` | `page, context` | After page/context creation | **Authentication, cookies, route blocking** |
-| `before_goto` | `page, context, url` | Before navigating to URL | Custom headers, logging |
-| `after_goto` | `page, context, url, response` | After navigation completes | Verification, waiting for elements |
-| `on_user_agent_updated` | `page, context, user_agent` | When user agent changes | UA-specific logic |
-| `on_execution_started` | `page, context` | When JS execution begins | JS-related setup |
-| `before_retrieve_html` | `page, context` | Before getting final HTML | **Scrolling, lazy loading** |
-| `before_return_html` | `page, context, html` | Before returning HTML | Final modifications, metrics |
-
-### Using Hooks in Requests
-
-Add hooks to any crawl request by including the `hooks` parameter:
-
-```json
-{
-  "urls": ["https://httpbin.org/html"],
-  "hooks": {
-    "code": {
-      "hook_point_name": "async def hook(...): ...",
-      "another_hook": "async def hook(...): ..."
-    },
-    "timeout": 30  // Optional, default 30 seconds (max 120)
-  }
-}
-```
-
-### Hook Examples with Real URLs
-
-#### 1. Authentication with Cookies (GitHub)
-
-```python
-import requests
-
-# Example: Setting GitHub session cookie (use your actual session)
-hooks_code = {
-    "on_page_context_created": """
-async def hook(page, context, **kwargs):
-    # Add authentication cookies for GitHub
-    # WARNING: Never hardcode real credentials!
-    await context.add_cookies([
-        {
-            'name': 'user_session',
-            'value': 'your_github_session_token',  # Replace with actual token
-            'domain': '.github.com',
-            'path': '/',
-            'httpOnly': True,
-            'secure': True,
-            'sameSite': 'Lax'
-        }
-    ])
-    return page
-"""
-}
-
-response = requests.post("http://localhost:11235/crawl", json={
-    "urls": ["https://github.com/settings/profile"],  # Protected page
-    "hooks": {"code": hooks_code, "timeout": 30}
-})
-```
-
-#### 2. Basic Authentication (httpbin.org for testing)
-
-```python
-# Safe testing with httpbin.org (a service designed for HTTP testing)
-hooks_code = {
-    "before_goto": """
-async def hook(page, context, url, **kwargs):
-    import base64
-    # httpbin.org/basic-auth expects username="user" and password="passwd"
-    credentials = base64.b64encode(b"user:passwd").decode('ascii')
-    
-    await page.set_extra_http_headers({
-        'Authorization': f'Basic {credentials}'
-    })
-    return page
-"""
-}
-
-response = requests.post("http://localhost:11235/crawl", json={
-    "urls": ["https://httpbin.org/basic-auth/user/passwd"],
-    "hooks": {"code": hooks_code, "timeout": 15}
-})
-```
-
-#### 3. Performance Optimization (News Sites)
-
-```python
-# Example: Optimizing crawling of news sites like CNN or BBC
-hooks_code = {
-    "on_page_context_created": """
-async def hook(page, context, **kwargs):
-    # Block images, fonts, and media to speed up crawling
-    await context.route("**/*.{png,jpg,jpeg,gif,webp,svg,ico}", lambda route: route.abort())
-    await context.route("**/*.{woff,woff2,ttf,otf,eot}", lambda route: route.abort())
-    await context.route("**/*.{mp4,webm,ogg,mp3,wav,flac}", lambda route: route.abort())
-    
-    # Block common tracking and ad domains
-    await context.route("**/googletagmanager.com/*", lambda route: route.abort())
-    await context.route("**/google-analytics.com/*", lambda route: route.abort())
-    await context.route("**/doubleclick.net/*", lambda route: route.abort())
-    await context.route("**/facebook.com/tr/*", lambda route: route.abort())
-    await context.route("**/amazon-adsystem.com/*", lambda route: route.abort())
-    
-    # Disable CSS animations for faster rendering
-    await page.add_style_tag(content='''
-        *, *::before, *::after {
-            animation-duration: 0s !important;
-            transition-duration: 0s !important;
-        }
-    ''')
-    
-    return page
-"""
-}
-
-response = requests.post("http://localhost:11235/crawl", json={
-    "urls": ["https://www.bbc.com/news"],  # Heavy news site
-    "hooks": {"code": hooks_code, "timeout": 30}
-})
-```
-
-#### 4. Handling Infinite Scroll (Twitter/X)
-
-```python
-# Example: Scrolling on Twitter/X (requires authentication)
-hooks_code = {
-    "before_retrieve_html": """
-async def hook(page, context, **kwargs):
-    # Scroll to load more tweets
-    previous_height = 0
-    for i in range(5):  # Limit scrolls to avoid infinite loop
-        current_height = await page.evaluate("document.body.scrollHeight")
-        if current_height == previous_height:
-            break  # No more content to load
-            
-        await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
-        await page.wait_for_timeout(2000)  # Wait for content to load
-        previous_height = current_height
-    
-    return page
-"""
-}
-
-# Note: Twitter requires authentication for most content
-response = requests.post("http://localhost:11235/crawl", json={
-    "urls": ["https://twitter.com/nasa"],  # Public profile
-    "hooks": {"code": hooks_code, "timeout": 30}
-})
-```
-
-#### 5. E-commerce Login (Example Pattern)
-
-```python
-# SECURITY WARNING: This is a pattern example. 
-# Never use real credentials in code!
-# Always use environment variables or secure vaults.
-
-hooks_code = {
-    "on_page_context_created": """
-async def hook(page, context, **kwargs):
-    # Example pattern for e-commerce sites
-    # DO NOT use real credentials here!
-    
-    # Navigate to login page first
-    await page.goto("https://example-shop.com/login")
-    
-    # Wait for login form to load
-    await page.wait_for_selector("#email", timeout=5000)
-    
-    # Fill login form (use environment variables in production!)
-    await page.fill("#email", "test@example.com")  # Never use real email
-    await page.fill("#password", "test_password")   # Never use real password
-    
-    # Handle "Remember Me" checkbox if present
-    try:
-        await page.uncheck("#remember_me")  # Don't remember on shared systems
-    except:
-        pass
-    
-    # Submit form
-    await page.click("button[type='submit']")
-    
-    # Wait for redirect after login
-    await page.wait_for_url("**/account/**", timeout=10000)
-    
-    return page
-"""
-}
-```
-
-#### 6. Extracting Structured Data (Wikipedia)
-
-```python
-# Safe example using Wikipedia
-hooks_code = {
-    "after_goto": """
-async def hook(page, context, url, response, **kwargs):
-    # Wait for Wikipedia content to load
-    await page.wait_for_selector("#content", timeout=5000)
-    return page
-""",
-    
-    "before_retrieve_html": """
-async def hook(page, context, **kwargs):
-    # Extract structured data from Wikipedia infobox
-    metadata = await page.evaluate('''() => {
-        const infobox = document.querySelector('.infobox');
-        if (!infobox) return null;
-        
-        const data = {};
-        const rows = infobox.querySelectorAll('tr');
-        
-        rows.forEach(row => {
-            const header = row.querySelector('th');
-            const value = row.querySelector('td');
-            if (header && value) {
-                data[header.innerText.trim()] = value.innerText.trim();
-            }
-        });
-        
-        return data;
-    }''')
-    
-    if metadata:
-        console.log("Extracted metadata:", metadata)
-    
-    return page
-"""
-}
-
-response = requests.post("http://localhost:11235/crawl", json={
-    "urls": ["https://en.wikipedia.org/wiki/Python_(programming_language)"],
-    "hooks": {"code": hooks_code, "timeout": 20}
-})
-```
-
-### Security Best Practices
-
-> 🔒 **Critical Security Guidelines**:
-
-1. **Never Trust User Input**: If accepting hook code from users, always validate and sandbox it
-2. **Avoid Phishing Sites**: Never use hooks on suspicious or unverified websites
-3. **Protect Credentials**: 
-   - Never hardcode passwords, tokens, or API keys in hook code
-   - Use environment variables or secure secret management
-   - Rotate credentials regularly
-4. **Network Isolation**: Run the Docker container in an isolated network when testing
-5. **Audit Hook Code**: Always review hook code before execution
-6. **Limit Permissions**: Use the least privileged access needed
-7. **Monitor Execution**: Check hook execution logs for suspicious behavior
-8. **Timeout Protection**: Always set reasonable timeouts (default 30s)
-
-### Hook Response Information
-
-When hooks are used, the response includes detailed execution information:
-
-```json
-{
-  "success": true,
-  "results": [...],
-  "hooks": {
-    "status": {
-      "status": "success",  // or "partial" or "failed"
-      "attached_hooks": ["on_page_context_created", "before_retrieve_html"],
-      "validation_errors": [],
-      "successfully_attached": 2,
-      "failed_validation": 0
-    },
-    "execution_log": [
-      {
-        "hook_point": "on_page_context_created",
-        "status": "success",
-        "execution_time": 0.523,
-        "timestamp": 1234567890.123
-      }
-    ],
-    "errors": [],  // Any runtime errors
-    "summary": {
-      "total_executions": 2,
-      "successful": 2,
-      "failed": 0,
-      "timed_out": 0,
-      "success_rate": 100.0
-    }
-  }
-}
-```
-
-### Error Handling
-
-The hooks system is designed to be resilient:
-
-1. **Validation Errors**: Caught before execution (syntax errors, wrong parameters)
-2. **Runtime Errors**: Handled gracefully - crawl continues with original page object
-3. **Timeout Protection**: Hooks automatically terminated after timeout (configurable 1-120s)
-
-### Complete Example: Safe Multi-Hook Crawling
-
-```python
-import requests
-import json
-import os
-
-# Safe example using httpbin.org for testing
-hooks_code = {
-    "on_page_context_created": """
-async def hook(page, context, **kwargs):
-    # Set viewport and test cookies
-    await page.set_viewport_size({"width": 1920, "height": 1080})
-    await context.add_cookies([
-        {"name": "test_cookie", "value": "test_value", "domain": ".httpbin.org", "path": "/"}
-    ])
-    
-    # Block unnecessary resources for httpbin
-    await context.route("**/*.{png,jpg,jpeg}", lambda route: route.abort())
-    return page
-""",
-    
-    "before_goto": """
-async def hook(page, context, url, **kwargs):
-    # Add custom headers for testing
-    await page.set_extra_http_headers({
-        "X-Test-Header": "crawl4ai-test",
-        "Accept-Language": "en-US,en;q=0.9"
-    })
-    print(f"[HOOK] Navigating to: {url}")
-    return page
-""",
-    
-    "before_retrieve_html": """
-async def hook(page, context, **kwargs):
-    # Simple scroll for any lazy-loaded content
-    await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
-    await page.wait_for_timeout(1000)
-    return page
-"""
-}
-
-# Make the request to safe testing endpoints
-response = requests.post("http://localhost:11235/crawl", json={
-    "urls": [
-        "https://httpbin.org/html",
-        "https://httpbin.org/json"
-    ],
-    "hooks": {
-        "code": hooks_code,
-        "timeout": 30
-    },
-    "crawler_config": {
-        "cache_mode": "bypass"
-    }
-})
-
-# Check results
-if response.status_code == 200:
-    data = response.json()
-    
-    # Check hook execution
-    if data['hooks']['status']['status'] == 'success':
-        print(f"✅ All {len(data['hooks']['status']['attached_hooks'])} hooks executed successfully")
-        print(f"Execution stats: {data['hooks']['summary']}")
-    
-    # Process crawl results
-    for result in data['results']:
-        print(f"Crawled: {result['url']} - Success: {result['success']}")
-else:
-    print(f"Error: {response.status_code}")
-```
-
-> 💡 **Remember**: Always test your hooks on safe, known websites first before using them on production sites. Never crawl sites that you don't have permission to access or that might be malicious.
-
---
-
 ## Dockerfile Parameters

 You can customize the image build process using build arguments (`--build-arg`). These are typically used via `docker buildx build` or within the `docker-compose.yml` file.
@@ -1095,7 +668,7 @@ app:

 # Default LLM Configuration
 llm:
-  provider: "openai/gpt-4o-mini"  # Can be overridden by LLM_PROVIDER env var
+  provider: "openai/gpt-4o-mini"
  api_key_env: "OPENAI_API_KEY"
  # api_key: sk-...  # If you pass the API key directly then api_key_env will be ignored

--- a/docs/md_v2/core/examples.md
+++ b/docs/md_v2/core/examples.md
@@ -28,12 +28,21 @@ This page provides a comprehensive list of example scripts that demonstrate vari
 | Example | Description | Link |
 |---------|-------------|------|
 | Deep Crawling | An extensive tutorial on deep crawling capabilities, demonstrating BFS and BestFirst strategies, stream vs. non-stream execution, filters, scorers, and advanced configurations. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/deepcrawl_example.py) |
+<<<<<<< HEAD
 | Virtual Scroll | Comprehensive examples for handling virtualized scrolling on sites like Twitter, Instagram. Demonstrates different scrolling scenarios with local test server. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/virtual_scroll_example.py) |
+=======
 | Adaptive Crawling | Demonstrates intelligent crawling that automatically determines when sufficient information has been gathered. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/adaptive_crawling/) |
+>>>>>>> feature/progressive-crawling
 | Dispatcher | Shows how to use the crawl dispatcher for advanced workload management. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/dispatcher_example.py) |
 | Storage State | Tutorial on managing browser storage state for persistence. | [View Guide](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/storage_state_tutorial.md) |
 | Network Console Capture | Demonstrates how to capture and analyze network requests and console logs. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/network_console_capture_example.py) |

+## Caching & Performance
+
+| Example | Description | Link |
+|---------|-------------|------|
+| SMART Cache Mode | Demonstrates the intelligent SMART cache mode that validates cached content using HEAD requests, saving 70-95% bandwidth while ensuring fresh content. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/smart_cache.py) |
+
 ## Extraction Strategies

 | Example | Description | Link |
@@ -54,16 +63,6 @@ This page provides a comprehensive list of example scripts that demonstrate vari
 | Crypto Analysis | Demonstrates how to crawl and analyze cryptocurrency data. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/crypto_analysis_example.py) |
 | SERP API | Demonstrates using Crawl4AI with search engine result pages. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/serp_api_project_11_feb.py) |

-## Anti-Bot & Stealth Features
-
-| Example | Description | Link |
-|---------|-------------|------|
-| Stealth Mode Quick Start | Five practical examples showing how to use stealth mode for bypassing basic bot detection. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/stealth_mode_quick_start.py) |
-| Stealth Mode Comprehensive | Comprehensive demonstration of stealth mode features with bot detection testing and comparisons. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/stealth_mode_example.py) |
-| Undetected Browser | Simple example showing how to use the undetected browser adapter. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/hello_world_undetected.py) |
-| Undetected Browser Demo | Basic demo comparing regular and undetected browser modes. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/undetected_simple_demo.py) |
-| Undetected Tests | Advanced tests comparing regular vs undetected browsers on various bot detection services. | [View Folder](https://github.com/unclecode/crawl4ai/tree/main/docs/examples/undetectability/) |
-
 ## Customization & Security

 | Example | Description | Link |
@@ -124,4 +123,4 @@ Some examples may require:

 ## Contributing New Examples

-If you've created an interesting example that demonstrates a unique use case or feature of Crawl4AI, we encourage you to contribute it to our examples collection. Please see our [contribution guidelines](https://github.com/unclecode/crawl4ai/blob/main/CONTRIBUTORS.md) for more information.
+If you've created an interesting example that demonstrates a unique use case or feature of Crawl4AI, we encourage you to contribute it to our examples collection. Please see our [contribution guidelines](https://github.com/unclecode/crawl4ai/blob/main/CONTRIBUTORS.md) for more information.
--- a/docs/md_v2/core/installation.md
+++ b/docs/md_v2/core/installation.md
@@ -18,7 +18,7 @@ crawl4ai-setup
 ```

 **What does it do?**
- Installs or updates required browser dependencies for both regular and undetected modes
+- Installs or updates required Playwright browsers (Chromium, Firefox, etc.)
 - Performs OS-level checks (e.g., missing libs on Linux)
 - Confirms your environment is ready to crawl

--- a/docs/md_v2/core/link-media.md
+++ b/docs/md_v2/core/link-media.md
@@ -520,8 +520,7 @@ This approach is handy when you still want external links but need to block cert

 ### 4.1 Accessing `result.media`

-By default, Crawl4AI collects images, audio and video URLs it finds on the page. These are stored in `result.media`, a dictionary keyed by media type (e.g., `images`, `videos`, `audio`).
-**Note: Tables have been moved from `result.media["tables"]` to the new `result.tables` format for better organization and direct access.**
+By default, Crawl4AI collects images, audio, video URLs, and data tables it finds on the page. These are stored in `result.media`, a dictionary keyed by media type (e.g., `images`, `videos`, `audio`, `tables`).

 **Basic Example**:

@@ -535,6 +534,14 @@ if result.success:
        print(f"           Alt text: {img.get('alt', '')}")
        print(f"           Score: {img.get('score')}")
        print(f"           Description: {img.get('desc', '')}\n")
+    
+    # Get tables
+    tables = result.media.get("tables", [])
+    print(f"Found {len(tables)} data tables in total.")
+    for i, table in enumerate(tables):
+        print(f"[Table {i}] Caption: {table.get('caption', 'No caption')}")
+        print(f"           Columns: {len(table.get('headers', []))}")
+        print(f"           Rows: {len(table.get('rows', []))}")
 ```

 **Structure Example**:
@@ -561,6 +568,19 @@ result.media = {
  "audio": [
    # Similar structure but with audio-specific fields
  ],
+  "tables": [
+    {
+      "headers": ["Name", "Age", "Location"],
+      "rows": [
+        ["John Doe", "34", "New York"],
+        ["Jane Smith", "28", "San Francisco"],
+        ["Alex Johnson", "42", "Chicago"]
+      ],
+      "caption": "Employee Directory",
+      "summary": "Directory of company employees"
+    },
+    # More tables if present
+  ]
 }
 ```

@@ -588,7 +608,53 @@ crawler_cfg = CrawlerRunConfig(

 This setting attempts to discard images from outside the primary domain, keeping only those from the site you’re crawling.

-### 4.3 Additional Media Config
+### 3.3 Working with Tables
+
+Crawl4AI can detect and extract structured data from HTML tables. Tables are analyzed based on various criteria to determine if they are actual data tables (as opposed to layout tables), including:
+
+- Presence of thead and tbody sections
+- Use of th elements for headers
+- Column consistency
+- Text density
+- And other factors
+
+Tables that score above the threshold (default: 7) are extracted and stored in `result.media.tables`.
+
+**Accessing Table Data**:
+
+```python
+if result.success:
+    tables = result.media.get("tables", [])
+    print(f"Found {len(tables)} data tables on the page")
+    
+    if tables:
+        # Access the first table
+        first_table = tables[0]
+        print(f"Table caption: {first_table.get('caption', 'No caption')}")
+        print(f"Headers: {first_table.get('headers', [])}")
+        
+        # Print the first 3 rows
+        for i, row in enumerate(first_table.get('rows', [])[:3]):
+            print(f"Row {i+1}: {row}")
+```
+
+**Configuring Table Extraction**:
+
+You can adjust the sensitivity of the table detection algorithm with:
+
+```python
+crawler_cfg = CrawlerRunConfig(
+    table_score_threshold=5  # Lower value = more tables detected (default: 7)
+)
+```
+
+Each extracted table contains:
+- `headers`: Column header names
+- `rows`: List of rows, each containing cell values
+- `caption`: Table caption text (if available)
+- `summary`: Table summary attribute (if specified)
+
+### 3.4 Additional Media Config

 - **`screenshot`**: Set to `True` if you want a full-page screenshot stored as `base64` in `result.screenshot`.  
 - **`pdf`**: Set to `True` if you want a PDF version of the page in `result.pdf`.  
@@ -629,7 +695,7 @@ The MHTML format is particularly useful because:

 ---

-## 5. Putting It All Together: Link & Media Filtering
+## 4. Putting It All Together: Link & Media Filtering

 Here’s a combined example demonstrating how to filter out external links, skip certain domains, and exclude external images:

@@ -677,7 +743,7 @@ if __name__ == "__main__":

 ---

-## 6. Common Pitfalls & Tips
+## 5. Common Pitfalls & Tips

 1. **Conflicting Flags**:  
   - `exclude_external_links=True` but then also specifying `exclude_social_media_links=True` is typically fine, but understand that the first setting already discards *all* external links. The second becomes somewhat redundant.  
@@ -696,3 +762,10 @@ if __name__ == "__main__":
 ---

 **That’s it for Link & Media Analysis!** You’re now equipped to filter out unwanted sites and zero in on the images and videos that matter for your project.
+### Table Extraction Tips
+
+- Not all HTML tables are extracted - only those detected as "data tables" vs. layout tables.
+- Tables with inconsistent cell counts, nested tables, or those used purely for layout may be skipped.
+- If you're missing tables, try adjusting the `table_score_threshold` to a lower value (default is 7).
+
+The table detection algorithm scores tables based on features like consistent columns, presence of headers, text density, and more. Tables scoring above the threshold are considered data tables worth extracting.
--- a/docs/md_v2/core/quickstart.md
+++ b/docs/md_v2/core/quickstart.md
@@ -79,7 +79,7 @@ if __name__ == "__main__":
    asyncio.run(main())
 ```

-> IMPORTANT: By default cache mode is set to `CacheMode.ENABLED`. So to have fresh content, you need to set it to `CacheMode.BYPASS`
+> IMPORTANT: By default cache mode is set to `CacheMode.ENABLED`. So to have fresh content, you need to set it to `CacheMode.BYPASS`. For intelligent caching that validates content before using cache, use the new `CacheMode.SMART` - it saves bandwidth while ensuring fresh content.

 We’ll explore more advanced config in later tutorials (like enabling proxies, PDF output, multi-tab sessions, etc.). For now, just note how you pass these objects to manage crawling.

--- a/docs/md_v2/migration/webscraping-strategy-migration.md
+++ b/docs/md_v2/migration/webscraping-strategy-migration.md
@@ -1,92 +0,0 @@
-# WebScrapingStrategy Migration Guide
-
-## Overview
-
-Crawl4AI has simplified its content scraping architecture. The BeautifulSoup-based `WebScrapingStrategy` has been deprecated in favor of the faster LXML-based implementation. However, **no action is required** - your existing code will continue to work.
-
-## What Changed?
-
-1. **`WebScrapingStrategy` is now an alias** for `LXMLWebScrapingStrategy`
-2. **The BeautifulSoup implementation has been removed** (~1000 lines of redundant code)
-3. **`LXMLWebScrapingStrategy` inherits directly** from `ContentScrapingStrategy`
-4. **Performance remains optimal** with LXML as the sole implementation
-
-## Backward Compatibility
-
-**Your existing code continues to work without any changes:**
-
-```python
-# This still works perfectly
-from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, WebScrapingStrategy
-
-config = CrawlerRunConfig(
-    scraping_strategy=WebScrapingStrategy()  # Works as before
-)
-```
-
-## Migration Options
-
-You have three options:
-
-### Option 1: Do Nothing (Recommended)
-Your code will continue to work. `WebScrapingStrategy` is permanently aliased to `LXMLWebScrapingStrategy`.
-
-### Option 2: Update Imports (Optional)
-For clarity, you can update your imports:
-
-```python
-# Old (still works)
-from crawl4ai import WebScrapingStrategy
-strategy = WebScrapingStrategy()
-
-# New (more explicit)
-from crawl4ai import LXMLWebScrapingStrategy
-strategy = LXMLWebScrapingStrategy()
-```
-
-### Option 3: Use Default Configuration
-Since `LXMLWebScrapingStrategy` is the default, you can omit the strategy parameter:
-
-```python
-# Simplest approach - uses LXMLWebScrapingStrategy by default
-config = CrawlerRunConfig()
-```
-
-## Type Hints
-
-If you use type hints, both work:
-
-```python
-from crawl4ai import WebScrapingStrategy, LXMLWebScrapingStrategy
-
-def process_with_strategy(strategy: WebScrapingStrategy) -> None:
-    # Works with both WebScrapingStrategy and LXMLWebScrapingStrategy
-    pass
-
-# Both are valid
-process_with_strategy(WebScrapingStrategy())
-process_with_strategy(LXMLWebScrapingStrategy())
-```
-
-## Subclassing
-
-If you've subclassed `WebScrapingStrategy`, it continues to work:
-
-```python
-class MyCustomStrategy(WebScrapingStrategy):
-    def __init__(self):
-        super().__init__()
-        # Your custom code
-```
-
-## Performance Benefits
-
-By consolidating to LXML:
- **10-20x faster** HTML parsing for large documents
- **Lower memory usage**
- **Consistent behavior** across all use cases
- **Simplified maintenance** and bug fixes
-
-## Summary
-
-This change simplifies Crawl4AI's internals while maintaining 100% backward compatibility. Your existing code continues to work, and you get better performance automatically.
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -45,7 +45,6 @@ nav:
    - "Lazy Loading": "advanced/lazy-loading.md"
    - "Hooks & Auth": "advanced/hooks-auth.md"
    - "Proxy & Security": "advanced/proxy-security.md"
-    - "Undetected Browser": "advanced/undetected-browser.md"
    - "Session Management": "advanced/session-management.md"
    - "Multi-URL Crawling": "advanced/multi-url-crawling.md"
    - "Crawl Dispatcher": "advanced/crawl-dispatcher.md"
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -13,34 +13,32 @@ authors = [
    {name = "Unclecode", email = "unclecode@kidocode.com"}
 ]
 dependencies = [
-    "aiofiles>=24.1.0",
-    "aiohttp>=3.11.11",
    "aiosqlite~=0.20",
-    "anyio>=4.0.0",
    "lxml~=5.3",
    "litellm>=1.53.1",
    "numpy>=1.26.0,<3",
    "pillow>=10.4",
    "playwright>=1.49.0",
-    "patchright>=1.49.0",
    "python-dotenv~=1.0",
    "requests~=2.26",
    "beautifulsoup4~=4.12",
    "tf-playwright-stealth>=1.1.0",
    "xxhash~=3.4",
    "rank-bm25~=0.2",
+    "aiofiles>=24.1.0",
    "snowballstemmer~=2.2",
    "pydantic>=2.10",
    "pyOpenSSL>=24.3.0",
    "psutil>=6.1.1",
-    "PyYAML>=6.0",
-    "nltk>=3.9.1",
    "rich>=13.9.4",
+    "cssselect>=1.2.0",
    "httpx>=0.27.2",
    "httpx[http2]>=0.27.2",
    "fake-useragent>=2.0.3",
    "click>=8.1.7",
+    "pyperclip>=1.8.2",
    "chardet>=5.2.0",
+    "aiohttp>=3.11.11",
    "brotli>=1.1.0",
    "humanize>=4.10.0",
    "lark>=1.2.2",
@@ -59,20 +57,20 @@ classifiers = [
 ]

 [project.optional-dependencies]
-pdf = ["PyPDF2"]  
-torch = ["torch", "nltk", "scikit-learn"]
-transformer = ["transformers", "tokenizers", "sentence-transformers"]
-cosine = ["torch", "transformers", "nltk", "sentence-transformers"]
-sync = ["selenium"]
+pdf = ["pypdf>=3.0.0"]  # PyPDF2 is deprecated, use pypdf instead
+torch = ["torch>=2.0.0", "nltk>=3.9.1", "scikit-learn>=1.3.0"]
+transformer = ["transformers>=4.34.0", "tokenizers>=0.15.0", "sentence-transformers>=2.2.0"]
+cosine = ["torch>=2.0.0", "transformers>=4.34.0", "nltk>=3.9.1", "sentence-transformers>=2.2.0"]
+sync = ["selenium>=4.0.0"]
 all = [
-    "PyPDF2",
-    "torch",
-    "nltk",
-    "scikit-learn",
-    "transformers",
-    "tokenizers",
-    "sentence-transformers",
-    "selenium"
+    "pypdf>=3.0.0",
+    "torch>=2.0.0",
+    "nltk>=3.9.1",
+    "scikit-learn>=1.3.0",
+    "transformers>=4.34.0",
+    "tokenizers>=0.15.0",
+    "sentence-transformers>=2.2.0",
+    "selenium>=4.0.0"
 ]

 [project.scripts]
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,32 +1,30 @@
 # Note: These requirements are also specified in pyproject.toml
 # This file is kept for development environment setup and compatibility
-aiofiles>=24.1.0
-aiohttp>=3.11.11
 aiosqlite~=0.20
-anyio>=4.0.0
 lxml~=5.3
 litellm>=1.53.1
 numpy>=1.26.0,<3
 pillow>=10.4
 playwright>=1.49.0
-patchright>=1.49.0
 python-dotenv~=1.0
 requests~=2.26
 beautifulsoup4~=4.12
 tf-playwright-stealth>=1.1.0
 xxhash~=3.4
 rank-bm25~=0.2
+aiofiles>=24.1.0
 colorama~=0.4
 snowballstemmer~=2.2
 pydantic>=2.10
 pyOpenSSL>=24.3.0
 psutil>=6.1.1
-PyYAML>=6.0
 nltk>=3.9.1
 rich>=13.9.4
+cssselect>=1.2.0
 chardet>=5.2.0
 brotli>=1.1.0
 httpx[http2]>=0.27.2
+sentence-transformers>=2.2.0
 alphashape>=1.3.1
 shapely>=2.0.0

--- a/tests/adaptive/test_adaptive_crawler.py
+++ b/tests/adaptive/test_adaptive_crawler.py
@@ -23,7 +23,7 @@ from crawl4ai import (
    AsyncWebCrawler,
    AdaptiveCrawler,
    AdaptiveConfig,
-    CrawlState
+    AdaptiveCrawlResult
 )


--- a/tests/adaptive/test_confidence_debug.py
+++ b/tests/adaptive/test_confidence_debug.py
@@ -13,7 +13,7 @@ import math
 sys.path.append(str(Path(__file__).parent.parent))

 from crawl4ai import AsyncWebCrawler
-from crawl4ai.adaptive_crawler import CrawlState, StatisticalStrategy
+from crawl4ai.adaptive_crawler import AdaptiveCrawlResult, StatisticalStrategy
 from crawl4ai.models import CrawlResult


@@ -37,7 +37,7 @@ class ConfidenceTestHarness:
        print("=" * 80)
        
        # Initialize state
-        state = CrawlState(query=self.query)
+        state = AdaptiveCrawlResult(query=self.query)
        
        # Create crawler
        async with AsyncWebCrawler() as crawler:
@@ -107,7 +107,7 @@ class ConfidenceTestHarness:
                
                state.metrics['prev_confidence'] = confidence
    
-    def _debug_coverage_calculation(self, state: CrawlState, query_terms: List[str]):
+    def _debug_coverage_calculation(self, state: AdaptiveCrawlResult, query_terms: List[str]):
        """Debug coverage calculation step by step"""
        coverage_score = 0.0
        max_possible_score = 0.0
@@ -136,7 +136,7 @@ class ConfidenceTestHarness:
        new_coverage = self._calculate_coverage_new(state, query_terms)
        print(f"    → New Coverage: {new_coverage:.3f}")
    
-    def _calculate_coverage_new(self, state: CrawlState, query_terms: List[str]) -> float:
+    def _calculate_coverage_new(self, state: AdaptiveCrawlResult, query_terms: List[str]) -> float:
        """New coverage calculation without IDF"""
        if not query_terms or state.total_documents == 0:
            return 0.0
--- a/tests/adaptive/test_embedding_performance.py
+++ b/tests/adaptive/test_embedding_performance.py
@@ -15,7 +15,7 @@ import os
 sys.path.append(str(Path(__file__).parent.parent.parent))

 from crawl4ai import AsyncWebCrawler, AdaptiveCrawler, AdaptiveConfig
-from crawl4ai.adaptive_crawler import EmbeddingStrategy, CrawlState
+from crawl4ai.adaptive_crawler import EmbeddingStrategy, AdaptiveCrawlResult
 from crawl4ai.models import CrawlResult


@@ -132,7 +132,7 @@ async def test_embedding_performance():
    strategy.config = config
    
    # Initialize state
-    state = CrawlState()
+    state = AdaptiveCrawlResult()
    state.query = "async await coroutines event loops tasks"
    
    # Start performance monitoring
--- a/tests/adaptive/test_embedding_strategy.py
+++ b/tests/adaptive/test_embedding_strategy.py
@@ -20,7 +20,7 @@ from crawl4ai import (
    AsyncWebCrawler,
    AdaptiveCrawler,
    AdaptiveConfig,
-    CrawlState
+    AdaptiveCrawlResult
 )

 console = Console()
--- a/tests/async/test_content_scraper_strategy.py
+++ b/tests/async/test_content_scraper_strategy.py
@@ -12,8 +12,11 @@ parent_dir = os.path.dirname(
 sys.path.append(parent_dir)
 __location__ = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__)))

-from crawl4ai.content_scraping_strategy import LXMLWebScrapingStrategy
-# This test compares the same strategy with itself now since WebScrapingStrategy is deprecated
+from crawl4ai.content_scraping_strategy import WebScrapingStrategy
+from crawl4ai.content_scraping_strategy import (
+    WebScrapingStrategy as WebScrapingStrategyCurrent,
+)
+# from crawl4ai.content_scrapping_strategy_current import WebScrapingStrategy as WebScrapingStrategyCurrent


@dataclass
@@ -29,8 +32,8 @@ class TestResult:

 class StrategyTester:
    def __init__(self):
-        self.new_scraper = LXMLWebScrapingStrategy()
-        self.current_scraper = LXMLWebScrapingStrategy()  # Same strategy now
+        self.new_scraper = WebScrapingStrategy()
+        self.current_scraper = WebScrapingStrategyCurrent()
        with open(__location__ + "/sample_wikipedia.html", "r", encoding="utf-8") as f:
            self.WIKI_HTML = f.read()
        self.results = {"new": [], "current": []}
--- a/tests/check_dependencies.py
+++ b/tests/check_dependencies.py
@@ -1,344 +0,0 @@
-#!/usr/bin/env python3
-"""
-Dependency checker for Crawl4AI
-Analyzes imports in the codebase and shows which files use them
-"""
-
-import ast
-import os
-import sys
-from pathlib import Path
-from typing import Set, Dict, List, Tuple
-from collections import defaultdict
-import re
-import toml
-
-# Standard library modules to ignore
-STDLIB_MODULES = {
-    'abc', 'argparse', 'asyncio', 'base64', 'collections', 'concurrent', 'contextlib',
-    'copy', 'datetime', 'decimal', 'email', 'enum', 'functools', 'glob', 'hashlib',
-    'http', 'importlib', 'io', 'itertools', 'json', 'logging', 'math', 'mimetypes',
-    'multiprocessing', 'os', 'pathlib', 'pickle', 'platform', 'pprint', 'random',
-    're', 'shutil', 'signal', 'socket', 'sqlite3', 'string', 'subprocess', 'sys',
-    'tempfile', 'threading', 'time', 'traceback', 'typing', 'unittest', 'urllib',
-    'uuid', 'warnings', 'weakref', 'xml', 'zipfile', 'dataclasses', 'secrets',
-    'statistics', 'textwrap', 'queue', 'csv', 'gzip', 'tarfile', 'configparser',
-    'inspect', 'operator', 'struct', 'binascii', 'codecs', 'locale', 'gc',
-    'atexit', 'builtins', 'html', 'errno', 'fcntl', 'pwd', 'grp', 'resource',
-    'termios', 'tty', 'pty', 'select', 'selectors', 'ssl', 'zlib', 'bz2',
-    'lzma', 'types', 'copy', 'pydoc', 'profile', 'cProfile', 'timeit',
-    'trace', 'doctest', 'pdb', 'contextvars', 'dataclasses', 'graphlib',
-    'zoneinfo', 'tomllib', 'cgi', 'wsgiref', 'fileinput', 'linecache',
-    'tokenize', 'tabnanny', 'compileall', 'dis', 'pickletools', 'formatter',
-    '__future__', 'array', 'ctypes', 'heapq', 'bisect', 'array', 'weakref',
-    'types', 'copy', 'pprint', 'repr', 'numbers', 'cmath', 'fractions',
-    'statistics', 'itertools', 'functools', 'operator', 'pathlib', 'fileinput',
-    'stat', 'filecmp', 'tempfile', 'glob', 'fnmatch', 'linecache', 'shutil',
-    'pickle', 'copyreg', 'shelve', 'marshal', 'dbm', 'sqlite3', 'zlib', 'gzip',
-    'bz2', 'lzma', 'zipfile', 'tarfile', 'configparser', 'netrc', 'xdrlib',
-    'plistlib', 'hashlib', 'hmac', 'secrets', 'os', 'io', 'time', 'argparse',
-    'getopt', 'logging', 'getpass', 'curses', 'platform', 'errno', 'ctypes',
-    'threading', 'multiprocessing', 'concurrent', 'subprocess', 'sched', 'queue',
-    'contextvars', 'asyncio', 'socket', 'ssl', 'email', 'json', 'mailcap',
-    'mailbox', 'mimetypes', 'base64', 'binhex', 'binascii', 'quopri', 'uu',
-    'html', 'xml', 'webbrowser', 'cgi', 'cgitb', 'wsgiref', 'urllib', 'http',
-    'ftplib', 'poplib', 'imaplib', 'nntplib', 'smtplib', 'smtpd', 'telnetlib',
-    'uuid', 'socketserver', 'xmlrpc', 'ipaddress', 'audioop', 'aifc', 'sunau',
-    'wave', 'chunk', 'colorsys', 'imghdr', 'sndhdr', 'ossaudiodev', 'gettext',
-    'locale', 'turtle', 'cmd', 'shlex', 'tkinter', 'typing', 'pydoc', 'doctest',
-    'unittest', 'test', '2to3', 'distutils', 'venv', 'ensurepip', 'zipapp',
-    'py_compile', 'compileall', 'dis', 'pickletools', 'pdb', 'timeit', 'trace',
-    'tracemalloc', 'warnings', 'faulthandler', 'pdb', 'dataclasses', 'cgi', 
-    'cgitb', 'chunk', 'crypt', 'imghdr', 'mailcap', 'nis', 'nntplib', 'optparse',
-    'ossaudiodev', 'pipes', 'smtpd', 'sndhdr', 'spwd', 'sunau', 'telnetlib',
-    'uu', 'xdrlib', 'msilib', 'pstats', 'rlcompleter', 'tkinter', 'ast'
-}
-
-# Known package name mappings (import name -> package name)
-PACKAGE_MAPPINGS = {
-    'bs4': 'beautifulsoup4',
-    'PIL': 'pillow',
-    'cv2': 'opencv-python',
-    'sklearn': 'scikit-learn',
-    'yaml': 'PyYAML',
-    'OpenSSL': 'pyOpenSSL',
-    'sqlalchemy': 'SQLAlchemy',
-    'playwright': 'playwright',
-    'patchright': 'patchright',
-    'dotenv': 'python-dotenv',
-    'fake_useragent': 'fake-useragent',
-    'playwright_stealth': 'tf-playwright-stealth',
-    'sentence_transformers': 'sentence-transformers',
-    'rank_bm25': 'rank-bm25',
-    'snowballstemmer': 'snowballstemmer',
-    'PyPDF2': 'PyPDF2',
-    'pdf2image': 'pdf2image',
-}
-
-
-class ImportVisitor(ast.NodeVisitor):
-    """AST visitor to extract imports from Python files"""
-    
-    def __init__(self):
-        self.imports = {}  # Changed to dict to store line numbers
-        self.from_imports = {}
-    
-    def visit_Import(self, node):
-        for alias in node.names:
-            module_name = alias.name.split('.')[0]
-            if module_name not in self.imports:
-                self.imports[module_name] = []
-            self.imports[module_name].append(node.lineno)
-    
-    def visit_ImportFrom(self, node):
-        if node.module and node.level == 0:  # absolute imports only
-            module_name = node.module.split('.')[0]
-            if module_name not in self.from_imports:
-                self.from_imports[module_name] = []
-            self.from_imports[module_name].append(node.lineno)
-
-
-def extract_imports_from_file(filepath: Path) -> Dict[str, List[int]]:
-    """Extract all imports from a Python file with line numbers"""
-    all_imports = {}
-    
-    try:
-        with open(filepath, 'r', encoding='utf-8') as f:
-            content = f.read()
-        
-        tree = ast.parse(content)
-        visitor = ImportVisitor()
-        visitor.visit(tree)
-        
-        # Merge imports and from_imports
-        for module, lines in visitor.imports.items():
-            if module not in all_imports:
-                all_imports[module] = []
-            all_imports[module].extend(lines)
-            
-        for module, lines in visitor.from_imports.items():
-            if module not in all_imports:
-                all_imports[module] = []
-            all_imports[module].extend(lines)
-        
-    except Exception as e:
-        # Silently skip files that can't be parsed
-        pass
-    
-    return all_imports
-
-
-def get_codebase_imports_with_files(root_dir: Path) -> Dict[str, List[Tuple[str, List[int]]]]:
-    """Get all imports from the crawl4ai library and docs folders with file locations and line numbers"""
-    import_to_files = defaultdict(list)
-    
-    # Only scan crawl4ai library folder and docs folder
-    target_dirs = [
-        root_dir / 'crawl4ai',
-        root_dir / 'docs'
-    ]
-    
-    for target_dir in target_dirs:
-        if not target_dir.exists():
-            continue
-            
-        for py_file in target_dir.rglob('*.py'):
-            # Skip __pycache__ directories
-            if '__pycache__' in py_file.parts:
-                continue
-            
-            # Skip setup.py and similar files
-            if py_file.name in ['setup.py', 'setup.cfg', 'conf.py']:
-                continue
-                
-            imports = extract_imports_from_file(py_file)
-            
-            # Map each import to the file and line numbers
-            for imp, line_numbers in imports.items():
-                relative_path = py_file.relative_to(root_dir)
-                import_to_files[imp].append((str(relative_path), sorted(line_numbers)))
-    
-    return dict(import_to_files)
-
-
-def get_declared_dependencies() -> Set[str]:
-    """Get declared dependencies from pyproject.toml and requirements.txt"""
-    declared = set()
-    
-    # Read from pyproject.toml
-    if Path('pyproject.toml').exists():
-        with open('pyproject.toml', 'r') as f:
-            data = toml.load(f)
-        
-        # Get main dependencies
-        deps = data.get('project', {}).get('dependencies', [])
-        for dep in deps:
-            # Parse dependency string (e.g., "numpy>=1.26.0,<3")
-            match = re.match(r'^([a-zA-Z0-9_-]+)', dep)
-            if match:
-                pkg_name = match.group(1).lower()
-                declared.add(pkg_name)
-        
-        # Get optional dependencies
-        optional = data.get('project', {}).get('optional-dependencies', {})
-        for group, deps in optional.items():
-            for dep in deps:
-                match = re.match(r'^([a-zA-Z0-9_-]+)', dep)
-                if match:
-                    pkg_name = match.group(1).lower()
-                    declared.add(pkg_name)
-    
-    # Also check requirements.txt as backup
-    if Path('requirements.txt').exists():
-        with open('requirements.txt', 'r') as f:
-            for line in f:
-                line = line.strip()
-                if line and not line.startswith('#'):
-                    match = re.match(r'^([a-zA-Z0-9_-]+)', line)
-                    if match:
-                        pkg_name = match.group(1).lower()
-                        declared.add(pkg_name)
-    
-    return declared
-
-
-def normalize_package_name(name: str) -> str:
-    """Normalize package name for comparison"""
-    # Handle known mappings first
-    if name in PACKAGE_MAPPINGS:
-        return PACKAGE_MAPPINGS[name].lower()
-    
-    # Basic normalization
-    return name.lower().replace('_', '-')
-
-
-def check_missing_dependencies():
-    """Main function to check for missing dependencies"""
-    print("🔍 Analyzing crawl4ai library and docs folders...\n")
-    
-    # Get all imports with their file locations
-    root_dir = Path('.')
-    import_to_files = get_codebase_imports_with_files(root_dir)
-    
-    # Get declared dependencies
-    declared_deps = get_declared_dependencies()
-    
-    # Normalize declared dependencies
-    normalized_declared = {normalize_package_name(dep) for dep in declared_deps}
-    
-    # Categorize imports
-    external_imports = {}
-    local_imports = {}
-    
-    # Known local packages
-    local_packages = {'crawl4ai'}
-    
-    for imp, file_info in import_to_files.items():
-        # Skip standard library
-        if imp in STDLIB_MODULES:
-            continue
-            
-        # Check if it's a local import
-        if any(imp.startswith(local) for local in local_packages):
-            local_imports[imp] = file_info
-        else:
-            external_imports[imp] = file_info
-    
-    # Check which external imports are not declared
-    not_declared = {}
-    declared_imports = {}
-    
-    for imp, file_info in external_imports.items():
-        normalized_imp = normalize_package_name(imp)
-        
-        # Check if import is covered by declared dependencies
-        found = False
-        for declared in normalized_declared:
-            if normalized_imp == declared or normalized_imp.startswith(declared + '.') or declared.startswith(normalized_imp):
-                found = True
-                break
-        
-        if found:
-            declared_imports[imp] = file_info
-        else:
-            not_declared[imp] = file_info
-    
-    # Print results
-    print(f"📊 Summary:")
-    print(f"  - Total unique imports: {len(import_to_files)}")
-    print(f"  - External imports: {len(external_imports)}")
-    print(f"  - Declared dependencies: {len(declared_deps)}")
-    print(f"  - External imports NOT in dependencies: {len(not_declared)}\n")
-    
-    if not_declared:
-        print("❌ External imports NOT declared in pyproject.toml or requirements.txt:\n")
-        
-        # Sort by import name
-        for imp in sorted(not_declared.keys()):
-            file_info = not_declared[imp]
-            print(f"  📦 {imp}")
-            if imp in PACKAGE_MAPPINGS:
-                print(f"     → Package name: {PACKAGE_MAPPINGS[imp]}")
-            
-            # Show up to 3 files that use this import
-            for i, (file_path, line_numbers) in enumerate(file_info[:3]):
-                # Format line numbers for clickable output
-                if len(line_numbers) == 1:
-                    print(f"     - {file_path}:{line_numbers[0]}")
-                else:
-                    # Show first few line numbers
-                    line_str = ','.join(str(ln) for ln in line_numbers[:3])
-                    if len(line_numbers) > 3:
-                        line_str += f"... ({len(line_numbers)} imports)"
-                    print(f"     - {file_path}: lines {line_str}")
-            
-            if len(file_info) > 3:
-                print(f"     ... and {len(file_info) - 3} more files")
-            print()
-    
-    # Check for potentially unused dependencies
-    print("\n🔎 Checking declared dependencies usage...\n")
-    
-    # Get all used external packages
-    used_packages = set()
-    for imp in external_imports.keys():
-        normalized = normalize_package_name(imp)
-        used_packages.add(normalized)
-    
-    # Find unused
-    unused = []
-    for dep in declared_deps:
-        normalized_dep = normalize_package_name(dep)
-        
-        # Check if any import uses this dependency
-        found_usage = False
-        for used in used_packages:
-            if used == normalized_dep or used.startswith(normalized_dep) or normalized_dep.startswith(used):
-                found_usage = True
-                break
-        
-        if not found_usage:
-            # Some packages are commonly unused directly
-            indirect_deps = {'wheel', 'setuptools', 'pip', 'colorama', 'certifi', 'packaging', 'urllib3'}
-            if normalized_dep not in indirect_deps:
-                unused.append(dep)
-    
-    if unused:
-        print("⚠️  Declared dependencies with NO imports found:")
-        for dep in sorted(unused):
-            print(f"  - {dep}")
-        print("\n  Note: These might be used indirectly or by other dependencies")
-    else:
-        print("✅ All declared dependencies have corresponding imports")
-    
-    print("\n" + "="*60)
-    print("💡 How to use this report:")
-    print("  1. Check each ❌ import to see if it's legitimate")
-    print("  2. If legitimate, add the package to pyproject.toml")
-    print("  3. If it's an internal module or typo, fix the import")
-    print("  4. Review unused dependencies - remove if truly not needed")
-    print("="*60)
-
-
-if __name__ == '__main__':
-    check_missing_dependencies()
--- a/tests/docker/test_hooks_client.py
+++ b/tests/docker/test_hooks_client.py
@@ -1,372 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test client for demonstrating user-provided hooks in Crawl4AI Docker API
-"""
-
-import requests
-import json
-from typing import Dict, Any
-
-
-API_BASE_URL = "http://localhost:11234"  # Adjust if needed
-
-
-def test_hooks_info():
-    """Get information about available hooks"""
-    print("=" * 70)
-    print("Testing: GET /hooks/info")
-    print("=" * 70)
-    
-    response = requests.get(f"{API_BASE_URL}/hooks/info")
-    if response.status_code == 200:
-        data = response.json()
-        print("Available Hook Points:")
-        for hook, info in data['available_hooks'].items():
-            print(f"\n{hook}:")
-            print(f"  Parameters: {', '.join(info['parameters'])}")
-            print(f"  Description: {info['description']}")
-    else:
-        print(f"Error: {response.status_code}")
-        print(response.text)
-
-
-def test_basic_crawl_with_hooks():
-    """Test basic crawling with user-provided hooks"""
-    print("\n" + "=" * 70)
-    print("Testing: POST /crawl with hooks")
-    print("=" * 70)
-    
-    # Define hooks as Python code strings
-    hooks_code = {
-        "on_page_context_created": """
-async def hook(page, context, **kwargs):
-    print("Hook: Setting up page context")
-    # Block images to speed up crawling
-    await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
-    print("Hook: Images blocked")
-    return page
-""",
-        
-        "before_retrieve_html": """
-async def hook(page, context, **kwargs):
-    print("Hook: Before retrieving HTML")
-    # Scroll to bottom to load lazy content
-    await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
-    await page.wait_for_timeout(1000)
-    print("Hook: Scrolled to bottom")
-    return page
-""",
-        
-        "before_goto": """
-async def hook(page, context, url, **kwargs):
-    print(f"Hook: About to navigate to {url}")
-    # Add custom headers
-    await page.set_extra_http_headers({
-        'X-Test-Header': 'crawl4ai-hooks-test'
-    })
-    return page
-"""
-    }
-    
-    # Create request payload
-    payload = {
-        "urls": ["https://httpbin.org/html"],
-        "hooks": {
-            "code": hooks_code,
-            "timeout": 30
-        }
-    }
-    
-    print("Sending request with hooks...")
-    response = requests.post(f"{API_BASE_URL}/crawl", json=payload)
-    
-    if response.status_code == 200:
-        data = response.json()
-        print("\n✅ Crawl successful!")
-        
-        # Check hooks status
-        if 'hooks' in data:
-            hooks_info = data['hooks']
-            print("\nHooks Execution Summary:")
-            print(f"  Status: {hooks_info['status']['status']}")
-            print(f"  Attached hooks: {', '.join(hooks_info['status']['attached_hooks'])}")
-            
-            if hooks_info['status']['validation_errors']:
-                print("\n⚠️ Validation Errors:")
-                for error in hooks_info['status']['validation_errors']:
-                    print(f"  - {error['hook_point']}: {error['error']}")
-            
-            if 'summary' in hooks_info:
-                summary = hooks_info['summary']
-                print(f"\nExecution Statistics:")
-                print(f"  Total executions: {summary['total_executions']}")
-                print(f"  Successful: {summary['successful']}")
-                print(f"  Failed: {summary['failed']}")
-                print(f"  Timed out: {summary['timed_out']}")
-                print(f"  Success rate: {summary['success_rate']:.1f}%")
-            
-            if hooks_info['execution_log']:
-                print("\nExecution Log:")
-                for log_entry in hooks_info['execution_log']:
-                    status_icon = "✅" if log_entry['status'] == 'success' else "❌"
-                    print(f"  {status_icon} {log_entry['hook_point']}: {log_entry['status']} ({log_entry.get('execution_time', 0):.2f}s)")
-            
-            if hooks_info['errors']:
-                print("\n❌ Hook Errors:")
-                for error in hooks_info['errors']:
-                    print(f"  - {error['hook_point']}: {error['error']}")
-        
-        # Show crawl results
-        if 'results' in data:
-            print(f"\nCrawled {len(data['results'])} URL(s)")
-            for result in data['results']:
-                print(f"  - {result['url']}: {'✅' if result['success'] else '❌'}")
-    
-    else:
-        print(f"❌ Error: {response.status_code}")
-        print(response.text)
-
-
-def test_invalid_hook():
-    """Test with an invalid hook to see error handling"""
-    print("\n" + "=" * 70)
-    print("Testing: Invalid hook handling")
-    print("=" * 70)
-    
-    # Intentionally broken hook
-    hooks_code = {
-        "on_page_context_created": """
-def hook(page, context):  # Missing async!
-    return page
-""",
-        
-        "before_retrieve_html": """
-async def hook(page, context, **kwargs):
-    # This will cause an error
-    await page.non_existent_method()
-    return page
-"""
-    }
-    
-    payload = {
-        "urls": ["https://httpbin.org/html"],
-        "hooks": {
-            "code": hooks_code,
-            "timeout": 5
-        }
-    }
-    
-    print("Sending request with invalid hooks...")
-    response = requests.post(f"{API_BASE_URL}/crawl", json=payload)
-    
-    if response.status_code == 200:
-        data = response.json()
-        
-        if 'hooks' in data:
-            hooks_info = data['hooks']
-            print(f"\nHooks Status: {hooks_info['status']['status']}")
-            
-            if hooks_info['status']['validation_errors']:
-                print("\n✅ Validation caught errors (as expected):")
-                for error in hooks_info['status']['validation_errors']:
-                    print(f"  - {error['hook_point']}: {error['error']}")
-            
-            if hooks_info['errors']:
-                print("\n✅ Runtime errors handled gracefully:")
-                for error in hooks_info['errors']:
-                    print(f"  - {error['hook_point']}: {error['error']}")
-            
-            # The crawl should still succeed despite hook errors
-            if data.get('success'):
-                print("\n✅ Crawl succeeded despite hook errors (error isolation working!)")
-    
-    else:
-        print(f"Error: {response.status_code}")
-        print(response.text)
-
-
-def test_authentication_hook():
-    """Test authentication using hooks"""
-    print("\n" + "=" * 70)
-    print("Testing: Authentication with hooks")
-    print("=" * 70)
-    
-    hooks_code = {
-        "before_goto": """
-async def hook(page, context, url, **kwargs):
-    # For httpbin.org basic auth test, set Authorization header
-    import base64
-    
-    # httpbin.org/basic-auth/user/passwd expects username="user" and password="passwd"
-    credentials = base64.b64encode(b"user:passwd").decode('ascii')
-    
-    await page.set_extra_http_headers({
-        'Authorization': f'Basic {credentials}'
-    })
-    
-    print(f"Hook: Set Authorization header for {url}")
-    return page
-""",
-        "on_page_context_created": """
-async def hook(page, context, **kwargs):
-    # Example: Add cookies for session tracking
-    await context.add_cookies([
-        {
-            'name': 'session_id',
-            'value': 'test_session_123',
-            'domain': '.httpbin.org',
-            'path': '/',
-            'httpOnly': True,
-            'secure': True
-        }
-    ])
-    
-    print("Hook: Added session cookie")
-    return page
-"""
-    }
-    
-    payload = {
-        "urls": ["https://httpbin.org/basic-auth/user/passwd"],
-        "hooks": {
-            "code": hooks_code,
-            "timeout": 30
-        }
-    }
-    
-    print("Sending request with authentication hook...")
-    response = requests.post(f"{API_BASE_URL}/crawl", json=payload)
-    
-    if response.status_code == 200:
-        data = response.json()
-        if data.get('success'):
-            print("✅ Crawl with authentication hook successful")
-            
-            # Check if hooks executed
-            if 'hooks' in data:
-                hooks_info = data['hooks']
-                if hooks_info.get('summary', {}).get('successful', 0) > 0:
-                    print(f"✅ Authentication hooks executed: {hooks_info['summary']['successful']} successful")
-                
-                # Check for any hook errors
-                if hooks_info.get('errors'):
-                    print("⚠️ Hook errors:")
-                    for error in hooks_info['errors']:
-                        print(f"  - {error}")
-            
-            # Check if authentication worked by looking at the result
-            if 'results' in data and len(data['results']) > 0:
-                result = data['results'][0]
-                if result.get('success'):
-                    print("✅ Page crawled successfully (authentication worked!)")
-                    # httpbin.org/basic-auth returns JSON with authenticated=true when successful
-                    if 'authenticated' in str(result.get('html', '')):
-                        print("✅ Authentication confirmed in response content")
-                else:
-                    print(f"❌ Crawl failed: {result.get('error_message', 'Unknown error')}")
-        else:
-            print("❌ Request failed")
-            print(f"Response: {json.dumps(data, indent=2)}")
-    else:
-        print(f"❌ Error: {response.status_code}")
-        try:
-            error_data = response.json()
-            print(f"Error details: {json.dumps(error_data, indent=2)}")
-        except:
-            print(f"Error text: {response.text[:500]}")
-
-
-def test_streaming_with_hooks():
-    """Test streaming endpoint with hooks"""
-    print("\n" + "=" * 70)
-    print("Testing: POST /crawl/stream with hooks")
-    print("=" * 70)
-    
-    hooks_code = {
-        "before_retrieve_html": """
-async def hook(page, context, **kwargs):
-    await page.evaluate("document.querySelectorAll('img').forEach(img => img.remove())")
-    return page
-"""
-    }
-    
-    payload = {
-        "urls": ["https://httpbin.org/html", "https://httpbin.org/json"],
-        "hooks": {
-            "code": hooks_code,
-            "timeout": 10
-        }
-    }
-    
-    print("Sending streaming request with hooks...")
-    
-    with requests.post(f"{API_BASE_URL}/crawl/stream", json=payload, stream=True) as response:
-        if response.status_code == 200:
-            # Check headers for hooks status
-            hooks_status = response.headers.get('X-Hooks-Status')
-            if hooks_status:
-                print(f"Hooks Status (from header): {hooks_status}")
-            
-            print("\nStreaming results:")
-            for line in response.iter_lines():
-                if line:
-                    try:
-                        result = json.loads(line)
-                        if 'url' in result:
-                            print(f"  Received: {result['url']}")
-                        elif 'status' in result:
-                            print(f"  Stream status: {result['status']}")
-                    except json.JSONDecodeError:
-                        print(f"  Raw: {line.decode()}")
-        else:
-            print(f"Error: {response.status_code}")
-
-
-def test_basic_without_hooks():
-    """Test basic crawl without hooks"""
-    print("\n" + "=" * 70)
-    print("Testing: POST /crawl with no hooks")
-    print("=" * 70)
-
-    payload = {
-        "urls": ["https://httpbin.org/html", "https://httpbin.org/json"]
-    }
-
-    response = requests.post(f"{API_BASE_URL}/crawl", json=payload)
-    if response.status_code == 200:
-        data = response.json()
-        print(f"Response: {json.dumps(data, indent=2)}")
-    else:
-        print(f"Error: {response.status_code}")
-
-
-def main():
-    """Run all tests"""
-    print("🔧 Crawl4AI Docker API - Hooks Testing")
-    print("=" * 70)
-    
-    # Test 1: Get hooks information
-    # test_hooks_info()
-    
-    # Test 2: Basic crawl with hooks
-    # test_basic_crawl_with_hooks()
-    
-    # Test 3: Invalid hooks (error handling)
-    test_invalid_hook()
-    
-    # # Test 4: Authentication hook
-    # test_authentication_hook()
-    
-    # # Test 5: Streaming with hooks
-    # test_streaming_with_hooks()
-
-    # # Test 6: Basic crawl without hooks
-    # test_basic_without_hooks()
-
-    print("\n" + "=" * 70)
-    print("✅ All tests completed!")
-    print("=" * 70)
-
-
-if __name__ == "__main__":
-    main()
--- a/tests/docker/test_hooks_comprehensive.py
+++ b/tests/docker/test_hooks_comprehensive.py
@@ -1,512 +0,0 @@
-#!/usr/bin/env python3
-"""
-Comprehensive test demonstrating all hook types from hooks_example.py
-adapted for the Docker API with real URLs
-"""
-
-import requests
-import json
-import time
-from typing import Dict, Any
-
-API_BASE_URL = "http://localhost:11234"
-
-
-def test_all_hooks_demo():
-    """Demonstrate all 8 hook types with practical examples"""
-    print("=" * 70)
-    print("Testing: All Hooks Comprehensive Demo")
-    print("=" * 70)
-    
-    hooks_code = {
-        "on_browser_created": """
-async def hook(browser, **kwargs):
-    # Hook called after browser is created
-    print("[HOOK] on_browser_created - Browser is ready!")
-    # Browser-level configurations would go here
-    return browser
-""",
-        
-        "on_page_context_created": """
-async def hook(page, context, **kwargs):
-    # Hook called after a new page and context are created
-    print("[HOOK] on_page_context_created - New page created!")
-    
-    # Set viewport size for consistent rendering
-    await page.set_viewport_size({"width": 1920, "height": 1080})
-    
-    # Add cookies for the session (using httpbin.org domain)
-    await context.add_cookies([
-        {
-            "name": "test_session",
-            "value": "abc123xyz",
-            "domain": ".httpbin.org",
-            "path": "/",
-            "httpOnly": True,
-            "secure": True
-        }
-    ])
-    
-    # Block ads and tracking scripts to speed up crawling
-    await context.route("**/*.{png,jpg,jpeg,gif,webp,svg}", lambda route: route.abort())
-    await context.route("**/analytics/*", lambda route: route.abort())
-    await context.route("**/ads/*", lambda route: route.abort())
-    
-    print("[HOOK] Viewport set, cookies added, and ads blocked")
-    return page
-""",
-        
-        "on_user_agent_updated": """
-async def hook(page, context, user_agent, **kwargs):
-    # Hook called when user agent is updated
-    print(f"[HOOK] on_user_agent_updated - User agent: {user_agent[:50]}...")
-    return page
-""",
-        
-        "before_goto": """
-async def hook(page, context, url, **kwargs):
-    # Hook called before navigating to each URL
-    print(f"[HOOK] before_goto - About to visit: {url}")
-    
-    # Add custom headers for the request
-    await page.set_extra_http_headers({
-        "X-Custom-Header": "crawl4ai-test",
-        "Accept-Language": "en-US,en;q=0.9",
-        "DNT": "1"
-    })
-    
-    return page
-""",
-        
-        "after_goto": """
-async def hook(page, context, url, response, **kwargs):
-    # Hook called after navigating to each URL
-    print(f"[HOOK] after_goto - Successfully loaded: {url}")
-    
-    # Wait a moment for dynamic content to load
-    await page.wait_for_timeout(1000)
-    
-    # Check if specific elements exist (with error handling)
-    try:
-        # For httpbin.org, wait for body element
-        await page.wait_for_selector("body", timeout=2000)
-        print("[HOOK] Body element found and loaded")
-    except:
-        print("[HOOK] Timeout waiting for body, continuing anyway")
-    
-    return page
-""",
-        
-        "on_execution_started": """
-async def hook(page, context, **kwargs):
-    # Hook called after custom JavaScript execution
-    print("[HOOK] on_execution_started - Custom JS executed!")
-    
-    # You could inject additional JavaScript here if needed
-    await page.evaluate("console.log('[INJECTED] Hook JS running');")
-    
-    return page
-""",
-        
-        "before_retrieve_html": """
-async def hook(page, context, **kwargs):
-    # Hook called before retrieving the HTML content
-    print("[HOOK] before_retrieve_html - Preparing to get HTML")
-    
-    # Scroll to bottom to trigger lazy loading
-    await page.evaluate("window.scrollTo(0, document.body.scrollHeight);")
-    await page.wait_for_timeout(500)
-    
-    # Scroll back to top
-    await page.evaluate("window.scrollTo(0, 0);")
-    await page.wait_for_timeout(500)
-    
-    # One more scroll to middle for good measure
-    await page.evaluate("window.scrollTo(0, document.body.scrollHeight / 2);")
-    
-    print("[HOOK] Scrolling completed for lazy-loaded content")
-    return page
-""",
-        
-        "before_return_html": """
-async def hook(page, context, html, **kwargs):
-    # Hook called before returning the HTML content
-    print(f"[HOOK] before_return_html - HTML length: {len(html)} characters")
-    
-    # Log some page metrics
-    metrics = await page.evaluate('''() => {
-        return {
-            images: document.images.length,
-            links: document.links.length,
-            scripts: document.scripts.length
-        }
-    }''')
-    
-    print(f"[HOOK] Page metrics - Images: {metrics['images']}, Links: {metrics['links']}, Scripts: {metrics['scripts']}")
-    
-    return page
-"""
-    }
-    
-    # Create request payload
-    payload = {
-        "urls": ["https://httpbin.org/html"],
-        "hooks": {
-            "code": hooks_code,
-            "timeout": 30
-        },
-        "crawler_config": {
-            "js_code": "window.scrollTo(0, document.body.scrollHeight);",
-            "wait_for": "body",
-            "cache_mode": "bypass"
-        }
-    }
-    
-    print("\nSending request with all 8 hooks...")
-    start_time = time.time()
-    
-    response = requests.post(f"{API_BASE_URL}/crawl", json=payload)
-    
-    elapsed_time = time.time() - start_time
-    print(f"Request completed in {elapsed_time:.2f} seconds")
-    
-    if response.status_code == 200:
-        data = response.json()
-        print("\n✅ Request successful!")
-        
-        # Check hooks execution
-        if 'hooks' in data:
-            hooks_info = data['hooks']
-            print("\n📊 Hooks Execution Summary:")
-            print(f"  Status: {hooks_info['status']['status']}")
-            print(f"  Attached hooks: {len(hooks_info['status']['attached_hooks'])}")
-            
-            for hook_name in hooks_info['status']['attached_hooks']:
-                print(f"    ✓ {hook_name}")
-            
-            if 'summary' in hooks_info:
-                summary = hooks_info['summary']
-                print(f"\n📈 Execution Statistics:")
-                print(f"  Total executions: {summary['total_executions']}")
-                print(f"  Successful: {summary['successful']}")
-                print(f"  Failed: {summary['failed']}")
-                print(f"  Timed out: {summary['timed_out']}")
-                print(f"  Success rate: {summary['success_rate']:.1f}%")
-            
-            if hooks_info.get('execution_log'):
-                print(f"\n📝 Execution Log:")
-                for log_entry in hooks_info['execution_log']:
-                    status_icon = "✅" if log_entry['status'] == 'success' else "❌"
-                    exec_time = log_entry.get('execution_time', 0)
-                    print(f"  {status_icon} {log_entry['hook_point']}: {exec_time:.3f}s")
-        
-        # Check crawl results
-        if 'results' in data and len(data['results']) > 0:
-            print(f"\n📄 Crawl Results:")
-            for result in data['results']:
-                print(f"  URL: {result['url']}")
-                print(f"  Success: {result.get('success', False)}")
-                if result.get('html'):
-                    print(f"  HTML length: {len(result['html'])} characters")
-    
-    else:
-        print(f"❌ Error: {response.status_code}")
-        try:
-            error_data = response.json()
-            print(f"Error details: {json.dumps(error_data, indent=2)}")
-        except:
-            print(f"Error text: {response.text[:500]}")
-
-
-def test_authentication_flow():
-    """Test a complete authentication flow with multiple hooks"""
-    print("\n" + "=" * 70)
-    print("Testing: Authentication Flow with Multiple Hooks")
-    print("=" * 70)
-    
-    hooks_code = {
-        "on_page_context_created": """
-async def hook(page, context, **kwargs):
-    print("[HOOK] Setting up authentication context")
-    
-    # Add authentication cookies
-    await context.add_cookies([
-        {
-            "name": "auth_token",
-            "value": "fake_jwt_token_here",
-            "domain": ".httpbin.org",
-            "path": "/",
-            "httpOnly": True,
-            "secure": True
-        }
-    ])
-    
-    # Set localStorage items (for SPA authentication)
-    await page.evaluate('''
-        localStorage.setItem('user_id', '12345');
-        localStorage.setItem('auth_time', new Date().toISOString());
-    ''')
-    
-    return page
-""",
-        
-        "before_goto": """
-async def hook(page, context, url, **kwargs):
-    print(f"[HOOK] Adding auth headers for {url}")
-    
-    # Add Authorization header
-    import base64
-    credentials = base64.b64encode(b"user:passwd").decode('ascii')
-    
-    await page.set_extra_http_headers({
-        'Authorization': f'Basic {credentials}',
-        'X-API-Key': 'test-api-key-123'
-    })
-    
-    return page
-"""
-    }
-    
-    payload = {
-        "urls": [
-            "https://httpbin.org/basic-auth/user/passwd"
-        ],
-        "hooks": {
-            "code": hooks_code,
-            "timeout": 15
-        }
-    }
-    
-    print("\nTesting authentication with httpbin endpoints...")
-    response = requests.post(f"{API_BASE_URL}/crawl", json=payload)
-    
-    if response.status_code == 200:
-        data = response.json()
-        print("✅ Authentication test completed")
-        
-        if 'results' in data:
-            for i, result in enumerate(data['results']):
-                print(f"\n  URL {i+1}: {result['url']}")
-                if result.get('success'):
-                    # Check for authentication success indicators
-                    html_content = result.get('html', '')
-                    if '"authenticated"' in html_content and 'true' in html_content:
-                        print("    ✅ Authentication successful! Basic auth worked.")
-                    else:
-                        print("    ⚠️ Page loaded but auth status unclear")
-                else:
-                    print(f"    ❌ Failed: {result.get('error_message', 'Unknown error')}")
-    else:
-        print(f"❌ Error: {response.status_code}")
-
-
-def test_performance_optimization_hooks():
-    """Test hooks for performance optimization"""
-    print("\n" + "=" * 70)
-    print("Testing: Performance Optimization Hooks")
-    print("=" * 70)
-    
-    hooks_code = {
-        "on_page_context_created": """
-async def hook(page, context, **kwargs):
-    print("[HOOK] Optimizing page for performance")
-    
-    # Block resource-heavy content
-    await context.route("**/*.{png,jpg,jpeg,gif,webp,svg,ico}", lambda route: route.abort())
-    await context.route("**/*.{woff,woff2,ttf,otf}", lambda route: route.abort())
-    await context.route("**/*.{mp4,webm,ogg,mp3,wav}", lambda route: route.abort())
-    await context.route("**/googletagmanager.com/*", lambda route: route.abort())
-    await context.route("**/google-analytics.com/*", lambda route: route.abort())
-    await context.route("**/doubleclick.net/*", lambda route: route.abort())
-    await context.route("**/facebook.com/*", lambda route: route.abort())
-    
-    # Disable animations and transitions
-    await page.add_style_tag(content='''
-        *, *::before, *::after {
-            animation-duration: 0s !important;
-            animation-delay: 0s !important;
-            transition-duration: 0s !important;
-            transition-delay: 0s !important;
-        }
-    ''')
-    
-    print("[HOOK] Performance optimizations applied")
-    return page
-""",
-        
-        "before_retrieve_html": """
-async def hook(page, context, **kwargs):
-    print("[HOOK] Removing unnecessary elements before extraction")
-    
-    # Remove ads, popups, and other unnecessary elements
-    await page.evaluate('''() => {
-        // Remove common ad containers
-        const adSelectors = [
-            '.ad', '.ads', '.advertisement', '[id*="ad-"]', '[class*="ad-"]',
-            '.popup', '.modal', '.overlay', '.cookie-banner', '.newsletter-signup'
-        ];
-        
-        adSelectors.forEach(selector => {
-            document.querySelectorAll(selector).forEach(el => el.remove());
-        });
-        
-        // Remove script tags to clean up HTML
-        document.querySelectorAll('script').forEach(el => el.remove());
-        
-        // Remove style tags we don't need
-        document.querySelectorAll('style').forEach(el => el.remove());
-    }''')
-    
-    return page
-"""
-    }
-    
-    payload = {
-        "urls": ["https://httpbin.org/html"],
-        "hooks": {
-            "code": hooks_code,
-            "timeout": 10
-        }
-    }
-    
-    print("\nTesting performance optimization hooks...")
-    start_time = time.time()
-    
-    response = requests.post(f"{API_BASE_URL}/crawl", json=payload)
-    
-    elapsed_time = time.time() - start_time
-    print(f"Request completed in {elapsed_time:.2f} seconds")
-    
-    if response.status_code == 200:
-        data = response.json()
-        print("✅ Performance optimization test completed")
-        
-        if 'results' in data and len(data['results']) > 0:
-            result = data['results'][0]
-            if result.get('html'):
-                print(f"  HTML size: {len(result['html'])} characters")
-                print("  Resources blocked, ads removed, animations disabled")
-    else:
-        print(f"❌ Error: {response.status_code}")
-
-
-def test_content_extraction_hooks():
-    """Test hooks for intelligent content extraction"""
-    print("\n" + "=" * 70)
-    print("Testing: Content Extraction Hooks")
-    print("=" * 70)
-    
-    hooks_code = {
-        "after_goto": """
-async def hook(page, context, url, response, **kwargs):
-    print(f"[HOOK] Waiting for dynamic content on {url}")
-    
-    # Wait for any lazy-loaded content
-    await page.wait_for_timeout(2000)
-    
-    # Trigger any "Load More" buttons
-    try:
-        load_more = await page.query_selector('[class*="load-more"], [class*="show-more"], button:has-text("Load More")')
-        if load_more:
-            await load_more.click()
-            await page.wait_for_timeout(1000)
-            print("[HOOK] Clicked 'Load More' button")
-    except:
-        pass
-    
-    return page
-""",
-        
-        "before_retrieve_html": """
-async def hook(page, context, **kwargs):
-    print("[HOOK] Extracting structured data")
-    
-    # Extract metadata
-    metadata = await page.evaluate('''() => {
-        const getMeta = (name) => {
-            const element = document.querySelector(`meta[name="${name}"], meta[property="${name}"]`);
-            return element ? element.getAttribute('content') : null;
-        };
-        
-        return {
-            title: document.title,
-            description: getMeta('description') || getMeta('og:description'),
-            author: getMeta('author'),
-            keywords: getMeta('keywords'),
-            ogTitle: getMeta('og:title'),
-            ogImage: getMeta('og:image'),
-            canonical: document.querySelector('link[rel="canonical"]')?.href,
-            jsonLd: Array.from(document.querySelectorAll('script[type="application/ld+json"]'))
-                .map(el => el.textContent).filter(Boolean)
-        };
-    }''')
-    
-    print(f"[HOOK] Extracted metadata: {json.dumps(metadata, indent=2)}")
-    
-    # Infinite scroll handling
-    for i in range(3):
-        await page.evaluate("window.scrollTo(0, document.body.scrollHeight);")
-        await page.wait_for_timeout(1000)
-        print(f"[HOOK] Scroll iteration {i+1}/3")
-    
-    return page
-"""
-    }
-    
-    payload = {
-        "urls": ["https://httpbin.org/html", "https://httpbin.org/json"],
-        "hooks": {
-            "code": hooks_code,
-            "timeout": 20
-        }
-    }
-    
-    print("\nTesting content extraction hooks...")
-    response = requests.post(f"{API_BASE_URL}/crawl", json=payload)
-    
-    if response.status_code == 200:
-        data = response.json()
-        print("✅ Content extraction test completed")
-        
-        if 'hooks' in data and 'summary' in data['hooks']:
-            summary = data['hooks']['summary']
-            print(f"  Hooks executed: {summary['successful']}/{summary['total_executions']}")
-        
-        if 'results' in data:
-            for result in data['results']:
-                print(f"\n  URL: {result['url']}")
-                print(f"  Success: {result.get('success', False)}")
-    else:
-        print(f"❌ Error: {response.status_code}")
-
-
-def main():
-    """Run comprehensive hook tests"""
-    print("🔧 Crawl4AI Docker API - Comprehensive Hooks Testing")
-    print("Based on docs/examples/hooks_example.py")
-    print("=" * 70)
-    
-    tests = [
-        ("All Hooks Demo", test_all_hooks_demo),
-        ("Authentication Flow", test_authentication_flow),
-        ("Performance Optimization", test_performance_optimization_hooks),
-        ("Content Extraction", test_content_extraction_hooks),
-    ]
-    
-    for i, (name, test_func) in enumerate(tests, 1):
-        print(f"\n📌 Test {i}/{len(tests)}: {name}")
-        try:
-            test_func()
-            print(f"✅ {name} completed")
-        except Exception as e:
-            print(f"❌ {name} failed: {e}")
-            import traceback
-            traceback.print_exc()
-    
-    print("\n" + "=" * 70)
-    print("🎉 All comprehensive hook tests completed!")
-    print("=" * 70)
-
-
-if __name__ == "__main__":
-    main()
--- a/tests/test_arun_many.py
+++ b/tests/test_arun_many.py
@@ -1,42 +0,0 @@
-"""
-Test example for multiple crawler configs feature
-"""
-import asyncio
-import sys
-from pathlib import Path
-
-# Add parent directory to path for imports
-sys.path.insert(0, str(Path(__file__).parent.parent))
-
-from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
-from crawl4ai.processors.pdf import PDFContentScrapingStrategy
-
-
-async def test_run_many():
-    default_config = CrawlerRunConfig(
-        cache_mode=CacheMode.BYPASS,
-        # scraping_strategy=PDFContentScrapingStrategy()
-    )
-    
-    test_urls = [
-        # "https://blog.python.org/",  # Blog URL  
-        "https://www.python.org/",  # Generic HTTPS page
-        "https://www.kidocode.com/",  # Generic HTTPS page
-        "https://www.example.com/",  # Generic HTTPS page
-        # "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
-    ]
-    
-    async with AsyncWebCrawler() as crawler:
-        # Single config - traditional usage still works
-        print("Test 1: Single config (backwards compatible)")
-        result = await crawler.arun_many(
-            urls=test_urls[:2],
-            config=default_config
-        )
-        print(f"Crawled {len(result)} URLs with single config\n")
-        for item in result:
-            print(f"  {item.url} -> {item.status_code}")
-        
-
-if __name__ == "__main__":
-    asyncio.run(test_run_many())
--- a/tests/test_config_matching_only.py
+++ b/tests/test_config_matching_only.py
@@ -1,131 +0,0 @@
-"""
-Test only the config matching logic without running crawler
-"""
-import sys
-from pathlib import Path
-
-# Add parent directory to path for imports
-sys.path.insert(0, str(Path(__file__).parent.parent))
-
-from crawl4ai.async_configs import CrawlerRunConfig, MatchMode
-
-def test_all_matching_scenarios():
-    print("Testing CrawlerRunConfig.is_match() method")
-    print("=" * 50)
-    
-    # Test 1: Single string pattern
-    print("\n1. Single string pattern (glob style)")
-    config = CrawlerRunConfig(
-        url_matcher="*.pdf",
-        # For example we can set this => scraping_strategy=PDFContentScrapingStrategy()
-    )
-    test_urls = [
-        ("https://example.com/file.pdf", True),
-        ("https://example.com/doc.PDF", False),  # Case sensitive
-        ("https://example.com/file.txt", False),
-        ("file.pdf", True),
-    ]
-    for url, expected in test_urls:
-        result = config.is_match(url)
-        status = "✓" if result == expected else "✗"
-        print(f"  {status} {url} -> {result}")
-    
-    # Test 2: List of patterns with OR
-    print("\n2. List of patterns with OR (default)")
-    config = CrawlerRunConfig(
-        url_matcher=["*/article/*", "*/blog/*", "*.html"],
-        match_mode=MatchMode.OR
-    )
-    test_urls = [
-        ("https://example.com/article/news", True),
-        ("https://example.com/blog/post", True),
-        ("https://example.com/page.html", True),
-        ("https://example.com/page.php", False),
-    ]
-    for url, expected in test_urls:
-        result = config.is_match(url)
-        status = "✓" if result == expected else "✗"
-        print(f"  {status} {url} -> {result}")
-    
-    # Test 3: Custom function
-    print("\n3. Custom function matcher")
-    config = CrawlerRunConfig(
-        url_matcher=lambda url: 'api' in url and (url.endswith('.json') or url.endswith('.xml'))
-    )
-    test_urls = [
-        ("https://api.example.com/data.json", True),
-        ("https://api.example.com/data.xml", True),
-        ("https://api.example.com/data.html", False),
-        ("https://example.com/data.json", False),  # No 'api'
-    ]
-    for url, expected in test_urls:
-        result = config.is_match(url)
-        status = "✓" if result == expected else "✗"
-        print(f"  {status} {url} -> {result}")
-    
-    # Test 4: Mixed list with AND
-    print("\n4. Mixed patterns and functions with AND")
-    config = CrawlerRunConfig(
-        url_matcher=[
-            "https://*",  # Must be HTTPS
-            lambda url: '.com' in url,  # Must have .com
-            lambda url: len(url) < 50  # Must be short
-        ],
-        match_mode=MatchMode.AND
-    )
-    test_urls = [
-        ("https://example.com/page", True),
-        ("http://example.com/page", False),  # Not HTTPS
-        ("https://example.org/page", False),  # No .com
-        ("https://example.com/" + "x" * 50, False),  # Too long
-    ]
-    for url, expected in test_urls:
-        result = config.is_match(url)
-        status = "✓" if result == expected else "✗"
-        print(f"  {status} {url} -> {result}")
-    
-    # Test 5: Complex real-world scenario
-    print("\n5. Complex pattern combinations")
-    config = CrawlerRunConfig(
-        url_matcher=[
-            "*/api/v[0-9]/*",  # API versioned endpoints
-            lambda url: 'graphql' in url,  # GraphQL endpoints
-            "*.json"  # JSON files
-        ],
-        match_mode=MatchMode.OR
-    )
-    test_urls = [
-        ("https://example.com/api/v1/users", True),
-        ("https://example.com/api/v2/posts", True),
-        ("https://example.com/graphql", True),
-        ("https://example.com/data.json", True),
-        ("https://example.com/api/users", False),  # No version
-    ]
-    for url, expected in test_urls:
-        result = config.is_match(url)
-        status = "✓" if result == expected else "✗"
-        print(f"  {status} {url} -> {result}")
-    
-    # Test 6: Edge cases
-    print("\n6. Edge cases")
-    
-    # No matcher
-    config = CrawlerRunConfig()
-    result = config.is_match("https://example.com")
-    print(f"  {'✓' if not result else '✗'} No matcher -> {result}")
-    
-    # Empty list
-    config = CrawlerRunConfig(url_matcher=[])
-    result = config.is_match("https://example.com")
-    print(f"  {'✓' if not result else '✗'} Empty list -> {result}")
-    
-    # None in list (should be skipped)
-    config = CrawlerRunConfig(url_matcher=["*.pdf", None, "*.doc"])
-    result = config.is_match("test.pdf")
-    print(f"  {'✓' if result else '✗'} List with None -> {result}")
-    
-    print("\n" + "=" * 50)
-    print("All matching tests completed!")
-
-if __name__ == "__main__":
-    test_all_matching_scenarios()
--- a/tests/test_config_selection.py
+++ b/tests/test_config_selection.py
@@ -1,87 +0,0 @@
-"""
-Test config selection logic in dispatchers
-"""
-import asyncio
-import sys
-from pathlib import Path
-from unittest.mock import AsyncMock, MagicMock
-
-# Add parent directory to path for imports
-sys.path.insert(0, str(Path(__file__).parent.parent))
-
-from crawl4ai.async_configs import CrawlerRunConfig, MatchMode
-from crawl4ai.async_dispatcher import BaseDispatcher, MemoryAdaptiveDispatcher
-
-class TestDispatcher(BaseDispatcher):
-    """Simple test dispatcher to verify config selection"""
-    
-    async def crawl_url(self, url, config, task_id, **kwargs):
-        # Just return which config was selected
-        selected = self.select_config(url, config)
-        return {"url": url, "config_id": id(selected)}
-    
-    async def run_urls(self, urls, crawler, config):
-        results = []
-        for url in urls:
-            result = await self.crawl_url(url, config, "test")
-            results.append(result)
-        return results
-
-async def test_dispatcher_config_selection():
-    print("Testing dispatcher config selection")
-    print("=" * 50)
-    
-    # Create test configs with different matchers
-    pdf_config = CrawlerRunConfig(url_matcher="*.pdf")
-    api_config = CrawlerRunConfig(url_matcher=lambda url: 'api' in url)
-    default_config = CrawlerRunConfig()  # No matcher
-    
-    configs = [pdf_config, api_config, default_config]
-    
-    # Create test dispatcher
-    dispatcher = TestDispatcher()
-    
-    # Test single config
-    print("\nTest 1: Single config")
-    result = await dispatcher.crawl_url("https://example.com/file.pdf", pdf_config, "test1")
-    assert result["config_id"] == id(pdf_config)
-    print("✓ Single config works")
-    
-    # Test config list selection
-    print("\nTest 2: Config list selection")
-    test_cases = [
-        ("https://example.com/file.pdf", id(pdf_config)),
-        ("https://api.example.com/data", id(api_config)),
-        ("https://example.com/page", id(configs[0])),  # No match, uses first
-    ]
-    
-    for url, expected_id in test_cases:
-        result = await dispatcher.crawl_url(url, configs, "test")
-        assert result["config_id"] == expected_id, f"URL {url} got wrong config"
-        print(f"✓ {url} -> correct config selected")
-    
-    # Test with MemoryAdaptiveDispatcher
-    print("\nTest 3: MemoryAdaptiveDispatcher config selection")
-    mem_dispatcher = MemoryAdaptiveDispatcher()
-    
-    # Test select_config method directly
-    selected = mem_dispatcher.select_config("https://example.com/doc.pdf", configs)
-    assert selected == pdf_config
-    print("✓ MemoryAdaptiveDispatcher.select_config works")
-    
-    # Test empty config list
-    print("\nTest 4: Edge cases")
-    selected = mem_dispatcher.select_config("https://example.com", [])
-    assert isinstance(selected, CrawlerRunConfig)  # Should return default
-    print("✓ Empty config list returns default config")
-    
-    # Test None config
-    selected = mem_dispatcher.select_config("https://example.com", None)
-    assert isinstance(selected, CrawlerRunConfig)  # Should return default
-    print("✓ None config returns default config")
-    
-    print("\n" + "=" * 50)
-    print("All dispatcher tests passed! ✓")
-
-if __name__ == "__main__":
-    asyncio.run(test_dispatcher_config_selection())
--- a/tests/test_docker_api_with_llm_provider.py
+++ b/tests/test_docker_api_with_llm_provider.py
@@ -1,122 +0,0 @@
-#!/usr/bin/env python3
-"""Test script to verify Docker API with LLM provider configuration."""
-
-import requests
-import json
-import time
-
-BASE_URL = "http://localhost:11235"
-
-def test_health():
-    """Test health endpoint."""
-    print("1. Testing health endpoint...")
-    response = requests.get(f"{BASE_URL}/health")
-    print(f"   Status: {response.status_code}")
-    print(f"   Response: {response.json()}")
-    print()
-
-def test_schema():
-    """Test schema endpoint to see configuration."""
-    print("2. Testing schema endpoint...")
-    response = requests.get(f"{BASE_URL}/schema")
-    print(f"   Status: {response.status_code}")
-    # Print only browser config to keep output concise
-    print(f"   Browser config keys: {list(response.json().get('browser', {}).keys())[:5]}...")
-    print()
-
-def test_markdown_with_llm_filter():
-    """Test markdown endpoint with LLM filter (should use configured provider)."""
-    print("3. Testing markdown endpoint with LLM filter...")
-    print("   This should use the Groq provider from LLM_PROVIDER env var")
-    
-    # Note: This will fail with dummy API keys, but we can see if it tries to use Groq
-    payload = {
-        "url": "https://httpbin.org/html",
-        "f": "llm",
-        "q": "Extract the main content"
-    }
-    
-    response = requests.post(f"{BASE_URL}/md", json=payload)
-    print(f"   Status: {response.status_code}")
-    
-    if response.status_code != 200:
-        print(f"   Error: {response.text[:200]}...")
-    else:
-        print(f"   Success! Markdown length: {len(response.json().get('markdown', ''))} chars")
-    print()
-
-def test_markdown_with_provider_override():
-    """Test markdown endpoint with provider override in request."""
-    print("4. Testing markdown endpoint with provider override...")
-    print("   This should use OpenAI provider from request parameter")
-    
-    payload = {
-        "url": "https://httpbin.org/html",
-        "f": "llm",
-        "q": "Extract the main content",
-        "provider": "openai/gpt-4"  # Override to use OpenAI
-    }
-    
-    response = requests.post(f"{BASE_URL}/md", json=payload)
-    print(f"   Status: {response.status_code}")
-    
-    if response.status_code != 200:
-        print(f"   Error: {response.text[:200]}...")
-    else:
-        print(f"   Success! Markdown length: {len(response.json().get('markdown', ''))} chars")
-    print()
-
-def test_simple_crawl():
-    """Test simple crawl without LLM."""
-    print("5. Testing simple crawl (no LLM required)...")
-    
-    payload = {
-        "urls": ["https://httpbin.org/html"],
-        "browser_config": {
-            "type": "BrowserConfig",
-            "params": {"headless": True}
-        },
-        "crawler_config": {
-            "type": "CrawlerRunConfig",
-            "params": {"cache_mode": "bypass"}
-        }
-    }
-    
-    response = requests.post(f"{BASE_URL}/crawl", json=payload)
-    print(f"   Status: {response.status_code}")
-    
-    if response.status_code == 200:
-        result = response.json()
-        print(f"   Success: {result.get('success')}")
-        print(f"   Results count: {len(result.get('results', []))}")
-        if result.get('results'):
-            print(f"   First result success: {result['results'][0].get('success')}")
-    else:
-        print(f"   Error: {response.text[:200]}...")
-    print()
-
-def test_playground():
-    """Test if playground is accessible."""
-    print("6. Testing playground interface...")
-    response = requests.get(f"{BASE_URL}/playground")
-    print(f"   Status: {response.status_code}")
-    print(f"   Content-Type: {response.headers.get('content-type')}")
-    print()
-
-if __name__ == "__main__":
-    print("=== Crawl4AI Docker API Tests ===\n")
-    print(f"Testing API at {BASE_URL}\n")
-    
-    # Wait a bit for server to be fully ready
-    time.sleep(2)
-    
-    test_health()
-    test_schema()
-    test_simple_crawl()
-    test_playground()
-    
-    print("\nTesting LLM functionality (these may fail with dummy API keys):\n")
-    test_markdown_with_llm_filter()
-    test_markdown_with_provider_override()
-    
-    print("\nTests completed!")
--- a/tests/test_memory_macos.py
+++ b/tests/test_memory_macos.py
@@ -1,71 +0,0 @@
-#!/usr/bin/env python3
-"""Test script to verify macOS memory calculation accuracy."""
-
-import psutil
-import platform
-import time
-from crawl4ai.memory_utils import get_true_memory_usage_percent, get_memory_stats, get_true_available_memory_gb
-
-
-def test_memory_calculation():
-    """Test and compare memory calculations."""
-    print(f"Platform: {platform.system()}")
-    print(f"Python version: {platform.python_version()}")
-    print("-" * 60)
-    
-    # Get psutil's view
-    vm = psutil.virtual_memory()
-    psutil_percent = vm.percent
-    psutil_available_gb = vm.available / (1024**3)
-    total_gb = vm.total / (1024**3)
-    
-    # Get our corrected view
-    true_percent = get_true_memory_usage_percent()
-    true_available_gb = get_true_available_memory_gb()
-    true_percent_calc, available_calc, total_calc = get_memory_stats()
-    
-    print("Memory Statistics Comparison:")
-    print(f"Total Memory: {total_gb:.2f} GB")
-    print()
-    
-    print("PSUtil (Standard) Calculation:")
-    print(f"  - Memory Used: {psutil_percent:.1f}%")
-    print(f"  - Available: {psutil_available_gb:.2f} GB")
-    print()
-    
-    print("Platform-Aware Calculation:")
-    print(f"  - Memory Used: {true_percent:.1f}%")
-    print(f"  - Available: {true_available_gb:.2f} GB")
-    print(f"  - Difference: {true_available_gb - psutil_available_gb:.2f} GB of reclaimable memory")
-    print()
-    
-    # Show the impact on dispatcher behavior
-    print("Impact on MemoryAdaptiveDispatcher:")
-    thresholds = {
-        "Normal": 90.0,
-        "Critical": 95.0,
-        "Recovery": 85.0
-    }
-    
-    for name, threshold in thresholds.items():
-        psutil_triggered = psutil_percent >= threshold
-        true_triggered = true_percent >= threshold
-        print(f"  - {name} Threshold ({threshold}%):")
-        print(f"    PSUtil: {'TRIGGERED' if psutil_triggered else 'OK'}")
-        print(f"    Platform-Aware: {'TRIGGERED' if true_triggered else 'OK'}")
-        if psutil_triggered != true_triggered:
-            print(f"    → Difference: Platform-aware prevents false {'pressure' if psutil_triggered else 'recovery'}")
-    print()
-    
-    # Monitor for a few seconds
-    print("Monitoring memory for 10 seconds...")
-    for i in range(10):
-        vm = psutil.virtual_memory()
-        true_pct = get_true_memory_usage_percent()
-        print(f"  {i+1}s - PSUtil: {vm.percent:.1f}% | Platform-Aware: {true_pct:.1f}%", end="\r")
-        time.sleep(1)
-    print("\n")
-
-
-if __name__ == "__main__":
-    test_memory_calculation()
--- a/tests/test_multi_config.py
+++ b/tests/test_multi_config.py
@@ -1,117 +0,0 @@
-"""
-Test example for multiple crawler configs feature
-"""
-import asyncio
-import sys
-from pathlib import Path
-
-# Add parent directory to path for imports
-sys.path.insert(0, str(Path(__file__).parent.parent))
-
-from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, MatchMode, CacheMode
-
-async def test_multi_config():
-    # Create different configs for different URL patterns
-    
-    # Config for PDF files
-    pdf_config = CrawlerRunConfig(
-        url_matcher="*.pdf",
-    )
-    
-    # Config for articles (using multiple patterns with OR logic)
-    article_config = CrawlerRunConfig(
-        url_matcher=["*/news/*", "*blog*", "*/article/*"],
-        match_mode=MatchMode.OR,
-        screenshot=True,
-    )
-    
-    # Config using custom matcher function
-    api_config = CrawlerRunConfig(
-        url_matcher=lambda url: 'api' in url or 'json' in url,
-    )
-    
-    # Config combining patterns and functions with AND logic
-    secure_docs_config = CrawlerRunConfig(
-        url_matcher=[
-            "*.doc*",  # Matches .doc, .docx
-            lambda url: url.startswith('https://')  # Must be HTTPS
-        ],
-        match_mode=MatchMode.AND,
-    )
-    
-    # Default config (no url_matcher means it won't match anything unless it's the fallback)
-    default_config = CrawlerRunConfig(
-        # cache_mode=CacheMode.BYPASS,
-    )
-    
-    # List of configs - order matters! First match wins
-    configs = [
-        pdf_config,
-        article_config, 
-        api_config,
-        secure_docs_config,
-        default_config  # Fallback
-    ]
-    
-    # Test URLs - using real URLs that exist
-    test_urls = [
-        "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",  # Real PDF
-        "https://www.bbc.com/news/articles/c5y3e3glnldo",  # News article
-        "https://blog.python.org/",  # Blog URL  
-        "https://api.github.com/users/github",  # GitHub API (returns JSON)
-        "https://httpbin.org/json",  # API endpoint that returns JSON
-        "https://www.python.org/",  # Generic HTTPS page
-        "http://info.cern.ch/",  # HTTP (not HTTPS) page
-        "https://example.com/",  # → Default config
-    ]
-    
-    # Test the matching logic
-    print("Config matching test:")
-    print("-" * 50)
-    for url in test_urls:
-        for i, config in enumerate(configs):
-            if config.is_match(url):
-                print(f"{url} -> Config {i} matches")
-                break
-        else:
-            print(f"{url} -> No match, will use fallback (first config)")
-    
-    print("\n" + "=" * 50 + "\n")
-    
-    # Now test with actual crawler
-    async with AsyncWebCrawler() as crawler:
-        # Single config - traditional usage still works
-        print("Test 1: Single config (backwards compatible)")
-        result = await crawler.arun_many(
-            urls=["https://www.python.org/"],
-            config=default_config
-        )
-        print(f"Crawled {len(result)} URLs with single config\n")
-        
-        # Multiple configs - new feature
-        print("Test 2: Multiple configs")
-        # Just test with 2 URLs to avoid timeout
-        results = await crawler.arun_many(
-            urls=test_urls[:2],  # Just test first 2 URLs
-            config=configs  # Pass list of configs
-        )
-        print(f"Crawled {len(results)} URLs with multiple configs")
-        
-        # Using custom matcher inline
-        print("\nTest 3: Inline custom matcher")
-        custom_config = CrawlerRunConfig(
-            url_matcher=lambda url: len(url) > 50 and 'python' in url.lower(),
-            verbose=False
-        )
-        results = await crawler.arun_many(
-            urls=[
-                "https://docs.python.org/3/library/asyncio.html",  # Long URL with 'python'
-                "https://python.org/",  # Short URL with 'python' - won't match
-                "https://www.google.com/"  # No 'python' - won't match
-            ],
-            config=[custom_config, default_config]
-        )
-        print(f"Crawled {len(results)} URLs with custom matcher")
-
-if __name__ == "__main__":
-    asyncio.run(test_multi_config())
--- a/tests/validity/test_head_change_detection.py
+++ b/tests/validity/test_head_change_detection.py
@@ -0,0 +1,211 @@
+import asyncio
+import httpx
+import email.utils
+from datetime import datetime
+import json
+from typing import Dict, Optional
+import time
+
+
+async def should_crawl(url: str, cache: Optional[Dict[str, str]] = None) -> bool:
+    """
+    Check if a URL should be crawled based on HEAD request headers.
+    
+    Args:
+        url: The URL to check
+        cache: Previous cache data containing etag, last_modified, digest, content_length
+    
+    Returns:
+        True if the page has changed and should be crawled, False otherwise
+    """
+    if cache is None:
+        cache = {}
+    
+    headers = {
+        "Accept-Encoding": "identity",
+        "Want-Content-Digest": "sha-256",
+    }
+    
+    if cache.get("etag"):
+        headers["If-None-Match"] = cache["etag"]
+    if cache.get("last_modified"):
+        headers["If-Modified-Since"] = cache["last_modified"]
+    
+    try:
+        async with httpx.AsyncClient(follow_redirects=True, timeout=5) as client:
+            response = await client.head(url, headers=headers)
+        
+        # 304 Not Modified - content hasn't changed
+        if response.status_code == 304:
+            print(f"✓ 304 Not Modified - No need to crawl {url}")
+            return False
+        
+        h = response.headers
+        
+        # Check Content-Digest (most reliable)
+        if h.get("content-digest") and h["content-digest"] == cache.get("digest"):
+            print(f"✓ Content-Digest matches - No need to crawl {url}")
+            return False
+        
+        # Check strong ETag
+        if h.get("etag") and h["etag"].startswith('"') and h["etag"] == cache.get("etag"):
+            print(f"✓ Strong ETag matches - No need to crawl {url}")
+            return False
+        
+        # Check Last-Modified
+        if h.get("last-modified") and cache.get("last_modified"):
+            try:
+                lm_new = email.utils.parsedate_to_datetime(h["last-modified"])
+                lm_old = email.utils.parsedate_to_datetime(cache["last_modified"])
+                if lm_new <= lm_old:
+                    print(f"✓ Last-Modified not newer - No need to crawl {url}")
+                    return False
+            except:
+                pass
+        
+        # Check Content-Length (weakest signal - only as a hint, not definitive)
+        # Note: Same content length doesn't mean same content!
+        # This should only be used when no other signals are available
+        if h.get("content-length") and cache.get("content_length"):
+            try:
+                if int(h["content-length"]) != cache.get("content_length"):
+                    print(f"✗ Content-Length changed - Should crawl {url}")
+                    return True
+                else:
+                    print(f"⚠️  Content-Length unchanged but content might have changed - Should crawl {url}")
+                    return True  # When in doubt, crawl!
+            except:
+                pass
+        
+        print(f"✗ Content has changed - Should crawl {url}")
+        return True
+        
+    except Exception as e:
+        print(f"✗ Error checking {url}: {e}")
+        return True  # On error, assume we should crawl
+
+
+async def crawl_page(url: str) -> Dict[str, str]:
+    """
+    Simulate crawling a page and extracting cache headers.
+    """
+    print(f"\n🕷️  Crawling {url}...")
+    
+    async with httpx.AsyncClient(follow_redirects=True, timeout=10) as client:
+        response = await client.get(url)
+    
+    cache_data = {}
+    h = response.headers
+    
+    if h.get("etag"):
+        cache_data["etag"] = h["etag"]
+        print(f"  Stored ETag: {h['etag']}")
+    
+    if h.get("last-modified"):
+        cache_data["last_modified"] = h["last-modified"]
+        print(f"  Stored Last-Modified: {h['last-modified']}")
+    
+    if h.get("content-digest"):
+        cache_data["digest"] = h["content-digest"]
+        print(f"  Stored Content-Digest: {h['content-digest']}")
+    
+    if h.get("content-length"):
+        cache_data["content_length"] = int(h["content-length"])
+        print(f"  Stored Content-Length: {h['content-length']}")
+    
+    print(f"  Response size: {len(response.content)} bytes")
+    return cache_data
+
+
+async def test_static_site():
+    """Test with a static website (example.com)"""
+    print("=" * 60)
+    print("Testing with static site: example.com")
+    print("=" * 60)
+    
+    url = "https://example.com"
+    
+    # First crawl - always happens
+    cache = await crawl_page(url)
+    
+    # Wait a bit
+    await asyncio.sleep(2)
+    
+    # Second check - should not need to crawl
+    print(f"\n📊 Checking if we need to re-crawl...")
+    needs_crawl = await should_crawl(url, cache)
+    
+    if not needs_crawl:
+        print("✅ Correctly identified: No need to re-crawl static content")
+    else:
+        print("❌ Unexpected: Static content flagged as changed")
+
+
+async def test_dynamic_site():
+    """Test with dynamic websites that change frequently"""
+    print("\n" + "=" * 60)
+    print("Testing with dynamic sites")
+    print("=" * 60)
+    
+    # Test with a few dynamic sites
+    dynamic_sites = [
+        "https://api.github.com/",  # GitHub API root (changes with rate limit info)
+        "https://worldtimeapi.org/api/timezone/UTC",  # Current time API
+        "https://httpbin.org/uuid",  # Generates new UUID each request
+    ]
+    
+    for url in dynamic_sites:
+        print(f"\n🔄 Testing {url}")
+        try:
+            # First crawl
+            cache = await crawl_page(url)
+            
+            # Wait a bit
+            await asyncio.sleep(2)
+            
+            # Check if content changed
+            print(f"\n📊 Checking if we need to re-crawl...")
+            needs_crawl = await should_crawl(url, cache)
+            
+            if needs_crawl:
+                print("✅ Correctly identified: Dynamic content has changed")
+            else:
+                print("⚠️  Note: Dynamic content appears unchanged (might have caching)")
+                
+        except Exception as e:
+            print(f"❌ Error testing {url}: {e}")
+
+
+async def test_conditional_get():
+    """Test conditional GET fallback when HEAD doesn't provide enough info"""
+    print("\n" + "=" * 60)
+    print("Testing conditional GET scenario")
+    print("=" * 60)
+    
+    url = "https://httpbin.org/etag/test-etag-123"
+    
+    # Simulate a scenario where we have an ETag
+    cache = {"etag": '"test-etag-123"'}
+    
+    print(f"Testing with cached ETag: {cache['etag']}")
+    needs_crawl = await should_crawl(url, cache)
+    
+    if not needs_crawl:
+        print("✅ ETag matched - no crawl needed")
+    else:
+        print("✅ ETag didn't match - crawl needed")
+
+
+async def main():
+    """Run all tests"""
+    print("🚀 Starting HEAD request change detection tests\n")
+    
+    await test_static_site()
+    await test_dynamic_site()
+    await test_conditional_get()
+    
+    print("\n✨ All tests completed!")
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/tests/validity/test_head_with_real_changes.py
+++ b/tests/validity/test_head_with_real_changes.py
@@ -0,0 +1,186 @@
+import asyncio
+import httpx
+import email.utils
+from datetime import datetime
+import json
+from typing import Dict, Optional
+import time
+
+
+async def should_crawl(url: str, cache: Optional[Dict[str, str]] = None) -> bool:
+    """
+    Check if a URL should be crawled based on HEAD request headers.
+    """
+    if cache is None:
+        cache = {}
+    
+    headers = {
+        "Accept-Encoding": "identity",
+        "Want-Content-Digest": "sha-256",
+        "User-Agent": "Mozilla/5.0 (compatible; crawl4ai/1.0)"
+    }
+    
+    if cache.get("etag"):
+        headers["If-None-Match"] = cache["etag"]
+    if cache.get("last_modified"):
+        headers["If-Modified-Since"] = cache["last_modified"]
+    
+    try:
+        async with httpx.AsyncClient(follow_redirects=True, timeout=5) as client:
+            response = await client.head(url, headers=headers)
+        
+        print(f"\nHEAD Response Status: {response.status_code}")
+        print(f"Headers received: {dict(response.headers)}")
+        
+        # 304 Not Modified
+        if response.status_code == 304:
+            return False
+        
+        h = response.headers
+        
+        # Check headers in order of reliability
+        if h.get("content-digest") and h["content-digest"] == cache.get("digest"):
+            return False
+        
+        if h.get("etag") and h["etag"].startswith('"') and h["etag"] == cache.get("etag"):
+            return False
+        
+        if h.get("last-modified") and cache.get("last_modified"):
+            try:
+                lm_new = email.utils.parsedate_to_datetime(h["last-modified"])
+                lm_old = email.utils.parsedate_to_datetime(cache["last_modified"])
+                if lm_new <= lm_old:
+                    return False
+            except:
+                pass
+        
+        # Check Content-Length (weakest signal - only as a hint, not definitive)
+        # Note: Same content length doesn't mean same content!
+        if h.get("content-length") and cache.get("content_length"):
+            try:
+                if int(h["content-length"]) != cache.get("content_length"):
+                    return True  # Length changed, likely content changed
+                # If length is same, we can't be sure - default to crawling
+            except:
+                pass
+        
+        return True
+        
+    except Exception as e:
+        print(f"Error during HEAD request: {e}")
+        return True
+
+
+async def test_with_changing_content():
+    """Test with a real changing website"""
+    print("=" * 60)
+    print("Testing with real changing content")
+    print("=" * 60)
+    
+    # Using httpbin's cache endpoint that changes after specified seconds
+    url = "https://httpbin.org/cache/1"  # Cache for 1 second
+    
+    print(f"\n1️⃣ First request to {url}")
+    async with httpx.AsyncClient() as client:
+        response1 = await client.get(url)
+        cache = {}
+        if response1.headers.get("etag"):
+            cache["etag"] = response1.headers["etag"]
+        if response1.headers.get("last-modified"):
+            cache["last_modified"] = response1.headers["last-modified"]
+        print(f"Cached ETag: {cache.get('etag', 'None')}")
+        print(f"Cached Last-Modified: {cache.get('last_modified', 'None')}")
+    
+    # Check immediately (should not need crawl)
+    print(f"\n2️⃣ Checking immediately after first request...")
+    needs_crawl = await should_crawl(url, cache)
+    print(f"Result: {'NEED TO CRAWL' if needs_crawl else 'NO NEED TO CRAWL'}")
+    
+    # Wait for cache to expire
+    print(f"\n⏳ Waiting 2 seconds for cache to expire...")
+    await asyncio.sleep(2)
+    
+    # Check again (should need crawl now)
+    print(f"\n3️⃣ Checking after cache expiry...")
+    needs_crawl = await should_crawl(url, cache)
+    print(f"Result: {'NEED TO CRAWL' if needs_crawl else 'NO NEED TO CRAWL'}")
+
+
+async def test_news_website():
+    """Test with a news website that updates frequently"""
+    print("\n" + "=" * 60)
+    print("Testing with news website (BBC)")
+    print("=" * 60)
+    
+    url = "https://www.bbc.com"
+    
+    print(f"\n1️⃣ First crawl of {url}")
+    async with httpx.AsyncClient() as client:
+        response1 = await client.get(url)
+        cache = {}
+        h = response1.headers
+        
+        if h.get("etag"):
+            cache["etag"] = h["etag"]
+            print(f"Stored ETag: {h['etag'][:50]}...")
+        if h.get("last-modified"):
+            cache["last_modified"] = h["last-modified"]
+            print(f"Stored Last-Modified: {h['last-modified']}")
+        if h.get("content-length"):
+            cache["content_length"] = int(h["content-length"])
+            print(f"Stored Content-Length: {h['content-length']}")
+    
+    # Check multiple times
+    for i in range(3):
+        await asyncio.sleep(5)
+        print(f"\n📊 Check #{i+2} - {datetime.now().strftime('%H:%M:%S')}")
+        needs_crawl = await should_crawl(url, cache)
+        print(f"Result: {'NEED TO CRAWL ✓' if needs_crawl else 'NO NEED TO CRAWL ✗'}")
+
+
+async def test_api_endpoint():
+    """Test with an API that provides proper caching headers"""
+    print("\n" + "=" * 60)
+    print("Testing with GitHub API")
+    print("=" * 60)
+    
+    # GitHub user API (updates when user data changes)
+    url = "https://api.github.com/users/github"
+    
+    headers = {"User-Agent": "crawl4ai-test"}
+    
+    print(f"\n1️⃣ First request to {url}")
+    async with httpx.AsyncClient() as client:
+        response1 = await client.get(url, headers=headers)
+        cache = {}
+        h = response1.headers
+        
+        if h.get("etag"):
+            cache["etag"] = h["etag"]
+            print(f"Stored ETag: {h['etag']}")
+        if h.get("last-modified"):
+            cache["last_modified"] = h["last-modified"]
+            print(f"Stored Last-Modified: {h['last-modified']}")
+        
+        # Print rate limit info
+        print(f"Rate Limit Remaining: {h.get('x-ratelimit-remaining', 'N/A')}")
+    
+    # Check if content changed
+    print(f"\n2️⃣ Checking if content changed...")
+    needs_crawl = await should_crawl(url, cache)
+    print(f"Result: {'NEED TO CRAWL' if needs_crawl else 'NO NEED TO CRAWL (content unchanged)'}")
+
+
+async def main():
+    """Run all tests"""
+    print("🚀 Testing HEAD request change detection with real websites\n")
+    
+    await test_with_changing_content()
+    await test_news_website()
+    await test_api_endpoint()
+    
+    print("\n✨ All tests completed!")
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/tests/validity/test_smart_cache_mode.py
+++ b/tests/validity/test_smart_cache_mode.py
@@ -0,0 +1,196 @@
+"""
+Test SMART cache mode functionality in crawl4ai.
+
+This test demonstrates:
+1. Initial crawl with caching enabled
+2. Re-crawl with SMART mode on static content (should use cache)
+3. Re-crawl with SMART mode on dynamic content (should re-crawl)
+"""
+
+import asyncio
+from crawl4ai import AsyncWebCrawler
+from crawl4ai.async_configs import CrawlerRunConfig
+from crawl4ai.cache_context import CacheMode
+import time
+from datetime import datetime
+
+
+async def test_smart_cache_mode():
+    """Test the SMART cache mode with both static and dynamic URLs"""
+    
+    print("=" * 60)
+    print("Testing SMART Cache Mode")
+    print("=" * 60)
+    
+    # URLs for testing
+    static_url = "https://example.com"  # Rarely changes
+    dynamic_url = "https://httpbin.org/uuid"  # Changes every request
+    
+    async with AsyncWebCrawler(verbose=True) as crawler:
+        
+        # Test 1: Initial crawl with caching enabled
+        print("\n1️⃣ Initial crawl with ENABLED cache mode")
+        print("-" * 40)
+        
+        # Crawl static URL
+        config_static = CrawlerRunConfig(
+            cache_mode=CacheMode.ENABLED,
+            verbose=True
+        )
+        result_static_1 = await crawler.arun(url=static_url, config=config_static)
+        print(f"✓ Static URL crawled: {len(result_static_1.html)} bytes")
+        print(f"  Response headers: {list(result_static_1.response_headers.keys())[:5]}...")
+        
+        # Crawl dynamic URL
+        config_dynamic = CrawlerRunConfig(
+            cache_mode=CacheMode.ENABLED,
+            verbose=True
+        )
+        result_dynamic_1 = await crawler.arun(url=dynamic_url, config=config_dynamic)
+        print(f"✓ Dynamic URL crawled: {len(result_dynamic_1.html)} bytes")
+        dynamic_content_1 = result_dynamic_1.html
+        
+        # Wait a bit
+        await asyncio.sleep(2)
+        
+        # Test 2: Re-crawl static URL with SMART mode
+        print("\n2️⃣ Re-crawl static URL with SMART cache mode")
+        print("-" * 40)
+        
+        config_smart = CrawlerRunConfig(
+            cache_mode=CacheMode.SMART,  # This will be our new mode
+            verbose=True
+        )
+        
+        start_time = time.time()
+        result_static_2 = await crawler.arun(url=static_url, config=config_smart)
+        elapsed = time.time() - start_time
+        
+        print(f"✓ Static URL with SMART mode completed in {elapsed:.2f}s")
+        print(f"  Should use cache (content unchanged)")
+        print(f"  HTML length: {len(result_static_2.html)} bytes")
+        
+        # Test 3: Re-crawl dynamic URL with SMART mode
+        print("\n3️⃣ Re-crawl dynamic URL with SMART cache mode")
+        print("-" * 40)
+        
+        start_time = time.time()
+        result_dynamic_2 = await crawler.arun(url=dynamic_url, config=config_smart)
+        elapsed = time.time() - start_time
+        dynamic_content_2 = result_dynamic_2.html
+        
+        print(f"✓ Dynamic URL with SMART mode completed in {elapsed:.2f}s")
+        print(f"  Should re-crawl (content changes every request)")
+        print(f"  HTML length: {len(result_dynamic_2.html)} bytes")
+        print(f"  Content changed: {dynamic_content_1 != dynamic_content_2}")
+        
+        # Test 4: Test with a news website (content changes frequently)
+        print("\n4️⃣ Testing with news website")
+        print("-" * 40)
+        
+        news_url = "https://news.ycombinator.com"
+        
+        # First crawl
+        result_news_1 = await crawler.arun(
+            url=news_url, 
+            config=CrawlerRunConfig(cache_mode=CacheMode.ENABLED)
+        )
+        print(f"✓ News site initial crawl: {len(result_news_1.html)} bytes")
+        
+        # Wait a bit
+        await asyncio.sleep(5)
+        
+        # Re-crawl with SMART mode
+        start_time = time.time()
+        result_news_2 = await crawler.arun(
+            url=news_url,
+            config=CrawlerRunConfig(cache_mode=CacheMode.SMART)
+        )
+        elapsed = time.time() - start_time
+        
+        print(f"✓ News site SMART mode completed in {elapsed:.2f}s")
+        print(f"  Content length changed: {len(result_news_1.html) != len(result_news_2.html)}")
+        
+        # Summary
+        print("\n" + "=" * 60)
+        print("Summary")
+        print("=" * 60)
+        print("✅ SMART cache mode should:")
+        print("   - Use cache for static content (example.com)")
+        print("   - Re-crawl dynamic content (httpbin.org/uuid)")
+        print("   - Make intelligent decisions based on HEAD requests")
+        print("   - Save bandwidth on unchanged content")
+
+
+async def test_smart_cache_edge_cases():
+    """Test edge cases for SMART cache mode"""
+    
+    print("\n" + "=" * 60)
+    print("Testing SMART Cache Mode Edge Cases")
+    print("=" * 60)
+    
+    async with AsyncWebCrawler(verbose=True) as crawler:
+        
+        # Test with URL that doesn't support HEAD
+        print("\n🔧 Testing URL with potential HEAD issues")
+        print("-" * 40)
+        
+        # Some servers don't handle HEAD well
+        problematic_url = "https://httpbin.org/status/200"
+        
+        # Initial crawl
+        await crawler.arun(
+            url=problematic_url,
+            config=CrawlerRunConfig(cache_mode=CacheMode.ENABLED)
+        )
+        
+        # Try SMART mode
+        result = await crawler.arun(
+            url=problematic_url,
+            config=CrawlerRunConfig(cache_mode=CacheMode.SMART)
+        )
+        print(f"✓ Handled potentially problematic URL: {result.success}")
+        
+        # Test with URL that has no caching headers
+        print("\n🔧 Testing URL with no cache headers")
+        print("-" * 40)
+        
+        no_cache_url = "https://httpbin.org/html"
+        
+        # Initial crawl
+        await crawler.arun(
+            url=no_cache_url,
+            config=CrawlerRunConfig(cache_mode=CacheMode.ENABLED)
+        )
+        
+        # SMART mode should handle gracefully
+        result = await crawler.arun(
+            url=no_cache_url,
+            config=CrawlerRunConfig(cache_mode=CacheMode.SMART)
+        )
+        print(f"✓ Handled URL with no cache headers: {result.success}")
+
+
+async def main():
+    """Run all tests"""
+    try:
+        # Run main test
+        await test_smart_cache_mode()
+        
+        # Run edge case tests
+        await test_smart_cache_edge_cases()
+        
+        print("\n✨ All tests completed!")
+        
+    except Exception as e:
+        print(f"\n❌ Error during testing: {e}")
+        import traceback
+        traceback.print_exc()
+
+
+if __name__ == "__main__":
+    # Note: This test will fail until SMART mode is implemented
+    print("⚠️  Note: This test expects CacheMode.SMART to be implemented")
+    print("⚠️  It will fail with AttributeError until the feature is added\n")
+    
+    asyncio.run(main())
--- a/tests/validity/test_smart_cache_simple.py
+++ b/tests/validity/test_smart_cache_simple.py
@@ -0,0 +1,69 @@
+"""
+Simple test for SMART cache mode functionality.
+"""
+
+import sys
+import os
+sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))
+
+import asyncio
+from crawl4ai import AsyncWebCrawler
+from crawl4ai.async_configs import CrawlerRunConfig
+from crawl4ai.cache_context import CacheMode
+import time
+
+
+async def test_smart_cache():
+    """Test SMART cache mode with a simple example"""
+    
+    print("Testing SMART Cache Mode")
+    print("-" * 40)
+    
+    # Test URL
+    url = "https://example.com"
+    
+    async with AsyncWebCrawler(verbose=True) as crawler:
+        # First crawl with normal caching
+        print("\n1. Initial crawl with ENABLED mode:")
+        config1 = CrawlerRunConfig(cache_mode=CacheMode.ENABLED)
+        result1 = await crawler.arun(url=url, config=config1)
+        print(f"   Crawled: {len(result1.html)} bytes")
+        print(f"   Headers: {list(result1.response_headers.keys())[:3]}...")
+        
+        # Wait a moment
+        await asyncio.sleep(2)
+        
+        # Re-crawl with SMART mode
+        print("\n2. Re-crawl with SMART mode:")
+        config2 = CrawlerRunConfig(cache_mode=CacheMode.SMART)
+        start = time.time()
+        result2 = await crawler.arun(url=url, config=config2)
+        elapsed = time.time() - start
+        
+        print(f"   Time: {elapsed:.2f}s")
+        print(f"   Result: {len(result2.html)} bytes")
+        print(f"   Should use cache (content unchanged)")
+        
+        # Test with dynamic content
+        print("\n3. Testing with dynamic URL:")
+        dynamic_url = "https://httpbin.org/uuid"
+        
+        # First crawl
+        config3 = CrawlerRunConfig(cache_mode=CacheMode.ENABLED)
+        result3 = await crawler.arun(url=dynamic_url, config=config3)
+        content1 = result3.html
+        
+        # Re-crawl with SMART
+        config4 = CrawlerRunConfig(cache_mode=CacheMode.SMART)
+        result4 = await crawler.arun(url=dynamic_url, config=config4)
+        content2 = result4.html
+        
+        print(f"   Content changed: {content1 != content2}")
+        print(f"   Should re-crawl (dynamic content)")
+
+
+if __name__ == "__main__":
+    print(f"Python path: {sys.path[0]}")
+    print(f"CacheMode values: {[e.value for e in CacheMode]}")
+    print()
+    asyncio.run(test_smart_cache())
Author	SHA1	Message	Date
UncleCode	8a906fcad0	fix(dependencies): Update and clean up package versions in pyproject.toml, the bundle size will be much smaller.	2025-07-29 19:56:27 +08:00
UncleCode	54ae10d957	feat(extraction_strategy): Enhance schema generation with improved validation and task description handling fix(prompts): Update GENERATE_SCRIPT_PROMPT to raw string for better formatting docs: Add missing import for GENERATE_SCRIPT_PROMPT in hello_world example	2025-07-29 19:33:36 +08:00
UncleCode	843457a9cb	Refactor adaptive crawling state management - Renamed `CrawlState` to `AdaptiveCrawlResult` to better reflect its purpose. - Updated all references to `CrawlState` in the codebase, including method signatures and documentation. - Modified the `AdaptiveCrawler` class to initialize and manage the new `AdaptiveCrawlResult` state. - Adjusted example strategies and documentation to align with the new state class. - Ensured all tests are updated to use `AdaptiveCrawlResult` instead of `CrawlState`.	2025-07-24 20:11:43 +08:00
UncleCode	d1de82a332	feat(crawl4ai): Implement SMART cache mode This commit introduces a new cache mode, SMART, to the crawl4ai library. The SMART mode intelligently validates cached content using HEAD requests before using it, saving significant bandwidth while ensuring fresh content. The changes include modifications to the async_webcrawler.py, cache_context.py, and utils.py files in the crawl4ai directory. The async_webcrawler.py file now includes a check for the SMART cache mode and performs a HEAD check to see if the content has changed. If the content has changed, the url is re-crawled; otherwise, the cached result is used. The cache_context.py and utils.py files have been updated to support these changes. The documentation has also been updated to reflect these changes. The cache-modes.md file now includes a detailed explanation of the SMART mode, its logs, limitations, and an advanced example. The examples.md file now includes a link to the SMART Cache Mode example. The quickstart.md file now mentions the SMART mode in the note about cache modes. These changes improve the efficiency of the crawl4ai library by reducing unnecessary re-crawling and bandwidth usage. BREAKING CHANGE: The introduction of the SMART cache mode may affect existing code that uses the crawl4ai library and does not expect this new mode. Users should review the updated documentation to understand how to use this new mode.	2025-07-21 21:19:37 +08:00
UncleCode	8a04351406	feat(crawl4ai): Update to version 0.7.1 with improvements and new tests This commit includes several updates to the crawl4ai package, including changes to the browser manager and content scraping strategy. The version number has been updated to 0.7.1. Significant modifications have been made to the documentation, including updates to the release notes for version 0.7.0 and the addition of release notes for version 0.7.1. Examples and core documentation have also been updated to reflect the changes in this version. Additionally, a new simple API test has been added to the Docker tests. These changes were made to improve the functionality of the crawl4ai package and to provide clearer, more up-to-date documentation for users. The new test will help ensure the API is working as expected. BREAKING CHANGE: The updates to the browser manager and content scraping strategy may affect how these components interact with the rest of the package. Users should review the updated documentation for details on these changes.	2025-07-18 16:27:19 +08:00