Merge branch 'release/v0.7.5' into develop

2025-10-22 13:16:16 +02:00
parent 3efb59fb9a f6a02c4358
commit 7e8fb3a8f3
9 changed files with 3217 additions and 48 deletions
--- a/README.md
+++ b/README.md
@@ -27,11 +27,13 @@

 Crawl4AI turns the web into clean, LLM ready Markdown for RAG, agents, and data pipelines. Fast, controllable, battle tested by a 50k+ star community.

-[✨ Check out latest update v0.7.4](#-recent-updates)
+[✨ Check out latest update v0.7.5](#-recent-updates)

-✨ New in v0.7.4: Revolutionary LLM Table Extraction with intelligent chunking, enhanced concurrency fixes, memory management refactor, and critical stability improvements. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.4.md)
+✨ New in v0.7.5: Docker Hooks System with function-based API for pipeline customization, Enhanced LLM Integration with custom providers, HTTPS Preservation, and multiple community-reported bug fixes. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.5.md)

-✨ Recent v0.7.3: Undetected Browser Support, Multi-URL Configurations, Memory Monitoring, Enhanced Table Extraction, GitHub Sponsors. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.3.md)
+✨ Recent v0.7.4: Revolutionary LLM Table Extraction with intelligent chunking, enhanced concurrency fixes, memory management refactor, and critical stability improvements. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.4.md)
+
+✨ Previous v0.7.3: Undetected Browser Support, Multi-URL Configurations, Memory Monitoring, Enhanced Table Extraction, GitHub Sponsors. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.3.md)

 <details>
  <summary>🤓 <strong>My Personal Story</strong></summary>
@@ -177,7 +179,7 @@ No rate-limited APIs. No lock-in. Build and own your data pipeline with direct g
 - 📸 **Screenshots**: Capture page screenshots during crawling for debugging or analysis.
 - 📂 **Raw Data Crawling**: Directly process raw HTML (`raw:`) or local files (`file://`).
 - 🔗 **Comprehensive Link Extraction**: Extracts internal, external links, and embedded iframe content.
- 🛠️ **Customizable Hooks**: Define hooks at every step to customize crawling behavior.
+- 🛠️ **Customizable Hooks**: Define hooks at every step to customize crawling behavior (supports both string and function-based APIs).
 - 💾 **Caching**: Cache data for improved speed and to avoid redundant fetches.
 - 📄 **Metadata Extraction**: Retrieve structured metadata from web pages.
 - 📡 **IFrame Content Extraction**: Seamless extraction from embedded iframe content.
@@ -544,6 +546,54 @@ async def test_news_crawl():

 ## ✨ Recent Updates

+<details>
+<summary><strong>Version 0.7.5 Release Highlights - The Docker Hooks & Security Update</strong></summary>
+
+- **🔧 Docker Hooks System**: Complete pipeline customization with user-provided Python functions at 8 key points
+- **✨ Function-Based Hooks API (NEW)**: Write hooks as regular Python functions with full IDE support:
+  ```python
+  from crawl4ai import hooks_to_string
+  from crawl4ai.docker_client import Crawl4aiDockerClient
+
+  # Define hooks as regular Python functions
+  async def on_page_context_created(page, context, **kwargs):
+      """Block images to speed up crawling"""
+      await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
+      await page.set_viewport_size({"width": 1920, "height": 1080})
+      return page
+
+  async def before_goto(page, context, url, **kwargs):
+      """Add custom headers"""
+      await page.set_extra_http_headers({'X-Crawl4AI': 'v0.7.5'})
+      return page
+
+  # Option 1: Use hooks_to_string() utility for REST API
+  hooks_code = hooks_to_string({
+      "on_page_context_created": on_page_context_created,
+      "before_goto": before_goto
+  })
+
+  # Option 2: Docker client with automatic conversion (Recommended)
+  client = Crawl4aiDockerClient(base_url="http://localhost:11235")
+  results = await client.crawl(
+      urls=["https://httpbin.org/html"],
+      hooks={
+          "on_page_context_created": on_page_context_created,
+          "before_goto": before_goto
+      }
+  )
+  # ✓ Full IDE support, type checking, and reusability!
+  ```
+
+- **🤖 Enhanced LLM Integration**: Custom providers with temperature control and base_url configuration
+- **🔒 HTTPS Preservation**: Secure internal link handling with `preserve_https_for_internal_links=True`
+- **🐍 Python 3.10+ Support**: Modern language features and enhanced performance
+- **🛠️ Bug Fixes**: Resolved multiple community-reported issues including URL processing, JWT authentication, and proxy configuration
+
+[Full v0.7.5 Release Notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.5.md)
+
+</details>
+
 <details>
 <summary><strong>Version 0.7.4 Release Highlights - The Intelligent Table Extraction & Performance Update</strong></summary>

--- a/crawl4ai/version.py
+++ b/crawl4ai/version.py
@@ -1,7 +1,7 @@
 # crawl4ai/__version__.py

 # This is the version that will be used for stable releases
-__version__ = "0.7.4"
+__version__ = "0.7.5"

 # For nightly builds, this gets set during build process
 __nightly_version__ = None
--- a/docs/blog/release-v0.7.4.md
+++ b/docs/blog/release-v0.7.4.md
@@ -10,7 +10,6 @@ Today I'm releasing Crawl4AI v0.7.4—the Intelligent Table Extraction & Perform

 - **🚀 LLMTableExtraction**: Revolutionary table extraction with intelligent chunking for massive tables
 - **⚡ Enhanced Concurrency**: True concurrency improvements for fast-completing tasks in batch operations
- **🧹 Memory Management Refactor**: Streamlined memory utilities and better resource management
 - **🔧 Browser Manager Fixes**: Resolved race conditions in concurrent page creation
 - **⌨️ Cross-Platform Browser Profiler**: Improved keyboard handling and quit mechanisms
 - **🔗 Advanced URL Processing**: Better handling of raw URLs and base tag link resolution
@@ -158,40 +157,6 @@ async with AsyncWebCrawler() as crawler:
 - **Monitoring Systems**: Faster health checks and status page monitoring
 - **Data Aggregation**: Improved performance for real-time data collection

-## 🧹 Memory Management Refactor: Cleaner Architecture
-
-**The Problem:** Memory utilities were scattered and difficult to maintain, with potential import conflicts and unclear organization.
-
-**My Solution:** I consolidated all memory-related utilities into the main `utils.py` module, creating a cleaner, more maintainable architecture.
-
-### Improved Memory Handling
-
-```python
-# All memory utilities now consolidated
-from crawl4ai.utils import get_true_memory_usage_percent, MemoryMonitor
-
-# Enhanced memory monitoring
-monitor = MemoryMonitor()
-monitor.start_monitoring()
-
-async with AsyncWebCrawler() as crawler:
-    # Memory-efficient batch processing
-    results = await crawler.arun_many(large_url_list)
-    
-    # Get accurate memory metrics
-    memory_usage = get_true_memory_usage_percent()
-    memory_report = monitor.get_report()
-    
-    print(f"Memory efficiency: {memory_report['efficiency']:.1f}%")
-    print(f"Peak usage: {memory_report['peak_mb']:.1f} MB")
-```
-
-**Expected Real-World Impact:**
- **Production Stability**: More reliable memory tracking and management
- **Code Maintainability**: Cleaner architecture for easier debugging
- **Import Clarity**: Resolved potential conflicts and import issues
- **Developer Experience**: Simpler API for memory monitoring
-
 ## 🔧 Critical Stability Fixes

 ### Browser Manager Race Condition Resolution
--- a/docs/blog/release-v0.7.5.md
+++ b/docs/blog/release-v0.7.5.md
@@ -0,0 +1,318 @@
+# 🚀 Crawl4AI v0.7.5: The Docker Hooks & Security Update
+
+*September 29, 2025 • 8 min read*
+
+---
+
+Today I'm releasing Crawl4AI v0.7.5—focused on extensibility and security. This update introduces the Docker Hooks System for pipeline customization, enhanced LLM integration, and important security improvements.
+
+## 🎯 What's New at a Glance
+
+- **Docker Hooks System**: Custom Python functions at key pipeline points with function-based API
+- **Function-Based Hooks**: New `hooks_to_string()` utility with Docker client auto-conversion
+- **Enhanced LLM Integration**: Custom providers with temperature control
+- **HTTPS Preservation**: Secure internal link handling
+- **Bug Fixes**: Resolved multiple community-reported issues
+- **Improved Docker Error Handling**: Better debugging and reliability
+
+## 🔧 Docker Hooks System: Pipeline Customization
+
+Every scraping project needs custom logic—authentication, performance optimization, content processing. Traditional solutions require forking or complex workarounds. Docker Hooks let you inject custom Python functions at 8 key points in the crawling pipeline.
+
+### Real Example: Authentication & Performance
+
+```python
+import requests
+
+# Real working hooks for httpbin.org
+hooks_config = {
+    "on_page_context_created": """
+async def hook(page, context, **kwargs):
+    print("Hook: Setting up page context")
+    # Block images to speed up crawling
+    await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
+    print("Hook: Images blocked")
+    return page
+""",
+
+    "before_retrieve_html": """
+async def hook(page, context, **kwargs):
+    print("Hook: Before retrieving HTML")
+    # Scroll to bottom to load lazy content
+    await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
+    await page.wait_for_timeout(1000)
+    print("Hook: Scrolled to bottom")
+    return page
+""",
+
+    "before_goto": """
+async def hook(page, context, url, **kwargs):
+    print(f"Hook: About to navigate to {url}")
+    # Add custom headers
+    await page.set_extra_http_headers({
+        'X-Test-Header': 'crawl4ai-hooks-test'
+    })
+    return page
+"""
+}
+
+# Test with Docker API
+payload = {
+    "urls": ["https://httpbin.org/html"],
+    "hooks": {
+        "code": hooks_config,
+        "timeout": 30
+    }
+}
+
+response = requests.post("http://localhost:11235/crawl", json=payload)
+result = response.json()
+
+if result.get('success'):
+    print("✅ Hooks executed successfully!")
+    print(f"Content length: {len(result.get('markdown', ''))} characters")
+```
+
+**Available Hook Points:**
+- `on_browser_created`: Browser setup
+- `on_page_context_created`: Page context configuration
+- `before_goto`: Pre-navigation setup
+- `after_goto`: Post-navigation processing
+- `on_user_agent_updated`: User agent changes
+- `on_execution_started`: Crawl initialization
+- `before_retrieve_html`: Pre-extraction processing
+- `before_return_html`: Final HTML processing
+
+### Function-Based Hooks API
+
+Writing hooks as strings works, but lacks IDE support and type checking. v0.7.5 introduces a function-based approach with automatic conversion!
+
+**Option 1: Using the `hooks_to_string()` Utility**
+
+```python
+from crawl4ai import hooks_to_string
+import requests
+
+# Define hooks as regular Python functions (with full IDE support!)
+async def on_page_context_created(page, context, **kwargs):
+    """Block images to speed up crawling"""
+    await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
+    await page.set_viewport_size({"width": 1920, "height": 1080})
+    return page
+
+async def before_goto(page, context, url, **kwargs):
+    """Add custom headers"""
+    await page.set_extra_http_headers({
+        'X-Crawl4AI': 'v0.7.5',
+        'X-Custom-Header': 'my-value'
+    })
+    return page
+
+# Convert functions to strings
+hooks_code = hooks_to_string({
+    "on_page_context_created": on_page_context_created,
+    "before_goto": before_goto
+})
+
+# Use with REST API
+payload = {
+    "urls": ["https://httpbin.org/html"],
+    "hooks": {"code": hooks_code, "timeout": 30}
+}
+response = requests.post("http://localhost:11235/crawl", json=payload)
+```
+
+**Option 2: Docker Client with Automatic Conversion (Recommended!)**
+
+```python
+from crawl4ai.docker_client import Crawl4aiDockerClient
+
+# Define hooks as functions (same as above)
+async def on_page_context_created(page, context, **kwargs):
+    await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
+    return page
+
+async def before_retrieve_html(page, context, **kwargs):
+    # Scroll to load lazy content
+    await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
+    await page.wait_for_timeout(1000)
+    return page
+
+# Use Docker client - conversion happens automatically!
+client = Crawl4aiDockerClient(base_url="http://localhost:11235")
+
+results = await client.crawl(
+    urls=["https://httpbin.org/html"],
+    hooks={
+        "on_page_context_created": on_page_context_created,
+        "before_retrieve_html": before_retrieve_html
+    },
+    hooks_timeout=30
+)
+
+if results and results.success:
+    print(f"✅ Hooks executed! HTML length: {len(results.html)}")
+```
+
+**Benefits of Function-Based Hooks:**
+- ✅ Full IDE support (autocomplete, syntax highlighting)
+- ✅ Type checking and linting
+- ✅ Easier to test and debug
+- ✅ Reusable across projects
+- ✅ Automatic conversion in Docker client
+- ✅ No breaking changes - string hooks still work!
+
+## 🤖 Enhanced LLM Integration
+
+Enhanced LLM integration with custom providers, temperature control, and base URL configuration.
+
+### Multi-Provider Support
+
+```python
+from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
+from crawl4ai.extraction_strategy import LLMExtractionStrategy
+
+# Test with different providers
+async def test_llm_providers():
+    # OpenAI with custom temperature
+    openai_strategy = LLMExtractionStrategy(
+        provider="gemini/gemini-2.5-flash-lite",
+        api_token="your-api-token",
+        temperature=0.7,  # New in v0.7.5
+        instruction="Summarize this page in one sentence"
+    )
+
+    async with AsyncWebCrawler() as crawler:
+        result = await crawler.arun(
+            "https://example.com",
+            config=CrawlerRunConfig(extraction_strategy=openai_strategy)
+        )
+
+        if result.success:
+            print("✅ LLM extraction completed")
+            print(result.extracted_content)
+
+# Docker API with enhanced LLM config
+llm_payload = {
+    "url": "https://example.com",
+    "f": "llm",
+    "q": "Summarize this page in one sentence.",
+    "provider": "gemini/gemini-2.5-flash-lite",
+    "temperature": 0.7
+}
+
+response = requests.post("http://localhost:11235/md", json=llm_payload)
+```
+
+**New Features:**
+- Custom `temperature` parameter for creativity control
+- `base_url` for custom API endpoints
+- Multi-provider environment variable support
+- Docker API integration
+
+## 🔒 HTTPS Preservation
+
+**The Problem:** Modern web apps require HTTPS everywhere. When crawlers downgrade internal links from HTTPS to HTTP, authentication breaks and security warnings appear.
+
+**Solution:** HTTPS preservation maintains secure protocols throughout crawling.
+
+```python
+from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, FilterChain, URLPatternFilter, BFSDeepCrawlStrategy
+
+async def test_https_preservation():
+    # Enable HTTPS preservation
+    url_filter = URLPatternFilter(
+        patterns=["^(https:\/\/)?quotes\.toscrape\.com(\/.*)?$"]
+    )
+
+    config = CrawlerRunConfig(
+        exclude_external_links=True,
+        preserve_https_for_internal_links=True,  # New in v0.7.5
+        deep_crawl_strategy=BFSDeepCrawlStrategy(
+            max_depth=2,
+            max_pages=5,
+            filter_chain=FilterChain([url_filter])
+        )
+    )
+
+    async with AsyncWebCrawler() as crawler:
+        async for result in await crawler.arun(
+            url="https://quotes.toscrape.com",
+            config=config
+        ):
+            # All internal links maintain HTTPS
+            internal_links = [link['href'] for link in result.links['internal']]
+            https_links = [link for link in internal_links if link.startswith('https://')]
+
+            print(f"HTTPS links preserved: {len(https_links)}/{len(internal_links)}")
+            for link in https_links[:3]:
+                print(f"  → {link}")
+```
+
+## 🛠️ Bug Fixes and Improvements
+
+### Major Fixes
+- **URL Processing**: Fixed '+' sign preservation in query parameters (#1332)
+- **Proxy Configuration**: Enhanced proxy string parsing (old `proxy` parameter deprecated)
+- **Docker Error Handling**: Comprehensive error messages with status codes
+- **Memory Management**: Fixed leaks in long-running sessions
+- **JWT Authentication**: Fixed Docker JWT validation issues (#1442)
+- **Playwright Stealth**: Fixed stealth features for Playwright integration (#1481)
+- **API Configuration**: Fixed config handling to prevent overriding user-provided settings (#1505)
+- **Docker Filter Serialization**: Resolved JSON encoding errors in deep crawl strategy (#1419)
+- **LLM Provider Support**: Fixed custom LLM provider integration for adaptive crawler (#1291)
+- **Performance Issues**: Resolved backoff strategy failures and timeout handling (#989)
+
+### Community-Reported Issues Fixed
+This release addresses multiple issues reported by the community through GitHub issues and Discord discussions:
+- Fixed browser configuration reference errors
+- Resolved dependency conflicts with cssselect
+- Improved error messaging for failed authentications
+- Enhanced compatibility with various proxy configurations
+- Fixed edge cases in URL normalization
+
+### Configuration Updates
+```python
+# Old proxy config (deprecated)
+# browser_config = BrowserConfig(proxy="http://proxy:8080")
+
+# New enhanced proxy config
+browser_config = BrowserConfig(
+    proxy_config={
+        "server": "http://proxy:8080",
+        "username": "optional-user",
+        "password": "optional-pass"
+    }
+)
+```
+
+## 🔄 Breaking Changes
+
+1. **Python 3.10+ Required**: Upgrade from Python 3.9
+2. **Proxy Parameter Deprecated**: Use new `proxy_config` structure
+3. **New Dependency**: Added `cssselect` for better CSS handling
+
+## 🚀 Get Started
+
+```bash
+# Install latest version
+pip install crawl4ai==0.7.5
+
+# Docker deployment
+docker pull unclecode/crawl4ai:latest
+docker run -p 11235:11235 unclecode/crawl4ai:latest
+```
+
+**Try the Demo:**
+```bash
+# Run working examples
+python docs/releases_review/demo_v0.7.5.py
+```
+
+**Resources:**
+- 📖 Documentation: [docs.crawl4ai.com](https://docs.crawl4ai.com)
+- 🐙 GitHub: [github.com/unclecode/crawl4ai](https://github.com/unclecode/crawl4ai)
+- 💬 Discord: [discord.gg/crawl4ai](https://discord.gg/jP8KfhDhyN)
+- 🐦 Twitter: [@unclecode](https://x.com/unclecode)
+
+Happy crawling! 🕷️
--- a/docs/md_v2/blog/index.md
+++ b/docs/md_v2/blog/index.md
@@ -20,17 +20,26 @@ Ever wondered why your AI coding assistant struggles with your library despite c

 ## Latest Release

+### [Crawl4AI v0.7.5 – The Docker Hooks & Security Update](../blog/release-v0.7.5.md)
+*September 29, 2025*
+
+Crawl4AI v0.7.5 introduces the powerful Docker Hooks System for complete pipeline customization, enhanced LLM integration with custom providers, HTTPS preservation for modern web security, and resolves multiple community-reported issues.
+
+Key highlights:
+- **🔧 Docker Hooks System**: Custom Python functions at 8 key pipeline points for unprecedented customization
+- **🤖 Enhanced LLM Integration**: Custom providers with temperature control and base_url configuration
+- **🔒 HTTPS Preservation**: Secure internal link handling for modern web applications
+- **🐍 Python 3.10+ Support**: Modern language features and enhanced performance
+- **🛠️ Bug Fixes**: Resolved multiple community-reported issues including URL processing, JWT authentication, and proxy configuration
+
+[Read full release notes →](../blog/release-v0.7.5.md)
+
+## Recent Releases
+
 ### [Crawl4AI v0.7.4 – The Intelligent Table Extraction & Performance Update](../blog/release-v0.7.4.md)
 *August 17, 2025*

-Crawl4AI v0.7.4 introduces revolutionary LLM-powered table extraction with intelligent chunking, performance improvements for concurrent crawling, enhanced browser management, and critical stability fixes that make Crawl4AI more robust for production workloads.
-
-Key highlights:
- **🚀 LLMTableExtraction**: Revolutionary table extraction with intelligent chunking for massive tables
- **⚡ Dispatcher Bug Fix**: Fixed sequential processing issue in arun_many for fast-completing tasks
- **🧹 Memory Management Refactor**: Streamlined memory utilities and better resource management
- **🔧 Browser Manager Fixes**: Resolved race conditions in concurrent page creation
- **🔗 Advanced URL Processing**: Better handling of raw URLs and base tag link resolution
+Revolutionary LLM-powered table extraction with intelligent chunking, performance improvements for concurrent crawling, enhanced browser management, and critical stability fixes.

 [Read full release notes →](../blog/release-v0.7.4.md)

--- a/docs/md_v2/blog/releases/v0.7.5.md
+++ b/docs/md_v2/blog/releases/v0.7.5.md
@@ -0,0 +1,318 @@
+# 🚀 Crawl4AI v0.7.5: The Docker Hooks & Security Update
+
+*September 29, 2025 • 8 min read*
+
+---
+
+Today I'm releasing Crawl4AI v0.7.5—focused on extensibility and security. This update introduces the Docker Hooks System for pipeline customization, enhanced LLM integration, and important security improvements.
+
+## 🎯 What's New at a Glance
+
+- **Docker Hooks System**: Custom Python functions at key pipeline points with function-based API
+- **Function-Based Hooks**: New `hooks_to_string()` utility with Docker client auto-conversion
+- **Enhanced LLM Integration**: Custom providers with temperature control
+- **HTTPS Preservation**: Secure internal link handling
+- **Bug Fixes**: Resolved multiple community-reported issues
+- **Improved Docker Error Handling**: Better debugging and reliability
+
+## 🔧 Docker Hooks System: Pipeline Customization
+
+Every scraping project needs custom logic—authentication, performance optimization, content processing. Traditional solutions require forking or complex workarounds. Docker Hooks let you inject custom Python functions at 8 key points in the crawling pipeline.
+
+### Real Example: Authentication & Performance
+
+```python
+import requests
+
+# Real working hooks for httpbin.org
+hooks_config = {
+    "on_page_context_created": """
+async def hook(page, context, **kwargs):
+    print("Hook: Setting up page context")
+    # Block images to speed up crawling
+    await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
+    print("Hook: Images blocked")
+    return page
+""",
+
+    "before_retrieve_html": """
+async def hook(page, context, **kwargs):
+    print("Hook: Before retrieving HTML")
+    # Scroll to bottom to load lazy content
+    await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
+    await page.wait_for_timeout(1000)
+    print("Hook: Scrolled to bottom")
+    return page
+""",
+
+    "before_goto": """
+async def hook(page, context, url, **kwargs):
+    print(f"Hook: About to navigate to {url}")
+    # Add custom headers
+    await page.set_extra_http_headers({
+        'X-Test-Header': 'crawl4ai-hooks-test'
+    })
+    return page
+"""
+}
+
+# Test with Docker API
+payload = {
+    "urls": ["https://httpbin.org/html"],
+    "hooks": {
+        "code": hooks_config,
+        "timeout": 30
+    }
+}
+
+response = requests.post("http://localhost:11235/crawl", json=payload)
+result = response.json()
+
+if result.get('success'):
+    print("✅ Hooks executed successfully!")
+    print(f"Content length: {len(result.get('markdown', ''))} characters")
+```
+
+**Available Hook Points:**
+- `on_browser_created`: Browser setup
+- `on_page_context_created`: Page context configuration
+- `before_goto`: Pre-navigation setup
+- `after_goto`: Post-navigation processing
+- `on_user_agent_updated`: User agent changes
+- `on_execution_started`: Crawl initialization
+- `before_retrieve_html`: Pre-extraction processing
+- `before_return_html`: Final HTML processing
+
+### Function-Based Hooks API
+
+Writing hooks as strings works, but lacks IDE support and type checking. v0.7.5 introduces a function-based approach with automatic conversion!
+
+**Option 1: Using the `hooks_to_string()` Utility**
+
+```python
+from crawl4ai import hooks_to_string
+import requests
+
+# Define hooks as regular Python functions (with full IDE support!)
+async def on_page_context_created(page, context, **kwargs):
+    """Block images to speed up crawling"""
+    await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
+    await page.set_viewport_size({"width": 1920, "height": 1080})
+    return page
+
+async def before_goto(page, context, url, **kwargs):
+    """Add custom headers"""
+    await page.set_extra_http_headers({
+        'X-Crawl4AI': 'v0.7.5',
+        'X-Custom-Header': 'my-value'
+    })
+    return page
+
+# Convert functions to strings
+hooks_code = hooks_to_string({
+    "on_page_context_created": on_page_context_created,
+    "before_goto": before_goto
+})
+
+# Use with REST API
+payload = {
+    "urls": ["https://httpbin.org/html"],
+    "hooks": {"code": hooks_code, "timeout": 30}
+}
+response = requests.post("http://localhost:11235/crawl", json=payload)
+```
+
+**Option 2: Docker Client with Automatic Conversion (Recommended!)**
+
+```python
+from crawl4ai.docker_client import Crawl4aiDockerClient
+
+# Define hooks as functions (same as above)
+async def on_page_context_created(page, context, **kwargs):
+    await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
+    return page
+
+async def before_retrieve_html(page, context, **kwargs):
+    # Scroll to load lazy content
+    await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
+    await page.wait_for_timeout(1000)
+    return page
+
+# Use Docker client - conversion happens automatically!
+client = Crawl4aiDockerClient(base_url="http://localhost:11235")
+
+results = await client.crawl(
+    urls=["https://httpbin.org/html"],
+    hooks={
+        "on_page_context_created": on_page_context_created,
+        "before_retrieve_html": before_retrieve_html
+    },
+    hooks_timeout=30
+)
+
+if results and results.success:
+    print(f"✅ Hooks executed! HTML length: {len(results.html)}")
+```
+
+**Benefits of Function-Based Hooks:**
+- ✅ Full IDE support (autocomplete, syntax highlighting)
+- ✅ Type checking and linting
+- ✅ Easier to test and debug
+- ✅ Reusable across projects
+- ✅ Automatic conversion in Docker client
+- ✅ No breaking changes - string hooks still work!
+
+## 🤖 Enhanced LLM Integration
+
+Enhanced LLM integration with custom providers, temperature control, and base URL configuration.
+
+### Multi-Provider Support
+
+```python
+from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
+from crawl4ai.extraction_strategy import LLMExtractionStrategy
+
+# Test with different providers
+async def test_llm_providers():
+    # OpenAI with custom temperature
+    openai_strategy = LLMExtractionStrategy(
+        provider="gemini/gemini-2.5-flash-lite",
+        api_token="your-api-token",
+        temperature=0.7,  # New in v0.7.5
+        instruction="Summarize this page in one sentence"
+    )
+
+    async with AsyncWebCrawler() as crawler:
+        result = await crawler.arun(
+            "https://example.com",
+            config=CrawlerRunConfig(extraction_strategy=openai_strategy)
+        )
+
+        if result.success:
+            print("✅ LLM extraction completed")
+            print(result.extracted_content)
+
+# Docker API with enhanced LLM config
+llm_payload = {
+    "url": "https://example.com",
+    "f": "llm",
+    "q": "Summarize this page in one sentence.",
+    "provider": "gemini/gemini-2.5-flash-lite",
+    "temperature": 0.7
+}
+
+response = requests.post("http://localhost:11235/md", json=llm_payload)
+```
+
+**New Features:**
+- Custom `temperature` parameter for creativity control
+- `base_url` for custom API endpoints
+- Multi-provider environment variable support
+- Docker API integration
+
+## 🔒 HTTPS Preservation
+
+**The Problem:** Modern web apps require HTTPS everywhere. When crawlers downgrade internal links from HTTPS to HTTP, authentication breaks and security warnings appear.
+
+**Solution:** HTTPS preservation maintains secure protocols throughout crawling.
+
+```python
+from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, FilterChain, URLPatternFilter, BFSDeepCrawlStrategy
+
+async def test_https_preservation():
+    # Enable HTTPS preservation
+    url_filter = URLPatternFilter(
+        patterns=["^(https:\/\/)?quotes\.toscrape\.com(\/.*)?$"]
+    )
+
+    config = CrawlerRunConfig(
+        exclude_external_links=True,
+        preserve_https_for_internal_links=True,  # New in v0.7.5
+        deep_crawl_strategy=BFSDeepCrawlStrategy(
+            max_depth=2,
+            max_pages=5,
+            filter_chain=FilterChain([url_filter])
+        )
+    )
+
+    async with AsyncWebCrawler() as crawler:
+        async for result in await crawler.arun(
+            url="https://quotes.toscrape.com",
+            config=config
+        ):
+            # All internal links maintain HTTPS
+            internal_links = [link['href'] for link in result.links['internal']]
+            https_links = [link for link in internal_links if link.startswith('https://')]
+
+            print(f"HTTPS links preserved: {len(https_links)}/{len(internal_links)}")
+            for link in https_links[:3]:
+                print(f"  → {link}")
+```
+
+## 🛠️ Bug Fixes and Improvements
+
+### Major Fixes
+- **URL Processing**: Fixed '+' sign preservation in query parameters (#1332)
+- **Proxy Configuration**: Enhanced proxy string parsing (old `proxy` parameter deprecated)
+- **Docker Error Handling**: Comprehensive error messages with status codes
+- **Memory Management**: Fixed leaks in long-running sessions
+- **JWT Authentication**: Fixed Docker JWT validation issues (#1442)
+- **Playwright Stealth**: Fixed stealth features for Playwright integration (#1481)
+- **API Configuration**: Fixed config handling to prevent overriding user-provided settings (#1505)
+- **Docker Filter Serialization**: Resolved JSON encoding errors in deep crawl strategy (#1419)
+- **LLM Provider Support**: Fixed custom LLM provider integration for adaptive crawler (#1291)
+- **Performance Issues**: Resolved backoff strategy failures and timeout handling (#989)
+
+### Community-Reported Issues Fixed
+This release addresses multiple issues reported by the community through GitHub issues and Discord discussions:
+- Fixed browser configuration reference errors
+- Resolved dependency conflicts with cssselect
+- Improved error messaging for failed authentications
+- Enhanced compatibility with various proxy configurations
+- Fixed edge cases in URL normalization
+
+### Configuration Updates
+```python
+# Old proxy config (deprecated)
+# browser_config = BrowserConfig(proxy="http://proxy:8080")
+
+# New enhanced proxy config
+browser_config = BrowserConfig(
+    proxy_config={
+        "server": "http://proxy:8080",
+        "username": "optional-user",
+        "password": "optional-pass"
+    }
+)
+```
+
+## 🔄 Breaking Changes
+
+1. **Python 3.10+ Required**: Upgrade from Python 3.9
+2. **Proxy Parameter Deprecated**: Use new `proxy_config` structure
+3. **New Dependency**: Added `cssselect` for better CSS handling
+
+## 🚀 Get Started
+
+```bash
+# Install latest version
+pip install crawl4ai==0.7.5
+
+# Docker deployment
+docker pull unclecode/crawl4ai:latest
+docker run -p 11235:11235 unclecode/crawl4ai:latest
+```
+
+**Try the Demo:**
+```bash
+# Run working examples
+python docs/releases_review/demo_v0.7.5.py
+```
+
+**Resources:**
+- 📖 Documentation: [docs.crawl4ai.com](https://docs.crawl4ai.com)
+- 🐙 GitHub: [github.com/unclecode/crawl4ai](https://github.com/unclecode/crawl4ai)
+- 💬 Discord: [discord.gg/crawl4ai](https://discord.gg/jP8KfhDhyN)
+- 🐦 Twitter: [@unclecode](https://x.com/unclecode)
+
+Happy crawling! 🕷️
--- a/docs/releases_review/demo_v0.7.5.py
+++ b/docs/releases_review/demo_v0.7.5.py
@@ -0,0 +1,338 @@
+"""
+🚀 Crawl4AI v0.7.5 Release Demo - Working Examples
+==================================================
+This demo showcases key features introduced in v0.7.5 with real, executable examples.
+
+Featured Demos:
+1. ✅ Docker Hooks System - Real API calls with custom hooks (string & function-based)
+2. ✅ Enhanced LLM Integration - Working LLM configurations
+3. ✅ HTTPS Preservation - Live crawling with HTTPS maintenance
+
+Requirements:
+- crawl4ai v0.7.5 installed
+- Docker running with crawl4ai image (optional for Docker demos)
+- Valid API keys for LLM demos (optional)
+"""
+
+import asyncio
+import requests
+import time
+import sys
+
+from crawl4ai import (AsyncWebCrawler, CrawlerRunConfig, BrowserConfig,
+                      CacheMode, FilterChain, URLPatternFilter, BFSDeepCrawlStrategy,
+                      hooks_to_string)
+from crawl4ai.docker_client import Crawl4aiDockerClient
+    
+
+def print_section(title: str, description: str = ""):
+    """Print a section header"""
+    print(f"\n{'=' * 60}")
+    print(f"{title}")
+    if description:
+        print(f"{description}")
+    print(f"{'=' * 60}\n")
+
+
+async def demo_1_docker_hooks_system():
+    """Demo 1: Docker Hooks System - Real API calls with custom hooks"""
+    print_section(
+        "Demo 1: Docker Hooks System",
+        "Testing both string-based and function-based hooks (NEW in v0.7.5!)"
+    )
+
+    # Check Docker service availability
+    def check_docker_service():
+        try:
+            response = requests.get("http://localhost:11235/", timeout=3)
+            return response.status_code == 200
+        except:
+            return False
+
+    print("Checking Docker service...")
+    docker_running = check_docker_service()
+
+    if not docker_running:
+        print("⚠️  Docker service not running on localhost:11235")
+        print("To test Docker hooks:")
+        print("1. Run: docker run -p 11235:11235 unclecode/crawl4ai:latest")
+        print("2. Wait for service to start")
+        print("3. Re-run this demo\n")
+        return
+
+    print("✓ Docker service detected!")
+
+    # ============================================================================
+    # PART 1: Traditional String-Based Hooks (Works with REST API)
+    # ============================================================================
+    print("\n" + "─" * 60)
+    print("Part 1: String-Based Hooks (REST API)")
+    print("─" * 60)
+
+    hooks_config_string = {
+        "on_page_context_created": """
+async def hook(page, context, **kwargs):
+    print("[String Hook] Setting up page context")
+    await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
+    return page
+""",
+        "before_retrieve_html": """
+async def hook(page, context, **kwargs):
+    print("[String Hook] Before retrieving HTML")
+    await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
+    await page.wait_for_timeout(1000)
+    return page
+"""
+    }
+
+    payload = {
+        "urls": ["https://httpbin.org/html"],
+        "hooks": {
+            "code": hooks_config_string,
+            "timeout": 30
+        }
+    }
+
+    print("🔧 Using string-based hooks for REST API...")
+    try:
+        start_time = time.time()
+        response = requests.post("http://localhost:11235/crawl", json=payload, timeout=60)
+        execution_time = time.time() - start_time
+
+        if response.status_code == 200:
+            result = response.json()
+            print(f"✅ String-based hooks executed in {execution_time:.2f}s")
+            if result.get('results') and result['results'][0].get('success'):
+                html_length = len(result['results'][0].get('html', ''))
+                print(f"   📄 HTML length: {html_length} characters")
+        else:
+            print(f"❌ Request failed: {response.status_code}")
+    except Exception as e:
+        print(f"❌ Error: {str(e)}")
+
+    # ============================================================================
+    # PART 2: NEW Function-Based Hooks with Docker Client (v0.7.5)
+    # ============================================================================
+    print("\n" + "─" * 60)
+    print("Part 2: Function-Based Hooks with Docker Client (✨ NEW!)")
+    print("─" * 60)
+
+    # Define hooks as regular Python functions
+    async def on_page_context_created_func(page, context, **kwargs):
+        """Block images to speed up crawling"""
+        print("[Function Hook] Setting up page context")
+        await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
+        await page.set_viewport_size({"width": 1920, "height": 1080})
+        return page
+
+    async def before_goto_func(page, context, url, **kwargs):
+        """Add custom headers before navigation"""
+        print(f"[Function Hook] About to navigate to {url}")
+        await page.set_extra_http_headers({
+            'X-Crawl4AI': 'v0.7.5-function-hooks',
+            'X-Test-Header': 'demo'
+        })
+        return page
+
+    async def before_retrieve_html_func(page, context, **kwargs):
+        """Scroll to load lazy content"""
+        print("[Function Hook] Scrolling page for lazy-loaded content")
+        await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
+        await page.wait_for_timeout(500)
+        await page.evaluate("window.scrollTo(0, 0)")
+        return page
+
+    # Use the hooks_to_string utility (can be used standalone)
+    print("\n📦 Converting functions to strings with hooks_to_string()...")
+    hooks_as_strings = hooks_to_string({
+        "on_page_context_created": on_page_context_created_func,
+        "before_goto": before_goto_func,
+        "before_retrieve_html": before_retrieve_html_func
+    })
+    print(f"   ✓ Converted {len(hooks_as_strings)} hooks to string format")
+
+    # OR use Docker Client which does conversion automatically!
+    print("\n🐳 Using Docker Client with automatic conversion...")
+    try:
+        client = Crawl4aiDockerClient(base_url="http://localhost:11235")
+
+        # Pass function objects directly - conversion happens automatically!
+        results = await client.crawl(
+            urls=["https://httpbin.org/html"],
+            hooks={
+                "on_page_context_created": on_page_context_created_func,
+                "before_goto": before_goto_func,
+                "before_retrieve_html": before_retrieve_html_func
+            },
+            hooks_timeout=30
+        )
+
+        if results and results.success:
+            print(f"✅ Function-based hooks executed successfully!")
+            print(f"   📄 HTML length: {len(results.html)} characters")
+            print(f"   🎯 URL: {results.url}")
+        else:
+            print("⚠️ Crawl completed but may have warnings")
+
+    except Exception as e:
+        print(f"❌ Docker client error: {str(e)}")
+
+    # Show the benefits
+    print("\n" + "=" * 60)
+    print("✨ Benefits of Function-Based Hooks:")
+    print("=" * 60)
+    print("✓ Full IDE support (autocomplete, syntax highlighting)")
+    print("✓ Type checking and linting")
+    print("✓ Easier to test and debug")
+    print("✓ Reusable across projects")
+    print("✓ Automatic conversion in Docker client")
+    print("=" * 60)
+
+
+async def demo_2_enhanced_llm_integration():
+    """Demo 2: Enhanced LLM Integration - Working LLM configurations"""
+    print_section(
+        "Demo 2: Enhanced LLM Integration",
+        "Testing custom LLM providers and configurations"
+    )
+
+    print("🤖 Testing Enhanced LLM Integration Features")
+
+    provider = "gemini/gemini-2.5-flash-lite"
+    payload = {
+        "url": "https://example.com",
+        "f": "llm",
+        "q": "Summarize this page in one sentence.",
+        "provider": provider,  # Explicitly set provider
+        "temperature": 0.7
+    }
+    try:
+        response = requests.post(
+            "http://localhost:11235/md",
+            json=payload,
+            timeout=60
+        )
+        if response.status_code == 200:
+            result = response.json()
+            print(f"✓ Request successful with provider: {provider}")
+            print(f"  - Response keys: {list(result.keys())}")
+            print(f"  - Content length: {len(result.get('markdown', ''))} characters")
+            print(f"  - Note: Actual LLM call may fail without valid API key")
+        else:
+            print(f"❌ Request failed: {response.status_code}")
+            print(f"  - Response: {response.text[:500]}")
+            
+    except Exception as e:
+        print(f"[red]Error: {e}[/]")
+
+
+async def demo_3_https_preservation():
+    """Demo 3: HTTPS Preservation - Live crawling with HTTPS maintenance"""
+    print_section(
+        "Demo 3: HTTPS Preservation",
+        "Testing HTTPS preservation for internal links"
+    )
+
+    print("🔒 Testing HTTPS Preservation Feature")
+
+    # Test with HTTPS preservation enabled
+    print("\nTest 1: HTTPS Preservation ENABLED")
+
+    url_filter = URLPatternFilter(
+        patterns=["^(https:\/\/)?quotes\.toscrape\.com(\/.*)?$"]
+    )
+    config = CrawlerRunConfig(
+        exclude_external_links=True, 
+        stream=True, 
+        verbose=False,
+        preserve_https_for_internal_links=True,
+        deep_crawl_strategy=BFSDeepCrawlStrategy(
+            max_depth=2, 
+            max_pages=5,
+            filter_chain=FilterChain([url_filter])
+        )
+    )
+
+    test_url = "https://quotes.toscrape.com"
+    print(f"🎯 Testing URL: {test_url}")
+
+    async with AsyncWebCrawler() as crawler:
+        async for result in await crawler.arun(url=test_url, config=config):
+            print("✓ HTTPS Preservation Test Completed")
+            internal_links = [i['href'] for i in result.links['internal']]
+            for link in internal_links:
+                print(f"  → {link}")
+
+
+async def main():
+    """Run all demos"""
+    print("\n" + "=" * 60)
+    print("🚀 Crawl4AI v0.7.5 Working Demo")
+    print("=" * 60)
+
+    # Check system requirements
+    print("🔍 System Requirements Check:")
+    print(f"  - Python version: {sys.version.split()[0]} {'✓' if sys.version_info >= (3, 10) else '❌ (3.10+ required)'}")
+
+    try:
+        import requests
+        print(f"  - Requests library: ✓")
+    except ImportError:
+        print(f"  - Requests library: ❌")
+
+    print()
+
+    demos = [
+        ("Docker Hooks System", demo_1_docker_hooks_system),
+        ("Enhanced LLM Integration", demo_2_enhanced_llm_integration),
+        ("HTTPS Preservation", demo_3_https_preservation),
+    ]
+
+    for i, (name, demo_func) in enumerate(demos, 1):
+        try:
+            print(f"\n📍 Starting Demo {i}/{len(demos)}: {name}")
+            await demo_func()
+
+            if i < len(demos):
+                print(f"\n✨ Demo {i} complete! Press Enter for next demo...")
+                input()
+
+        except KeyboardInterrupt:
+            print(f"\n⏹️  Demo interrupted by user")
+            break
+        except Exception as e:
+            print(f"❌ Demo {i} error: {str(e)}")
+            print("Continuing to next demo...")
+            continue
+
+    print("\n" + "=" * 60)
+    print("🎉 Demo Complete!")
+    print("=" * 60)
+    print("You've experienced the power of Crawl4AI v0.7.5!")
+    print("")
+    print("Key Features Demonstrated:")
+    print("🔧 Docker Hooks - String-based & function-based (NEW!)")
+    print("   • hooks_to_string() utility for function conversion")
+    print("   • Docker client with automatic conversion")
+    print("   • Full IDE support and type checking")
+    print("🤖 Enhanced LLM - Better AI integration")
+    print("🔒 HTTPS Preservation - Secure link handling")
+    print("")
+    print("Ready to build something amazing? 🚀")
+    print("")
+    print("📖 Docs: https://docs.crawl4ai.com/")
+    print("🐙 GitHub: https://github.com/unclecode/crawl4ai")
+    print("=" * 60)
+
+
+if __name__ == "__main__":
+    print("🚀 Crawl4AI v0.7.5 Live Demo Starting...")
+    print("Press Ctrl+C anytime to exit\n")
+
+    try:
+        asyncio.run(main())
+    except KeyboardInterrupt:
+        print("\n👋 Demo stopped by user. Thanks for trying Crawl4AI v0.7.5!")
+    except Exception as e:
+        print(f"\n❌ Demo error: {str(e)}")
+        print("Make sure you have the required dependencies installed.")
--- a/docs/releases_review/v0.7.5_docker_hooks_demo.py
+++ b/docs/releases_review/v0.7.5_docker_hooks_demo.py
@@ -0,0 +1,655 @@
+#!/usr/bin/env python3
+"""
+🚀 Crawl4AI v0.7.5 - Docker Hooks System Complete Demonstration
+================================================================
+
+This file demonstrates the NEW Docker Hooks System introduced in v0.7.5.
+
+The Docker Hooks System is a completely NEW feature that provides pipeline
+customization through user-provided Python functions. It offers three approaches:
+
+1. String-based hooks for REST API
+2. hooks_to_string() utility to convert functions
+3. Docker Client with automatic conversion (most convenient)
+
+All three approaches are part of this NEW v0.7.5 feature!
+
+Perfect for video recording and demonstration purposes.
+
+Requirements:
+- Docker container running: docker run -p 11235:11235 unclecode/crawl4ai:latest
+- crawl4ai v0.7.5 installed: pip install crawl4ai==0.7.5
+"""
+
+import asyncio
+import requests
+import json
+import time
+from typing import Dict, Any
+
+# Import Crawl4AI components
+from crawl4ai import hooks_to_string
+from crawl4ai.docker_client import Crawl4aiDockerClient
+
+# Configuration
+DOCKER_URL = "http://localhost:11235"
+# DOCKER_URL = "http://localhost:11234"
+TEST_URLS = [
+    # "https://httpbin.org/html",
+    "https://www.kidocode.com",
+    "https://quotes.toscrape.com",
+]
+
+
+def print_section(title: str, description: str = ""):
+    """Print a formatted section header"""
+    print("\n" + "=" * 70)
+    print(f"  {title}")
+    if description:
+        print(f"  {description}")
+    print("=" * 70 + "\n")
+
+
+def check_docker_service() -> bool:
+    """Check if Docker service is running"""
+    try:
+        response = requests.get(f"{DOCKER_URL}/health", timeout=3)
+        return response.status_code == 200
+    except:
+        return False
+
+
+# ============================================================================
+# REUSABLE HOOK LIBRARY (NEW in v0.7.5)
+# ============================================================================
+
+async def performance_optimization_hook(page, context, **kwargs):
+    """
+    Performance Hook: Block unnecessary resources to speed up crawling
+    """
+    print("  [Hook] 🚀 Optimizing performance - blocking images and ads...")
+
+    # Block images
+    await context.route(
+        "**/*.{png,jpg,jpeg,gif,webp,svg,ico}",
+        lambda route: route.abort()
+    )
+
+    # Block ads and analytics
+    await context.route("**/analytics/*", lambda route: route.abort())
+    await context.route("**/ads/*", lambda route: route.abort())
+    await context.route("**/google-analytics.com/*", lambda route: route.abort())
+
+    print("  [Hook] ✓ Performance optimization applied")
+    return page
+
+
+async def viewport_setup_hook(page, context, **kwargs):
+    """
+    Viewport Hook: Set consistent viewport size for rendering
+    """
+    print("  [Hook] 🖥️  Setting viewport to 1920x1080...")
+    await page.set_viewport_size({"width": 1920, "height": 1080})
+    print("  [Hook] ✓ Viewport configured")
+    return page
+
+
+async def authentication_headers_hook(page, context, url, **kwargs):
+    """
+    Headers Hook: Add custom authentication and tracking headers
+    """
+    print(f"  [Hook] 🔐 Adding custom headers for {url[:50]}...")
+
+    await page.set_extra_http_headers({
+        'X-Crawl4AI-Version': '0.7.5',
+        'X-Custom-Hook': 'function-based-demo',
+        'Accept-Language': 'en-US,en;q=0.9',
+        'User-Agent': 'Crawl4AI/0.7.5 (Educational Demo)'
+    })
+
+    print("  [Hook] ✓ Custom headers added")
+    return page
+
+
+async def lazy_loading_handler_hook(page, context, **kwargs):
+    """
+    Content Hook: Handle lazy-loaded content by scrolling
+    """
+    print("  [Hook] 📜 Scrolling to load lazy content...")
+
+    # Scroll to bottom
+    await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
+    await page.wait_for_timeout(1000)
+
+    # Scroll to middle
+    await page.evaluate("window.scrollTo(0, document.body.scrollHeight / 2)")
+    await page.wait_for_timeout(500)
+
+    # Scroll back to top
+    await page.evaluate("window.scrollTo(0, 0)")
+    await page.wait_for_timeout(500)
+
+    print("  [Hook] ✓ Lazy content loaded")
+    return page
+
+
+async def page_analytics_hook(page, context, **kwargs):
+    """
+    Analytics Hook: Log page metrics before extraction
+    """
+    print("  [Hook] 📊 Collecting page analytics...")
+
+    metrics = await page.evaluate('''
+        () => ({
+            title: document.title,
+            images: document.images.length,
+            links: document.links.length,
+            scripts: document.scripts.length,
+            headings: document.querySelectorAll('h1, h2, h3').length,
+            paragraphs: document.querySelectorAll('p').length
+        })
+    ''')
+
+    print(f"  [Hook] 📈 Page: {metrics['title'][:50]}...")
+    print(f"         Links: {metrics['links']}, Images: {metrics['images']}, "
+          f"Headings: {metrics['headings']}, Paragraphs: {metrics['paragraphs']}")
+
+    return page
+
+
+# ============================================================================
+# DEMO 1: String-Based Hooks (NEW Docker Hooks System)
+# ============================================================================
+
+def demo_1_string_based_hooks():
+    """
+    Demonstrate string-based hooks with REST API (part of NEW Docker Hooks System)
+    """
+    print_section(
+        "DEMO 1: String-Based Hooks (REST API)",
+        "Part of the NEW Docker Hooks System - hooks as strings"
+    )
+
+    # Define hooks as strings
+    hooks_config = {
+        "on_page_context_created": """
+async def hook(page, context, **kwargs):
+    print("  [String Hook] Setting up page context...")
+    # Block images for performance
+    await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
+    await page.set_viewport_size({"width": 1920, "height": 1080})
+    return page
+""",
+
+        "before_goto": """
+async def hook(page, context, url, **kwargs):
+    print(f"  [String Hook] Navigating to {url[:50]}...")
+    await page.set_extra_http_headers({
+        'X-Crawl4AI': 'string-based-hooks',
+        'X-Demo': 'v0.7.5'
+    })
+    return page
+""",
+
+        "before_retrieve_html": """
+async def hook(page, context, **kwargs):
+    print("  [String Hook] Scrolling page...")
+    await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
+    await page.wait_for_timeout(1000)
+    return page
+"""
+    }
+
+    # Prepare request payload
+    payload = {
+        "urls": [TEST_URLS[0]],
+        "hooks": {
+            "code": hooks_config,
+            "timeout": 30
+        },
+        "crawler_config": {
+            "cache_mode": "bypass"
+        }
+    }
+
+    print(f"🎯 Target URL: {TEST_URLS[0]}")
+    print(f"🔧 Configured {len(hooks_config)} string-based hooks")
+    print(f"📡 Sending request to Docker API...\n")
+
+    try:
+        start_time = time.time()
+        response = requests.post(f"{DOCKER_URL}/crawl", json=payload, timeout=60)
+        execution_time = time.time() - start_time
+
+        if response.status_code == 200:
+            result = response.json()
+
+            print(f"\n✅ Request successful! (took {execution_time:.2f}s)")
+
+            # Display results
+            if result.get('results') and result['results'][0].get('success'):
+                crawl_result = result['results'][0]
+                html_length = len(crawl_result.get('html', ''))
+                markdown_length = len(crawl_result.get('markdown', ''))
+
+                print(f"\n📊 Results:")
+                print(f"   • HTML length: {html_length:,} characters")
+                print(f"   • Markdown length: {markdown_length:,} characters")
+                print(f"   • URL: {crawl_result.get('url')}")
+
+                # Check hooks execution
+                if 'hooks' in result:
+                    hooks_info = result['hooks']
+                    print(f"\n🎣 Hooks Execution:")
+                    print(f"   • Status: {hooks_info['status']['status']}")
+                    print(f"   • Attached hooks: {len(hooks_info['status']['attached_hooks'])}")
+
+                    if 'summary' in hooks_info:
+                        summary = hooks_info['summary']
+                        print(f"   • Total executions: {summary['total_executions']}")
+                        print(f"   • Successful: {summary['successful']}")
+                        print(f"   • Success rate: {summary['success_rate']:.1f}%")
+            else:
+                print(f"⚠️ Crawl completed but no results")
+
+        else:
+            print(f"❌ Request failed with status {response.status_code}")
+            print(f"   Error: {response.text[:200]}")
+
+    except requests.exceptions.Timeout:
+        print("⏰ Request timed out after 60 seconds")
+    except Exception as e:
+        print(f"❌ Error: {str(e)}")
+
+    print("\n" + "─" * 70)
+    print("✓ String-based hooks demo complete\n")
+
+
+# ============================================================================
+# DEMO 2: Function-Based Hooks with hooks_to_string() Utility
+# ============================================================================
+
+def demo_2_hooks_to_string_utility():
+    """
+    Demonstrate the new hooks_to_string() utility for converting functions
+    """
+    print_section(
+        "DEMO 2: hooks_to_string() Utility (NEW! ✨)",
+        "Convert Python functions to strings for REST API"
+    )
+
+    print("📦 Creating hook functions...")
+    print("   • performance_optimization_hook")
+    print("   • viewport_setup_hook")
+    print("   • authentication_headers_hook")
+    print("   • lazy_loading_handler_hook")
+
+    # Convert function objects to strings using the NEW utility
+    print("\n🔄 Converting functions to strings with hooks_to_string()...")
+
+    hooks_dict = {
+        "on_page_context_created": performance_optimization_hook,
+        "before_goto": authentication_headers_hook,
+        "before_retrieve_html": lazy_loading_handler_hook,
+    }
+
+    hooks_as_strings = hooks_to_string(hooks_dict)
+
+    print(f"✅ Successfully converted {len(hooks_as_strings)} functions to strings")
+
+    # Show a preview
+    print("\n📝 Sample converted hook (first 250 characters):")
+    print("─" * 70)
+    sample_hook = list(hooks_as_strings.values())[0]
+    print(sample_hook[:250] + "...")
+    print("─" * 70)
+
+    # Use the converted hooks with REST API
+    print("\n📡 Using converted hooks with REST API...")
+
+    payload = {
+        "urls": [TEST_URLS[0]],
+        "hooks": {
+            "code": hooks_as_strings,
+            "timeout": 30
+        }
+    }
+
+    try:
+        start_time = time.time()
+        response = requests.post(f"{DOCKER_URL}/crawl", json=payload, timeout=60)
+        execution_time = time.time() - start_time
+
+        if response.status_code == 200:
+            result = response.json()
+            print(f"\n✅ Request successful! (took {execution_time:.2f}s)")
+
+            if result.get('results') and result['results'][0].get('success'):
+                crawl_result = result['results'][0]
+                print(f"   • HTML length: {len(crawl_result.get('html', '')):,} characters")
+                print(f"   • Hooks executed successfully!")
+        else:
+            print(f"❌ Request failed: {response.status_code}")
+
+    except Exception as e:
+        print(f"❌ Error: {str(e)}")
+
+    print("\n💡 Benefits of hooks_to_string():")
+    print("   ✓ Write hooks as regular Python functions")
+    print("   ✓ Full IDE support (autocomplete, syntax highlighting)")
+    print("   ✓ Type checking and linting")
+    print("   ✓ Easy to test and debug")
+    print("   ✓ Reusable across projects")
+    print("   ✓ Works with any REST API client")
+
+    print("\n" + "─" * 70)
+    print("✓ hooks_to_string() utility demo complete\n")
+
+
+# ============================================================================
+# DEMO 3: Docker Client with Automatic Conversion (RECOMMENDED! 🌟)
+# ============================================================================
+
+async def demo_3_docker_client_auto_conversion():
+    """
+    Demonstrate Docker Client with automatic hook conversion (RECOMMENDED)
+    """
+    print_section(
+        "DEMO 3: Docker Client with Auto-Conversion (RECOMMENDED! 🌟)",
+        "Pass function objects directly - conversion happens automatically!"
+    )
+
+    print("🐳 Initializing Crawl4AI Docker Client...")
+    client = Crawl4aiDockerClient(base_url=DOCKER_URL)
+
+    print("✅ Client ready!\n")
+
+    # Use our reusable hook library - just pass the function objects!
+    print("📚 Using reusable hook library:")
+    print("   • performance_optimization_hook")
+    print("   • viewport_setup_hook")
+    print("   • authentication_headers_hook")
+    print("   • lazy_loading_handler_hook")
+    print("   • page_analytics_hook")
+
+    print("\n🎯 Target URL: " + TEST_URLS[1])
+    print("🚀 Starting crawl with automatic hook conversion...\n")
+
+    try:
+        start_time = time.time()
+
+        # Pass function objects directly - NO manual conversion needed! ✨
+        results = await client.crawl(
+            urls=[TEST_URLS[0]],
+            hooks={
+                "on_page_context_created": performance_optimization_hook,
+                "before_goto": authentication_headers_hook,
+                "before_retrieve_html": lazy_loading_handler_hook,
+                "before_return_html": page_analytics_hook,
+            },
+            hooks_timeout=30
+        )
+
+        execution_time = time.time() - start_time
+
+        print(f"\n✅ Crawl completed! (took {execution_time:.2f}s)\n")
+
+        # Display results
+        if results and results.success:
+            result = results
+            print(f"📊 Results:")
+            print(f"   • URL: {result.url}")
+            print(f"   • Success: {result.success}")
+            print(f"   • HTML length: {len(result.html):,} characters")
+            print(f"   • Markdown length: {len(result.markdown):,} characters")
+
+            # Show metadata
+            if result.metadata:
+                print(f"\n📋 Metadata:")
+                print(f"   • Title: {result.metadata.get('title', 'N/A')}")
+                print(f"   • Description: {result.metadata.get('description', 'N/A')}")
+
+            # Show links
+            if result.links:
+                internal_count = len(result.links.get('internal', []))
+                external_count = len(result.links.get('external', []))
+                print(f"\n🔗 Links Found:")
+                print(f"   • Internal: {internal_count}")
+                print(f"   • External: {external_count}")
+        else:
+            print(f"⚠️ Crawl completed but no successful results")
+            if results:
+                print(f"   Error: {results.error_message}")
+
+    except Exception as e:
+        print(f"❌ Error: {str(e)}")
+        import traceback
+        traceback.print_exc()
+
+    print("\n🌟 Why Docker Client is RECOMMENDED:")
+    print("   ✓ Automatic function-to-string conversion")
+    print("   ✓ No manual hooks_to_string() calls needed")
+    print("   ✓ Cleaner, more Pythonic code")
+    print("   ✓ Full type hints and IDE support")
+    print("   ✓ Built-in error handling")
+    print("   ✓ Async/await support")
+
+    print("\n" + "─" * 70)
+    print("✓ Docker Client auto-conversion demo complete\n")
+
+
+# ============================================================================
+# DEMO 4: Advanced Use Case - Complete Hook Pipeline
+# ============================================================================
+
+async def demo_4_complete_hook_pipeline():
+    """
+    Demonstrate a complete hook pipeline using all 8 hook points
+    """
+    print_section(
+        "DEMO 4: Complete Hook Pipeline",
+        "Using all 8 available hook points for comprehensive control"
+    )
+
+    # Define all 8 hooks
+    async def on_browser_created_hook(browser, **kwargs):
+        """Hook 1: Called after browser is created"""
+        print("  [Pipeline] 1/8 Browser created")
+        return browser
+
+    async def on_page_context_created_hook(page, context, **kwargs):
+        """Hook 2: Called after page context is created"""
+        print("  [Pipeline] 2/8 Page context created - setting up...")
+        await page.set_viewport_size({"width": 1920, "height": 1080})
+        return page
+
+    async def on_user_agent_updated_hook(page, context, user_agent, **kwargs):
+        """Hook 3: Called when user agent is updated"""
+        print(f"  [Pipeline] 3/8 User agent updated: {user_agent[:50]}...")
+        return page
+
+    async def before_goto_hook(page, context, url, **kwargs):
+        """Hook 4: Called before navigating to URL"""
+        print(f"  [Pipeline] 4/8 Before navigation to: {url[:60]}...")
+        return page
+
+    async def after_goto_hook(page, context, url, response, **kwargs):
+        """Hook 5: Called after navigation completes"""
+        print(f"  [Pipeline] 5/8 After navigation - Status: {response.status if response else 'N/A'}")
+        await page.wait_for_timeout(1000)
+        return page
+
+    async def on_execution_started_hook(page, context, **kwargs):
+        """Hook 6: Called when JavaScript execution starts"""
+        print("  [Pipeline] 6/8 JavaScript execution started")
+        return page
+
+    async def before_retrieve_html_hook(page, context, **kwargs):
+        """Hook 7: Called before retrieving HTML"""
+        print("  [Pipeline] 7/8 Before HTML retrieval - scrolling...")
+        await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
+        return page
+
+    async def before_return_html_hook(page, context, html, **kwargs):
+        """Hook 8: Called before returning HTML"""
+        print(f"  [Pipeline] 8/8 Before return - HTML length: {len(html):,} chars")
+        return page
+
+    print("🎯 Target URL: " + TEST_URLS[0])
+    print("🔧 Configured ALL 8 hook points for complete pipeline control\n")
+
+    client = Crawl4aiDockerClient(base_url=DOCKER_URL)
+
+    try:
+        print("🚀 Starting complete pipeline crawl...\n")
+        start_time = time.time()
+
+        results = await client.crawl(
+            urls=[TEST_URLS[0]],
+            hooks={
+                "on_browser_created": on_browser_created_hook,
+                "on_page_context_created": on_page_context_created_hook,
+                "on_user_agent_updated": on_user_agent_updated_hook,
+                "before_goto": before_goto_hook,
+                "after_goto": after_goto_hook,
+                "on_execution_started": on_execution_started_hook,
+                "before_retrieve_html": before_retrieve_html_hook,
+                "before_return_html": before_return_html_hook,
+            },
+            hooks_timeout=45
+        )
+
+        execution_time = time.time() - start_time
+
+        if results and results.success:
+            print(f"\n✅ Complete pipeline executed successfully! (took {execution_time:.2f}s)")
+            print(f"   • All 8 hooks executed in sequence")
+            print(f"   • HTML length: {len(results.html):,} characters")
+        else:
+            print(f"⚠️ Pipeline completed with warnings")
+
+    except Exception as e:
+        print(f"❌ Error: {str(e)}")
+
+    print("\n📚 Available Hook Points:")
+    print("   1. on_browser_created - Browser initialization")
+    print("   2. on_page_context_created - Page context setup")
+    print("   3. on_user_agent_updated - User agent configuration")
+    print("   4. before_goto - Pre-navigation setup")
+    print("   5. after_goto - Post-navigation processing")
+    print("   6. on_execution_started - JavaScript execution start")
+    print("   7. before_retrieve_html - Pre-extraction processing")
+    print("   8. before_return_html - Final HTML processing")
+
+    print("\n" + "─" * 70)
+    print("✓ Complete hook pipeline demo complete\n")
+
+
+# ============================================================================
+# MAIN EXECUTION
+# ============================================================================
+
+async def main():
+    """
+    Run all demonstrations
+    """
+    print("\n" + "=" * 70)
+    print("  🚀 Crawl4AI v0.7.5 - Docker Hooks Complete Demonstration")
+    print("=" * 70)
+
+    # Check Docker service
+    print("\n🔍 Checking Docker service status...")
+    if not check_docker_service():
+        print("❌ Docker service is not running!")
+        print("\n📋 To start the Docker service:")
+        print("   docker run -p 11235:11235 unclecode/crawl4ai:latest")
+        print("\nPlease start the service and run this demo again.")
+        return
+
+    print("✅ Docker service is running!\n")
+
+    # Run all demos
+    demos = [
+        ("String-Based Hooks (REST API)", demo_1_string_based_hooks, False),
+        ("hooks_to_string() Utility", demo_2_hooks_to_string_utility, False),
+        ("Docker Client Auto-Conversion", demo_3_docker_client_auto_conversion, True),
+        # ("Complete Hook Pipeline", demo_4_complete_hook_pipeline, True),
+    ]
+
+    for i, (name, demo_func, is_async) in enumerate(demos, 1):
+        print(f"\n{'🔷' * 35}")
+        print(f"Starting Demo {i}/{len(demos)}: {name}")
+        print(f"{'🔷' * 35}\n")
+
+        try:
+            if is_async:
+                await demo_func()
+            else:
+                demo_func()
+
+            print(f"✅ Demo {i} completed successfully!")
+
+            # Pause between demos (except the last one)
+            if i < len(demos):
+                print("\n⏸️  Press Enter to continue to next demo...")
+                # input()
+
+        except KeyboardInterrupt:
+            print(f"\n⏹️  Demo interrupted by user")
+            break
+        except Exception as e:
+            print(f"\n❌ Demo {i} failed: {str(e)}")
+            import traceback
+            traceback.print_exc()
+            print("\nContinuing to next demo...\n")
+            continue
+
+    # Final summary
+    print("\n" + "=" * 70)
+    print("  🎉 All Demonstrations Complete!")
+    print("=" * 70)
+
+    print("\n📊 Summary of v0.7.5 Docker Hooks System:")
+    print("\n🆕 COMPLETELY NEW FEATURE in v0.7.5:")
+    print("   The Docker Hooks System lets you customize the crawling pipeline")
+    print("   with user-provided Python functions at 8 strategic points.")
+
+    print("\n✨ Three Ways to Use Docker Hooks (All NEW!):")
+    print("   1. String-based - Write hooks as strings for REST API")
+    print("   2. hooks_to_string() - Convert Python functions to strings")
+    print("   3. Docker Client - Automatic conversion (RECOMMENDED)")
+
+    print("\n💡 Key Benefits:")
+    print("   ✓ Full IDE support (autocomplete, syntax highlighting)")
+    print("   ✓ Type checking and linting")
+    print("   ✓ Easy to test and debug")
+    print("   ✓ Reusable across projects")
+    print("   ✓ Complete pipeline control")
+
+    print("\n🎯 8 Hook Points Available:")
+    print("   • on_browser_created, on_page_context_created")
+    print("   • on_user_agent_updated, before_goto, after_goto")
+    print("   • on_execution_started, before_retrieve_html, before_return_html")
+
+    print("\n📚 Resources:")
+    print("   • Docs: https://docs.crawl4ai.com")
+    print("   • GitHub: https://github.com/unclecode/crawl4ai")
+    print("   • Discord: https://discord.gg/jP8KfhDhyN")
+
+    print("\n" + "=" * 70)
+    print("  Happy Crawling with v0.7.5! 🕷️")
+    print("=" * 70 + "\n")
+
+
+if __name__ == "__main__":
+    print("\n🎬 Starting Crawl4AI v0.7.5 Docker Hooks Demonstration...")
+    print("Press Ctrl+C anytime to exit\n")
+
+    try:
+        asyncio.run(main())
+    except KeyboardInterrupt:
+        print("\n\n👋 Demo stopped by user. Thanks for exploring Crawl4AI v0.7.5!")
+    except Exception as e:
+        print(f"\n\n❌ Demo error: {str(e)}")
+        import traceback
+        traceback.print_exc()
--- a/docs/releases_review/v0.7.5_video_walkthrough.ipynb
+++ b/docs/releases_review/v0.7.5_video_walkthrough.ipynb