Compare commits

..

2 Commits

Author SHA1 Message Date
ntohidi
13e116610d fix(marketplace): improve app detail page content rendering and UX
Fixed multiple issues with app detail page content display and formatting
2025-10-23 16:12:30 +02:00
ntohidi
97c92c4f62 fix(marketplace): replace hardcoded app detail content with database-driven fields.
The app detail page was displaying hardcoded/templated content instead of
using actual data from the database. This prevented admins from controlling
the content shown in Overview, Integration, and Documentation tabs.
2025-10-21 15:39:04 +02:00
13 changed files with 203 additions and 3405 deletions

View File

@@ -27,13 +27,11 @@
Crawl4AI turns the web into clean, LLM ready Markdown for RAG, agents, and data pipelines. Fast, controllable, battle tested by a 50k+ star community.
[✨ Check out latest update v0.7.5](#-recent-updates)
[✨ Check out latest update v0.7.4](#-recent-updates)
✨ New in v0.7.5: Docker Hooks System with function-based API for pipeline customization, Enhanced LLM Integration with custom providers, HTTPS Preservation, and multiple community-reported bug fixes. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.5.md)
✨ New in v0.7.4: Revolutionary LLM Table Extraction with intelligent chunking, enhanced concurrency fixes, memory management refactor, and critical stability improvements. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.4.md)
✨ Recent v0.7.4: Revolutionary LLM Table Extraction with intelligent chunking, enhanced concurrency fixes, memory management refactor, and critical stability improvements. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.4.md)
✨ Previous v0.7.3: Undetected Browser Support, Multi-URL Configurations, Memory Monitoring, Enhanced Table Extraction, GitHub Sponsors. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.3.md)
✨ Recent v0.7.3: Undetected Browser Support, Multi-URL Configurations, Memory Monitoring, Enhanced Table Extraction, GitHub Sponsors. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.3.md)
<details>
<summary>🤓 <strong>My Personal Story</strong></summary>
@@ -179,7 +177,7 @@ No rate-limited APIs. No lock-in. Build and own your data pipeline with direct g
- 📸 **Screenshots**: Capture page screenshots during crawling for debugging or analysis.
- 📂 **Raw Data Crawling**: Directly process raw HTML (`raw:`) or local files (`file://`).
- 🔗 **Comprehensive Link Extraction**: Extracts internal, external links, and embedded iframe content.
- 🛠️ **Customizable Hooks**: Define hooks at every step to customize crawling behavior (supports both string and function-based APIs).
- 🛠️ **Customizable Hooks**: Define hooks at every step to customize crawling behavior.
- 💾 **Caching**: Cache data for improved speed and to avoid redundant fetches.
- 📄 **Metadata Extraction**: Retrieve structured metadata from web pages.
- 📡 **IFrame Content Extraction**: Seamless extraction from embedded iframe content.
@@ -546,54 +544,6 @@ async def test_news_crawl():
## ✨ Recent Updates
<details>
<summary><strong>Version 0.7.5 Release Highlights - The Docker Hooks & Security Update</strong></summary>
- **🔧 Docker Hooks System**: Complete pipeline customization with user-provided Python functions at 8 key points
- **✨ Function-Based Hooks API (NEW)**: Write hooks as regular Python functions with full IDE support:
```python
from crawl4ai import hooks_to_string
from crawl4ai.docker_client import Crawl4aiDockerClient
# Define hooks as regular Python functions
async def on_page_context_created(page, context, **kwargs):
"""Block images to speed up crawling"""
await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
await page.set_viewport_size({"width": 1920, "height": 1080})
return page
async def before_goto(page, context, url, **kwargs):
"""Add custom headers"""
await page.set_extra_http_headers({'X-Crawl4AI': 'v0.7.5'})
return page
# Option 1: Use hooks_to_string() utility for REST API
hooks_code = hooks_to_string({
"on_page_context_created": on_page_context_created,
"before_goto": before_goto
})
# Option 2: Docker client with automatic conversion (Recommended)
client = Crawl4aiDockerClient(base_url="http://localhost:11235")
results = await client.crawl(
urls=["https://httpbin.org/html"],
hooks={
"on_page_context_created": on_page_context_created,
"before_goto": before_goto
}
)
# ✓ Full IDE support, type checking, and reusability!
```
- **🤖 Enhanced LLM Integration**: Custom providers with temperature control and base_url configuration
- **🔒 HTTPS Preservation**: Secure internal link handling with `preserve_https_for_internal_links=True`
- **🐍 Python 3.10+ Support**: Modern language features and enhanced performance
- **🛠️ Bug Fixes**: Resolved multiple community-reported issues including URL processing, JWT authentication, and proxy configuration
[Full v0.7.5 Release Notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.5.md)
</details>
<details>
<summary><strong>Version 0.7.4 Release Highlights - The Intelligent Table Extraction & Performance Update</strong></summary>

View File

@@ -1,7 +1,7 @@
# crawl4ai/__version__.py
# This is the version that will be used for stable releases
__version__ = "0.7.5"
__version__ = "0.7.4"
# For nightly builds, this gets set during build process
__nightly_version__ = None

View File

@@ -10,6 +10,7 @@ Today I'm releasing Crawl4AI v0.7.4—the Intelligent Table Extraction & Perform
- **🚀 LLMTableExtraction**: Revolutionary table extraction with intelligent chunking for massive tables
- **⚡ Enhanced Concurrency**: True concurrency improvements for fast-completing tasks in batch operations
- **🧹 Memory Management Refactor**: Streamlined memory utilities and better resource management
- **🔧 Browser Manager Fixes**: Resolved race conditions in concurrent page creation
- **⌨️ Cross-Platform Browser Profiler**: Improved keyboard handling and quit mechanisms
- **🔗 Advanced URL Processing**: Better handling of raw URLs and base tag link resolution
@@ -157,6 +158,40 @@ async with AsyncWebCrawler() as crawler:
- **Monitoring Systems**: Faster health checks and status page monitoring
- **Data Aggregation**: Improved performance for real-time data collection
## 🧹 Memory Management Refactor: Cleaner Architecture
**The Problem:** Memory utilities were scattered and difficult to maintain, with potential import conflicts and unclear organization.
**My Solution:** I consolidated all memory-related utilities into the main `utils.py` module, creating a cleaner, more maintainable architecture.
### Improved Memory Handling
```python
# All memory utilities now consolidated
from crawl4ai.utils import get_true_memory_usage_percent, MemoryMonitor
# Enhanced memory monitoring
monitor = MemoryMonitor()
monitor.start_monitoring()
async with AsyncWebCrawler() as crawler:
# Memory-efficient batch processing
results = await crawler.arun_many(large_url_list)
# Get accurate memory metrics
memory_usage = get_true_memory_usage_percent()
memory_report = monitor.get_report()
print(f"Memory efficiency: {memory_report['efficiency']:.1f}%")
print(f"Peak usage: {memory_report['peak_mb']:.1f} MB")
```
**Expected Real-World Impact:**
- **Production Stability**: More reliable memory tracking and management
- **Code Maintainability**: Cleaner architecture for easier debugging
- **Import Clarity**: Resolved potential conflicts and import issues
- **Developer Experience**: Simpler API for memory monitoring
## 🔧 Critical Stability Fixes
### Browser Manager Race Condition Resolution

View File

@@ -1,318 +0,0 @@
# 🚀 Crawl4AI v0.7.5: The Docker Hooks & Security Update
*September 29, 2025 • 8 min read*
---
Today I'm releasing Crawl4AI v0.7.5—focused on extensibility and security. This update introduces the Docker Hooks System for pipeline customization, enhanced LLM integration, and important security improvements.
## 🎯 What's New at a Glance
- **Docker Hooks System**: Custom Python functions at key pipeline points with function-based API
- **Function-Based Hooks**: New `hooks_to_string()` utility with Docker client auto-conversion
- **Enhanced LLM Integration**: Custom providers with temperature control
- **HTTPS Preservation**: Secure internal link handling
- **Bug Fixes**: Resolved multiple community-reported issues
- **Improved Docker Error Handling**: Better debugging and reliability
## 🔧 Docker Hooks System: Pipeline Customization
Every scraping project needs custom logic—authentication, performance optimization, content processing. Traditional solutions require forking or complex workarounds. Docker Hooks let you inject custom Python functions at 8 key points in the crawling pipeline.
### Real Example: Authentication & Performance
```python
import requests
# Real working hooks for httpbin.org
hooks_config = {
"on_page_context_created": """
async def hook(page, context, **kwargs):
print("Hook: Setting up page context")
# Block images to speed up crawling
await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
print("Hook: Images blocked")
return page
""",
"before_retrieve_html": """
async def hook(page, context, **kwargs):
print("Hook: Before retrieving HTML")
# Scroll to bottom to load lazy content
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
await page.wait_for_timeout(1000)
print("Hook: Scrolled to bottom")
return page
""",
"before_goto": """
async def hook(page, context, url, **kwargs):
print(f"Hook: About to navigate to {url}")
# Add custom headers
await page.set_extra_http_headers({
'X-Test-Header': 'crawl4ai-hooks-test'
})
return page
"""
}
# Test with Docker API
payload = {
"urls": ["https://httpbin.org/html"],
"hooks": {
"code": hooks_config,
"timeout": 30
}
}
response = requests.post("http://localhost:11235/crawl", json=payload)
result = response.json()
if result.get('success'):
print("✅ Hooks executed successfully!")
print(f"Content length: {len(result.get('markdown', ''))} characters")
```
**Available Hook Points:**
- `on_browser_created`: Browser setup
- `on_page_context_created`: Page context configuration
- `before_goto`: Pre-navigation setup
- `after_goto`: Post-navigation processing
- `on_user_agent_updated`: User agent changes
- `on_execution_started`: Crawl initialization
- `before_retrieve_html`: Pre-extraction processing
- `before_return_html`: Final HTML processing
### Function-Based Hooks API
Writing hooks as strings works, but lacks IDE support and type checking. v0.7.5 introduces a function-based approach with automatic conversion!
**Option 1: Using the `hooks_to_string()` Utility**
```python
from crawl4ai import hooks_to_string
import requests
# Define hooks as regular Python functions (with full IDE support!)
async def on_page_context_created(page, context, **kwargs):
"""Block images to speed up crawling"""
await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
await page.set_viewport_size({"width": 1920, "height": 1080})
return page
async def before_goto(page, context, url, **kwargs):
"""Add custom headers"""
await page.set_extra_http_headers({
'X-Crawl4AI': 'v0.7.5',
'X-Custom-Header': 'my-value'
})
return page
# Convert functions to strings
hooks_code = hooks_to_string({
"on_page_context_created": on_page_context_created,
"before_goto": before_goto
})
# Use with REST API
payload = {
"urls": ["https://httpbin.org/html"],
"hooks": {"code": hooks_code, "timeout": 30}
}
response = requests.post("http://localhost:11235/crawl", json=payload)
```
**Option 2: Docker Client with Automatic Conversion (Recommended!)**
```python
from crawl4ai.docker_client import Crawl4aiDockerClient
# Define hooks as functions (same as above)
async def on_page_context_created(page, context, **kwargs):
await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
return page
async def before_retrieve_html(page, context, **kwargs):
# Scroll to load lazy content
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
await page.wait_for_timeout(1000)
return page
# Use Docker client - conversion happens automatically!
client = Crawl4aiDockerClient(base_url="http://localhost:11235")
results = await client.crawl(
urls=["https://httpbin.org/html"],
hooks={
"on_page_context_created": on_page_context_created,
"before_retrieve_html": before_retrieve_html
},
hooks_timeout=30
)
if results and results.success:
print(f"✅ Hooks executed! HTML length: {len(results.html)}")
```
**Benefits of Function-Based Hooks:**
- ✅ Full IDE support (autocomplete, syntax highlighting)
- ✅ Type checking and linting
- ✅ Easier to test and debug
- ✅ Reusable across projects
- ✅ Automatic conversion in Docker client
- ✅ No breaking changes - string hooks still work!
## 🤖 Enhanced LLM Integration
Enhanced LLM integration with custom providers, temperature control, and base URL configuration.
### Multi-Provider Support
```python
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
from crawl4ai.extraction_strategy import LLMExtractionStrategy
# Test with different providers
async def test_llm_providers():
# OpenAI with custom temperature
openai_strategy = LLMExtractionStrategy(
provider="gemini/gemini-2.5-flash-lite",
api_token="your-api-token",
temperature=0.7, # New in v0.7.5
instruction="Summarize this page in one sentence"
)
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
"https://example.com",
config=CrawlerRunConfig(extraction_strategy=openai_strategy)
)
if result.success:
print("✅ LLM extraction completed")
print(result.extracted_content)
# Docker API with enhanced LLM config
llm_payload = {
"url": "https://example.com",
"f": "llm",
"q": "Summarize this page in one sentence.",
"provider": "gemini/gemini-2.5-flash-lite",
"temperature": 0.7
}
response = requests.post("http://localhost:11235/md", json=llm_payload)
```
**New Features:**
- Custom `temperature` parameter for creativity control
- `base_url` for custom API endpoints
- Multi-provider environment variable support
- Docker API integration
## 🔒 HTTPS Preservation
**The Problem:** Modern web apps require HTTPS everywhere. When crawlers downgrade internal links from HTTPS to HTTP, authentication breaks and security warnings appear.
**Solution:** HTTPS preservation maintains secure protocols throughout crawling.
```python
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, FilterChain, URLPatternFilter, BFSDeepCrawlStrategy
async def test_https_preservation():
# Enable HTTPS preservation
url_filter = URLPatternFilter(
patterns=["^(https:\/\/)?quotes\.toscrape\.com(\/.*)?$"]
)
config = CrawlerRunConfig(
exclude_external_links=True,
preserve_https_for_internal_links=True, # New in v0.7.5
deep_crawl_strategy=BFSDeepCrawlStrategy(
max_depth=2,
max_pages=5,
filter_chain=FilterChain([url_filter])
)
)
async with AsyncWebCrawler() as crawler:
async for result in await crawler.arun(
url="https://quotes.toscrape.com",
config=config
):
# All internal links maintain HTTPS
internal_links = [link['href'] for link in result.links['internal']]
https_links = [link for link in internal_links if link.startswith('https://')]
print(f"HTTPS links preserved: {len(https_links)}/{len(internal_links)}")
for link in https_links[:3]:
print(f"{link}")
```
## 🛠️ Bug Fixes and Improvements
### Major Fixes
- **URL Processing**: Fixed '+' sign preservation in query parameters (#1332)
- **Proxy Configuration**: Enhanced proxy string parsing (old `proxy` parameter deprecated)
- **Docker Error Handling**: Comprehensive error messages with status codes
- **Memory Management**: Fixed leaks in long-running sessions
- **JWT Authentication**: Fixed Docker JWT validation issues (#1442)
- **Playwright Stealth**: Fixed stealth features for Playwright integration (#1481)
- **API Configuration**: Fixed config handling to prevent overriding user-provided settings (#1505)
- **Docker Filter Serialization**: Resolved JSON encoding errors in deep crawl strategy (#1419)
- **LLM Provider Support**: Fixed custom LLM provider integration for adaptive crawler (#1291)
- **Performance Issues**: Resolved backoff strategy failures and timeout handling (#989)
### Community-Reported Issues Fixed
This release addresses multiple issues reported by the community through GitHub issues and Discord discussions:
- Fixed browser configuration reference errors
- Resolved dependency conflicts with cssselect
- Improved error messaging for failed authentications
- Enhanced compatibility with various proxy configurations
- Fixed edge cases in URL normalization
### Configuration Updates
```python
# Old proxy config (deprecated)
# browser_config = BrowserConfig(proxy="http://proxy:8080")
# New enhanced proxy config
browser_config = BrowserConfig(
proxy_config={
"server": "http://proxy:8080",
"username": "optional-user",
"password": "optional-pass"
}
)
```
## 🔄 Breaking Changes
1. **Python 3.10+ Required**: Upgrade from Python 3.9
2. **Proxy Parameter Deprecated**: Use new `proxy_config` structure
3. **New Dependency**: Added `cssselect` for better CSS handling
## 🚀 Get Started
```bash
# Install latest version
pip install crawl4ai==0.7.5
# Docker deployment
docker pull unclecode/crawl4ai:latest
docker run -p 11235:11235 unclecode/crawl4ai:latest
```
**Try the Demo:**
```bash
# Run working examples
python docs/releases_review/demo_v0.7.5.py
```
**Resources:**
- 📖 Documentation: [docs.crawl4ai.com](https://docs.crawl4ai.com)
- 🐙 GitHub: [github.com/unclecode/crawl4ai](https://github.com/unclecode/crawl4ai)
- 💬 Discord: [discord.gg/crawl4ai](https://discord.gg/jP8KfhDhyN)
- 🐦 Twitter: [@unclecode](https://x.com/unclecode)
Happy crawling! 🕷️

View File

@@ -20,26 +20,17 @@ Ever wondered why your AI coding assistant struggles with your library despite c
## Latest Release
### [Crawl4AI v0.7.5 The Docker Hooks & Security Update](../blog/release-v0.7.5.md)
*September 29, 2025*
Crawl4AI v0.7.5 introduces the powerful Docker Hooks System for complete pipeline customization, enhanced LLM integration with custom providers, HTTPS preservation for modern web security, and resolves multiple community-reported issues.
Key highlights:
- **🔧 Docker Hooks System**: Custom Python functions at 8 key pipeline points for unprecedented customization
- **🤖 Enhanced LLM Integration**: Custom providers with temperature control and base_url configuration
- **🔒 HTTPS Preservation**: Secure internal link handling for modern web applications
- **🐍 Python 3.10+ Support**: Modern language features and enhanced performance
- **🛠️ Bug Fixes**: Resolved multiple community-reported issues including URL processing, JWT authentication, and proxy configuration
[Read full release notes →](../blog/release-v0.7.5.md)
## Recent Releases
### [Crawl4AI v0.7.4 The Intelligent Table Extraction & Performance Update](../blog/release-v0.7.4.md)
*August 17, 2025*
Revolutionary LLM-powered table extraction with intelligent chunking, performance improvements for concurrent crawling, enhanced browser management, and critical stability fixes.
Crawl4AI v0.7.4 introduces revolutionary LLM-powered table extraction with intelligent chunking, performance improvements for concurrent crawling, enhanced browser management, and critical stability fixes that make Crawl4AI more robust for production workloads.
Key highlights:
- **🚀 LLMTableExtraction**: Revolutionary table extraction with intelligent chunking for massive tables
- **⚡ Dispatcher Bug Fix**: Fixed sequential processing issue in arun_many for fast-completing tasks
- **🧹 Memory Management Refactor**: Streamlined memory utilities and better resource management
- **🔧 Browser Manager Fixes**: Resolved race conditions in concurrent page creation
- **🔗 Advanced URL Processing**: Better handling of raw URLs and base tag link resolution
[Read full release notes →](../blog/release-v0.7.4.md)

View File

@@ -1,318 +0,0 @@
# 🚀 Crawl4AI v0.7.5: The Docker Hooks & Security Update
*September 29, 2025 • 8 min read*
---
Today I'm releasing Crawl4AI v0.7.5—focused on extensibility and security. This update introduces the Docker Hooks System for pipeline customization, enhanced LLM integration, and important security improvements.
## 🎯 What's New at a Glance
- **Docker Hooks System**: Custom Python functions at key pipeline points with function-based API
- **Function-Based Hooks**: New `hooks_to_string()` utility with Docker client auto-conversion
- **Enhanced LLM Integration**: Custom providers with temperature control
- **HTTPS Preservation**: Secure internal link handling
- **Bug Fixes**: Resolved multiple community-reported issues
- **Improved Docker Error Handling**: Better debugging and reliability
## 🔧 Docker Hooks System: Pipeline Customization
Every scraping project needs custom logic—authentication, performance optimization, content processing. Traditional solutions require forking or complex workarounds. Docker Hooks let you inject custom Python functions at 8 key points in the crawling pipeline.
### Real Example: Authentication & Performance
```python
import requests
# Real working hooks for httpbin.org
hooks_config = {
"on_page_context_created": """
async def hook(page, context, **kwargs):
print("Hook: Setting up page context")
# Block images to speed up crawling
await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
print("Hook: Images blocked")
return page
""",
"before_retrieve_html": """
async def hook(page, context, **kwargs):
print("Hook: Before retrieving HTML")
# Scroll to bottom to load lazy content
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
await page.wait_for_timeout(1000)
print("Hook: Scrolled to bottom")
return page
""",
"before_goto": """
async def hook(page, context, url, **kwargs):
print(f"Hook: About to navigate to {url}")
# Add custom headers
await page.set_extra_http_headers({
'X-Test-Header': 'crawl4ai-hooks-test'
})
return page
"""
}
# Test with Docker API
payload = {
"urls": ["https://httpbin.org/html"],
"hooks": {
"code": hooks_config,
"timeout": 30
}
}
response = requests.post("http://localhost:11235/crawl", json=payload)
result = response.json()
if result.get('success'):
print("✅ Hooks executed successfully!")
print(f"Content length: {len(result.get('markdown', ''))} characters")
```
**Available Hook Points:**
- `on_browser_created`: Browser setup
- `on_page_context_created`: Page context configuration
- `before_goto`: Pre-navigation setup
- `after_goto`: Post-navigation processing
- `on_user_agent_updated`: User agent changes
- `on_execution_started`: Crawl initialization
- `before_retrieve_html`: Pre-extraction processing
- `before_return_html`: Final HTML processing
### Function-Based Hooks API
Writing hooks as strings works, but lacks IDE support and type checking. v0.7.5 introduces a function-based approach with automatic conversion!
**Option 1: Using the `hooks_to_string()` Utility**
```python
from crawl4ai import hooks_to_string
import requests
# Define hooks as regular Python functions (with full IDE support!)
async def on_page_context_created(page, context, **kwargs):
"""Block images to speed up crawling"""
await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
await page.set_viewport_size({"width": 1920, "height": 1080})
return page
async def before_goto(page, context, url, **kwargs):
"""Add custom headers"""
await page.set_extra_http_headers({
'X-Crawl4AI': 'v0.7.5',
'X-Custom-Header': 'my-value'
})
return page
# Convert functions to strings
hooks_code = hooks_to_string({
"on_page_context_created": on_page_context_created,
"before_goto": before_goto
})
# Use with REST API
payload = {
"urls": ["https://httpbin.org/html"],
"hooks": {"code": hooks_code, "timeout": 30}
}
response = requests.post("http://localhost:11235/crawl", json=payload)
```
**Option 2: Docker Client with Automatic Conversion (Recommended!)**
```python
from crawl4ai.docker_client import Crawl4aiDockerClient
# Define hooks as functions (same as above)
async def on_page_context_created(page, context, **kwargs):
await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
return page
async def before_retrieve_html(page, context, **kwargs):
# Scroll to load lazy content
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
await page.wait_for_timeout(1000)
return page
# Use Docker client - conversion happens automatically!
client = Crawl4aiDockerClient(base_url="http://localhost:11235")
results = await client.crawl(
urls=["https://httpbin.org/html"],
hooks={
"on_page_context_created": on_page_context_created,
"before_retrieve_html": before_retrieve_html
},
hooks_timeout=30
)
if results and results.success:
print(f"✅ Hooks executed! HTML length: {len(results.html)}")
```
**Benefits of Function-Based Hooks:**
- ✅ Full IDE support (autocomplete, syntax highlighting)
- ✅ Type checking and linting
- ✅ Easier to test and debug
- ✅ Reusable across projects
- ✅ Automatic conversion in Docker client
- ✅ No breaking changes - string hooks still work!
## 🤖 Enhanced LLM Integration
Enhanced LLM integration with custom providers, temperature control, and base URL configuration.
### Multi-Provider Support
```python
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
from crawl4ai.extraction_strategy import LLMExtractionStrategy
# Test with different providers
async def test_llm_providers():
# OpenAI with custom temperature
openai_strategy = LLMExtractionStrategy(
provider="gemini/gemini-2.5-flash-lite",
api_token="your-api-token",
temperature=0.7, # New in v0.7.5
instruction="Summarize this page in one sentence"
)
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
"https://example.com",
config=CrawlerRunConfig(extraction_strategy=openai_strategy)
)
if result.success:
print("✅ LLM extraction completed")
print(result.extracted_content)
# Docker API with enhanced LLM config
llm_payload = {
"url": "https://example.com",
"f": "llm",
"q": "Summarize this page in one sentence.",
"provider": "gemini/gemini-2.5-flash-lite",
"temperature": 0.7
}
response = requests.post("http://localhost:11235/md", json=llm_payload)
```
**New Features:**
- Custom `temperature` parameter for creativity control
- `base_url` for custom API endpoints
- Multi-provider environment variable support
- Docker API integration
## 🔒 HTTPS Preservation
**The Problem:** Modern web apps require HTTPS everywhere. When crawlers downgrade internal links from HTTPS to HTTP, authentication breaks and security warnings appear.
**Solution:** HTTPS preservation maintains secure protocols throughout crawling.
```python
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, FilterChain, URLPatternFilter, BFSDeepCrawlStrategy
async def test_https_preservation():
# Enable HTTPS preservation
url_filter = URLPatternFilter(
patterns=["^(https:\/\/)?quotes\.toscrape\.com(\/.*)?$"]
)
config = CrawlerRunConfig(
exclude_external_links=True,
preserve_https_for_internal_links=True, # New in v0.7.5
deep_crawl_strategy=BFSDeepCrawlStrategy(
max_depth=2,
max_pages=5,
filter_chain=FilterChain([url_filter])
)
)
async with AsyncWebCrawler() as crawler:
async for result in await crawler.arun(
url="https://quotes.toscrape.com",
config=config
):
# All internal links maintain HTTPS
internal_links = [link['href'] for link in result.links['internal']]
https_links = [link for link in internal_links if link.startswith('https://')]
print(f"HTTPS links preserved: {len(https_links)}/{len(internal_links)}")
for link in https_links[:3]:
print(f"{link}")
```
## 🛠️ Bug Fixes and Improvements
### Major Fixes
- **URL Processing**: Fixed '+' sign preservation in query parameters (#1332)
- **Proxy Configuration**: Enhanced proxy string parsing (old `proxy` parameter deprecated)
- **Docker Error Handling**: Comprehensive error messages with status codes
- **Memory Management**: Fixed leaks in long-running sessions
- **JWT Authentication**: Fixed Docker JWT validation issues (#1442)
- **Playwright Stealth**: Fixed stealth features for Playwright integration (#1481)
- **API Configuration**: Fixed config handling to prevent overriding user-provided settings (#1505)
- **Docker Filter Serialization**: Resolved JSON encoding errors in deep crawl strategy (#1419)
- **LLM Provider Support**: Fixed custom LLM provider integration for adaptive crawler (#1291)
- **Performance Issues**: Resolved backoff strategy failures and timeout handling (#989)
### Community-Reported Issues Fixed
This release addresses multiple issues reported by the community through GitHub issues and Discord discussions:
- Fixed browser configuration reference errors
- Resolved dependency conflicts with cssselect
- Improved error messaging for failed authentications
- Enhanced compatibility with various proxy configurations
- Fixed edge cases in URL normalization
### Configuration Updates
```python
# Old proxy config (deprecated)
# browser_config = BrowserConfig(proxy="http://proxy:8080")
# New enhanced proxy config
browser_config = BrowserConfig(
proxy_config={
"server": "http://proxy:8080",
"username": "optional-user",
"password": "optional-pass"
}
)
```
## 🔄 Breaking Changes
1. **Python 3.10+ Required**: Upgrade from Python 3.9
2. **Proxy Parameter Deprecated**: Use new `proxy_config` structure
3. **New Dependency**: Added `cssselect` for better CSS handling
## 🚀 Get Started
```bash
# Install latest version
pip install crawl4ai==0.7.5
# Docker deployment
docker pull unclecode/crawl4ai:latest
docker run -p 11235:11235 unclecode/crawl4ai:latest
```
**Try the Demo:**
```bash
# Run working examples
python docs/releases_review/demo_v0.7.5.py
```
**Resources:**
- 📖 Documentation: [docs.crawl4ai.com](https://docs.crawl4ai.com)
- 🐙 GitHub: [github.com/unclecode/crawl4ai](https://github.com/unclecode/crawl4ai)
- 💬 Discord: [discord.gg/crawl4ai](https://discord.gg/jP8KfhDhyN)
- 🐦 Twitter: [@unclecode](https://x.com/unclecode)
Happy crawling! 🕷️

View File

@@ -529,8 +529,19 @@ class AdminDashboard {
</label>
</div>
<div class="form-group full-width">
<label>Integration Guide</label>
<textarea id="form-integration" rows="10">${app?.integration_guide || ''}</textarea>
<label>Long Description (Markdown - Overview tab)</label>
<textarea id="form-long-description" rows="10" placeholder="Enter detailed description with markdown formatting...">${app?.long_description || ''}</textarea>
<small>Markdown support: **bold**, *italic*, [links](url), # headers, code blocks, lists</small>
</div>
<div class="form-group full-width">
<label>Integration Guide (Markdown - Integration tab)</label>
<textarea id="form-integration" rows="20" placeholder="Enter integration guide with installation, examples, and code snippets using markdown...">${app?.integration_guide || ''}</textarea>
<small>Single markdown field with installation, examples, and complete guide. Code blocks get auto copy buttons.</small>
</div>
<div class="form-group full-width">
<label>Documentation (Markdown - Documentation tab)</label>
<textarea id="form-documentation" rows="20" placeholder="Enter documentation with API reference, examples, and best practices using markdown...">${app?.documentation || ''}</textarea>
<small>Full documentation with API reference, examples, best practices, etc.</small>
</div>
</div>
`;
@@ -712,7 +723,9 @@ class AdminDashboard {
data.contact_email = document.getElementById('form-email').value;
data.featured = document.getElementById('form-featured').checked ? 1 : 0;
data.sponsored = document.getElementById('form-sponsored').checked ? 1 : 0;
data.long_description = document.getElementById('form-long-description').value;
data.integration_guide = document.getElementById('form-integration').value;
data.documentation = document.getElementById('form-documentation').value;
} else if (type === 'articles') {
data.title = document.getElementById('form-title').value;
data.slug = this.generateSlug(data.title);

View File

@@ -510,6 +510,31 @@
line-height: 1.5;
}
/* Markdown rendered code blocks */
.integration-content pre,
.docs-content pre {
background: var(--bg-dark);
border: 1px solid var(--border-color);
margin: 1rem 0;
padding: 1rem;
padding-top: 2.5rem; /* Space for copy button */
overflow-x: auto;
position: relative;
max-height: none; /* Remove any height restrictions */
height: auto; /* Allow content to expand */
}
.integration-content pre code,
.docs-content pre code {
background: transparent;
padding: 0;
color: var(--text-secondary);
font-size: 0.875rem;
line-height: 1.5;
white-space: pre; /* Preserve whitespace and line breaks */
display: block;
}
/* Feature Grid */
.feature-grid {
display: grid;

View File

@@ -80,20 +80,7 @@
<section id="overview-tab" class="tab-content active">
<div class="overview-columns">
<div class="overview-main">
<h2>Overview</h2>
<div id="app-overview">Overview content goes here.</div>
<h3>Key Features</h3>
<ul id="app-features" class="features-list">
<li>Feature 1</li>
<li>Feature 2</li>
<li>Feature 3</li>
</ul>
<h3>Use Cases</h3>
<div id="app-use-cases" class="use-cases">
<p>Describe how this app can help your workflow.</p>
</div>
</div>
<aside class="sidebar">
@@ -142,33 +129,14 @@
</section>
<section id="integration-tab" class="tab-content">
<div class="integration-content">
<h2>Integration Guide</h2>
<h3>Installation</h3>
<div class="code-block">
<pre><code id="install-code"># Installation instructions will appear here</code></pre>
</div>
<h3>Basic Usage</h3>
<div class="code-block">
<pre><code id="usage-code"># Usage example will appear here</code></pre>
</div>
<h3>Complete Integration Example</h3>
<div class="code-block">
<button class="copy-btn" id="copy-integration">Copy</button>
<pre><code id="integration-code"># Complete integration guide will appear here</code></pre>
</div>
<div class="integration-content" id="app-integration">
<!-- Integration guide markdown content will be rendered here -->
</div>
</section>
<section id="docs-tab" class="tab-content">
<div class="docs-content">
<h2>Documentation</h2>
<div id="app-docs" class="doc-sections">
<p>Documentation coming soon.</p>
</div>
<div class="docs-content" id="app-docs">
<!-- Documentation markdown content will be rendered here -->
</div>
</section>

View File

@@ -123,144 +123,132 @@ class AppDetailPage {
document.getElementById('sidebar-pricing').textContent = this.appData.pricing || 'Free';
document.getElementById('sidebar-contact').textContent = this.appData.contact_email || 'contact@example.com';
// Integration guide
this.renderIntegrationGuide();
// Render tab contents from database fields
this.renderTabContents();
}
renderIntegrationGuide() {
// Installation code
const installCode = document.getElementById('install-code');
if (installCode) {
if (this.appData.type === 'Open Source' && this.appData.github_url) {
installCode.textContent = `# Clone from GitHub
git clone ${this.appData.github_url}
# Install dependencies
pip install -r requirements.txt`;
} else if (this.appData.name.toLowerCase().includes('api')) {
installCode.textContent = `# Install via pip
pip install ${this.appData.slug}
# Or install from source
pip install git+${this.appData.github_url || 'https://github.com/example/repo'}`;
renderTabContents() {
// Overview tab - use long_description from database
const overviewDiv = document.getElementById('app-overview');
if (overviewDiv) {
if (this.appData.long_description) {
overviewDiv.innerHTML = this.renderMarkdown(this.appData.long_description);
} else {
overviewDiv.innerHTML = `<p>${this.appData.description || 'No overview available.'}</p>`;
}
}
// Usage code - customize based on category
const usageCode = document.getElementById('usage-code');
if (usageCode) {
if (this.appData.category === 'Browser Automation') {
usageCode.textContent = `from crawl4ai import AsyncWebCrawler
from ${this.appData.slug.replace(/-/g, '_')} import ${this.appData.name.replace(/\s+/g, '')}
async def main():
# Initialize ${this.appData.name}
automation = ${this.appData.name.replace(/\s+/g, '')}()
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url="https://example.com",
browser_config=automation.config,
wait_for="css:body"
)
print(result.markdown)`;
} else if (this.appData.category === 'Proxy Services') {
usageCode.textContent = `from crawl4ai import AsyncWebCrawler
import ${this.appData.slug.replace(/-/g, '_')}
# Configure proxy
proxy_config = {
"server": "${this.appData.website_url || 'https://proxy.example.com'}",
"username": "your_username",
"password": "your_password"
}
async with AsyncWebCrawler(proxy=proxy_config) as crawler:
result = await crawler.arun(
url="https://example.com",
bypass_cache=True
)
print(result.status_code)`;
} else if (this.appData.category === 'LLM Integration') {
usageCode.textContent = `from crawl4ai import AsyncWebCrawler
from crawl4ai.extraction_strategy import LLMExtractionStrategy
# Configure LLM extraction
strategy = LLMExtractionStrategy(
provider="${this.appData.name.toLowerCase().includes('gpt') ? 'openai' : 'anthropic'}",
api_key="your-api-key",
model="${this.appData.name.toLowerCase().includes('gpt') ? 'gpt-4' : 'claude-3'}",
instruction="Extract structured data"
)
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url="https://example.com",
extraction_strategy=strategy
)
print(result.extracted_content)`;
// Integration tab - use integration_guide field from database
const integrationDiv = document.getElementById('app-integration');
if (integrationDiv) {
if (this.appData.integration_guide) {
integrationDiv.innerHTML = this.renderMarkdown(this.appData.integration_guide);
// Add copy buttons to all code blocks
this.addCopyButtonsToCodeBlocks(integrationDiv);
} else {
integrationDiv.innerHTML = '<p>Integration guide not yet available. Please check the official website for details.</p>';
}
}
// Integration example
const integrationCode = document.getElementById('integration-code');
if (integrationCode) {
integrationCode.textContent = this.appData.integration_guide ||
`# Complete ${this.appData.name} Integration Example
from crawl4ai import AsyncWebCrawler
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
import json
async def crawl_with_${this.appData.slug.replace(/-/g, '_')}():
"""
Complete example showing how to use ${this.appData.name}
with Crawl4AI for production web scraping
"""
# Define extraction schema
schema = {
"name": "ProductList",
"baseSelector": "div.product",
"fields": [
{"name": "title", "selector": "h2", "type": "text"},
{"name": "price", "selector": ".price", "type": "text"},
{"name": "image", "selector": "img", "type": "attribute", "attribute": "src"},
{"name": "link", "selector": "a", "type": "attribute", "attribute": "href"}
]
// Documentation tab - use documentation field from database
const docsDiv = document.getElementById('app-docs');
if (docsDiv) {
if (this.appData.documentation) {
docsDiv.innerHTML = this.renderMarkdown(this.appData.documentation);
// Add copy buttons to all code blocks
this.addCopyButtonsToCodeBlocks(docsDiv);
} else {
docsDiv.innerHTML = '<p>Documentation coming soon.</p>';
}
}
}
# Initialize crawler with ${this.appData.name}
async with AsyncWebCrawler(
browser_type="chromium",
headless=True,
verbose=True
) as crawler:
addCopyButtonsToCodeBlocks(container) {
// Find all code blocks and add copy buttons
const codeBlocks = container.querySelectorAll('pre code');
codeBlocks.forEach(codeBlock => {
const pre = codeBlock.parentElement;
# Crawl with extraction
result = await crawler.arun(
url="https://example.com/products",
extraction_strategy=JsonCssExtractionStrategy(schema),
cache_mode="bypass",
wait_for="css:.product",
screenshot=True
)
// Skip if already has a copy button
if (pre.querySelector('.copy-btn')) return;
# Process results
if result.success:
products = json.loads(result.extracted_content)
print(f"Found {len(products)} products")
// Create copy button
const copyBtn = document.createElement('button');
copyBtn.className = 'copy-btn';
copyBtn.textContent = 'Copy';
copyBtn.onclick = () => {
navigator.clipboard.writeText(codeBlock.textContent).then(() => {
copyBtn.textContent = '✓ Copied!';
setTimeout(() => {
copyBtn.textContent = 'Copy';
}, 2000);
});
};
for product in products[:5]:
print(f"- {product['title']}: {product['price']}")
// Add button to pre element
pre.style.position = 'relative';
pre.insertBefore(copyBtn, codeBlock);
});
}
return products
renderMarkdown(text) {
if (!text) return '';
# Run the crawler
if __name__ == "__main__":
import asyncio
asyncio.run(crawl_with_${this.appData.slug.replace(/-/g, '_')}())`;
}
// Store code blocks temporarily to protect them from processing
const codeBlocks = [];
let processed = text.replace(/```(\w+)?\n([\s\S]*?)```/g, (match, lang, code) => {
const placeholder = `___CODE_BLOCK_${codeBlocks.length}___`;
codeBlocks.push(`<pre><code class="language-${lang || ''}">${this.escapeHtml(code)}</code></pre>`);
return placeholder;
});
// Store inline code temporarily
const inlineCodes = [];
processed = processed.replace(/`([^`]+)`/g, (match, code) => {
const placeholder = `___INLINE_CODE_${inlineCodes.length}___`;
inlineCodes.push(`<code>${this.escapeHtml(code)}</code>`);
return placeholder;
});
// Now process the rest of the markdown
processed = processed
// Headers
.replace(/^### (.*$)/gim, '<h3>$1</h3>')
.replace(/^## (.*$)/gim, '<h2>$1</h2>')
.replace(/^# (.*$)/gim, '<h1>$1</h1>')
// Bold
.replace(/\*\*(.*?)\*\*/g, '<strong>$1</strong>')
// Italic
.replace(/\*(.*?)\*/g, '<em>$1</em>')
// Links
.replace(/\[([^\]]+)\]\(([^)]+)\)/g, '<a href="$2" target="_blank">$1</a>')
// Line breaks
.replace(/\n\n/g, '</p><p>')
.replace(/\n/g, '<br>')
// Lists
.replace(/^\* (.*)$/gim, '<li>$1</li>')
.replace(/^- (.*)$/gim, '<li>$1</li>')
// Wrap in paragraphs
.replace(/^(?!<[h|p|pre|ul|ol|li])/gim, '<p>')
.replace(/(?<![>])$/gim, '</p>');
// Restore inline code
inlineCodes.forEach((code, i) => {
processed = processed.replace(`___INLINE_CODE_${i}___`, code);
});
// Restore code blocks
codeBlocks.forEach((block, i) => {
processed = processed.replace(`___CODE_BLOCK_${i}___`, block);
});
return processed;
}
escapeHtml(text) {
const div = document.createElement('div');
div.textContent = text;
return div.innerHTML;
}
formatNumber(num) {
@@ -289,33 +277,6 @@ if __name__ == "__main__":
document.getElementById(`${tabName}-tab`).classList.add('active');
});
});
// Copy integration code
document.getElementById('copy-integration').addEventListener('click', () => {
const code = document.getElementById('integration-code').textContent;
navigator.clipboard.writeText(code).then(() => {
const btn = document.getElementById('copy-integration');
const originalText = btn.innerHTML;
btn.innerHTML = '<span>✓</span> Copied!';
setTimeout(() => {
btn.innerHTML = originalText;
}, 2000);
});
});
// Copy code buttons
document.querySelectorAll('.copy-btn').forEach(btn => {
btn.addEventListener('click', (e) => {
const codeBlock = e.target.closest('.code-block');
const code = codeBlock.querySelector('code').textContent;
navigator.clipboard.writeText(code).then(() => {
btn.textContent = 'Copied!';
setTimeout(() => {
btn.textContent = 'Copy';
}, 2000);
});
});
});
}
async loadRelatedApps() {

View File

@@ -1,338 +0,0 @@
"""
🚀 Crawl4AI v0.7.5 Release Demo - Working Examples
==================================================
This demo showcases key features introduced in v0.7.5 with real, executable examples.
Featured Demos:
1. ✅ Docker Hooks System - Real API calls with custom hooks (string & function-based)
2. ✅ Enhanced LLM Integration - Working LLM configurations
3. ✅ HTTPS Preservation - Live crawling with HTTPS maintenance
Requirements:
- crawl4ai v0.7.5 installed
- Docker running with crawl4ai image (optional for Docker demos)
- Valid API keys for LLM demos (optional)
"""
import asyncio
import requests
import time
import sys
from crawl4ai import (AsyncWebCrawler, CrawlerRunConfig, BrowserConfig,
CacheMode, FilterChain, URLPatternFilter, BFSDeepCrawlStrategy,
hooks_to_string)
from crawl4ai.docker_client import Crawl4aiDockerClient
def print_section(title: str, description: str = ""):
"""Print a section header"""
print(f"\n{'=' * 60}")
print(f"{title}")
if description:
print(f"{description}")
print(f"{'=' * 60}\n")
async def demo_1_docker_hooks_system():
"""Demo 1: Docker Hooks System - Real API calls with custom hooks"""
print_section(
"Demo 1: Docker Hooks System",
"Testing both string-based and function-based hooks (NEW in v0.7.5!)"
)
# Check Docker service availability
def check_docker_service():
try:
response = requests.get("http://localhost:11235/", timeout=3)
return response.status_code == 200
except:
return False
print("Checking Docker service...")
docker_running = check_docker_service()
if not docker_running:
print("⚠️ Docker service not running on localhost:11235")
print("To test Docker hooks:")
print("1. Run: docker run -p 11235:11235 unclecode/crawl4ai:latest")
print("2. Wait for service to start")
print("3. Re-run this demo\n")
return
print("✓ Docker service detected!")
# ============================================================================
# PART 1: Traditional String-Based Hooks (Works with REST API)
# ============================================================================
print("\n" + "" * 60)
print("Part 1: String-Based Hooks (REST API)")
print("" * 60)
hooks_config_string = {
"on_page_context_created": """
async def hook(page, context, **kwargs):
print("[String Hook] Setting up page context")
await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
return page
""",
"before_retrieve_html": """
async def hook(page, context, **kwargs):
print("[String Hook] Before retrieving HTML")
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
await page.wait_for_timeout(1000)
return page
"""
}
payload = {
"urls": ["https://httpbin.org/html"],
"hooks": {
"code": hooks_config_string,
"timeout": 30
}
}
print("🔧 Using string-based hooks for REST API...")
try:
start_time = time.time()
response = requests.post("http://localhost:11235/crawl", json=payload, timeout=60)
execution_time = time.time() - start_time
if response.status_code == 200:
result = response.json()
print(f"✅ String-based hooks executed in {execution_time:.2f}s")
if result.get('results') and result['results'][0].get('success'):
html_length = len(result['results'][0].get('html', ''))
print(f" 📄 HTML length: {html_length} characters")
else:
print(f"❌ Request failed: {response.status_code}")
except Exception as e:
print(f"❌ Error: {str(e)}")
# ============================================================================
# PART 2: NEW Function-Based Hooks with Docker Client (v0.7.5)
# ============================================================================
print("\n" + "" * 60)
print("Part 2: Function-Based Hooks with Docker Client (✨ NEW!)")
print("" * 60)
# Define hooks as regular Python functions
async def on_page_context_created_func(page, context, **kwargs):
"""Block images to speed up crawling"""
print("[Function Hook] Setting up page context")
await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
await page.set_viewport_size({"width": 1920, "height": 1080})
return page
async def before_goto_func(page, context, url, **kwargs):
"""Add custom headers before navigation"""
print(f"[Function Hook] About to navigate to {url}")
await page.set_extra_http_headers({
'X-Crawl4AI': 'v0.7.5-function-hooks',
'X-Test-Header': 'demo'
})
return page
async def before_retrieve_html_func(page, context, **kwargs):
"""Scroll to load lazy content"""
print("[Function Hook] Scrolling page for lazy-loaded content")
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
await page.wait_for_timeout(500)
await page.evaluate("window.scrollTo(0, 0)")
return page
# Use the hooks_to_string utility (can be used standalone)
print("\n📦 Converting functions to strings with hooks_to_string()...")
hooks_as_strings = hooks_to_string({
"on_page_context_created": on_page_context_created_func,
"before_goto": before_goto_func,
"before_retrieve_html": before_retrieve_html_func
})
print(f" ✓ Converted {len(hooks_as_strings)} hooks to string format")
# OR use Docker Client which does conversion automatically!
print("\n🐳 Using Docker Client with automatic conversion...")
try:
client = Crawl4aiDockerClient(base_url="http://localhost:11235")
# Pass function objects directly - conversion happens automatically!
results = await client.crawl(
urls=["https://httpbin.org/html"],
hooks={
"on_page_context_created": on_page_context_created_func,
"before_goto": before_goto_func,
"before_retrieve_html": before_retrieve_html_func
},
hooks_timeout=30
)
if results and results.success:
print(f"✅ Function-based hooks executed successfully!")
print(f" 📄 HTML length: {len(results.html)} characters")
print(f" 🎯 URL: {results.url}")
else:
print("⚠️ Crawl completed but may have warnings")
except Exception as e:
print(f"❌ Docker client error: {str(e)}")
# Show the benefits
print("\n" + "=" * 60)
print("✨ Benefits of Function-Based Hooks:")
print("=" * 60)
print("✓ Full IDE support (autocomplete, syntax highlighting)")
print("✓ Type checking and linting")
print("✓ Easier to test and debug")
print("✓ Reusable across projects")
print("✓ Automatic conversion in Docker client")
print("=" * 60)
async def demo_2_enhanced_llm_integration():
"""Demo 2: Enhanced LLM Integration - Working LLM configurations"""
print_section(
"Demo 2: Enhanced LLM Integration",
"Testing custom LLM providers and configurations"
)
print("🤖 Testing Enhanced LLM Integration Features")
provider = "gemini/gemini-2.5-flash-lite"
payload = {
"url": "https://example.com",
"f": "llm",
"q": "Summarize this page in one sentence.",
"provider": provider, # Explicitly set provider
"temperature": 0.7
}
try:
response = requests.post(
"http://localhost:11235/md",
json=payload,
timeout=60
)
if response.status_code == 200:
result = response.json()
print(f"✓ Request successful with provider: {provider}")
print(f" - Response keys: {list(result.keys())}")
print(f" - Content length: {len(result.get('markdown', ''))} characters")
print(f" - Note: Actual LLM call may fail without valid API key")
else:
print(f"❌ Request failed: {response.status_code}")
print(f" - Response: {response.text[:500]}")
except Exception as e:
print(f"[red]Error: {e}[/]")
async def demo_3_https_preservation():
"""Demo 3: HTTPS Preservation - Live crawling with HTTPS maintenance"""
print_section(
"Demo 3: HTTPS Preservation",
"Testing HTTPS preservation for internal links"
)
print("🔒 Testing HTTPS Preservation Feature")
# Test with HTTPS preservation enabled
print("\nTest 1: HTTPS Preservation ENABLED")
url_filter = URLPatternFilter(
patterns=["^(https:\/\/)?quotes\.toscrape\.com(\/.*)?$"]
)
config = CrawlerRunConfig(
exclude_external_links=True,
stream=True,
verbose=False,
preserve_https_for_internal_links=True,
deep_crawl_strategy=BFSDeepCrawlStrategy(
max_depth=2,
max_pages=5,
filter_chain=FilterChain([url_filter])
)
)
test_url = "https://quotes.toscrape.com"
print(f"🎯 Testing URL: {test_url}")
async with AsyncWebCrawler() as crawler:
async for result in await crawler.arun(url=test_url, config=config):
print("✓ HTTPS Preservation Test Completed")
internal_links = [i['href'] for i in result.links['internal']]
for link in internal_links:
print(f"{link}")
async def main():
"""Run all demos"""
print("\n" + "=" * 60)
print("🚀 Crawl4AI v0.7.5 Working Demo")
print("=" * 60)
# Check system requirements
print("🔍 System Requirements Check:")
print(f" - Python version: {sys.version.split()[0]} {'' if sys.version_info >= (3, 10) else '❌ (3.10+ required)'}")
try:
import requests
print(f" - Requests library: ✓")
except ImportError:
print(f" - Requests library: ❌")
print()
demos = [
("Docker Hooks System", demo_1_docker_hooks_system),
("Enhanced LLM Integration", demo_2_enhanced_llm_integration),
("HTTPS Preservation", demo_3_https_preservation),
]
for i, (name, demo_func) in enumerate(demos, 1):
try:
print(f"\n📍 Starting Demo {i}/{len(demos)}: {name}")
await demo_func()
if i < len(demos):
print(f"\n✨ Demo {i} complete! Press Enter for next demo...")
input()
except KeyboardInterrupt:
print(f"\n⏹️ Demo interrupted by user")
break
except Exception as e:
print(f"❌ Demo {i} error: {str(e)}")
print("Continuing to next demo...")
continue
print("\n" + "=" * 60)
print("🎉 Demo Complete!")
print("=" * 60)
print("You've experienced the power of Crawl4AI v0.7.5!")
print("")
print("Key Features Demonstrated:")
print("🔧 Docker Hooks - String-based & function-based (NEW!)")
print(" • hooks_to_string() utility for function conversion")
print(" • Docker client with automatic conversion")
print(" • Full IDE support and type checking")
print("🤖 Enhanced LLM - Better AI integration")
print("🔒 HTTPS Preservation - Secure link handling")
print("")
print("Ready to build something amazing? 🚀")
print("")
print("📖 Docs: https://docs.crawl4ai.com/")
print("🐙 GitHub: https://github.com/unclecode/crawl4ai")
print("=" * 60)
if __name__ == "__main__":
print("🚀 Crawl4AI v0.7.5 Live Demo Starting...")
print("Press Ctrl+C anytime to exit\n")
try:
asyncio.run(main())
except KeyboardInterrupt:
print("\n👋 Demo stopped by user. Thanks for trying Crawl4AI v0.7.5!")
except Exception as e:
print(f"\n❌ Demo error: {str(e)}")
print("Make sure you have the required dependencies installed.")

View File

@@ -1,655 +0,0 @@
#!/usr/bin/env python3
"""
🚀 Crawl4AI v0.7.5 - Docker Hooks System Complete Demonstration
================================================================
This file demonstrates the NEW Docker Hooks System introduced in v0.7.5.
The Docker Hooks System is a completely NEW feature that provides pipeline
customization through user-provided Python functions. It offers three approaches:
1. String-based hooks for REST API
2. hooks_to_string() utility to convert functions
3. Docker Client with automatic conversion (most convenient)
All three approaches are part of this NEW v0.7.5 feature!
Perfect for video recording and demonstration purposes.
Requirements:
- Docker container running: docker run -p 11235:11235 unclecode/crawl4ai:latest
- crawl4ai v0.7.5 installed: pip install crawl4ai==0.7.5
"""
import asyncio
import requests
import json
import time
from typing import Dict, Any
# Import Crawl4AI components
from crawl4ai import hooks_to_string
from crawl4ai.docker_client import Crawl4aiDockerClient
# Configuration
DOCKER_URL = "http://localhost:11235"
# DOCKER_URL = "http://localhost:11234"
TEST_URLS = [
# "https://httpbin.org/html",
"https://www.kidocode.com",
"https://quotes.toscrape.com",
]
def print_section(title: str, description: str = ""):
"""Print a formatted section header"""
print("\n" + "=" * 70)
print(f" {title}")
if description:
print(f" {description}")
print("=" * 70 + "\n")
def check_docker_service() -> bool:
"""Check if Docker service is running"""
try:
response = requests.get(f"{DOCKER_URL}/health", timeout=3)
return response.status_code == 200
except:
return False
# ============================================================================
# REUSABLE HOOK LIBRARY (NEW in v0.7.5)
# ============================================================================
async def performance_optimization_hook(page, context, **kwargs):
"""
Performance Hook: Block unnecessary resources to speed up crawling
"""
print(" [Hook] 🚀 Optimizing performance - blocking images and ads...")
# Block images
await context.route(
"**/*.{png,jpg,jpeg,gif,webp,svg,ico}",
lambda route: route.abort()
)
# Block ads and analytics
await context.route("**/analytics/*", lambda route: route.abort())
await context.route("**/ads/*", lambda route: route.abort())
await context.route("**/google-analytics.com/*", lambda route: route.abort())
print(" [Hook] ✓ Performance optimization applied")
return page
async def viewport_setup_hook(page, context, **kwargs):
"""
Viewport Hook: Set consistent viewport size for rendering
"""
print(" [Hook] 🖥️ Setting viewport to 1920x1080...")
await page.set_viewport_size({"width": 1920, "height": 1080})
print(" [Hook] ✓ Viewport configured")
return page
async def authentication_headers_hook(page, context, url, **kwargs):
"""
Headers Hook: Add custom authentication and tracking headers
"""
print(f" [Hook] 🔐 Adding custom headers for {url[:50]}...")
await page.set_extra_http_headers({
'X-Crawl4AI-Version': '0.7.5',
'X-Custom-Hook': 'function-based-demo',
'Accept-Language': 'en-US,en;q=0.9',
'User-Agent': 'Crawl4AI/0.7.5 (Educational Demo)'
})
print(" [Hook] ✓ Custom headers added")
return page
async def lazy_loading_handler_hook(page, context, **kwargs):
"""
Content Hook: Handle lazy-loaded content by scrolling
"""
print(" [Hook] 📜 Scrolling to load lazy content...")
# Scroll to bottom
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
await page.wait_for_timeout(1000)
# Scroll to middle
await page.evaluate("window.scrollTo(0, document.body.scrollHeight / 2)")
await page.wait_for_timeout(500)
# Scroll back to top
await page.evaluate("window.scrollTo(0, 0)")
await page.wait_for_timeout(500)
print(" [Hook] ✓ Lazy content loaded")
return page
async def page_analytics_hook(page, context, **kwargs):
"""
Analytics Hook: Log page metrics before extraction
"""
print(" [Hook] 📊 Collecting page analytics...")
metrics = await page.evaluate('''
() => ({
title: document.title,
images: document.images.length,
links: document.links.length,
scripts: document.scripts.length,
headings: document.querySelectorAll('h1, h2, h3').length,
paragraphs: document.querySelectorAll('p').length
})
''')
print(f" [Hook] 📈 Page: {metrics['title'][:50]}...")
print(f" Links: {metrics['links']}, Images: {metrics['images']}, "
f"Headings: {metrics['headings']}, Paragraphs: {metrics['paragraphs']}")
return page
# ============================================================================
# DEMO 1: String-Based Hooks (NEW Docker Hooks System)
# ============================================================================
def demo_1_string_based_hooks():
"""
Demonstrate string-based hooks with REST API (part of NEW Docker Hooks System)
"""
print_section(
"DEMO 1: String-Based Hooks (REST API)",
"Part of the NEW Docker Hooks System - hooks as strings"
)
# Define hooks as strings
hooks_config = {
"on_page_context_created": """
async def hook(page, context, **kwargs):
print(" [String Hook] Setting up page context...")
# Block images for performance
await context.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
await page.set_viewport_size({"width": 1920, "height": 1080})
return page
""",
"before_goto": """
async def hook(page, context, url, **kwargs):
print(f" [String Hook] Navigating to {url[:50]}...")
await page.set_extra_http_headers({
'X-Crawl4AI': 'string-based-hooks',
'X-Demo': 'v0.7.5'
})
return page
""",
"before_retrieve_html": """
async def hook(page, context, **kwargs):
print(" [String Hook] Scrolling page...")
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
await page.wait_for_timeout(1000)
return page
"""
}
# Prepare request payload
payload = {
"urls": [TEST_URLS[0]],
"hooks": {
"code": hooks_config,
"timeout": 30
},
"crawler_config": {
"cache_mode": "bypass"
}
}
print(f"🎯 Target URL: {TEST_URLS[0]}")
print(f"🔧 Configured {len(hooks_config)} string-based hooks")
print(f"📡 Sending request to Docker API...\n")
try:
start_time = time.time()
response = requests.post(f"{DOCKER_URL}/crawl", json=payload, timeout=60)
execution_time = time.time() - start_time
if response.status_code == 200:
result = response.json()
print(f"\n✅ Request successful! (took {execution_time:.2f}s)")
# Display results
if result.get('results') and result['results'][0].get('success'):
crawl_result = result['results'][0]
html_length = len(crawl_result.get('html', ''))
markdown_length = len(crawl_result.get('markdown', ''))
print(f"\n📊 Results:")
print(f" • HTML length: {html_length:,} characters")
print(f" • Markdown length: {markdown_length:,} characters")
print(f" • URL: {crawl_result.get('url')}")
# Check hooks execution
if 'hooks' in result:
hooks_info = result['hooks']
print(f"\n🎣 Hooks Execution:")
print(f" • Status: {hooks_info['status']['status']}")
print(f" • Attached hooks: {len(hooks_info['status']['attached_hooks'])}")
if 'summary' in hooks_info:
summary = hooks_info['summary']
print(f" • Total executions: {summary['total_executions']}")
print(f" • Successful: {summary['successful']}")
print(f" • Success rate: {summary['success_rate']:.1f}%")
else:
print(f"⚠️ Crawl completed but no results")
else:
print(f"❌ Request failed with status {response.status_code}")
print(f" Error: {response.text[:200]}")
except requests.exceptions.Timeout:
print("⏰ Request timed out after 60 seconds")
except Exception as e:
print(f"❌ Error: {str(e)}")
print("\n" + "" * 70)
print("✓ String-based hooks demo complete\n")
# ============================================================================
# DEMO 2: Function-Based Hooks with hooks_to_string() Utility
# ============================================================================
def demo_2_hooks_to_string_utility():
"""
Demonstrate the new hooks_to_string() utility for converting functions
"""
print_section(
"DEMO 2: hooks_to_string() Utility (NEW! ✨)",
"Convert Python functions to strings for REST API"
)
print("📦 Creating hook functions...")
print(" • performance_optimization_hook")
print(" • viewport_setup_hook")
print(" • authentication_headers_hook")
print(" • lazy_loading_handler_hook")
# Convert function objects to strings using the NEW utility
print("\n🔄 Converting functions to strings with hooks_to_string()...")
hooks_dict = {
"on_page_context_created": performance_optimization_hook,
"before_goto": authentication_headers_hook,
"before_retrieve_html": lazy_loading_handler_hook,
}
hooks_as_strings = hooks_to_string(hooks_dict)
print(f"✅ Successfully converted {len(hooks_as_strings)} functions to strings")
# Show a preview
print("\n📝 Sample converted hook (first 250 characters):")
print("" * 70)
sample_hook = list(hooks_as_strings.values())[0]
print(sample_hook[:250] + "...")
print("" * 70)
# Use the converted hooks with REST API
print("\n📡 Using converted hooks with REST API...")
payload = {
"urls": [TEST_URLS[0]],
"hooks": {
"code": hooks_as_strings,
"timeout": 30
}
}
try:
start_time = time.time()
response = requests.post(f"{DOCKER_URL}/crawl", json=payload, timeout=60)
execution_time = time.time() - start_time
if response.status_code == 200:
result = response.json()
print(f"\n✅ Request successful! (took {execution_time:.2f}s)")
if result.get('results') and result['results'][0].get('success'):
crawl_result = result['results'][0]
print(f" • HTML length: {len(crawl_result.get('html', '')):,} characters")
print(f" • Hooks executed successfully!")
else:
print(f"❌ Request failed: {response.status_code}")
except Exception as e:
print(f"❌ Error: {str(e)}")
print("\n💡 Benefits of hooks_to_string():")
print(" ✓ Write hooks as regular Python functions")
print(" ✓ Full IDE support (autocomplete, syntax highlighting)")
print(" ✓ Type checking and linting")
print(" ✓ Easy to test and debug")
print(" ✓ Reusable across projects")
print(" ✓ Works with any REST API client")
print("\n" + "" * 70)
print("✓ hooks_to_string() utility demo complete\n")
# ============================================================================
# DEMO 3: Docker Client with Automatic Conversion (RECOMMENDED! 🌟)
# ============================================================================
async def demo_3_docker_client_auto_conversion():
"""
Demonstrate Docker Client with automatic hook conversion (RECOMMENDED)
"""
print_section(
"DEMO 3: Docker Client with Auto-Conversion (RECOMMENDED! 🌟)",
"Pass function objects directly - conversion happens automatically!"
)
print("🐳 Initializing Crawl4AI Docker Client...")
client = Crawl4aiDockerClient(base_url=DOCKER_URL)
print("✅ Client ready!\n")
# Use our reusable hook library - just pass the function objects!
print("📚 Using reusable hook library:")
print(" • performance_optimization_hook")
print(" • viewport_setup_hook")
print(" • authentication_headers_hook")
print(" • lazy_loading_handler_hook")
print(" • page_analytics_hook")
print("\n🎯 Target URL: " + TEST_URLS[1])
print("🚀 Starting crawl with automatic hook conversion...\n")
try:
start_time = time.time()
# Pass function objects directly - NO manual conversion needed! ✨
results = await client.crawl(
urls=[TEST_URLS[0]],
hooks={
"on_page_context_created": performance_optimization_hook,
"before_goto": authentication_headers_hook,
"before_retrieve_html": lazy_loading_handler_hook,
"before_return_html": page_analytics_hook,
},
hooks_timeout=30
)
execution_time = time.time() - start_time
print(f"\n✅ Crawl completed! (took {execution_time:.2f}s)\n")
# Display results
if results and results.success:
result = results
print(f"📊 Results:")
print(f" • URL: {result.url}")
print(f" • Success: {result.success}")
print(f" • HTML length: {len(result.html):,} characters")
print(f" • Markdown length: {len(result.markdown):,} characters")
# Show metadata
if result.metadata:
print(f"\n📋 Metadata:")
print(f" • Title: {result.metadata.get('title', 'N/A')}")
print(f" • Description: {result.metadata.get('description', 'N/A')}")
# Show links
if result.links:
internal_count = len(result.links.get('internal', []))
external_count = len(result.links.get('external', []))
print(f"\n🔗 Links Found:")
print(f" • Internal: {internal_count}")
print(f" • External: {external_count}")
else:
print(f"⚠️ Crawl completed but no successful results")
if results:
print(f" Error: {results.error_message}")
except Exception as e:
print(f"❌ Error: {str(e)}")
import traceback
traceback.print_exc()
print("\n🌟 Why Docker Client is RECOMMENDED:")
print(" ✓ Automatic function-to-string conversion")
print(" ✓ No manual hooks_to_string() calls needed")
print(" ✓ Cleaner, more Pythonic code")
print(" ✓ Full type hints and IDE support")
print(" ✓ Built-in error handling")
print(" ✓ Async/await support")
print("\n" + "" * 70)
print("✓ Docker Client auto-conversion demo complete\n")
# ============================================================================
# DEMO 4: Advanced Use Case - Complete Hook Pipeline
# ============================================================================
async def demo_4_complete_hook_pipeline():
"""
Demonstrate a complete hook pipeline using all 8 hook points
"""
print_section(
"DEMO 4: Complete Hook Pipeline",
"Using all 8 available hook points for comprehensive control"
)
# Define all 8 hooks
async def on_browser_created_hook(browser, **kwargs):
"""Hook 1: Called after browser is created"""
print(" [Pipeline] 1/8 Browser created")
return browser
async def on_page_context_created_hook(page, context, **kwargs):
"""Hook 2: Called after page context is created"""
print(" [Pipeline] 2/8 Page context created - setting up...")
await page.set_viewport_size({"width": 1920, "height": 1080})
return page
async def on_user_agent_updated_hook(page, context, user_agent, **kwargs):
"""Hook 3: Called when user agent is updated"""
print(f" [Pipeline] 3/8 User agent updated: {user_agent[:50]}...")
return page
async def before_goto_hook(page, context, url, **kwargs):
"""Hook 4: Called before navigating to URL"""
print(f" [Pipeline] 4/8 Before navigation to: {url[:60]}...")
return page
async def after_goto_hook(page, context, url, response, **kwargs):
"""Hook 5: Called after navigation completes"""
print(f" [Pipeline] 5/8 After navigation - Status: {response.status if response else 'N/A'}")
await page.wait_for_timeout(1000)
return page
async def on_execution_started_hook(page, context, **kwargs):
"""Hook 6: Called when JavaScript execution starts"""
print(" [Pipeline] 6/8 JavaScript execution started")
return page
async def before_retrieve_html_hook(page, context, **kwargs):
"""Hook 7: Called before retrieving HTML"""
print(" [Pipeline] 7/8 Before HTML retrieval - scrolling...")
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
return page
async def before_return_html_hook(page, context, html, **kwargs):
"""Hook 8: Called before returning HTML"""
print(f" [Pipeline] 8/8 Before return - HTML length: {len(html):,} chars")
return page
print("🎯 Target URL: " + TEST_URLS[0])
print("🔧 Configured ALL 8 hook points for complete pipeline control\n")
client = Crawl4aiDockerClient(base_url=DOCKER_URL)
try:
print("🚀 Starting complete pipeline crawl...\n")
start_time = time.time()
results = await client.crawl(
urls=[TEST_URLS[0]],
hooks={
"on_browser_created": on_browser_created_hook,
"on_page_context_created": on_page_context_created_hook,
"on_user_agent_updated": on_user_agent_updated_hook,
"before_goto": before_goto_hook,
"after_goto": after_goto_hook,
"on_execution_started": on_execution_started_hook,
"before_retrieve_html": before_retrieve_html_hook,
"before_return_html": before_return_html_hook,
},
hooks_timeout=45
)
execution_time = time.time() - start_time
if results and results.success:
print(f"\n✅ Complete pipeline executed successfully! (took {execution_time:.2f}s)")
print(f" • All 8 hooks executed in sequence")
print(f" • HTML length: {len(results.html):,} characters")
else:
print(f"⚠️ Pipeline completed with warnings")
except Exception as e:
print(f"❌ Error: {str(e)}")
print("\n📚 Available Hook Points:")
print(" 1. on_browser_created - Browser initialization")
print(" 2. on_page_context_created - Page context setup")
print(" 3. on_user_agent_updated - User agent configuration")
print(" 4. before_goto - Pre-navigation setup")
print(" 5. after_goto - Post-navigation processing")
print(" 6. on_execution_started - JavaScript execution start")
print(" 7. before_retrieve_html - Pre-extraction processing")
print(" 8. before_return_html - Final HTML processing")
print("\n" + "" * 70)
print("✓ Complete hook pipeline demo complete\n")
# ============================================================================
# MAIN EXECUTION
# ============================================================================
async def main():
"""
Run all demonstrations
"""
print("\n" + "=" * 70)
print(" 🚀 Crawl4AI v0.7.5 - Docker Hooks Complete Demonstration")
print("=" * 70)
# Check Docker service
print("\n🔍 Checking Docker service status...")
if not check_docker_service():
print("❌ Docker service is not running!")
print("\n📋 To start the Docker service:")
print(" docker run -p 11235:11235 unclecode/crawl4ai:latest")
print("\nPlease start the service and run this demo again.")
return
print("✅ Docker service is running!\n")
# Run all demos
demos = [
("String-Based Hooks (REST API)", demo_1_string_based_hooks, False),
("hooks_to_string() Utility", demo_2_hooks_to_string_utility, False),
("Docker Client Auto-Conversion", demo_3_docker_client_auto_conversion, True),
# ("Complete Hook Pipeline", demo_4_complete_hook_pipeline, True),
]
for i, (name, demo_func, is_async) in enumerate(demos, 1):
print(f"\n{'🔷' * 35}")
print(f"Starting Demo {i}/{len(demos)}: {name}")
print(f"{'🔷' * 35}\n")
try:
if is_async:
await demo_func()
else:
demo_func()
print(f"✅ Demo {i} completed successfully!")
# Pause between demos (except the last one)
if i < len(demos):
print("\n⏸️ Press Enter to continue to next demo...")
# input()
except KeyboardInterrupt:
print(f"\n⏹️ Demo interrupted by user")
break
except Exception as e:
print(f"\n❌ Demo {i} failed: {str(e)}")
import traceback
traceback.print_exc()
print("\nContinuing to next demo...\n")
continue
# Final summary
print("\n" + "=" * 70)
print(" 🎉 All Demonstrations Complete!")
print("=" * 70)
print("\n📊 Summary of v0.7.5 Docker Hooks System:")
print("\n🆕 COMPLETELY NEW FEATURE in v0.7.5:")
print(" The Docker Hooks System lets you customize the crawling pipeline")
print(" with user-provided Python functions at 8 strategic points.")
print("\n✨ Three Ways to Use Docker Hooks (All NEW!):")
print(" 1. String-based - Write hooks as strings for REST API")
print(" 2. hooks_to_string() - Convert Python functions to strings")
print(" 3. Docker Client - Automatic conversion (RECOMMENDED)")
print("\n💡 Key Benefits:")
print(" ✓ Full IDE support (autocomplete, syntax highlighting)")
print(" ✓ Type checking and linting")
print(" ✓ Easy to test and debug")
print(" ✓ Reusable across projects")
print(" ✓ Complete pipeline control")
print("\n🎯 8 Hook Points Available:")
print(" • on_browser_created, on_page_context_created")
print(" • on_user_agent_updated, before_goto, after_goto")
print(" • on_execution_started, before_retrieve_html, before_return_html")
print("\n📚 Resources:")
print(" • Docs: https://docs.crawl4ai.com")
print(" • GitHub: https://github.com/unclecode/crawl4ai")
print(" • Discord: https://discord.gg/jP8KfhDhyN")
print("\n" + "=" * 70)
print(" Happy Crawling with v0.7.5! 🕷️")
print("=" * 70 + "\n")
if __name__ == "__main__":
print("\n🎬 Starting Crawl4AI v0.7.5 Docker Hooks Demonstration...")
print("Press Ctrl+C anytime to exit\n")
try:
asyncio.run(main())
except KeyboardInterrupt:
print("\n\n👋 Demo stopped by user. Thanks for exploring Crawl4AI v0.7.5!")
except Exception as e:
print(f"\n\n❌ Demo error: {str(e)}")
import traceback
traceback.print_exc()

File diff suppressed because it is too large Load Diff