Compare commits

..

4 Commits

Author SHA1 Message Date
UncleCode
146f9d415f Update README 2025-04-23 19:50:33 +08:00
UncleCode
37fd80e4b9 feat(docs): add mobile-friendly navigation menu
Implements a responsive hamburger menu for mobile devices with the following changes:
- Add new mobile_menu.js for handling mobile navigation
- Update layout.css with mobile-specific styles and animations
- Enhance README with updated geolocation example
- Register mobile_menu.js in mkdocs.yml

The mobile menu includes:
- Hamburger button animation
- Slide-out sidebar
- Backdrop overlay
- Touch-friendly navigation
- Proper event handling
2025-04-23 19:44:25 +08:00
UncleCode
949a93982e feat(docs): update documentation and disable Ask AI feature
Major documentation updates including:
- Add comprehensive code examples page
- Add video tutorial to homepage
- Update Docker deployment instructions for v0.6.0
- Temporarily disable Ask AI feature
- Add table border styling
- Update site version to v0.6.x

BREAKING CHANGE: Ask AI feature temporarily disabled pending launch
2025-04-23 19:02:39 +08:00
UncleCode
c4f5651199 chore(deps): upgrade to Python 3.12 and prepare for 0.6.0 release
- Update Docker base image to Python 3.12-slim-bookworm
- Bump version from 0.6.0rc1 to 0.6.0
- Update documentation to reflect release version changes
- Fix license specification in pyproject.toml and setup.py
- Clean up code formatting in demo_docker_api.py

BREAKING CHANGE: Base Python version upgraded from 3.10 to 3.12
2025-04-23 16:35:15 +08:00
17 changed files with 432 additions and 53 deletions

View File

@@ -1,4 +1,4 @@
FROM python:3.10-slim
FROM python:3.12-slim-bookworm AS build
# C4ai version
ARG C4AI_VER=0.6.0
@@ -22,7 +22,7 @@ ENV PYTHONFAULTHANDLER=1 \
REDIS_HOST=localhost \
REDIS_PORT=6379
ARG PYTHON_VERSION=3.10
ARG PYTHON_VERSION=3.12
ARG INSTALL_TYPE=default
ARG ENABLE_GPU=false
ARG TARGETARCH
@@ -71,6 +71,9 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get dist-upgrade -y \
&& rm -rf /var/lib/apt/lists/*
RUN if [ "$ENABLE_GPU" = "true" ] && [ "$TARGETARCH" = "amd64" ] ; then \
apt-get update && apt-get install -y --no-install-recommends \
nvidia-cuda-toolkit \

View File

@@ -269,8 +269,8 @@ The new Docker implementation includes:
```bash
# Pull and run the latest release candidate
docker pull unclecode/crawl4ai:0.6.0rc1-r1
docker run -d -p 11235:11235 --name crawl4ai --shm-size=1g unclecode/crawl4ai:0.6.0rc1-r1
docker pull unclecode/crawl4ai:0.6.0-rN # Use your favorite revision number
docker run -d -p 11235:11235 --name crawl4ai --shm-size=1g unclecode/crawl4ai:0.6.0-rN # Use your favorite revision number
# Visit the playground at http://localhost:11235/playground
```
@@ -509,15 +509,47 @@ async def test_news_crawl():
- **🌎 World-aware Crawling**: Set geolocation, language, and timezone for authentic locale-specific content:
```python
crawler_config = CrawlerRunConfig(
geo_locale={"city": "Tokyo", "lang": "ja", "timezone": "Asia/Tokyo"}
)
crun_cfg = CrawlerRunConfig(
url="https://browserleaks.com/geo", # test page that shows your location
locale="en-US", # Accept-Language & UI locale
timezone_id="America/Los_Angeles", # JS Date()/Intl timezone
geolocation=GeolocationConfig( # override GPS coords
latitude=34.0522,
longitude=-118.2437,
accuracy=10.0,
)
)
```
- **📊 Table-to-DataFrame Extraction**: Extract HTML tables directly to CSV or pandas DataFrames:
```python
crawler_config = CrawlerRunConfig(extract_tables=True)
# Access tables via result.tables or result.tables_as_dataframe
crawler = AsyncWebCrawler(config=browser_config)
await crawler.start()
try:
# Set up scraping parameters
crawl_config = CrawlerRunConfig(
table_score_threshold=8, # Strict table detection
)
# Execute market data extraction
results: List[CrawlResult] = await crawler.arun(
url="https://coinmarketcap.com/?page=1", config=crawl_config
)
# Process results
raw_df = pd.DataFrame()
for result in results:
if result.success and result.media["tables"]:
raw_df = pd.DataFrame(
result.media["tables"][0]["rows"],
columns=result.media["tables"][0]["headers"],
)
break
print(raw_df.head())
finally:
await crawler.stop()
```
- **🚀 Browser Pooling**: Pages launch hot with pre-warmed browser instances for lower latency and memory usage
@@ -537,7 +569,7 @@ async def test_news_crawl():
claude mcp add --transport sse c4ai-sse http://localhost:11235/mcp/sse
```
- **🖥️ Interactive Playground**: Test configurations and generate API requests with the built-in web interface at `/playground`
- **🖥️ Interactive Playground**: Test configurations and generate API requests with the built-in web interface at `http://localhost:11235//playground`
- **🐳 Revamped Docker Deployment**: Streamlined multi-architecture Docker image with improved resource efficiency

View File

@@ -1,3 +1,3 @@
# crawl4ai/_version.py
__version__ = "0.6.0rc1"
__version__ = "0.6.0"

View File

@@ -62,7 +62,7 @@ Our latest release candidate is `0.6.0rc1-r1`. Images are built with multi-arch
```bash
# Pull the release candidate (recommended for latest features)
docker pull unclecode/crawl4ai:0.6.0rc1-r1
docker pull unclecode/crawl4ai:0.6.0-rN # Use your favorite revision number
# Or pull the latest stable version
docker pull unclecode/crawl4ai:latest
@@ -99,7 +99,7 @@ EOL
-p 11235:11235 \
--name crawl4ai \
--shm-size=1g \
unclecode/crawl4ai:0.6.0rc1-r1
unclecode/crawl4ai:0.6.0-rN # Use your favorite revision number
```
* **With LLM support:**
@@ -110,7 +110,7 @@ EOL
--name crawl4ai \
--env-file .llm.env \
--shm-size=1g \
unclecode/crawl4ai:0.6.0rc1-r1
unclecode/crawl4ai:0.6.0-rN # Use your favorite revision number
```
> The server will be available at `http://localhost:11235`. Visit `/playground` to access the interactive testing interface.
@@ -160,7 +160,7 @@ The `docker-compose.yml` file in the project root provides a simplified approach
```bash
# Pulls and runs the release candidate from Docker Hub
# Automatically selects the correct architecture
IMAGE=unclecode/crawl4ai:0.6.0rc1-r1 docker compose up -d
IMAGE=unclecode/crawl4ai:0.6.0-rN # Use your favorite revision number docker compose up -d
```
* **Build and Run Locally:**

View File

@@ -383,29 +383,29 @@ async def main():
scroll_delay=0.2,
)
# # Execute market data extraction
# results: List[CrawlResult] = await crawler.arun(
# url="https://coinmarketcap.com/?page=1", config=crawl_config
# )
# Execute market data extraction
results: List[CrawlResult] = await crawler.arun(
url="https://coinmarketcap.com/?page=1", config=crawl_config
)
# # Process results
# raw_df = pd.DataFrame()
# for result in results:
# if result.success and result.media["tables"]:
# # Extract primary market table
# # DataFrame
# raw_df = pd.DataFrame(
# result.media["tables"][0]["rows"],
# columns=result.media["tables"][0]["headers"],
# )
# break
# Process results
raw_df = pd.DataFrame()
for result in results:
if result.success and result.media["tables"]:
# Extract primary market table
# DataFrame
raw_df = pd.DataFrame(
result.media["tables"][0]["rows"],
columns=result.media["tables"][0]["headers"],
)
break
# This is for debugging only
# ////// Remove this in production from here..
# Save raw data for debugging
# raw_df.to_csv(f"{__current_dir__}/tmp/raw_crypto_data.csv", index=False)
# print("🔍 Raw data saved to 'raw_crypto_data.csv'")
raw_df.to_csv(f"{__current_dir__}/tmp/raw_crypto_data.csv", index=False)
print("🔍 Raw data saved to 'raw_crypto_data.csv'")
# Read from file for debugging
raw_df = pd.read_csv(f"{__current_dir__}/tmp/raw_crypto_data.csv")

View File

@@ -398,7 +398,6 @@ async def demo_param_js_execution(client: httpx.AsyncClient):
elif results:
console.print("[yellow]JS Execution Result not found in response.[/]")
async def demo_param_screenshot(client: httpx.AsyncClient):
payload = {
"urls": [SIMPLE_URL],
@@ -430,8 +429,6 @@ async def demo_param_ssl_fetch(client: httpx.AsyncClient):
elif results:
console.print("[yellow]SSL Certificate data not found in response.[/]")
async def demo_param_proxy(client: httpx.AsyncClient):
proxy_params_list = load_proxies_from_env() # Get the list of parameter dicts
if not proxy_params_list:

View File

@@ -361,8 +361,10 @@ A code snippet: \`crawler.run()\`. Check the [quickstart](/core/quickstart).`;
chatMessages.innerHTML = ""; // Start with clean slate for query
if (!isFromQuery) {
// Show welcome only if manually started
// chatMessages.innerHTML =
// '<div class="message ai-message welcome-message">Started a new chat! Ask me anything about Crawl4AI.</div>';
chatMessages.innerHTML =
'<div class="message ai-message welcome-message">Started a new chat! Ask me anything about Crawl4AI.</div>';
'<div class="message ai-message welcome-message">We will launch this feature very soon.</div>';
}
addCitations([]); // Clear citations
updateCitationsDisplay(); // Clear UI
@@ -504,8 +506,10 @@ A code snippet: \`crawler.run()\`. Check the [quickstart](/core/quickstart).`;
addMessageToChat(message, false);
});
if (messages.length === 0) {
// chatMessages.innerHTML =
// '<div class="message ai-message welcome-message">Chat history loaded. Ask a question!</div>';
chatMessages.innerHTML =
'<div class="message ai-message welcome-message">Chat history loaded. Ask a question!</div>';
'<div class="message ai-message welcome-message">We will launch this feature very soon.</div>';
}
// Scroll to bottom after loading messages
scrollToBottom();

View File

@@ -36,7 +36,7 @@
<div id="chat-input-area">
<!-- Loading indicator for general waiting (optional) -->
<!-- <div class="loading-indicator" style="display: none;">Thinking...</div> -->
<textarea id="chat-input" placeholder="Ask about Crawl4AI..." rows="2"></textarea>
<textarea id="chat-input" placeholder="We will roll out this feature very soon." rows="2" disabled></textarea>
<button id="send-button">Send</button>
</div>
</main>

View File

@@ -64,7 +64,7 @@ body {
/* Apply side padding within the centered block */
padding-left: calc(var(--global-space) * 2);
padding-right: calc(var(--global-space) * 2);
/* Add margin-left to clear the fixed sidebar */
/* Add margin-left to clear the fixed sidebar - ONLY ON DESKTOP */
margin-left: var(--sidebar-width);
}
@@ -81,7 +81,7 @@ body {
z-index: 900;
padding: 1em calc(var(--global-space) * 2);
padding-bottom: 2em;
/* transition: left var(--layout-transition-speed) ease-in-out; */
transition: left var(--layout-transition-speed) ease-in-out;
}
/* --- 2. Main Content Area (Within Centered Grid) --- */
@@ -188,21 +188,133 @@ footer {
}
}
/* --- Mobile Menu Styles --- */
.mobile-menu-toggle {
display: none; /* Hidden by default, shown in mobile */
background: none;
border: none;
padding: 10px;
cursor: pointer;
z-index: 1200;
margin-right: 10px;
position: absolute;
left: 10px;
top: 50%;
transform: translateY(-50%);
/* Make sure it doesn't get moved */
min-width: 30px;
min-height: 30px;
}
.hamburger-line {
display: block;
width: 22px;
height: 2px;
margin: 5px 0;
background-color: var(--font-color);
transition: transform 0.3s, opacity 0.3s;
}
/* Hamburger animation */
.mobile-menu-toggle.is-active .hamburger-line:nth-child(1) {
transform: translateY(7px) rotate(45deg);
}
.mobile-menu-toggle.is-active .hamburger-line:nth-child(2) {
opacity: 0;
}
.mobile-menu-toggle.is-active .hamburger-line:nth-child(3) {
transform: translateY(-7px) rotate(-45deg);
}
.mobile-menu-close {
display: none; /* Hidden by default, shown in mobile */
position: absolute;
top: 10px;
right: 10px;
background: none;
border: none;
color: var(--font-color);
font-size: 24px;
cursor: pointer;
z-index: 1200;
padding: 5px 10px;
}
.mobile-menu-backdrop {
position: fixed;
top: 0;
left: 0;
right: 0;
bottom: 0;
background-color: rgba(0, 0, 0, 0.7);
z-index: 1050;
}
/* --- Small screens: Hide left sidebar, full width content & footer --- */
@media screen and (max-width: 768px) {
/* Hide the terminal-menu from theme */
.terminal-menu {
display: none !important;
}
/* Add padding to site name to prevent hamburger overlap */
.terminal-mkdocs-site-name,
.terminal-logo a,
.terminal-nav .logo {
padding-left: 40px !important;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
/* Show mobile menu toggle button */
.mobile-menu-toggle {
display: block;
}
/* Show mobile menu close button */
.mobile-menu-close {
display: block;
}
#terminal-mkdocs-side-panel {
left: calc(-1 * var(--sidebar-width));
left: -100%; /* Hide completely off-screen */
z-index: 1100;
box-shadow: 2px 0 10px rgba(0,0,0,0.3);
top: 0; /* Start from top edge */
height: 100%; /* Full height */
transition: left 0.3s ease-in-out;
padding-top: 50px; /* Space for close button */
overflow-y: auto;
width: 85%; /* Wider on mobile */
max-width: 320px; /* Maximum width */
background-color: var(--background-color); /* Ensure solid background */
}
#terminal-mkdocs-side-panel.sidebar-visible {
left: 0;
}
/* Make navigation links more touch-friendly */
#terminal-mkdocs-side-panel a {
padding: 6px 15px;
display: block;
/* No border as requested */
}
#terminal-mkdocs-side-panel ul {
padding-left: 0;
}
#terminal-mkdocs-side-panel ul ul a {
padding-left: 10px;
}
.terminal-mkdocs-main-grid {
/* Grid now takes full width (minus body padding) */
margin-left: 0; /* Override sidebar margin */
margin-left: 0 !important; /* Override sidebar margin with !important */
margin-right: 0; /* Override auto margin */
max-width: 100%; /* Allow full width */
padding-left: var(--global-space); /* Reduce padding */
@@ -224,7 +336,6 @@ footer {
text-align: center;
gap: 0.5em;
}
/* Remember JS for toggle button & overlay */
}

View File

@@ -0,0 +1,106 @@
// mobile_menu.js - Hamburger menu for mobile view
document.addEventListener('DOMContentLoaded', () => {
// Get references to key elements
const sidePanel = document.getElementById('terminal-mkdocs-side-panel');
const mainHeader = document.querySelector('.terminal .container:first-child');
if (!sidePanel || !mainHeader) {
console.warn('Mobile menu: Required elements not found');
return;
}
// Force hide sidebar on mobile
const checkMobile = () => {
if (window.innerWidth <= 768) {
// Force with !important-like priority
sidePanel.style.setProperty('left', '-100%', 'important');
// Also hide terminal-menu from the theme
const terminalMenu = document.querySelector('.terminal-menu');
if (terminalMenu) {
terminalMenu.style.setProperty('display', 'none', 'important');
}
} else {
sidePanel.style.removeProperty('left');
// Restore terminal-menu if it exists
const terminalMenu = document.querySelector('.terminal-menu');
if (terminalMenu) {
terminalMenu.style.removeProperty('display');
}
}
};
// Run on initial load
checkMobile();
// Also run on resize
window.addEventListener('resize', checkMobile);
// Create hamburger button
const hamburgerBtn = document.createElement('button');
hamburgerBtn.className = 'mobile-menu-toggle';
hamburgerBtn.setAttribute('aria-label', 'Toggle navigation menu');
hamburgerBtn.innerHTML = `
<span class="hamburger-line"></span>
<span class="hamburger-line"></span>
<span class="hamburger-line"></span>
`;
// Create backdrop overlay
const menuBackdrop = document.createElement('div');
menuBackdrop.className = 'mobile-menu-backdrop';
menuBackdrop.style.display = 'none';
document.body.appendChild(menuBackdrop);
// Make sure it's properly hidden on page load
if (window.innerWidth <= 768) {
menuBackdrop.style.display = 'none';
}
// Insert hamburger button into header
mainHeader.insertBefore(hamburgerBtn, mainHeader.firstChild);
// Add menu close button to side panel
const closeBtn = document.createElement('button');
closeBtn.className = 'mobile-menu-close';
closeBtn.setAttribute('aria-label', 'Close navigation menu');
closeBtn.innerHTML = `&times;`;
sidePanel.insertBefore(closeBtn, sidePanel.firstChild);
// Toggle function
function toggleMobileMenu() {
const isOpen = sidePanel.classList.toggle('sidebar-visible');
// Toggle backdrop
menuBackdrop.style.display = isOpen ? 'block' : 'none';
// Toggle aria-expanded
hamburgerBtn.setAttribute('aria-expanded', isOpen ? 'true' : 'false');
// Toggle hamburger animation class
hamburgerBtn.classList.toggle('is-active');
// Force sidebar visibility setting
if (isOpen) {
sidePanel.style.setProperty('left', '0', 'important');
} else {
sidePanel.style.setProperty('left', '-100%', 'important');
}
// Prevent body scrolling when menu is open
document.body.style.overflow = isOpen ? 'hidden' : '';
}
// Event listeners
hamburgerBtn.addEventListener('click', toggleMobileMenu);
closeBtn.addEventListener('click', toggleMobileMenu);
menuBackdrop.addEventListener('click', toggleMobileMenu);
// Close menu on window resize to desktop
window.addEventListener('resize', () => {
if (window.innerWidth > 768 && sidePanel.classList.contains('sidebar-visible')) {
toggleMobileMenu();
}
});
console.log('Mobile menu initialized');
});

View File

@@ -268,3 +268,6 @@ div.badges a > img {
}
table td, table th {
border: 1px solid var(--code-bg-color) !important;
}

View File

@@ -62,7 +62,7 @@ Our latest release candidate is `0.6.0rc1-r2`. Images are built with multi-arch
```bash
# Pull the release candidate (recommended for latest features)
docker pull unclecode/crawl4ai:0.6.0rc1-r2
docker pull unclecode/crawl4ai:0.6.0-r1
# Or pull the latest stable version
docker pull unclecode/crawl4ai:latest
@@ -99,7 +99,7 @@ EOL
-p 11235:11235 \
--name crawl4ai \
--shm-size=1g \
unclecode/crawl4ai:0.6.0rc1-r2
unclecode/crawl4ai:latest
```
* **With LLM support:**
@@ -110,7 +110,7 @@ EOL
--name crawl4ai \
--env-file .llm.env \
--shm-size=1g \
unclecode/crawl4ai:0.6.0rc1-r2
unclecode/crawl4ai:latest
```
> The server will be available at `http://localhost:11235`. Visit `/playground` to access the interactive testing interface.
@@ -160,7 +160,7 @@ The `docker-compose.yml` file in the project root provides a simplified approach
```bash
# Pulls and runs the release candidate from Docker Hub
# Automatically selects the correct architecture
IMAGE=unclecode/crawl4ai:0.6.0rc1-r2 docker compose up -d
IMAGE=unclecode/crawl4ai:latest docker compose up -d
```
* **Build and Run Locally:**

115
docs/md_v2/core/examples.md Normal file
View File

@@ -0,0 +1,115 @@
# Code Examples
This page provides a comprehensive list of example scripts that demonstrate various features and capabilities of Crawl4AI. Each example is designed to showcase specific functionality, making it easier for you to understand how to implement these features in your own projects.
## Getting Started Examples
| Example | Description | Link |
|---------|-------------|------|
| Hello World | A simple introductory example demonstrating basic usage of AsyncWebCrawler with JavaScript execution and content filtering. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/hello_world.py) |
| Quickstart | A comprehensive collection of examples showcasing various features including basic crawling, content cleaning, link analysis, JavaScript execution, CSS selectors, media handling, custom hooks, proxy configuration, screenshots, and multiple extraction strategies. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/quickstart.py) |
| Quickstart Set 1 | Basic examples for getting started with Crawl4AI. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/quickstart_examples_set_1.py) |
| Quickstart Set 2 | More advanced examples for working with Crawl4AI. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/quickstart_examples_set_2.py) |
## Browser & Crawling Features
| Example | Description | Link |
|---------|-------------|------|
| Built-in Browser | Demonstrates how to use the built-in browser capabilities. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/builtin_browser_example.py) |
| Browser Optimization | Focuses on browser performance optimization techniques. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/browser_optimization_example.py) |
| arun vs arun_many | Compares the `arun` and `arun_many` methods for single vs. multiple URL crawling. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/arun_vs_arun_many.py) |
| Multiple URLs | Shows how to crawl multiple URLs asynchronously. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/async_webcrawler_multiple_urls_example.py) |
| Page Interaction | Guide on interacting with dynamic elements through clicks. | [View Guide](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/tutorial_dynamic_clicks.md) |
| Crawler Monitor | Shows how to monitor the crawler's activities and status. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/crawler_monitor_example.py) |
| Full Page Screenshot & PDF | Guide on capturing full-page screenshots and PDFs from massive webpages. | [View Guide](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/full_page_screenshot_and_pdf_export.md) |
## Advanced Crawling & Deep Crawling
| Example | Description | Link |
|---------|-------------|------|
| Deep Crawling | An extensive tutorial on deep crawling capabilities, demonstrating BFS and BestFirst strategies, stream vs. non-stream execution, filters, scorers, and advanced configurations. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/deepcrawl_example.py) |
| Dispatcher | Shows how to use the crawl dispatcher for advanced workload management. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/dispatcher_example.py) |
| Storage State | Tutorial on managing browser storage state for persistence. | [View Guide](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/storage_state_tutorial.md) |
| Network Console Capture | Demonstrates how to capture and analyze network requests and console logs. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/network_console_capture_example.py) |
## Extraction Strategies
| Example | Description | Link |
|---------|-------------|------|
| Extraction Strategies | Demonstrates different extraction strategies with various input formats (markdown, HTML, fit_markdown) and JSON-based extractors (CSS and XPath). | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/extraction_strategies_examples.py) |
| Scraping Strategies | Compares the performance of different scraping strategies. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/scraping_strategies_performance.py) |
| LLM Extraction | Demonstrates LLM-based extraction specifically for OpenAI pricing data. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/llm_extraction_openai_pricing.py) |
| LLM Markdown | Shows how to use LLMs to generate markdown from crawled content. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/llm_markdown_generator.py) |
| Summarize Page | Shows how to summarize web page content. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/summarize_page.py) |
## E-commerce & Specialized Crawling
| Example | Description | Link |
|---------|-------------|------|
| Amazon Product Extraction | Demonstrates how to extract structured product data from Amazon search results using CSS selectors. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/amazon_product_extraction_direct_url.py) |
| Amazon with Hooks | Shows how to use hooks with Amazon product extraction. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/amazon_product_extraction_using_hooks.py) |
| Amazon with JavaScript | Demonstrates using custom JavaScript for Amazon product extraction. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/amazon_product_extraction_using_use_javascript.py) |
| Crypto Analysis | Demonstrates how to crawl and analyze cryptocurrency data. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/crypto_analysis_example.py) |
| SERP API | Demonstrates using Crawl4AI with search engine result pages. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/serp_api_project_11_feb.py) |
## Customization & Security
| Example | Description | Link |
|---------|-------------|------|
| Hooks | Illustrates how to use hooks at different stages of the crawling process for advanced customization. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/hooks_example.py) |
| Identity-Based Browsing | Illustrates identity-based browsing configurations for authentic browsing experiences. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/identity_based_browsing.py) |
| Proxy Rotation | Shows how to use proxy rotation for web scraping and avoiding IP blocks. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/proxy_rotation_demo.py) |
| SSL Certificate | Illustrates SSL certificate handling and verification. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/ssl_example.py) |
| Language Support | Shows how to handle different languages during crawling. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/language_support_example.py) |
| Geolocation | Demonstrates how to use geolocation features. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/use_geo_location.py) |
## Docker & Deployment
| Example | Description | Link |
|---------|-------------|------|
| Docker Config | Demonstrates how to create and use Docker configuration objects. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/docker_config_obj.py) |
| Docker Basic | A test suite for Docker deployment, showcasing various functionalities through the Docker API. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/docker_example.py) |
| Docker REST API | Shows how to interact with Crawl4AI Docker using REST API calls. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/docker_python_rest_api.py) |
| Docker SDK | Demonstrates using the Python SDK for Crawl4AI Docker. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/docker_python_sdk.py) |
## Application Examples
| Example | Description | Link |
|---------|-------------|------|
| Research Assistant | Demonstrates how to build a research assistant using Crawl4AI. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/research_assistant.py) |
| REST Call | Shows how to make REST API calls with Crawl4AI. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/rest_call.py) |
| Chainlit Integration | Shows how to integrate Crawl4AI with Chainlit. | [View Guide](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/chainlit.md) |
| Crawl4AI vs FireCrawl | Compares Crawl4AI with the FireCrawl library. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/crawlai_vs_firecrawl.py) |
## Content Generation & Markdown
| Example | Description | Link |
|---------|-------------|------|
| Content Source | Demonstrates how to work with different content sources in markdown generation. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/markdown/content_source_example.py) |
| Content Source (Short) | A simplified version of content source usage. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/markdown/content_source_short_example.py) |
| Built-in Browser Guide | Guide for using the built-in browser capabilities. | [View Guide](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/README_BUILTIN_BROWSER.md) |
## Running the Examples
To run any of these examples, you'll need to have Crawl4AI installed:
```bash
pip install crawl4ai
```
Then, you can run an example script like this:
```bash
python -m docs.examples.hello_world
```
For examples that require additional dependencies or environment variables, refer to the comments at the top of each file.
Some examples may require:
- API keys (for LLM-based examples)
- Docker setup (for Docker-related examples)
- Additional dependencies (specified in the example files)
## Contributing New Examples
If you've created an interesting example that demonstrates a unique use case or feature of Crawl4AI, we encourage you to contribute it to our examples collection. Please see our [contribution guidelines](https://github.com/unclecode/crawl4ai/blob/main/CONTRIBUTORS.md) for more information.

View File

@@ -72,6 +72,14 @@ asyncio.run(main())
---
## Video Tutorial
<div align="center">
<iframe width="560" height="315" src="https://www.youtube.com/embed/xo3qK6Hg9AA?start=15" title="Crawl4AI Tutorial" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>
---
## What Does Crawl4AI Do?
Crawl4AI is a feature-rich crawler and scraper that aims to:

View File

@@ -1,4 +1,4 @@
site_name: Crawl4AI Documentation (v0.5.x)
site_name: Crawl4AI Documentation (v0.6.x)
site_description: 🚀🤖 Crawl4AI, Open-source LLM-Friendly Web Crawler & Scraper
site_url: https://docs.crawl4ai.com
repo_url: https://github.com/unclecode/crawl4ai
@@ -9,6 +9,7 @@ nav:
- Home: 'index.md'
- "Ask AI": "core/ask-ai.md"
- "Quick Start": "core/quickstart.md"
- "Code Examples": "core/examples.md"
- Setup & Installation:
- "Installation": "core/installation.md"
- "Docker Deployment": "core/docker-deployment.md"
@@ -90,4 +91,5 @@ extra_javascript:
- assets/github_stats.js
- assets/selection_ask_ai.js
- assets/copy_code.js
- assets/floating_ask_ai_button.js
- assets/floating_ask_ai_button.js
- assets/mobile_menu.js

View File

@@ -8,7 +8,7 @@ dynamic = ["version"]
description = "🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & scraper"
readme = "README.md"
requires-python = ">=3.9"
license = {text = "Apache-2.0"}
license = "Apache-2.0"
authors = [
{name = "Unclecode", email = "unclecode@kidocode.com"}
]
@@ -48,7 +48,6 @@ dependencies = [
classifiers = [
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",

View File

@@ -49,13 +49,12 @@ setup(
url="https://github.com/unclecode/crawl4ai",
author="Unclecode",
author_email="unclecode@kidocode.com",
license="MIT",
license="Apache-2.0",
packages=find_packages(),
package_data={"crawl4ai": ["js_snippet/*.js"]},
classifiers=[
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",