Compare commits
4 Commits
vr0.6.0rc1
...
vr0.6.0
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
146f9d415f | ||
|
|
37fd80e4b9 | ||
|
|
949a93982e | ||
|
|
c4f5651199 |
@@ -1,4 +1,4 @@
|
||||
FROM python:3.10-slim
|
||||
FROM python:3.12-slim-bookworm AS build
|
||||
|
||||
# C4ai version
|
||||
ARG C4AI_VER=0.6.0
|
||||
@@ -22,7 +22,7 @@ ENV PYTHONFAULTHANDLER=1 \
|
||||
REDIS_HOST=localhost \
|
||||
REDIS_PORT=6379
|
||||
|
||||
ARG PYTHON_VERSION=3.10
|
||||
ARG PYTHON_VERSION=3.12
|
||||
ARG INSTALL_TYPE=default
|
||||
ARG ENABLE_GPU=false
|
||||
ARG TARGETARCH
|
||||
@@ -71,6 +71,9 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
&& apt-get clean \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
RUN apt-get update && apt-get dist-upgrade -y \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
RUN if [ "$ENABLE_GPU" = "true" ] && [ "$TARGETARCH" = "amd64" ] ; then \
|
||||
apt-get update && apt-get install -y --no-install-recommends \
|
||||
nvidia-cuda-toolkit \
|
||||
|
||||
48
README.md
48
README.md
@@ -269,8 +269,8 @@ The new Docker implementation includes:
|
||||
|
||||
```bash
|
||||
# Pull and run the latest release candidate
|
||||
docker pull unclecode/crawl4ai:0.6.0rc1-r1
|
||||
docker run -d -p 11235:11235 --name crawl4ai --shm-size=1g unclecode/crawl4ai:0.6.0rc1-r1
|
||||
docker pull unclecode/crawl4ai:0.6.0-rN # Use your favorite revision number
|
||||
docker run -d -p 11235:11235 --name crawl4ai --shm-size=1g unclecode/crawl4ai:0.6.0-rN # Use your favorite revision number
|
||||
|
||||
# Visit the playground at http://localhost:11235/playground
|
||||
```
|
||||
@@ -509,15 +509,47 @@ async def test_news_crawl():
|
||||
|
||||
- **🌎 World-aware Crawling**: Set geolocation, language, and timezone for authentic locale-specific content:
|
||||
```python
|
||||
crawler_config = CrawlerRunConfig(
|
||||
geo_locale={"city": "Tokyo", "lang": "ja", "timezone": "Asia/Tokyo"}
|
||||
)
|
||||
crun_cfg = CrawlerRunConfig(
|
||||
url="https://browserleaks.com/geo", # test page that shows your location
|
||||
locale="en-US", # Accept-Language & UI locale
|
||||
timezone_id="America/Los_Angeles", # JS Date()/Intl timezone
|
||||
geolocation=GeolocationConfig( # override GPS coords
|
||||
latitude=34.0522,
|
||||
longitude=-118.2437,
|
||||
accuracy=10.0,
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
- **📊 Table-to-DataFrame Extraction**: Extract HTML tables directly to CSV or pandas DataFrames:
|
||||
```python
|
||||
crawler_config = CrawlerRunConfig(extract_tables=True)
|
||||
# Access tables via result.tables or result.tables_as_dataframe
|
||||
crawler = AsyncWebCrawler(config=browser_config)
|
||||
await crawler.start()
|
||||
|
||||
try:
|
||||
# Set up scraping parameters
|
||||
crawl_config = CrawlerRunConfig(
|
||||
table_score_threshold=8, # Strict table detection
|
||||
)
|
||||
|
||||
# Execute market data extraction
|
||||
results: List[CrawlResult] = await crawler.arun(
|
||||
url="https://coinmarketcap.com/?page=1", config=crawl_config
|
||||
)
|
||||
|
||||
# Process results
|
||||
raw_df = pd.DataFrame()
|
||||
for result in results:
|
||||
if result.success and result.media["tables"]:
|
||||
raw_df = pd.DataFrame(
|
||||
result.media["tables"][0]["rows"],
|
||||
columns=result.media["tables"][0]["headers"],
|
||||
)
|
||||
break
|
||||
print(raw_df.head())
|
||||
|
||||
finally:
|
||||
await crawler.stop()
|
||||
```
|
||||
|
||||
- **🚀 Browser Pooling**: Pages launch hot with pre-warmed browser instances for lower latency and memory usage
|
||||
@@ -537,7 +569,7 @@ async def test_news_crawl():
|
||||
claude mcp add --transport sse c4ai-sse http://localhost:11235/mcp/sse
|
||||
```
|
||||
|
||||
- **🖥️ Interactive Playground**: Test configurations and generate API requests with the built-in web interface at `/playground`
|
||||
- **🖥️ Interactive Playground**: Test configurations and generate API requests with the built-in web interface at `http://localhost:11235//playground`
|
||||
|
||||
- **🐳 Revamped Docker Deployment**: Streamlined multi-architecture Docker image with improved resource efficiency
|
||||
|
||||
|
||||
@@ -1,3 +1,3 @@
|
||||
# crawl4ai/_version.py
|
||||
__version__ = "0.6.0rc1"
|
||||
__version__ = "0.6.0"
|
||||
|
||||
|
||||
@@ -62,7 +62,7 @@ Our latest release candidate is `0.6.0rc1-r1`. Images are built with multi-arch
|
||||
|
||||
```bash
|
||||
# Pull the release candidate (recommended for latest features)
|
||||
docker pull unclecode/crawl4ai:0.6.0rc1-r1
|
||||
docker pull unclecode/crawl4ai:0.6.0-rN # Use your favorite revision number
|
||||
|
||||
# Or pull the latest stable version
|
||||
docker pull unclecode/crawl4ai:latest
|
||||
@@ -99,7 +99,7 @@ EOL
|
||||
-p 11235:11235 \
|
||||
--name crawl4ai \
|
||||
--shm-size=1g \
|
||||
unclecode/crawl4ai:0.6.0rc1-r1
|
||||
unclecode/crawl4ai:0.6.0-rN # Use your favorite revision number
|
||||
```
|
||||
|
||||
* **With LLM support:**
|
||||
@@ -110,7 +110,7 @@ EOL
|
||||
--name crawl4ai \
|
||||
--env-file .llm.env \
|
||||
--shm-size=1g \
|
||||
unclecode/crawl4ai:0.6.0rc1-r1
|
||||
unclecode/crawl4ai:0.6.0-rN # Use your favorite revision number
|
||||
```
|
||||
|
||||
> The server will be available at `http://localhost:11235`. Visit `/playground` to access the interactive testing interface.
|
||||
@@ -160,7 +160,7 @@ The `docker-compose.yml` file in the project root provides a simplified approach
|
||||
```bash
|
||||
# Pulls and runs the release candidate from Docker Hub
|
||||
# Automatically selects the correct architecture
|
||||
IMAGE=unclecode/crawl4ai:0.6.0rc1-r1 docker compose up -d
|
||||
IMAGE=unclecode/crawl4ai:0.6.0-rN # Use your favorite revision number docker compose up -d
|
||||
```
|
||||
|
||||
* **Build and Run Locally:**
|
||||
|
||||
@@ -383,29 +383,29 @@ async def main():
|
||||
scroll_delay=0.2,
|
||||
)
|
||||
|
||||
# # Execute market data extraction
|
||||
# results: List[CrawlResult] = await crawler.arun(
|
||||
# url="https://coinmarketcap.com/?page=1", config=crawl_config
|
||||
# )
|
||||
# Execute market data extraction
|
||||
results: List[CrawlResult] = await crawler.arun(
|
||||
url="https://coinmarketcap.com/?page=1", config=crawl_config
|
||||
)
|
||||
|
||||
# # Process results
|
||||
# raw_df = pd.DataFrame()
|
||||
# for result in results:
|
||||
# if result.success and result.media["tables"]:
|
||||
# # Extract primary market table
|
||||
# # DataFrame
|
||||
# raw_df = pd.DataFrame(
|
||||
# result.media["tables"][0]["rows"],
|
||||
# columns=result.media["tables"][0]["headers"],
|
||||
# )
|
||||
# break
|
||||
# Process results
|
||||
raw_df = pd.DataFrame()
|
||||
for result in results:
|
||||
if result.success and result.media["tables"]:
|
||||
# Extract primary market table
|
||||
# DataFrame
|
||||
raw_df = pd.DataFrame(
|
||||
result.media["tables"][0]["rows"],
|
||||
columns=result.media["tables"][0]["headers"],
|
||||
)
|
||||
break
|
||||
|
||||
|
||||
# This is for debugging only
|
||||
# ////// Remove this in production from here..
|
||||
# Save raw data for debugging
|
||||
# raw_df.to_csv(f"{__current_dir__}/tmp/raw_crypto_data.csv", index=False)
|
||||
# print("🔍 Raw data saved to 'raw_crypto_data.csv'")
|
||||
raw_df.to_csv(f"{__current_dir__}/tmp/raw_crypto_data.csv", index=False)
|
||||
print("🔍 Raw data saved to 'raw_crypto_data.csv'")
|
||||
|
||||
# Read from file for debugging
|
||||
raw_df = pd.read_csv(f"{__current_dir__}/tmp/raw_crypto_data.csv")
|
||||
|
||||
@@ -398,7 +398,6 @@ async def demo_param_js_execution(client: httpx.AsyncClient):
|
||||
elif results:
|
||||
console.print("[yellow]JS Execution Result not found in response.[/]")
|
||||
|
||||
|
||||
async def demo_param_screenshot(client: httpx.AsyncClient):
|
||||
payload = {
|
||||
"urls": [SIMPLE_URL],
|
||||
@@ -430,8 +429,6 @@ async def demo_param_ssl_fetch(client: httpx.AsyncClient):
|
||||
elif results:
|
||||
console.print("[yellow]SSL Certificate data not found in response.[/]")
|
||||
|
||||
|
||||
|
||||
async def demo_param_proxy(client: httpx.AsyncClient):
|
||||
proxy_params_list = load_proxies_from_env() # Get the list of parameter dicts
|
||||
if not proxy_params_list:
|
||||
|
||||
@@ -361,8 +361,10 @@ A code snippet: \`crawler.run()\`. Check the [quickstart](/core/quickstart).`;
|
||||
chatMessages.innerHTML = ""; // Start with clean slate for query
|
||||
if (!isFromQuery) {
|
||||
// Show welcome only if manually started
|
||||
// chatMessages.innerHTML =
|
||||
// '<div class="message ai-message welcome-message">Started a new chat! Ask me anything about Crawl4AI.</div>';
|
||||
chatMessages.innerHTML =
|
||||
'<div class="message ai-message welcome-message">Started a new chat! Ask me anything about Crawl4AI.</div>';
|
||||
'<div class="message ai-message welcome-message">We will launch this feature very soon.</div>';
|
||||
}
|
||||
addCitations([]); // Clear citations
|
||||
updateCitationsDisplay(); // Clear UI
|
||||
@@ -504,8 +506,10 @@ A code snippet: \`crawler.run()\`. Check the [quickstart](/core/quickstart).`;
|
||||
addMessageToChat(message, false);
|
||||
});
|
||||
if (messages.length === 0) {
|
||||
// chatMessages.innerHTML =
|
||||
// '<div class="message ai-message welcome-message">Chat history loaded. Ask a question!</div>';
|
||||
chatMessages.innerHTML =
|
||||
'<div class="message ai-message welcome-message">Chat history loaded. Ask a question!</div>';
|
||||
'<div class="message ai-message welcome-message">We will launch this feature very soon.</div>';
|
||||
}
|
||||
// Scroll to bottom after loading messages
|
||||
scrollToBottom();
|
||||
|
||||
@@ -36,7 +36,7 @@
|
||||
<div id="chat-input-area">
|
||||
<!-- Loading indicator for general waiting (optional) -->
|
||||
<!-- <div class="loading-indicator" style="display: none;">Thinking...</div> -->
|
||||
<textarea id="chat-input" placeholder="Ask about Crawl4AI..." rows="2"></textarea>
|
||||
<textarea id="chat-input" placeholder="We will roll out this feature very soon." rows="2" disabled></textarea>
|
||||
<button id="send-button">Send</button>
|
||||
</div>
|
||||
</main>
|
||||
|
||||
@@ -64,7 +64,7 @@ body {
|
||||
/* Apply side padding within the centered block */
|
||||
padding-left: calc(var(--global-space) * 2);
|
||||
padding-right: calc(var(--global-space) * 2);
|
||||
/* Add margin-left to clear the fixed sidebar */
|
||||
/* Add margin-left to clear the fixed sidebar - ONLY ON DESKTOP */
|
||||
margin-left: var(--sidebar-width);
|
||||
}
|
||||
|
||||
@@ -81,7 +81,7 @@ body {
|
||||
z-index: 900;
|
||||
padding: 1em calc(var(--global-space) * 2);
|
||||
padding-bottom: 2em;
|
||||
/* transition: left var(--layout-transition-speed) ease-in-out; */
|
||||
transition: left var(--layout-transition-speed) ease-in-out;
|
||||
}
|
||||
|
||||
/* --- 2. Main Content Area (Within Centered Grid) --- */
|
||||
@@ -188,21 +188,133 @@ footer {
|
||||
}
|
||||
}
|
||||
|
||||
/* --- Mobile Menu Styles --- */
|
||||
.mobile-menu-toggle {
|
||||
display: none; /* Hidden by default, shown in mobile */
|
||||
background: none;
|
||||
border: none;
|
||||
padding: 10px;
|
||||
cursor: pointer;
|
||||
z-index: 1200;
|
||||
margin-right: 10px;
|
||||
position: absolute;
|
||||
left: 10px;
|
||||
top: 50%;
|
||||
transform: translateY(-50%);
|
||||
/* Make sure it doesn't get moved */
|
||||
min-width: 30px;
|
||||
min-height: 30px;
|
||||
}
|
||||
|
||||
.hamburger-line {
|
||||
display: block;
|
||||
width: 22px;
|
||||
height: 2px;
|
||||
margin: 5px 0;
|
||||
background-color: var(--font-color);
|
||||
transition: transform 0.3s, opacity 0.3s;
|
||||
}
|
||||
|
||||
/* Hamburger animation */
|
||||
.mobile-menu-toggle.is-active .hamburger-line:nth-child(1) {
|
||||
transform: translateY(7px) rotate(45deg);
|
||||
}
|
||||
|
||||
.mobile-menu-toggle.is-active .hamburger-line:nth-child(2) {
|
||||
opacity: 0;
|
||||
}
|
||||
|
||||
.mobile-menu-toggle.is-active .hamburger-line:nth-child(3) {
|
||||
transform: translateY(-7px) rotate(-45deg);
|
||||
}
|
||||
|
||||
.mobile-menu-close {
|
||||
display: none; /* Hidden by default, shown in mobile */
|
||||
position: absolute;
|
||||
top: 10px;
|
||||
right: 10px;
|
||||
background: none;
|
||||
border: none;
|
||||
color: var(--font-color);
|
||||
font-size: 24px;
|
||||
cursor: pointer;
|
||||
z-index: 1200;
|
||||
padding: 5px 10px;
|
||||
}
|
||||
|
||||
.mobile-menu-backdrop {
|
||||
position: fixed;
|
||||
top: 0;
|
||||
left: 0;
|
||||
right: 0;
|
||||
bottom: 0;
|
||||
background-color: rgba(0, 0, 0, 0.7);
|
||||
z-index: 1050;
|
||||
}
|
||||
|
||||
/* --- Small screens: Hide left sidebar, full width content & footer --- */
|
||||
@media screen and (max-width: 768px) {
|
||||
/* Hide the terminal-menu from theme */
|
||||
.terminal-menu {
|
||||
display: none !important;
|
||||
}
|
||||
|
||||
/* Add padding to site name to prevent hamburger overlap */
|
||||
.terminal-mkdocs-site-name,
|
||||
.terminal-logo a,
|
||||
.terminal-nav .logo {
|
||||
padding-left: 40px !important;
|
||||
white-space: nowrap;
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
}
|
||||
|
||||
/* Show mobile menu toggle button */
|
||||
.mobile-menu-toggle {
|
||||
display: block;
|
||||
}
|
||||
|
||||
/* Show mobile menu close button */
|
||||
.mobile-menu-close {
|
||||
display: block;
|
||||
}
|
||||
|
||||
#terminal-mkdocs-side-panel {
|
||||
left: calc(-1 * var(--sidebar-width));
|
||||
left: -100%; /* Hide completely off-screen */
|
||||
z-index: 1100;
|
||||
box-shadow: 2px 0 10px rgba(0,0,0,0.3);
|
||||
top: 0; /* Start from top edge */
|
||||
height: 100%; /* Full height */
|
||||
transition: left 0.3s ease-in-out;
|
||||
padding-top: 50px; /* Space for close button */
|
||||
overflow-y: auto;
|
||||
width: 85%; /* Wider on mobile */
|
||||
max-width: 320px; /* Maximum width */
|
||||
background-color: var(--background-color); /* Ensure solid background */
|
||||
}
|
||||
|
||||
#terminal-mkdocs-side-panel.sidebar-visible {
|
||||
left: 0;
|
||||
}
|
||||
|
||||
/* Make navigation links more touch-friendly */
|
||||
#terminal-mkdocs-side-panel a {
|
||||
padding: 6px 15px;
|
||||
display: block;
|
||||
/* No border as requested */
|
||||
}
|
||||
|
||||
#terminal-mkdocs-side-panel ul {
|
||||
padding-left: 0;
|
||||
}
|
||||
|
||||
#terminal-mkdocs-side-panel ul ul a {
|
||||
padding-left: 10px;
|
||||
}
|
||||
|
||||
.terminal-mkdocs-main-grid {
|
||||
/* Grid now takes full width (minus body padding) */
|
||||
margin-left: 0; /* Override sidebar margin */
|
||||
margin-left: 0 !important; /* Override sidebar margin with !important */
|
||||
margin-right: 0; /* Override auto margin */
|
||||
max-width: 100%; /* Allow full width */
|
||||
padding-left: var(--global-space); /* Reduce padding */
|
||||
@@ -224,7 +336,6 @@ footer {
|
||||
text-align: center;
|
||||
gap: 0.5em;
|
||||
}
|
||||
/* Remember JS for toggle button & overlay */
|
||||
}
|
||||
|
||||
|
||||
|
||||
106
docs/md_v2/assets/mobile_menu.js
Normal file
106
docs/md_v2/assets/mobile_menu.js
Normal file
@@ -0,0 +1,106 @@
|
||||
// mobile_menu.js - Hamburger menu for mobile view
|
||||
document.addEventListener('DOMContentLoaded', () => {
|
||||
// Get references to key elements
|
||||
const sidePanel = document.getElementById('terminal-mkdocs-side-panel');
|
||||
const mainHeader = document.querySelector('.terminal .container:first-child');
|
||||
|
||||
if (!sidePanel || !mainHeader) {
|
||||
console.warn('Mobile menu: Required elements not found');
|
||||
return;
|
||||
}
|
||||
|
||||
// Force hide sidebar on mobile
|
||||
const checkMobile = () => {
|
||||
if (window.innerWidth <= 768) {
|
||||
// Force with !important-like priority
|
||||
sidePanel.style.setProperty('left', '-100%', 'important');
|
||||
// Also hide terminal-menu from the theme
|
||||
const terminalMenu = document.querySelector('.terminal-menu');
|
||||
if (terminalMenu) {
|
||||
terminalMenu.style.setProperty('display', 'none', 'important');
|
||||
}
|
||||
} else {
|
||||
sidePanel.style.removeProperty('left');
|
||||
// Restore terminal-menu if it exists
|
||||
const terminalMenu = document.querySelector('.terminal-menu');
|
||||
if (terminalMenu) {
|
||||
terminalMenu.style.removeProperty('display');
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
// Run on initial load
|
||||
checkMobile();
|
||||
|
||||
// Also run on resize
|
||||
window.addEventListener('resize', checkMobile);
|
||||
|
||||
// Create hamburger button
|
||||
const hamburgerBtn = document.createElement('button');
|
||||
hamburgerBtn.className = 'mobile-menu-toggle';
|
||||
hamburgerBtn.setAttribute('aria-label', 'Toggle navigation menu');
|
||||
hamburgerBtn.innerHTML = `
|
||||
<span class="hamburger-line"></span>
|
||||
<span class="hamburger-line"></span>
|
||||
<span class="hamburger-line"></span>
|
||||
`;
|
||||
|
||||
// Create backdrop overlay
|
||||
const menuBackdrop = document.createElement('div');
|
||||
menuBackdrop.className = 'mobile-menu-backdrop';
|
||||
menuBackdrop.style.display = 'none';
|
||||
document.body.appendChild(menuBackdrop);
|
||||
|
||||
// Make sure it's properly hidden on page load
|
||||
if (window.innerWidth <= 768) {
|
||||
menuBackdrop.style.display = 'none';
|
||||
}
|
||||
|
||||
// Insert hamburger button into header
|
||||
mainHeader.insertBefore(hamburgerBtn, mainHeader.firstChild);
|
||||
|
||||
// Add menu close button to side panel
|
||||
const closeBtn = document.createElement('button');
|
||||
closeBtn.className = 'mobile-menu-close';
|
||||
closeBtn.setAttribute('aria-label', 'Close navigation menu');
|
||||
closeBtn.innerHTML = `×`;
|
||||
sidePanel.insertBefore(closeBtn, sidePanel.firstChild);
|
||||
|
||||
// Toggle function
|
||||
function toggleMobileMenu() {
|
||||
const isOpen = sidePanel.classList.toggle('sidebar-visible');
|
||||
|
||||
// Toggle backdrop
|
||||
menuBackdrop.style.display = isOpen ? 'block' : 'none';
|
||||
|
||||
// Toggle aria-expanded
|
||||
hamburgerBtn.setAttribute('aria-expanded', isOpen ? 'true' : 'false');
|
||||
|
||||
// Toggle hamburger animation class
|
||||
hamburgerBtn.classList.toggle('is-active');
|
||||
|
||||
// Force sidebar visibility setting
|
||||
if (isOpen) {
|
||||
sidePanel.style.setProperty('left', '0', 'important');
|
||||
} else {
|
||||
sidePanel.style.setProperty('left', '-100%', 'important');
|
||||
}
|
||||
|
||||
// Prevent body scrolling when menu is open
|
||||
document.body.style.overflow = isOpen ? 'hidden' : '';
|
||||
}
|
||||
|
||||
// Event listeners
|
||||
hamburgerBtn.addEventListener('click', toggleMobileMenu);
|
||||
closeBtn.addEventListener('click', toggleMobileMenu);
|
||||
menuBackdrop.addEventListener('click', toggleMobileMenu);
|
||||
|
||||
// Close menu on window resize to desktop
|
||||
window.addEventListener('resize', () => {
|
||||
if (window.innerWidth > 768 && sidePanel.classList.contains('sidebar-visible')) {
|
||||
toggleMobileMenu();
|
||||
}
|
||||
});
|
||||
|
||||
console.log('Mobile menu initialized');
|
||||
});
|
||||
@@ -268,3 +268,6 @@ div.badges a > img {
|
||||
}
|
||||
|
||||
|
||||
table td, table th {
|
||||
border: 1px solid var(--code-bg-color) !important;
|
||||
}
|
||||
@@ -62,7 +62,7 @@ Our latest release candidate is `0.6.0rc1-r2`. Images are built with multi-arch
|
||||
|
||||
```bash
|
||||
# Pull the release candidate (recommended for latest features)
|
||||
docker pull unclecode/crawl4ai:0.6.0rc1-r2
|
||||
docker pull unclecode/crawl4ai:0.6.0-r1
|
||||
|
||||
# Or pull the latest stable version
|
||||
docker pull unclecode/crawl4ai:latest
|
||||
@@ -99,7 +99,7 @@ EOL
|
||||
-p 11235:11235 \
|
||||
--name crawl4ai \
|
||||
--shm-size=1g \
|
||||
unclecode/crawl4ai:0.6.0rc1-r2
|
||||
unclecode/crawl4ai:latest
|
||||
```
|
||||
|
||||
* **With LLM support:**
|
||||
@@ -110,7 +110,7 @@ EOL
|
||||
--name crawl4ai \
|
||||
--env-file .llm.env \
|
||||
--shm-size=1g \
|
||||
unclecode/crawl4ai:0.6.0rc1-r2
|
||||
unclecode/crawl4ai:latest
|
||||
```
|
||||
|
||||
> The server will be available at `http://localhost:11235`. Visit `/playground` to access the interactive testing interface.
|
||||
@@ -160,7 +160,7 @@ The `docker-compose.yml` file in the project root provides a simplified approach
|
||||
```bash
|
||||
# Pulls and runs the release candidate from Docker Hub
|
||||
# Automatically selects the correct architecture
|
||||
IMAGE=unclecode/crawl4ai:0.6.0rc1-r2 docker compose up -d
|
||||
IMAGE=unclecode/crawl4ai:latest docker compose up -d
|
||||
```
|
||||
|
||||
* **Build and Run Locally:**
|
||||
|
||||
115
docs/md_v2/core/examples.md
Normal file
115
docs/md_v2/core/examples.md
Normal file
@@ -0,0 +1,115 @@
|
||||
# Code Examples
|
||||
|
||||
This page provides a comprehensive list of example scripts that demonstrate various features and capabilities of Crawl4AI. Each example is designed to showcase specific functionality, making it easier for you to understand how to implement these features in your own projects.
|
||||
|
||||
## Getting Started Examples
|
||||
|
||||
| Example | Description | Link |
|
||||
|---------|-------------|------|
|
||||
| Hello World | A simple introductory example demonstrating basic usage of AsyncWebCrawler with JavaScript execution and content filtering. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/hello_world.py) |
|
||||
| Quickstart | A comprehensive collection of examples showcasing various features including basic crawling, content cleaning, link analysis, JavaScript execution, CSS selectors, media handling, custom hooks, proxy configuration, screenshots, and multiple extraction strategies. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/quickstart.py) |
|
||||
| Quickstart Set 1 | Basic examples for getting started with Crawl4AI. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/quickstart_examples_set_1.py) |
|
||||
| Quickstart Set 2 | More advanced examples for working with Crawl4AI. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/quickstart_examples_set_2.py) |
|
||||
|
||||
## Browser & Crawling Features
|
||||
|
||||
| Example | Description | Link |
|
||||
|---------|-------------|------|
|
||||
| Built-in Browser | Demonstrates how to use the built-in browser capabilities. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/builtin_browser_example.py) |
|
||||
| Browser Optimization | Focuses on browser performance optimization techniques. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/browser_optimization_example.py) |
|
||||
| arun vs arun_many | Compares the `arun` and `arun_many` methods for single vs. multiple URL crawling. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/arun_vs_arun_many.py) |
|
||||
| Multiple URLs | Shows how to crawl multiple URLs asynchronously. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/async_webcrawler_multiple_urls_example.py) |
|
||||
| Page Interaction | Guide on interacting with dynamic elements through clicks. | [View Guide](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/tutorial_dynamic_clicks.md) |
|
||||
| Crawler Monitor | Shows how to monitor the crawler's activities and status. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/crawler_monitor_example.py) |
|
||||
| Full Page Screenshot & PDF | Guide on capturing full-page screenshots and PDFs from massive webpages. | [View Guide](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/full_page_screenshot_and_pdf_export.md) |
|
||||
|
||||
## Advanced Crawling & Deep Crawling
|
||||
|
||||
| Example | Description | Link |
|
||||
|---------|-------------|------|
|
||||
| Deep Crawling | An extensive tutorial on deep crawling capabilities, demonstrating BFS and BestFirst strategies, stream vs. non-stream execution, filters, scorers, and advanced configurations. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/deepcrawl_example.py) |
|
||||
| Dispatcher | Shows how to use the crawl dispatcher for advanced workload management. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/dispatcher_example.py) |
|
||||
| Storage State | Tutorial on managing browser storage state for persistence. | [View Guide](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/storage_state_tutorial.md) |
|
||||
| Network Console Capture | Demonstrates how to capture and analyze network requests and console logs. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/network_console_capture_example.py) |
|
||||
|
||||
## Extraction Strategies
|
||||
|
||||
| Example | Description | Link |
|
||||
|---------|-------------|------|
|
||||
| Extraction Strategies | Demonstrates different extraction strategies with various input formats (markdown, HTML, fit_markdown) and JSON-based extractors (CSS and XPath). | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/extraction_strategies_examples.py) |
|
||||
| Scraping Strategies | Compares the performance of different scraping strategies. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/scraping_strategies_performance.py) |
|
||||
| LLM Extraction | Demonstrates LLM-based extraction specifically for OpenAI pricing data. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/llm_extraction_openai_pricing.py) |
|
||||
| LLM Markdown | Shows how to use LLMs to generate markdown from crawled content. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/llm_markdown_generator.py) |
|
||||
| Summarize Page | Shows how to summarize web page content. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/summarize_page.py) |
|
||||
|
||||
## E-commerce & Specialized Crawling
|
||||
|
||||
| Example | Description | Link |
|
||||
|---------|-------------|------|
|
||||
| Amazon Product Extraction | Demonstrates how to extract structured product data from Amazon search results using CSS selectors. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/amazon_product_extraction_direct_url.py) |
|
||||
| Amazon with Hooks | Shows how to use hooks with Amazon product extraction. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/amazon_product_extraction_using_hooks.py) |
|
||||
| Amazon with JavaScript | Demonstrates using custom JavaScript for Amazon product extraction. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/amazon_product_extraction_using_use_javascript.py) |
|
||||
| Crypto Analysis | Demonstrates how to crawl and analyze cryptocurrency data. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/crypto_analysis_example.py) |
|
||||
| SERP API | Demonstrates using Crawl4AI with search engine result pages. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/serp_api_project_11_feb.py) |
|
||||
|
||||
## Customization & Security
|
||||
|
||||
| Example | Description | Link |
|
||||
|---------|-------------|------|
|
||||
| Hooks | Illustrates how to use hooks at different stages of the crawling process for advanced customization. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/hooks_example.py) |
|
||||
| Identity-Based Browsing | Illustrates identity-based browsing configurations for authentic browsing experiences. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/identity_based_browsing.py) |
|
||||
| Proxy Rotation | Shows how to use proxy rotation for web scraping and avoiding IP blocks. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/proxy_rotation_demo.py) |
|
||||
| SSL Certificate | Illustrates SSL certificate handling and verification. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/ssl_example.py) |
|
||||
| Language Support | Shows how to handle different languages during crawling. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/language_support_example.py) |
|
||||
| Geolocation | Demonstrates how to use geolocation features. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/use_geo_location.py) |
|
||||
|
||||
## Docker & Deployment
|
||||
|
||||
| Example | Description | Link |
|
||||
|---------|-------------|------|
|
||||
| Docker Config | Demonstrates how to create and use Docker configuration objects. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/docker_config_obj.py) |
|
||||
| Docker Basic | A test suite for Docker deployment, showcasing various functionalities through the Docker API. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/docker_example.py) |
|
||||
| Docker REST API | Shows how to interact with Crawl4AI Docker using REST API calls. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/docker_python_rest_api.py) |
|
||||
| Docker SDK | Demonstrates using the Python SDK for Crawl4AI Docker. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/docker_python_sdk.py) |
|
||||
|
||||
## Application Examples
|
||||
|
||||
| Example | Description | Link |
|
||||
|---------|-------------|------|
|
||||
| Research Assistant | Demonstrates how to build a research assistant using Crawl4AI. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/research_assistant.py) |
|
||||
| REST Call | Shows how to make REST API calls with Crawl4AI. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/rest_call.py) |
|
||||
| Chainlit Integration | Shows how to integrate Crawl4AI with Chainlit. | [View Guide](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/chainlit.md) |
|
||||
| Crawl4AI vs FireCrawl | Compares Crawl4AI with the FireCrawl library. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/crawlai_vs_firecrawl.py) |
|
||||
|
||||
## Content Generation & Markdown
|
||||
|
||||
| Example | Description | Link |
|
||||
|---------|-------------|------|
|
||||
| Content Source | Demonstrates how to work with different content sources in markdown generation. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/markdown/content_source_example.py) |
|
||||
| Content Source (Short) | A simplified version of content source usage. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/markdown/content_source_short_example.py) |
|
||||
| Built-in Browser Guide | Guide for using the built-in browser capabilities. | [View Guide](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/README_BUILTIN_BROWSER.md) |
|
||||
|
||||
## Running the Examples
|
||||
|
||||
To run any of these examples, you'll need to have Crawl4AI installed:
|
||||
|
||||
```bash
|
||||
pip install crawl4ai
|
||||
```
|
||||
|
||||
Then, you can run an example script like this:
|
||||
|
||||
```bash
|
||||
python -m docs.examples.hello_world
|
||||
```
|
||||
|
||||
For examples that require additional dependencies or environment variables, refer to the comments at the top of each file.
|
||||
|
||||
Some examples may require:
|
||||
- API keys (for LLM-based examples)
|
||||
- Docker setup (for Docker-related examples)
|
||||
- Additional dependencies (specified in the example files)
|
||||
|
||||
## Contributing New Examples
|
||||
|
||||
If you've created an interesting example that demonstrates a unique use case or feature of Crawl4AI, we encourage you to contribute it to our examples collection. Please see our [contribution guidelines](https://github.com/unclecode/crawl4ai/blob/main/CONTRIBUTORS.md) for more information.
|
||||
@@ -72,6 +72,14 @@ asyncio.run(main())
|
||||
|
||||
---
|
||||
|
||||
## Video Tutorial
|
||||
|
||||
<div align="center">
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/xo3qK6Hg9AA?start=15" title="Crawl4AI Tutorial" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
## What Does Crawl4AI Do?
|
||||
|
||||
Crawl4AI is a feature-rich crawler and scraper that aims to:
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
site_name: Crawl4AI Documentation (v0.5.x)
|
||||
site_name: Crawl4AI Documentation (v0.6.x)
|
||||
site_description: 🚀🤖 Crawl4AI, Open-source LLM-Friendly Web Crawler & Scraper
|
||||
site_url: https://docs.crawl4ai.com
|
||||
repo_url: https://github.com/unclecode/crawl4ai
|
||||
@@ -9,6 +9,7 @@ nav:
|
||||
- Home: 'index.md'
|
||||
- "Ask AI": "core/ask-ai.md"
|
||||
- "Quick Start": "core/quickstart.md"
|
||||
- "Code Examples": "core/examples.md"
|
||||
- Setup & Installation:
|
||||
- "Installation": "core/installation.md"
|
||||
- "Docker Deployment": "core/docker-deployment.md"
|
||||
@@ -90,4 +91,5 @@ extra_javascript:
|
||||
- assets/github_stats.js
|
||||
- assets/selection_ask_ai.js
|
||||
- assets/copy_code.js
|
||||
- assets/floating_ask_ai_button.js
|
||||
- assets/floating_ask_ai_button.js
|
||||
- assets/mobile_menu.js
|
||||
@@ -8,7 +8,7 @@ dynamic = ["version"]
|
||||
description = "🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & scraper"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.9"
|
||||
license = {text = "Apache-2.0"}
|
||||
license = "Apache-2.0"
|
||||
authors = [
|
||||
{name = "Unclecode", email = "unclecode@kidocode.com"}
|
||||
]
|
||||
@@ -48,7 +48,6 @@ dependencies = [
|
||||
classifiers = [
|
||||
"Development Status :: 4 - Beta",
|
||||
"Intended Audience :: Developers",
|
||||
"License :: OSI Approved :: Apache Software License",
|
||||
"Programming Language :: Python :: 3",
|
||||
"Programming Language :: Python :: 3.9",
|
||||
"Programming Language :: Python :: 3.10",
|
||||
|
||||
3
setup.py
3
setup.py
@@ -49,13 +49,12 @@ setup(
|
||||
url="https://github.com/unclecode/crawl4ai",
|
||||
author="Unclecode",
|
||||
author_email="unclecode@kidocode.com",
|
||||
license="MIT",
|
||||
license="Apache-2.0",
|
||||
packages=find_packages(),
|
||||
package_data={"crawl4ai": ["js_snippet/*.js"]},
|
||||
classifiers=[
|
||||
"Development Status :: 3 - Alpha",
|
||||
"Intended Audience :: Developers",
|
||||
"License :: OSI Approved :: Apache Software License",
|
||||
"Programming Language :: Python :: 3",
|
||||
"Programming Language :: Python :: 3.9",
|
||||
"Programming Language :: Python :: 3.10",
|
||||
|
||||
Reference in New Issue
Block a user