Add C4A-Script support and documentation
- Generate OneShot js code geenrator - Introduced a new C4A-Script tutorial example for login flow using Blockly. - Updated index.html to include Blockly theme and event editor modal for script editing. - Created a test HTML file for testing Blockly integration. - Added comprehensive C4A-Script API reference documentation covering commands, syntax, and examples. - Developed core documentation for C4A-Script, detailing its features, commands, and real-world examples. - Updated mkdocs.yml to include new C4A-Script documentation in navigation.
This commit is contained in:
171
docs/examples/c4a_script/amazon_example/README.md
Normal file
171
docs/examples/c4a_script/amazon_example/README.md
Normal file
@@ -0,0 +1,171 @@
|
||||
# Amazon R2D2 Product Search Example
|
||||
|
||||
A real-world demonstration of Crawl4AI's multi-step crawling with LLM-generated automation scripts.
|
||||
|
||||
## 🎯 What This Example Shows
|
||||
|
||||
This example demonstrates advanced Crawl4AI features:
|
||||
- **LLM-Generated Scripts**: Automatically create C4A-Script from HTML snippets
|
||||
- **Multi-Step Crawling**: Navigate through multiple pages using session persistence
|
||||
- **Structured Data Extraction**: Extract product data using JSON CSS schemas
|
||||
- **Visual Automation**: Watch the browser perform the search (headless=False)
|
||||
|
||||
## 🚀 How It Works
|
||||
|
||||
### 1. **Script Generation Phase**
|
||||
The example uses `C4ACompiler.generate_script()` to analyze Amazon's HTML and create:
|
||||
- **Search Script**: Automates filling the search box and clicking search
|
||||
- **Extraction Schema**: Defines how to extract product information
|
||||
|
||||
### 2. **Crawling Workflow**
|
||||
```
|
||||
Homepage → Execute Search Script → Extract Products → Save Results
|
||||
```
|
||||
|
||||
All steps use the same `session_id` to maintain browser state.
|
||||
|
||||
### 3. **Data Extraction**
|
||||
Products are extracted with:
|
||||
- Title, price, rating, reviews
|
||||
- Delivery information
|
||||
- Sponsored/Small Business badges
|
||||
- Direct product URLs
|
||||
|
||||
## 📁 Files
|
||||
|
||||
- `amazon_r2d2_search.py` - Main example script
|
||||
- `header.html` - Amazon search bar HTML (provided)
|
||||
- `product.html` - Product card HTML (provided)
|
||||
- **Generated files:**
|
||||
- `generated_search_script.c4a` - Auto-generated search automation
|
||||
- `generated_product_schema.json` - Auto-generated extraction rules
|
||||
- `extracted_products.json` - Final scraped data
|
||||
- `search_results_screenshot.png` - Visual proof of results
|
||||
|
||||
## 🏃 Running the Example
|
||||
|
||||
1. **Prerequisites**
|
||||
```bash
|
||||
# Ensure Crawl4AI is installed
|
||||
pip install crawl4ai
|
||||
|
||||
# Set up LLM API key (for script generation)
|
||||
export OPENAI_API_KEY="your-key-here"
|
||||
```
|
||||
|
||||
2. **Run the scraper**
|
||||
```bash
|
||||
python amazon_r2d2_search.py
|
||||
```
|
||||
|
||||
3. **Watch the magic!**
|
||||
- Browser window opens (not headless)
|
||||
- Navigates to Amazon.com
|
||||
- Searches for "r2d2"
|
||||
- Extracts all products
|
||||
- Saves results to JSON
|
||||
|
||||
## 📊 Sample Output
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"title": "Death Star BB8 R2D2 Golf Balls with 20 Printed tees",
|
||||
"price": "29.95",
|
||||
"rating": "4.7",
|
||||
"reviews_count": "184",
|
||||
"delivery": "FREE delivery Thu, Jun 19",
|
||||
"url": "https://www.amazon.com/Death-Star-R2D2-Balls-Printed/dp/B081XSYZMS",
|
||||
"is_sponsored": true,
|
||||
"small_business": true
|
||||
},
|
||||
...
|
||||
]
|
||||
```
|
||||
|
||||
## 🔍 Key Features Demonstrated
|
||||
|
||||
### Session Persistence
|
||||
```python
|
||||
# Same session_id across multiple arun() calls
|
||||
config = CrawlerRunConfig(
|
||||
session_id="amazon_r2d2_session",
|
||||
# ... other settings
|
||||
)
|
||||
```
|
||||
|
||||
### LLM Script Generation
|
||||
```python
|
||||
# Generate automation from natural language + HTML
|
||||
script = C4ACompiler.generate_script(
|
||||
html=header_html,
|
||||
query="Find search box, type 'r2d2', click search",
|
||||
mode="c4a"
|
||||
)
|
||||
```
|
||||
|
||||
### JSON CSS Extraction
|
||||
```python
|
||||
# Structured data extraction with CSS selectors
|
||||
schema = {
|
||||
"baseSelector": "[data-component-type='s-search-result']",
|
||||
"fields": [
|
||||
{"name": "title", "selector": "h2 a span", "type": "text"},
|
||||
{"name": "price", "selector": ".a-price-whole", "type": "text"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## 🛠️ Customization
|
||||
|
||||
### Search Different Products
|
||||
Change the search term in the script generation:
|
||||
```python
|
||||
search_goal = """
|
||||
...
|
||||
3. Type "star wars lego" into the search box
|
||||
...
|
||||
"""
|
||||
```
|
||||
|
||||
### Extract More Data
|
||||
Add fields to the extraction schema:
|
||||
```python
|
||||
"fields": [
|
||||
# ... existing fields
|
||||
{"name": "prime", "selector": ".s-prime", "type": "exists"},
|
||||
{"name": "image_url", "selector": "img.s-image", "type": "attribute", "attribute": "src"}
|
||||
]
|
||||
```
|
||||
|
||||
### Use Different Sites
|
||||
Adapt the approach for other e-commerce sites by:
|
||||
1. Providing their HTML snippets
|
||||
2. Adjusting the search goals
|
||||
3. Updating the extraction schema
|
||||
|
||||
## 🎓 Learning Points
|
||||
|
||||
1. **No Manual Scripting**: LLM generates all automation code
|
||||
2. **Session Management**: Maintain state across page navigations
|
||||
3. **Robust Extraction**: Handle dynamic content and multiple products
|
||||
4. **Error Handling**: Graceful fallbacks if generation fails
|
||||
|
||||
## 🐛 Troubleshooting
|
||||
|
||||
- **"No products found"**: Check if Amazon's HTML structure changed
|
||||
- **"Script generation failed"**: Ensure LLM API key is configured
|
||||
- **"Page timeout"**: Increase wait times in the config
|
||||
- **"Session lost"**: Ensure same session_id is used consistently
|
||||
|
||||
## 📚 Next Steps
|
||||
|
||||
- Try searching for different products
|
||||
- Add pagination to get more results
|
||||
- Extract product details pages
|
||||
- Compare prices across different sellers
|
||||
- Build a price monitoring system
|
||||
|
||||
---
|
||||
|
||||
This example shows the power of combining LLM intelligence with web automation. The scripts adapt to HTML changes and natural language instructions make automation accessible to everyone!
|
||||
Reference in New Issue
Block a user