Files
crawl4ai/docs/md_v2/apps/crawl4ai-assistant/README.md
UncleCode 4eb90b41b6 Refactor Crawl4AI Assistant: Rename Schema Builder to Click2Crawl, update UI elements, and remove deprecated files
- Updated overlay.css to add gap in titlebar.
- Deleted schemaBuilder_v1.js and associated zip files (v1.0.0 to v1.2.0).
- Modified index.html to reflect new Click2Crawl feature and updated descriptions.
- Updated manifest.json to include new JavaScript files for Click2Crawl and markdown extraction.
- Refined popup styles and HTML to align with new feature names and functionalities.
- Enhanced user instructions and tooltips to guide users on the new Click2Crawl and Markdown Extraction features.
2025-06-10 15:40:26 +08:00

124 lines
3.8 KiB
Markdown

# Crawl4AI Chrome Extension
Visual extraction tools for Crawl4AI - Click to extract data and content from any webpage!
## 🚀 Features
- **Click2Crawl**: Click on elements to build data extraction schemas instantly
- **Markdown Extraction**: Select elements and export as clean markdown
- **Script Builder (Alpha)**: Record browser actions to create automation scripts
- **Smart Element Selection**: Container and field selection with visual feedback
- **Code Generation**: Generates complete Python code for Crawl4AI
- **Beautiful Dark UI**: Consistent with Crawl4AI's design language
## 📦 Installation
### Method 1: Load Unpacked Extension (Recommended for Development)
1. Open Chrome and navigate to `chrome://extensions/`
2. Enable "Developer mode" in the top right corner
3. Click "Load unpacked"
4. Select the `crawl4ai-assistant` folder
5. The extension icon (🚀🤖) will appear in your toolbar
### Method 2: Generate Icons First
If you want proper icons:
1. Open `icons/generate_icons.html` in your browser
2. Right-click each canvas and save as:
- `icon-16.png`
- `icon-48.png`
- `icon-128.png`
3. Then follow Method 1 above
## 🎯 How to Use
### Using Click2Crawl
1. **Navigate to any website** you want to extract data from
2. **Click the Crawl4AI extension icon** in your toolbar
3. **Click "Click2Crawl"** to start the capture mode
4. **Select a container element**:
- Hover over elements (they'll highlight in blue)
- Click on a repeating container (e.g., product card, article block)
5. **Select fields within the container**:
- Elements will now highlight in green
- Click on each piece of data you want to extract
- Name each field (e.g., "title", "price", "description")
6. **Test and Export**:
- Click "Test Schema" to see extracted data instantly
- Export as Python code, JSON schema, or markdown
### Running the Generated Code
The downloaded Python file contains:
```python
# 1. The HTML snippet of your selected container
HTML_SNIPPET = """..."""
# 2. The extraction query based on your selections
EXTRACTION_QUERY = """..."""
# 3. Functions to generate and test the schema
async def generate_schema():
# Generates the extraction schema using LLM
async def test_extraction():
# Tests the schema on the actual website
```
To use it:
1. Install Crawl4AI: `pip install crawl4ai`
2. Run the script: `python crawl4ai_schema_*.py`
3. The script will generate a `generated_schema.json` file
4. Use this schema in your Crawl4AI projects!
## 🎨 Visual Feedback
- **Blue dashed outline**: Container selection mode
- **Green dashed outline**: Field selection mode
- **Solid blue outline**: Selected container
- **Solid green outline**: Selected fields
- **Floating toolbar**: Shows current mode and selection status
## ⌨️ Keyboard Shortcuts
- **ESC**: Cancel current capture session
## 🔧 Technical Details
- Built with Manifest V3 for security and performance
- Pure client-side - no data sent to external servers
- Generates code that uses Crawl4AI's LLM integration
- Smart selector generation prioritizes stable attributes
## 🐛 Troubleshooting
### Extension doesn't load
- Make sure you're in Developer Mode
- Check the console for any errors
- Ensure all files are in the correct directories
### Can't select elements
- Some websites may block extensions
- Try refreshing the page
- Make sure you clicked "Schema Builder" first
### Generated code doesn't work
- Ensure you have Crawl4AI installed
- Check that you have an LLM API key configured
- Make sure the website structure hasn't changed
## 🤝 Contributing
This extension is part of the Crawl4AI project. Contributions are welcome!
- Report issues: [GitHub Issues](https://github.com/unclecode/crawl4ai/issues)
- Join discussion: [Discord](https://discord.gg/crawl4ai)
## 📄 License
Same as Crawl4AI - see main project for details.