- Updated overlay.css to add gap in titlebar. - Deleted schemaBuilder_v1.js and associated zip files (v1.0.0 to v1.2.0). - Modified index.html to reflect new Click2Crawl feature and updated descriptions. - Updated manifest.json to include new JavaScript files for Click2Crawl and markdown extraction. - Refined popup styles and HTML to align with new feature names and functionalities. - Enhanced user instructions and tooltips to guide users on the new Click2Crawl and Markdown Extraction features.
124 lines
3.8 KiB
Markdown
124 lines
3.8 KiB
Markdown
# Crawl4AI Chrome Extension
|
|
|
|
Visual extraction tools for Crawl4AI - Click to extract data and content from any webpage!
|
|
|
|
## 🚀 Features
|
|
|
|
- **Click2Crawl**: Click on elements to build data extraction schemas instantly
|
|
- **Markdown Extraction**: Select elements and export as clean markdown
|
|
- **Script Builder (Alpha)**: Record browser actions to create automation scripts
|
|
- **Smart Element Selection**: Container and field selection with visual feedback
|
|
- **Code Generation**: Generates complete Python code for Crawl4AI
|
|
- **Beautiful Dark UI**: Consistent with Crawl4AI's design language
|
|
|
|
## 📦 Installation
|
|
|
|
### Method 1: Load Unpacked Extension (Recommended for Development)
|
|
|
|
1. Open Chrome and navigate to `chrome://extensions/`
|
|
2. Enable "Developer mode" in the top right corner
|
|
3. Click "Load unpacked"
|
|
4. Select the `crawl4ai-assistant` folder
|
|
5. The extension icon (🚀🤖) will appear in your toolbar
|
|
|
|
### Method 2: Generate Icons First
|
|
|
|
If you want proper icons:
|
|
|
|
1. Open `icons/generate_icons.html` in your browser
|
|
2. Right-click each canvas and save as:
|
|
- `icon-16.png`
|
|
- `icon-48.png`
|
|
- `icon-128.png`
|
|
3. Then follow Method 1 above
|
|
|
|
## 🎯 How to Use
|
|
|
|
### Using Click2Crawl
|
|
|
|
1. **Navigate to any website** you want to extract data from
|
|
2. **Click the Crawl4AI extension icon** in your toolbar
|
|
3. **Click "Click2Crawl"** to start the capture mode
|
|
4. **Select a container element**:
|
|
- Hover over elements (they'll highlight in blue)
|
|
- Click on a repeating container (e.g., product card, article block)
|
|
5. **Select fields within the container**:
|
|
- Elements will now highlight in green
|
|
- Click on each piece of data you want to extract
|
|
- Name each field (e.g., "title", "price", "description")
|
|
6. **Test and Export**:
|
|
- Click "Test Schema" to see extracted data instantly
|
|
- Export as Python code, JSON schema, or markdown
|
|
|
|
### Running the Generated Code
|
|
|
|
The downloaded Python file contains:
|
|
|
|
```python
|
|
# 1. The HTML snippet of your selected container
|
|
HTML_SNIPPET = """..."""
|
|
|
|
# 2. The extraction query based on your selections
|
|
EXTRACTION_QUERY = """..."""
|
|
|
|
# 3. Functions to generate and test the schema
|
|
async def generate_schema():
|
|
# Generates the extraction schema using LLM
|
|
|
|
async def test_extraction():
|
|
# Tests the schema on the actual website
|
|
```
|
|
|
|
To use it:
|
|
|
|
1. Install Crawl4AI: `pip install crawl4ai`
|
|
2. Run the script: `python crawl4ai_schema_*.py`
|
|
3. The script will generate a `generated_schema.json` file
|
|
4. Use this schema in your Crawl4AI projects!
|
|
|
|
## 🎨 Visual Feedback
|
|
|
|
- **Blue dashed outline**: Container selection mode
|
|
- **Green dashed outline**: Field selection mode
|
|
- **Solid blue outline**: Selected container
|
|
- **Solid green outline**: Selected fields
|
|
- **Floating toolbar**: Shows current mode and selection status
|
|
|
|
## ⌨️ Keyboard Shortcuts
|
|
|
|
- **ESC**: Cancel current capture session
|
|
|
|
## 🔧 Technical Details
|
|
|
|
- Built with Manifest V3 for security and performance
|
|
- Pure client-side - no data sent to external servers
|
|
- Generates code that uses Crawl4AI's LLM integration
|
|
- Smart selector generation prioritizes stable attributes
|
|
|
|
## 🐛 Troubleshooting
|
|
|
|
### Extension doesn't load
|
|
- Make sure you're in Developer Mode
|
|
- Check the console for any errors
|
|
- Ensure all files are in the correct directories
|
|
|
|
### Can't select elements
|
|
- Some websites may block extensions
|
|
- Try refreshing the page
|
|
- Make sure you clicked "Schema Builder" first
|
|
|
|
### Generated code doesn't work
|
|
- Ensure you have Crawl4AI installed
|
|
- Check that you have an LLM API key configured
|
|
- Make sure the website structure hasn't changed
|
|
|
|
## 🤝 Contributing
|
|
|
|
This extension is part of the Crawl4AI project. Contributions are welcome!
|
|
|
|
- Report issues: [GitHub Issues](https://github.com/unclecode/crawl4ai/issues)
|
|
- Join discussion: [Discord](https://discord.gg/crawl4ai)
|
|
|
|
## 📄 License
|
|
|
|
Same as Crawl4AI - see main project for details. |