This commit adds a complete, web scraping API example that demonstrates how to get structured data from any website and use it like an API using the crawl4ai library with a minimalist frontend interface. Core Functionality - AI-powered web scraping with plain English queries - Dual scraping approaches: Schema-based (faster) and LLM-based (flexible) - Intelligent schema caching for improved performance - Custom LLM model support with API key management - Automatic duplicate request prevention Modern Frontend Interface - Minimalist black-and-white design inspired by modern web apps - Responsive layout with smooth animations and transitions - Three main pages: Scrape Data, Models Management, API Request History - Real-time results display with JSON formatting - Copy-to-clipboard functionality for extracted data - Toast notifications for user feedback - Auto-scroll to results when scraping starts Model Management System - Web-based model configuration interface - Support for any LLM provider (OpenAI, Gemini, Anthropic, etc.) - Simplified configuration requiring only provider and API token - Add, list, and delete model configurations - Secure storage of API keys in local JSON files API Request History - Automatic saving of all API requests and responses - Display of request history with URL, query, and cURL commands - Duplicate prevention (same URL + query combinations) - Request deletion functionality - Clean, simplified display focusing on essential information Technical Implementation Backend (FastAPI) - RESTful API with comprehensive endpoints - Pydantic models for request/response validation - Async web scraping with crawl4ai library - Error handling with detailed error messages - File-based storage for models and request history Frontend (Vanilla JS/CSS/HTML) - No framework dependencies - pure HTML, CSS, JavaScript - Modern CSS Grid and Flexbox layouts - Custom dropdown styling with SVG arrows - Responsive design for mobile and desktop - Smooth scrolling and animations Core Library Integration - WebScraperAgent class for orchestration - ModelConfig class for LLM configuration management - Schema generation and caching system - LLM extraction strategy support - Browser configuration with headless mode
49 lines
1.4 KiB
Python
49 lines
1.4 KiB
Python
#!/usr/bin/env python3
|
|
"""
|
|
Startup script for the Web Scraper API with frontend interface.
|
|
"""
|
|
|
|
import os
|
|
import sys
|
|
import uvicorn
|
|
from pathlib import Path
|
|
|
|
def main():
|
|
# Check if static directory exists
|
|
static_dir = Path("static")
|
|
if not static_dir.exists():
|
|
print("❌ Static directory not found!")
|
|
print("Please make sure the 'static' directory exists with the frontend files.")
|
|
sys.exit(1)
|
|
|
|
# Check if required frontend files exist
|
|
required_files = ["index.html", "styles.css", "script.js"]
|
|
missing_files = []
|
|
|
|
for file in required_files:
|
|
if not (static_dir / file).exists():
|
|
missing_files.append(file)
|
|
|
|
if missing_files:
|
|
print(f"❌ Missing frontend files: {', '.join(missing_files)}")
|
|
print("Please make sure all frontend files are present in the static directory.")
|
|
sys.exit(1)
|
|
|
|
print("🚀 Starting Web Scraper API with Frontend Interface")
|
|
print("=" * 50)
|
|
print("📁 Static files found and ready to serve")
|
|
print("🌐 Frontend will be available at: http://localhost:8000")
|
|
print("🔌 API endpoints available at: http://localhost:8000/docs")
|
|
print("=" * 50)
|
|
|
|
# Start the server
|
|
uvicorn.run(
|
|
"api_server:app",
|
|
host="0.0.0.0",
|
|
port=8000,
|
|
reload=True,
|
|
log_level="info"
|
|
)
|
|
|
|
if __name__ == "__main__":
|
|
main() |