feat(extraction): add LLM-powered schema generation utility

Adds new static method generate_schema() to JsonElementExtractionStrategy classes
that can automatically generate extraction schemas using LLM (OpenAI or Ollama).
This provides a convenient way to bootstrap extraction schemas while maintaining
the performance benefits of selector-based extraction.

Key changes:
- Added generate_schema() static method to base extraction strategy
- Added support for both CSS and XPath schema generation
- Updated documentation with examples and best practices
- Added new prompt templates for schema generation
This commit is contained in:
UncleCode
2025-01-20 17:28:00 +08:00
parent 4b1309cbf2
commit 2cec527a22
6 changed files with 1052 additions and 3 deletions

View File

@@ -1,3 +1,9 @@
### [Added] 2025-01-20
- New LLM-powered schema generation utility for JsonElementExtractionStrategy
- Support for automatic CSS and XPath schema generation using OpenAI or Ollama
- Comprehensive documentation and examples for schema generation
- New prompt templates optimized for HTML schema analysis
# Changelog
All notable changes to Crawl4AI will be documented in this file.