- Change embedding_llm_config from Dict to Union[LLMConfig, Dict] for type safety
- Add backward-compatible conversion property _embedding_llm_config_dict
- Replace all hardcoded OpenAI embedding configs with configurable options
- Fix LLMConfig object attribute access in query expansion logic
- Add comprehensive example demonstrating multiple provider configurations
- Update documentation with both LLMConfig object and dictionary usage patterns
Users can now specify any LLM provider for query expansion in embedding strategy:
- New: embedding_llm_config=LLMConfig(provider='anthropic/claude-3', api_token='key')
- Old: embedding_llm_config={'provider': 'openai/gpt-4', 'api_token': 'key'} (still works)
Adaptive Crawling Examples
This directory contains examples demonstrating various aspects of Crawl4AI's Adaptive Crawling feature.
Examples Overview
1. basic_usage.py
- Simple introduction to adaptive crawling
- Uses default statistical strategy
- Shows how to get crawl statistics and relevant content
2. embedding_strategy.py ⭐ NEW
- Demonstrates the embedding-based strategy for semantic understanding
- Shows query expansion and irrelevance detection
- Includes configuration for both local and API-based embeddings
3. embedding_vs_statistical.py ⭐ NEW
- Direct comparison between statistical and embedding strategies
- Helps you choose the right strategy for your use case
- Shows performance and accuracy trade-offs
4. embedding_configuration.py ⭐ NEW
- Advanced configuration options for embedding strategy
- Parameter tuning guide for different scenarios
- Examples for research, exploration, and quality-focused crawling
5. advanced_configuration.py
- Shows various configuration options for both strategies
- Demonstrates threshold tuning and performance optimization
6. custom_strategies.py
- How to implement your own crawling strategy
- Extends the base CrawlStrategy class
- Advanced use case for specialized requirements
7. export_import_kb.py
- Export crawled knowledge base to JSONL
- Import and continue crawling from saved state
- Useful for building persistent knowledge bases
Quick Start
For your first adaptive crawling experience, run:
python basic_usage.py
To try the new embedding strategy with semantic understanding:
python embedding_strategy.py
To compare strategies and see which works best for your use case:
python embedding_vs_statistical.py
Strategy Selection Guide
Use Statistical Strategy (Default) When:
- Working with technical documentation
- Queries contain specific terms or code
- Speed is critical
- No API access available
Use Embedding Strategy When:
- Queries are conceptual or ambiguous
- Need semantic understanding beyond exact matches
- Want to detect irrelevant content
- Working with diverse content sources
Requirements
- Crawl4AI installed
- For embedding strategy with local models:
sentence-transformers - For embedding strategy with OpenAI: Set
OPENAI_API_KEYenvironment variable