Files

UncleCode 1a73fb60db feat(crawl4ai): Implement adaptive crawling feature

This commit introduces the adaptive crawling feature to the crawl4ai project. The adaptive crawling feature intelligently determines when sufficient information has been gathered during a crawl, improving efficiency and reducing unnecessary resource usage.

The changes include the addition of new files related to the adaptive crawler, modifications to the existing files, and updates to the documentation. The new files include the main adaptive crawler script, utility functions, and various configuration and strategy scripts. The existing files that were modified include the project's initialization file and utility functions. The documentation has been updated to include detailed explanations and examples of the adaptive crawling feature.

The adaptive crawling feature will significantly enhance the capabilities of the crawl4ai project, providing users with a more efficient and intelligent web crawling tool.

Significant modifications:
- Added adaptive_crawler.py and related scripts
- Modified __init__.py and utils.py
- Updated documentation with details about the adaptive crawling feature
- Added tests for the new feature

BREAKING CHANGE: This is a significant feature addition that may affect the overall behavior of the crawl4ai project. Users are advised to review the updated documentation to understand how to use the new feature.

Refs: #123, #456

2025-07-04 15:16:53 +08:00

2.6 KiB

Raw Blame History

Adaptive Crawling Examples

This directory contains examples demonstrating various aspects of Crawl4AI's Adaptive Crawling feature.

Examples Overview

1. `basic_usage.py`

Simple introduction to adaptive crawling
Uses default statistical strategy
Shows how to get crawl statistics and relevant content

2. `embedding_strategy.py` ⭐ NEW

Demonstrates the embedding-based strategy for semantic understanding
Shows query expansion and irrelevance detection
Includes configuration for both local and API-based embeddings

3. `embedding_vs_statistical.py` ⭐ NEW

Direct comparison between statistical and embedding strategies
Helps you choose the right strategy for your use case
Shows performance and accuracy trade-offs

4. `embedding_configuration.py` ⭐ NEW

Advanced configuration options for embedding strategy
Parameter tuning guide for different scenarios
Examples for research, exploration, and quality-focused crawling

5. `advanced_configuration.py`

Shows various configuration options for both strategies
Demonstrates threshold tuning and performance optimization

6. `custom_strategies.py`

How to implement your own crawling strategy
Extends the base CrawlStrategy class
Advanced use case for specialized requirements

7. `export_import_kb.py`

Export crawled knowledge base to JSONL
Import and continue crawling from saved state
Useful for building persistent knowledge bases

Quick Start

For your first adaptive crawling experience, run:

python basic_usage.py

To try the new embedding strategy with semantic understanding:

python embedding_strategy.py

To compare strategies and see which works best for your use case:

python embedding_vs_statistical.py

2.6 KiB

Raw Blame History

Adaptive Crawling Examples

Examples Overview

1. `basic_usage.py`

2. `embedding_strategy.py` ⭐ NEW

3. `embedding_vs_statistical.py` ⭐ NEW

4. `embedding_configuration.py` ⭐ NEW

5. `advanced_configuration.py`

6. `custom_strategies.py`

7. `export_import_kb.py`

Quick Start

Strategy Selection Guide

Use Statistical Strategy (Default) When:

Use Embedding Strategy When:

Requirements

Learn More

2.6 KiB Raw Blame History

Adaptive Crawling Examples

Examples Overview

1. basic_usage.py

2. embedding_strategy.py ⭐ NEW

3. embedding_vs_statistical.py ⭐ NEW

4. embedding_configuration.py ⭐ NEW

5. advanced_configuration.py

6. custom_strategies.py

7. export_import_kb.py

Quick Start

Strategy Selection Guide

Use Statistical Strategy (Default) When:

Use Embedding Strategy When:

Requirements

Learn More

2.6 KiB

Raw Blame History

1. `basic_usage.py`

2. `embedding_strategy.py` ⭐ NEW

3. `embedding_vs_statistical.py` ⭐ NEW

4. `embedding_configuration.py` ⭐ NEW

5. `advanced_configuration.py`

6. `custom_strategies.py`

7. `export_import_kb.py`