Files

UncleCode 1a73fb60db feat(crawl4ai): Implement adaptive crawling feature

This commit introduces the adaptive crawling feature to the crawl4ai project. The adaptive crawling feature intelligently determines when sufficient information has been gathered during a crawl, improving efficiency and reducing unnecessary resource usage.

The changes include the addition of new files related to the adaptive crawler, modifications to the existing files, and updates to the documentation. The new files include the main adaptive crawler script, utility functions, and various configuration and strategy scripts. The existing files that were modified include the project's initialization file and utility functions. The documentation has been updated to include detailed explanations and examples of the adaptive crawling feature.

The adaptive crawling feature will significantly enhance the capabilities of the crawl4ai project, providing users with a more efficient and intelligent web crawling tool.

Significant modifications:
- Added adaptive_crawler.py and related scripts
- Modified __init__.py and utils.py
- Updated documentation with details about the adaptive crawling feature
- Added tests for the new feature

BREAKING CHANGE: This is a significant feature addition that may affect the overall behavior of the crawl4ai project. Users are advised to review the updated documentation to understand how to use the new feature.

Refs: #123, #456

2025-07-04 15:16:53 +08:00

11 KiB

Raw Blame History

Code Examples

This page provides a comprehensive list of example scripts that demonstrate various features and capabilities of Crawl4AI. Each example is designed to showcase specific functionality, making it easier for you to understand how to implement these features in your own projects.

Getting Started Examples

Example	Description	Link
Hello World	A simple introductory example demonstrating basic usage of AsyncWebCrawler with JavaScript execution and content filtering.	View Code
Quickstart	A comprehensive collection of examples showcasing various features including basic crawling, content cleaning, link analysis, JavaScript execution, CSS selectors, media handling, custom hooks, proxy configuration, screenshots, and multiple extraction strategies.	View Code
Quickstart Set 1	Basic examples for getting started with Crawl4AI.	View Code
Quickstart Set 2	More advanced examples for working with Crawl4AI.	View Code

Browser & Crawling Features

Example	Description	Link
Built-in Browser	Demonstrates how to use the built-in browser capabilities.	View Code
Browser Optimization	Focuses on browser performance optimization techniques.	View Code
arun vs arun_many	Compares the `arun` and `arun_many` methods for single vs. multiple URL crawling.	View Code
Multiple URLs	Shows how to crawl multiple URLs asynchronously.	View Code
Page Interaction	Guide on interacting with dynamic elements through clicks.	View Guide
Crawler Monitor	Shows how to monitor the crawler's activities and status.	View Code
Full Page Screenshot & PDF	Guide on capturing full-page screenshots and PDFs from massive webpages.	View Guide

Advanced Crawling & Deep Crawling

Example	Description	Link
Deep Crawling	An extensive tutorial on deep crawling capabilities, demonstrating BFS and BestFirst strategies, stream vs. non-stream execution, filters, scorers, and advanced configurations.	View Code
<<<<<<< HEAD
Virtual Scroll	Comprehensive examples for handling virtualized scrolling on sites like Twitter, Instagram. Demonstrates different scrolling scenarios with local test server.	View Code

======= | Adaptive Crawling | Demonstrates intelligent crawling that automatically determines when sufficient information has been gathered. | View Code |

feature/progressive-crawling | Dispatcher | Shows how to use the crawl dispatcher for advanced workload management. | View Code | | Storage State | Tutorial on managing browser storage state for persistence. | View Guide | | Network Console Capture | Demonstrates how to capture and analyze network requests and console logs. | View Code |

Extraction Strategies

Example	Description	Link
Extraction Strategies	Demonstrates different extraction strategies with various input formats (markdown, HTML, fit_markdown) and JSON-based extractors (CSS and XPath).	View Code
Scraping Strategies	Compares the performance of different scraping strategies.	View Code
LLM Extraction	Demonstrates LLM-based extraction specifically for OpenAI pricing data.	View Code
LLM Markdown	Shows how to use LLMs to generate markdown from crawled content.	View Code
Summarize Page	Shows how to summarize web page content.	View Code

E-commerce & Specialized Crawling

Example	Description	Link
Amazon Product Extraction	Demonstrates how to extract structured product data from Amazon search results using CSS selectors.	View Code
Amazon with Hooks	Shows how to use hooks with Amazon product extraction.	View Code
Amazon with JavaScript	Demonstrates using custom JavaScript for Amazon product extraction.	View Code
Crypto Analysis	Demonstrates how to crawl and analyze cryptocurrency data.	View Code
SERP API	Demonstrates using Crawl4AI with search engine result pages.	View Code

Customization & Security

Example	Description	Link
Hooks	Illustrates how to use hooks at different stages of the crawling process for advanced customization.	View Code
Identity-Based Browsing	Illustrates identity-based browsing configurations for authentic browsing experiences.	View Code
Proxy Rotation	Shows how to use proxy rotation for web scraping and avoiding IP blocks.	View Code
SSL Certificate	Illustrates SSL certificate handling and verification.	View Code
Language Support	Shows how to handle different languages during crawling.	View Code
Geolocation	Demonstrates how to use geolocation features.	View Code

Docker & Deployment

Example	Description	Link
Docker Config	Demonstrates how to create and use Docker configuration objects.	View Code
Docker Basic	A test suite for Docker deployment, showcasing various functionalities through the Docker API.	View Code
Docker REST API	Shows how to interact with Crawl4AI Docker using REST API calls.	View Code
Docker SDK	Demonstrates using the Python SDK for Crawl4AI Docker.	View Code

Application Examples

Example	Description	Link
Research Assistant	Demonstrates how to build a research assistant using Crawl4AI.	View Code
REST Call	Shows how to make REST API calls with Crawl4AI.	View Code
Chainlit Integration	Shows how to integrate Crawl4AI with Chainlit.	View Guide
Crawl4AI vs FireCrawl	Compares Crawl4AI with the FireCrawl library.	View Code

Content Generation & Markdown

Example	Description	Link
Content Source	Demonstrates how to work with different content sources in markdown generation.	View Code
Content Source (Short)	A simplified version of content source usage.	View Code
Built-in Browser Guide	Guide for using the built-in browser capabilities.	View Guide

Running the Examples

To run any of these examples, you'll need to have Crawl4AI installed:

pip install crawl4ai

Then, you can run an example script like this:

python -m docs.examples.hello_world

For examples that require additional dependencies or environment variables, refer to the comments at the top of each file.

Some examples may require:

API keys (for LLM-based examples)
Docker setup (for Docker-related examples)
Additional dependencies (specified in the example files)

Contributing New Examples

If you've created an interesting example that demonstrates a unique use case or feature of Crawl4AI, we encourage you to contribute it to our examples collection. Please see our contribution guidelines for more information.

11 KiB Raw Blame History