From 949a93982eff2e5beb94f33ff8abe47131dbcf1d Mon Sep 17 00:00:00 2001 From: UncleCode Date: Wed, 23 Apr 2025 19:02:39 +0800 Subject: [PATCH] feat(docs): update documentation and disable Ask AI feature Major documentation updates including: - Add comprehensive code examples page - Add video tutorial to homepage - Update Docker deployment instructions for v0.6.0 - Temporarily disable Ask AI feature - Add table border styling - Update site version to v0.6.x BREAKING CHANGE: Ask AI feature temporarily disabled pending launch --- docs/md_v2/ask_ai/ask-ai.js | 8 +- docs/md_v2/ask_ai/index.html | 2 +- docs/md_v2/assets/styles.css | 3 + docs/md_v2/core/docker-deployment.md | 8 +- docs/md_v2/core/examples.md | 115 +++++++++++++++++++++++++++ docs/md_v2/index.md | 8 ++ mkdocs.yml | 3 +- 7 files changed, 139 insertions(+), 8 deletions(-) create mode 100644 docs/md_v2/core/examples.md diff --git a/docs/md_v2/ask_ai/ask-ai.js b/docs/md_v2/ask_ai/ask-ai.js index 2710923e..bb1b370c 100644 --- a/docs/md_v2/ask_ai/ask-ai.js +++ b/docs/md_v2/ask_ai/ask-ai.js @@ -361,8 +361,10 @@ A code snippet: \`crawler.run()\`. Check the [quickstart](/core/quickstart).`; chatMessages.innerHTML = ""; // Start with clean slate for query if (!isFromQuery) { // Show welcome only if manually started + // chatMessages.innerHTML = + // '
Started a new chat! Ask me anything about Crawl4AI.
'; chatMessages.innerHTML = - '
Started a new chat! Ask me anything about Crawl4AI.
'; + '
We will launch this feature very soon.
'; } addCitations([]); // Clear citations updateCitationsDisplay(); // Clear UI @@ -504,8 +506,10 @@ A code snippet: \`crawler.run()\`. Check the [quickstart](/core/quickstart).`; addMessageToChat(message, false); }); if (messages.length === 0) { + // chatMessages.innerHTML = + // '
Chat history loaded. Ask a question!
'; chatMessages.innerHTML = - '
Chat history loaded. Ask a question!
'; + '
We will launch this feature very soon.
'; } // Scroll to bottom after loading messages scrollToBottom(); diff --git a/docs/md_v2/ask_ai/index.html b/docs/md_v2/ask_ai/index.html index 5fe79b12..ccb7faa4 100644 --- a/docs/md_v2/ask_ai/index.html +++ b/docs/md_v2/ask_ai/index.html @@ -36,7 +36,7 @@
- +
diff --git a/docs/md_v2/assets/styles.css b/docs/md_v2/assets/styles.css index 92e01f85..46b90ab0 100644 --- a/docs/md_v2/assets/styles.css +++ b/docs/md_v2/assets/styles.css @@ -268,3 +268,6 @@ div.badges a > img { } +table td, table th { + border: 1px solid var(--code-bg-color) !important; +} \ No newline at end of file diff --git a/docs/md_v2/core/docker-deployment.md b/docs/md_v2/core/docker-deployment.md index ddebeaeb..2a2f75eb 100644 --- a/docs/md_v2/core/docker-deployment.md +++ b/docs/md_v2/core/docker-deployment.md @@ -62,7 +62,7 @@ Our latest release candidate is `0.6.0rc1-r2`. Images are built with multi-arch ```bash # Pull the release candidate (recommended for latest features) -docker pull unclecode/crawl4ai:0.6.0rc1-r2 +docker pull unclecode/crawl4ai:0.6.0-r1 # Or pull the latest stable version docker pull unclecode/crawl4ai:latest @@ -99,7 +99,7 @@ EOL -p 11235:11235 \ --name crawl4ai \ --shm-size=1g \ - unclecode/crawl4ai:0.6.0rc1-r2 + unclecode/crawl4ai:latest ``` * **With LLM support:** @@ -110,7 +110,7 @@ EOL --name crawl4ai \ --env-file .llm.env \ --shm-size=1g \ - unclecode/crawl4ai:0.6.0rc1-r2 + unclecode/crawl4ai:latest ``` > The server will be available at `http://localhost:11235`. Visit `/playground` to access the interactive testing interface. @@ -160,7 +160,7 @@ The `docker-compose.yml` file in the project root provides a simplified approach ```bash # Pulls and runs the release candidate from Docker Hub # Automatically selects the correct architecture - IMAGE=unclecode/crawl4ai:0.6.0rc1-r2 docker compose up -d + IMAGE=unclecode/crawl4ai:latest docker compose up -d ``` * **Build and Run Locally:** diff --git a/docs/md_v2/core/examples.md b/docs/md_v2/core/examples.md new file mode 100644 index 00000000..93989552 --- /dev/null +++ b/docs/md_v2/core/examples.md @@ -0,0 +1,115 @@ +# Code Examples + +This page provides a comprehensive list of example scripts that demonstrate various features and capabilities of Crawl4AI. Each example is designed to showcase specific functionality, making it easier for you to understand how to implement these features in your own projects. + +## Getting Started Examples + +| Example | Description | Link | +|---------|-------------|------| +| Hello World | A simple introductory example demonstrating basic usage of AsyncWebCrawler with JavaScript execution and content filtering. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/hello_world.py) | +| Quickstart | A comprehensive collection of examples showcasing various features including basic crawling, content cleaning, link analysis, JavaScript execution, CSS selectors, media handling, custom hooks, proxy configuration, screenshots, and multiple extraction strategies. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/quickstart.py) | +| Quickstart Set 1 | Basic examples for getting started with Crawl4AI. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/quickstart_examples_set_1.py) | +| Quickstart Set 2 | More advanced examples for working with Crawl4AI. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/quickstart_examples_set_2.py) | + +## Browser & Crawling Features + +| Example | Description | Link | +|---------|-------------|------| +| Built-in Browser | Demonstrates how to use the built-in browser capabilities. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/builtin_browser_example.py) | +| Browser Optimization | Focuses on browser performance optimization techniques. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/browser_optimization_example.py) | +| arun vs arun_many | Compares the `arun` and `arun_many` methods for single vs. multiple URL crawling. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/arun_vs_arun_many.py) | +| Multiple URLs | Shows how to crawl multiple URLs asynchronously. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/async_webcrawler_multiple_urls_example.py) | +| Page Interaction | Guide on interacting with dynamic elements through clicks. | [View Guide](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/tutorial_dynamic_clicks.md) | +| Crawler Monitor | Shows how to monitor the crawler's activities and status. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/crawler_monitor_example.py) | +| Full Page Screenshot & PDF | Guide on capturing full-page screenshots and PDFs from massive webpages. | [View Guide](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/full_page_screenshot_and_pdf_export.md) | + +## Advanced Crawling & Deep Crawling + +| Example | Description | Link | +|---------|-------------|------| +| Deep Crawling | An extensive tutorial on deep crawling capabilities, demonstrating BFS and BestFirst strategies, stream vs. non-stream execution, filters, scorers, and advanced configurations. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/deepcrawl_example.py) | +| Dispatcher | Shows how to use the crawl dispatcher for advanced workload management. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/dispatcher_example.py) | +| Storage State | Tutorial on managing browser storage state for persistence. | [View Guide](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/storage_state_tutorial.md) | +| Network Console Capture | Demonstrates how to capture and analyze network requests and console logs. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/network_console_capture_example.py) | + +## Extraction Strategies + +| Example | Description | Link | +|---------|-------------|------| +| Extraction Strategies | Demonstrates different extraction strategies with various input formats (markdown, HTML, fit_markdown) and JSON-based extractors (CSS and XPath). | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/extraction_strategies_examples.py) | +| Scraping Strategies | Compares the performance of different scraping strategies. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/scraping_strategies_performance.py) | +| LLM Extraction | Demonstrates LLM-based extraction specifically for OpenAI pricing data. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/llm_extraction_openai_pricing.py) | +| LLM Markdown | Shows how to use LLMs to generate markdown from crawled content. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/llm_markdown_generator.py) | +| Summarize Page | Shows how to summarize web page content. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/summarize_page.py) | + +## E-commerce & Specialized Crawling + +| Example | Description | Link | +|---------|-------------|------| +| Amazon Product Extraction | Demonstrates how to extract structured product data from Amazon search results using CSS selectors. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/amazon_product_extraction_direct_url.py) | +| Amazon with Hooks | Shows how to use hooks with Amazon product extraction. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/amazon_product_extraction_using_hooks.py) | +| Amazon with JavaScript | Demonstrates using custom JavaScript for Amazon product extraction. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/amazon_product_extraction_using_use_javascript.py) | +| Crypto Analysis | Demonstrates how to crawl and analyze cryptocurrency data. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/crypto_analysis_example.py) | +| SERP API | Demonstrates using Crawl4AI with search engine result pages. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/serp_api_project_11_feb.py) | + +## Customization & Security + +| Example | Description | Link | +|---------|-------------|------| +| Hooks | Illustrates how to use hooks at different stages of the crawling process for advanced customization. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/hooks_example.py) | +| Identity-Based Browsing | Illustrates identity-based browsing configurations for authentic browsing experiences. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/identity_based_browsing.py) | +| Proxy Rotation | Shows how to use proxy rotation for web scraping and avoiding IP blocks. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/proxy_rotation_demo.py) | +| SSL Certificate | Illustrates SSL certificate handling and verification. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/ssl_example.py) | +| Language Support | Shows how to handle different languages during crawling. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/language_support_example.py) | +| Geolocation | Demonstrates how to use geolocation features. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/use_geo_location.py) | + +## Docker & Deployment + +| Example | Description | Link | +|---------|-------------|------| +| Docker Config | Demonstrates how to create and use Docker configuration objects. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/docker_config_obj.py) | +| Docker Basic | A test suite for Docker deployment, showcasing various functionalities through the Docker API. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/docker_example.py) | +| Docker REST API | Shows how to interact with Crawl4AI Docker using REST API calls. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/docker_python_rest_api.py) | +| Docker SDK | Demonstrates using the Python SDK for Crawl4AI Docker. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/docker_python_sdk.py) | + +## Application Examples + +| Example | Description | Link | +|---------|-------------|------| +| Research Assistant | Demonstrates how to build a research assistant using Crawl4AI. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/research_assistant.py) | +| REST Call | Shows how to make REST API calls with Crawl4AI. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/rest_call.py) | +| Chainlit Integration | Shows how to integrate Crawl4AI with Chainlit. | [View Guide](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/chainlit.md) | +| Crawl4AI vs FireCrawl | Compares Crawl4AI with the FireCrawl library. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/crawlai_vs_firecrawl.py) | + +## Content Generation & Markdown + +| Example | Description | Link | +|---------|-------------|------| +| Content Source | Demonstrates how to work with different content sources in markdown generation. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/markdown/content_source_example.py) | +| Content Source (Short) | A simplified version of content source usage. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/markdown/content_source_short_example.py) | +| Built-in Browser Guide | Guide for using the built-in browser capabilities. | [View Guide](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/README_BUILTIN_BROWSER.md) | + +## Running the Examples + +To run any of these examples, you'll need to have Crawl4AI installed: + +```bash +pip install crawl4ai +``` + +Then, you can run an example script like this: + +```bash +python -m docs.examples.hello_world +``` + +For examples that require additional dependencies or environment variables, refer to the comments at the top of each file. + +Some examples may require: +- API keys (for LLM-based examples) +- Docker setup (for Docker-related examples) +- Additional dependencies (specified in the example files) + +## Contributing New Examples + +If you've created an interesting example that demonstrates a unique use case or feature of Crawl4AI, we encourage you to contribute it to our examples collection. Please see our [contribution guidelines](https://github.com/unclecode/crawl4ai/blob/main/CONTRIBUTORS.md) for more information. \ No newline at end of file diff --git a/docs/md_v2/index.md b/docs/md_v2/index.md index 7a230d5d..4e54da7d 100644 --- a/docs/md_v2/index.md +++ b/docs/md_v2/index.md @@ -72,6 +72,14 @@ asyncio.run(main()) --- +## Video Tutorial + +
+ +
+ +--- + ## What Does Crawl4AI Do? Crawl4AI is a feature-rich crawler and scraper that aims to: diff --git a/mkdocs.yml b/mkdocs.yml index 39e03a88..b7d44220 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,4 +1,4 @@ -site_name: Crawl4AI Documentation (v0.5.x) +site_name: Crawl4AI Documentation (v0.6.x) site_description: 🚀🤖 Crawl4AI, Open-source LLM-Friendly Web Crawler & Scraper site_url: https://docs.crawl4ai.com repo_url: https://github.com/unclecode/crawl4ai @@ -9,6 +9,7 @@ nav: - Home: 'index.md' - "Ask AI": "core/ask-ai.md" - "Quick Start": "core/quickstart.md" + - "Code Examples": "core/examples.md" - Setup & Installation: - "Installation": "core/installation.md" - "Docker Deployment": "core/docker-deployment.md"