Implement a privacy-first, provider-agnostic telemetry system to help improve Crawl4AI stability through anonymous crash reporting. The system is designed with user privacy as the top priority, collecting only exception information without any PII, URLs, or crawled content. Architecture & Design: - Provider-agnostic architecture with base TelemetryProvider interface - Sentry as the initial provider implementation with easy extensibility - Separate handling for sync and async code paths - Environment-aware behavior (CLI, Docker, Jupyter/Colab) Key Features: - Opt-in by default for CLI/library usage with interactive consent prompt - Opt-out by default for Docker/API server (enabled unless CRAWL4AI_TELEMETRY=0) - Jupyter/Colab support with widget-based consent (fallback to code snippets) - Persistent consent storage in ~/.crawl4ai/config.json - Optional email collection for critical issue follow-up CLI Integration: - `crwl telemetry enable [--email <email>] [--once]` - Enable telemetry - `crwl telemetry disable` - Disable telemetry - `crwl telemetry status` - Check current status Python API: - Decorators: @telemetry_decorator, @async_telemetry_decorator - Context managers: telemetry_context(), async_telemetry_context() - Manual capture: capture_exception(exc, context) - Control: telemetry.enable(), telemetry.disable(), telemetry.status() Privacy Safeguards: - No URL collection - No request/response data - No authentication tokens or cookies - No crawled content - Automatic sanitization of sensitive fields - Local consent storage only Testing: - Comprehensive test suite with 15 test cases - Coverage for all environments and consent flows - Mock providers for testing without external dependencies Documentation: - Detailed documentation in docs/md_v2/core/telemetry.md - Added to mkdocs navigation under Core section - Privacy commitment and FAQ included - Examples for all usage patterns Installation: - Optional dependency: pip install crawl4ai[telemetry] - Graceful degradation if sentry-sdk not installed - Added to pyproject.toml optional dependencies - Docker requirements updated Integration Points: - AsyncWebCrawler: Automatic exception capture in arun() and aprocess_html() - Docker server: Automatic initialization with environment control - Global exception handler for uncaught exceptions (CLI only) This implementation provides valuable error insights to improve Crawl4AI while maintaining complete transparency and user control over data collection.
114 lines
4.0 KiB
YAML
114 lines
4.0 KiB
YAML
site_name: Crawl4AI Documentation (v0.7.x)
|
|
site_favicon: docs/md_v2/favicon.ico
|
|
site_description: 🚀🤖 Crawl4AI, Open-source LLM-Friendly Web Crawler & Scraper
|
|
site_url: https://docs.crawl4ai.com
|
|
repo_url: https://github.com/unclecode/crawl4ai
|
|
repo_name: unclecode/crawl4ai
|
|
docs_dir: docs/md_v2
|
|
|
|
nav:
|
|
- Home: 'index.md'
|
|
- "Ask AI": "core/ask-ai.md"
|
|
- "Quick Start": "core/quickstart.md"
|
|
- "Code Examples": "core/examples.md"
|
|
- Apps:
|
|
- "Demo Apps": "apps/index.md"
|
|
- "C4A-Script Editor": "apps/c4a-script/index.html"
|
|
- "LLM Context Builder": "apps/llmtxt/index.html"
|
|
- Setup & Installation:
|
|
- "Installation": "core/installation.md"
|
|
- "Docker Deployment": "core/docker-deployment.md"
|
|
- "Blog & Changelog":
|
|
- "Blog Home": "blog/index.md"
|
|
- "Changelog": "https://github.com/unclecode/crawl4ai/blob/main/CHANGELOG.md"
|
|
- Core:
|
|
- "Command Line Interface": "core/cli.md"
|
|
- "Simple Crawling": "core/simple-crawling.md"
|
|
- "Deep Crawling": "core/deep-crawling.md"
|
|
- "Adaptive Crawling": "core/adaptive-crawling.md"
|
|
- "URL Seeding": "core/url-seeding.md"
|
|
- "C4A-Script": "core/c4a-script.md"
|
|
- "Crawler Result": "core/crawler-result.md"
|
|
- "Browser, Crawler & LLM Config": "core/browser-crawler-config.md"
|
|
- "Markdown Generation": "core/markdown-generation.md"
|
|
- "Fit Markdown": "core/fit-markdown.md"
|
|
- "Page Interaction": "core/page-interaction.md"
|
|
- "Content Selection": "core/content-selection.md"
|
|
- "Cache Modes": "core/cache-modes.md"
|
|
- "Telemetry": "core/telemetry.md"
|
|
- "Local Files & Raw HTML": "core/local-files.md"
|
|
- "Link & Media": "core/link-media.md"
|
|
- Advanced:
|
|
- "Overview": "advanced/advanced-features.md"
|
|
- "Adaptive Strategies": "advanced/adaptive-strategies.md"
|
|
- "Virtual Scroll": "advanced/virtual-scroll.md"
|
|
- "File Downloading": "advanced/file-downloading.md"
|
|
- "Lazy Loading": "advanced/lazy-loading.md"
|
|
- "Hooks & Auth": "advanced/hooks-auth.md"
|
|
- "Proxy & Security": "advanced/proxy-security.md"
|
|
- "Undetected Browser": "advanced/undetected-browser.md"
|
|
- "Session Management": "advanced/session-management.md"
|
|
- "Multi-URL Crawling": "advanced/multi-url-crawling.md"
|
|
- "Crawl Dispatcher": "advanced/crawl-dispatcher.md"
|
|
- "Identity Based Crawling": "advanced/identity-based-crawling.md"
|
|
- "SSL Certificate": "advanced/ssl-certificate.md"
|
|
- "Network & Console Capture": "advanced/network-console-capture.md"
|
|
- "PDF Parsing": "advanced/pdf-parsing.md"
|
|
- Extraction:
|
|
- "LLM-Free Strategies": "extraction/no-llm-strategies.md"
|
|
- "LLM Strategies": "extraction/llm-strategies.md"
|
|
- "Clustering Strategies": "extraction/clustring-strategies.md"
|
|
- "Chunking": "extraction/chunking.md"
|
|
- API Reference:
|
|
- "AsyncWebCrawler": "api/async-webcrawler.md"
|
|
- "arun()": "api/arun.md"
|
|
- "arun_many()": "api/arun_many.md"
|
|
- "Browser, Crawler & LLM Config": "api/parameters.md"
|
|
- "CrawlResult": "api/crawl-result.md"
|
|
- "Strategies": "api/strategies.md"
|
|
- "C4A-Script Reference": "api/c4a-script-reference.md"
|
|
|
|
theme:
|
|
name: 'terminal'
|
|
palette: 'dark'
|
|
custom_dir: docs/md_v2/overrides
|
|
color_mode: 'dark'
|
|
icon:
|
|
repo: fontawesome/brands/github
|
|
|
|
plugins:
|
|
- search
|
|
|
|
markdown_extensions:
|
|
- pymdownx.highlight:
|
|
anchor_linenums: true
|
|
- pymdownx.inlinehilite
|
|
- pymdownx.snippets
|
|
- pymdownx.superfences
|
|
- admonition
|
|
- pymdownx.details
|
|
- attr_list
|
|
- tables
|
|
|
|
extra:
|
|
version: !ENV [CRAWL4AI_VERSION, 'development']
|
|
|
|
extra_css:
|
|
- assets/layout.css
|
|
- assets/styles.css
|
|
- assets/highlight.css
|
|
- assets/dmvendor.css
|
|
- assets/feedback-overrides.css
|
|
|
|
extra_javascript:
|
|
- https://www.googletagmanager.com/gtag/js?id=G-58W0K2ZQ25
|
|
- assets/gtag.js
|
|
- assets/highlight.min.js
|
|
- assets/highlight_init.js
|
|
- https://buttons.github.io/buttons.js
|
|
- assets/toc.js
|
|
- assets/github_stats.js
|
|
- assets/selection_ask_ai.js
|
|
- assets/copy_code.js
|
|
- assets/floating_ask_ai_button.js
|
|
- assets/mobile_menu.js |