Major changes: - Add browser takeover feature using CDP for authentic browsing - Implement Docker support with full API server documentation - Enhance Mockdown with tag preservation system - Improve parallel crawling performance This release focuses on authenticity and scalability, introducing the ability to use users' own browsers while providing containerized deployment options. Breaking changes include modified browser handling and API response structure. See CHANGELOG.md for detailed migration guide.
85 lines
3.5 KiB
YAML
85 lines
3.5 KiB
YAML
site_name: Crawl4AI Documentation
|
|
site_description: 🔥🕷️ Crawl4AI, Open-source LLM Friendly Web Crawler & Scrapper
|
|
site_url: https://docs.crawl4ai.com
|
|
repo_url: https://github.com/unclecode/crawl4ai
|
|
repo_name: unclecode/crawl4ai
|
|
docs_dir: docs/md_v2
|
|
|
|
nav:
|
|
- Home: 'index.md'
|
|
- 'Installation': 'basic/installation.md'
|
|
- 'Docker Deplotment': 'basic/docker-deploymeny.md'
|
|
- 'Quick Start': 'basic/quickstart.md'
|
|
|
|
- Basic:
|
|
- 'Simple Crawling': 'basic/simple-crawling.md'
|
|
- 'Output Formats': 'basic/output-formats.md'
|
|
- 'Browser Configuration': 'basic/browser-config.md'
|
|
- 'Page Interaction': 'basic/page-interaction.md'
|
|
- 'Content Selection': 'basic/content-selection.md'
|
|
|
|
- Advanced:
|
|
- 'Content Processing': 'advanced/content-processing.md'
|
|
- 'Magic Mode': 'advanced/magic-mode.md'
|
|
- 'Hooks & Auth': 'advanced/hooks-auth.md'
|
|
- 'Proxy & Security': 'advanced/proxy-security.md'
|
|
- 'Session Management': 'advanced/session-management.md'
|
|
- 'Session Management (Advanced)': 'advanced/session-management-advanced.md'
|
|
|
|
- Extraction:
|
|
- 'Overview': 'extraction/overview.md'
|
|
- 'LLM Strategy': 'extraction/llm.md'
|
|
- 'Json-CSS Extractor Basic': 'extraction/css.md'
|
|
- 'Json-CSS Extractor Advanced': 'extraction/css-advanced.md'
|
|
- 'Cosine Strategy': 'extraction/cosine.md'
|
|
- 'Chunking': 'extraction/chunking.md'
|
|
|
|
- API Reference:
|
|
- 'Parameters Table': 'api/parameters.md'
|
|
- 'AsyncWebCrawler': 'api/async-webcrawler.md'
|
|
- 'AsyncWebCrawler.arun()': 'api/arun.md'
|
|
- 'CrawlResult': 'api/crawl-result.md'
|
|
- 'Strategies': 'api/strategies.md'
|
|
|
|
- Tutorial:
|
|
- '1. Getting Started': 'tutorial/episode_01_Introduction_to_Crawl4AI_and_Basic_Installation.md'
|
|
- '2. Advanced Features': 'tutorial/episode_02_Overview_of_Advanced_Features.md'
|
|
- '3. Browser Setup': 'tutorial/episode_03_Browser_Configurations_&_Headless_Crawling.md'
|
|
- '4. Proxy Settings': 'tutorial/episode_04_Advanced_Proxy_and_Security_Settings.md'
|
|
- '5. Dynamic Content': 'tutorial/episode_05_JavaScript_Execution_and_Dynamic_Content_Handling.md'
|
|
- '6. Magic Mode': 'tutorial/episode_06_Magic_Mode_and_Anti-Bot_Protection.md'
|
|
- '7. Content Cleaning': 'tutorial/episode_07_Content_Cleaning_and_Fit_Markdown.md'
|
|
- '8. Media Handling': 'tutorial/episode_08_Media_Handling:_Images,_Videos,_and_Audio.md'
|
|
- '9. Link Analysis': 'tutorial/episode_09_Link_Analysis_and_Smart_Filtering.md'
|
|
- '10. User Simulation': 'tutorial/episode_10_Custom_Headers,_Identity,_and_User_Simulation.md'
|
|
- '11.1. JSON CSS': 'tutorial/episode_11_1_Extraction_Strategies:_JSON_CSS.md'
|
|
- '11.2. LLM Strategy': 'tutorial/episode_11_2_Extraction_Strategies:_LLM.md'
|
|
- '11.3. Cosine Strategy': 'tutorial/episode_11_3_Extraction_Strategies:_Cosine.md'
|
|
- '12. Session Crawling': 'tutorial/episode_12_Session-Based_Crawling_for_Dynamic_Websites.md'
|
|
- '13. Text Chunking': 'tutorial/episode_13_Chunking_Strategies_for_Large_Text_Processing.md'
|
|
- '14. Custom Workflows': 'tutorial/episode_14_Hooks_and_Custom_Workflow_with_AsyncWebCrawler.md'
|
|
|
|
|
|
theme:
|
|
name: terminal
|
|
palette: dark
|
|
|
|
markdown_extensions:
|
|
- pymdownx.highlight:
|
|
anchor_linenums: true
|
|
- pymdownx.inlinehilite
|
|
- pymdownx.snippets
|
|
- pymdownx.superfences
|
|
- admonition
|
|
- pymdownx.details
|
|
- attr_list
|
|
- tables
|
|
|
|
extra_css:
|
|
- assets/styles.css
|
|
- assets/highlight.css
|
|
- assets/dmvendor.css
|
|
|
|
extra_javascript:
|
|
- assets/highlight.min.js
|
|
- assets/highlight_init.js |