Files
crawl4ai/docs/llm.txt/4_browser_context_page.q.md
UncleCode d5ed451299 Enhance crawler capabilities and documentation
- Add llm.txt generator
  - Added SSL certificate extraction in AsyncWebCrawler.
  - Introduced new content filters and chunking strategies for more robust data extraction.
  - Updated documentation.
2024-12-25 21:34:31 +08:00

1.8 KiB

browser_creation: Create standard browser instance with default configurations | browser initialization, basic setup, minimal config | AsyncWebCrawler(browser_config=BrowserConfig(browser_type="chromium", headless=True)) persistent_context: Use persistent browser contexts to maintain session data and cookies | user_data_dir, session storage, login state | BrowserConfig(user_data_dir="/path/to/user/data") managed_browser: High-level browser management with resource optimization and debugging | browser process, stealth mode, debugging tools | BrowserConfig(headless=False, debug_port=9222) context_config: Configure browser context with custom headers and cookies | headers customization, session reuse | CrawlerRunConfig(headers={"User-Agent": "Crawl4AI/1.0"}) page_creation: Create and customize browser pages with viewport settings | viewport size, iframe handling, lazy loading | CrawlerRunConfig(viewport_width=1920, viewport_height=1080) identity_preservation: Maintain authentic digital identity using Managed Browsers | user profiles, CAPTCHA bypass, persistent login | BrowserConfig(use_managed_browser=True, user_data_dir="/path/to/profile") magic_mode: Enable automated user-like behavior and detection bypass | automation masking, cookie handling | crawler.arun(url="example.com", magic=True) session_management: Maintain state across multiple requests using session IDs | session reuse, sequential crawling | CrawlerRunConfig(session_id="my_session") dynamic_content: Handle JavaScript-rendered content with custom execution hooks | content loading, pagination | js_code="document.querySelector('a.pagination-next').click()" best_practices: Follow recommended patterns for efficient crawling | resource management, error handling | crawler.crawler_strategy.kill_session(session_id)