## CLI Workflows and Profile Management Visual representations of command-line interface operations, browser profile management, and identity-based crawling workflows. ### CLI Command Flow Architecture ```mermaid flowchart TD A[crwl command] --> B{Command Type?} B -->|URL Crawling| C[Parse URL & Options] B -->|Profile Management| D[profiles subcommand] B -->|CDP Browser| E[cdp subcommand] B -->|Browser Control| F[browser subcommand] B -->|Configuration| G[config subcommand] C --> C1{Output Format?} C1 -->|Default| C2[HTML/Markdown] C1 -->|JSON| C3[Structured Data] C1 -->|markdown| C4[Clean Markdown] C1 -->|markdown-fit| C5[Filtered Content] C --> C6{Authentication?} C6 -->|Profile Specified| C7[Load Browser Profile] C6 -->|No Profile| C8[Anonymous Session] C7 --> C9[Launch with User Data] C8 --> C10[Launch Clean Browser] C9 --> C11[Execute Crawl] C10 --> C11 C11 --> C12{Success?} C12 -->|Yes| C13[Return Results] C12 -->|No| C14[Error Handling] D --> D1[Interactive Profile Menu] D1 --> D2{Menu Choice?} D2 -->|Create| D3[Open Browser for Setup] D2 -->|List| D4[Show Existing Profiles] D2 -->|Delete| D5[Remove Profile] D2 -->|Use| D6[Crawl with Profile] E --> E1[Launch CDP Browser] E1 --> E2[Remote Debugging Active] F --> F1{Browser Action?} F1 -->|start| F2[Start Builtin Browser] F1 -->|stop| F3[Stop Builtin Browser] F1 -->|status| F4[Check Browser Status] F1 -->|view| F5[Open Browser Window] G --> G1{Config Action?} G1 -->|list| G2[Show All Settings] G1 -->|set| G3[Update Setting] G1 -->|get| G4[Read Setting] style A fill:#e1f5fe style C13 fill:#c8e6c9 style C14 fill:#ffcdd2 style D3 fill:#fff3e0 style E2 fill:#f3e5f5 ``` ### Profile Management Workflow ```mermaid sequenceDiagram participant User participant CLI participant ProfileManager participant Browser participant FileSystem User->>CLI: crwl profiles CLI->>ProfileManager: Initialize profile manager ProfileManager->>FileSystem: Scan for existing profiles FileSystem-->>ProfileManager: Profile list ProfileManager-->>CLI: Show interactive menu CLI-->>User: Display options Note over User: User selects "Create new profile" User->>CLI: Create profile "linkedin-auth" CLI->>ProfileManager: create_profile("linkedin-auth") ProfileManager->>FileSystem: Create profile directory ProfileManager->>Browser: Launch with new user data dir Browser-->>User: Opens browser window Note over User: User manually logs in to LinkedIn User->>Browser: Navigate and authenticate Browser->>FileSystem: Save cookies, session data User->>CLI: Press 'q' to save profile CLI->>ProfileManager: finalize_profile() ProfileManager->>FileSystem: Lock profile settings ProfileManager-->>CLI: Profile saved CLI-->>User: Profile "linkedin-auth" created Note over User: Later usage User->>CLI: crwl https://linkedin.com/feed -p linkedin-auth CLI->>ProfileManager: load_profile("linkedin-auth") ProfileManager->>FileSystem: Read profile data FileSystem-->>ProfileManager: User data directory ProfileManager-->>CLI: Profile configuration CLI->>Browser: Launch with existing profile Browser-->>CLI: Authenticated session ready CLI->>Browser: Navigate to target URL Browser-->>CLI: Crawl results with auth context CLI-->>User: Authenticated content ``` ### Browser Management State Machine ```mermaid stateDiagram-v2 [*] --> Stopped: Initial state Stopped --> Starting: crwl browser start Starting --> Running: Browser launched Running --> Viewing: crwl browser view Viewing --> Running: Close window Running --> Stopping: crwl browser stop Stopping --> Stopped: Cleanup complete Running --> Restarting: crwl browser restart Restarting --> Running: New browser instance Stopped --> CDP_Mode: crwl cdp CDP_Mode --> CDP_Running: Remote debugging active CDP_Running --> CDP_Mode: Manual close CDP_Mode --> Stopped: Exit CDP Running --> StatusCheck: crwl browser status StatusCheck --> Running: Return status note right of Running : Port 9222 active\nBuiltin browser available note right of CDP_Running : Remote debugging\nManual control enabled note right of Viewing : Visual browser window\nDirect interaction ``` ### Authentication Workflow for Protected Sites ```mermaid flowchart TD A[Protected Site Access Needed] --> B[Create Profile Strategy] B --> C{Existing Profile?} C -->|Yes| D[Test Profile Validity] C -->|No| E[Create New Profile] D --> D1{Profile Valid?} D1 -->|Yes| F[Use Existing Profile] D1 -->|No| E E --> E1[crwl profiles] E1 --> E2[Select Create New Profile] E2 --> E3[Enter Profile Name] E3 --> E4[Browser Opens for Auth] E4 --> E5{Authentication Method?} E5 -->|Login Form| E6[Fill Username/Password] E5 -->|OAuth| E7[OAuth Flow] E5 -->|2FA| E8[Handle 2FA] E5 -->|Session Cookie| E9[Import Cookies] E6 --> E10[Manual Login Process] E7 --> E10 E8 --> E10 E9 --> E10 E10 --> E11[Verify Authentication] E11 --> E12{Auth Successful?} E12 -->|Yes| E13[Save Profile - Press q] E12 -->|No| E10 E13 --> F F --> G[Execute Authenticated Crawl] G --> H[crwl URL -p profile-name] H --> I[Load Profile Data] I --> J[Launch Browser with Auth] J --> K[Navigate to Protected Content] K --> L[Extract Authenticated Data] L --> M[Return Results] style E4 fill:#fff3e0 style E10 fill:#e3f2fd style F fill:#e8f5e8 style M fill:#c8e6c9 ``` ### CDP Browser Architecture ```mermaid graph TB subgraph "CLI Layer" A[crwl cdp command] --> B[CDP Manager] B --> C[Port Configuration] B --> D[Profile Selection] end subgraph "Browser Process" E[Chromium/Firefox] --> F[Remote Debugging] F --> G[WebSocket Endpoint] G --> H[ws://localhost:9222] end subgraph "Client Connections" I[Manual Browser Control] --> H J[DevTools Interface] --> H K[External Automation] --> H L[Crawl4AI Crawler] --> H end subgraph "Profile Data" M[User Data Directory] --> E N[Cookies & Sessions] --> M O[Extensions] --> M P[Browser State] --> M end A --> E C --> H D --> M style H fill:#e3f2fd style E fill:#f3e5f5 style M fill:#e8f5e8 ``` ### Configuration Management Hierarchy ```mermaid graph TD subgraph "Global Configuration" A[~/.crawl4ai/config.yml] --> B[Default Settings] B --> C[LLM Providers] B --> D[Browser Defaults] B --> E[Output Preferences] end subgraph "Profile Configuration" F[Profile Directory] --> G[Browser State] F --> H[Authentication Data] F --> I[Site-Specific Settings] end subgraph "Command-Line Overrides" J[-b browser_config] --> K[Runtime Browser Settings] L[-c crawler_config] --> M[Runtime Crawler Settings] N[-o output_format] --> O[Runtime Output Format] end subgraph "Configuration Files" P[browser.yml] --> Q[Browser Config Template] R[crawler.yml] --> S[Crawler Config Template] T[extract.yml] --> U[Extraction Config] end subgraph "Resolution Order" V[Command Line Args] --> W[Config Files] W --> X[Profile Settings] X --> Y[Global Defaults] end J --> V L --> V N --> V P --> W R --> W T --> W F --> X A --> Y style V fill:#ffcdd2 style W fill:#fff3e0 style X fill:#e3f2fd style Y fill:#e8f5e8 ``` ### Identity-Based Crawling Decision Tree ```mermaid flowchart TD A[Target Website Assessment] --> B{Authentication Required?} B -->|No| C[Standard Anonymous Crawl] B -->|Yes| D{Authentication Type?} D -->|Login Form| E[Create Login Profile] D -->|OAuth/SSO| F[Create OAuth Profile] D -->|API Key/Token| G[Use Headers/Config] D -->|Session Cookies| H[Import Cookie Profile] E --> E1[crwl profiles → Manual login] F --> F1[crwl profiles → OAuth flow] G --> G1[Configure headers in crawler config] H --> H1[Import cookies to profile] E1 --> I[Test Authentication] F1 --> I G1 --> I H1 --> I I --> J{Auth Test Success?} J -->|Yes| K[Production Crawl Setup] J -->|No| L[Debug Authentication] L --> L1{Common Issues?} L1 -->|Rate Limiting| L2[Add delays/user simulation] L1 -->|Bot Detection| L3[Enable stealth mode] L1 -->|Session Expired| L4[Refresh authentication] L1 -->|CAPTCHA| L5[Manual intervention needed] L2 --> M[Retry with Adjustments] L3 --> M L4 --> E1 L5 --> N[Semi-automated approach] M --> I N --> O[Manual auth + automated crawl] K --> P[Automated Authenticated Crawling] O --> P C --> P P --> Q[Monitor & Maintain Profiles] style I fill:#fff3e0 style K fill:#e8f5e8 style P fill:#c8e6c9 style L fill:#ffcdd2 style N fill:#f3e5f5 ``` ### CLI Usage Patterns and Best Practices ```mermaid timeline title CLI Workflow Evolution section Setup Phase Installation : pip install crawl4ai : crawl4ai-setup Basic Test : crwl https://example.com Config Setup : crwl config set defaults section Profile Creation Site Analysis : Identify auth requirements Profile Creation : crwl profiles Manual Login : Authenticate in browser Profile Save : Press 'q' to save section Development Phase Test Crawls : crwl URL -p profile -v Config Tuning : Adjust browser/crawler settings Output Testing : Try different output formats Error Handling : Debug authentication issues section Production Phase Automated Crawls : crwl URL -p profile -o json Batch Processing : Multiple URLs with same profile Monitoring : Check profile validity Maintenance : Update profiles as needed ``` ### Multi-Profile Management Strategy ```mermaid graph LR subgraph "Profile Categories" A[Social Media Profiles] B[Work/Enterprise Profiles] C[E-commerce Profiles] D[Research Profiles] end subgraph "Social Media" A --> A1[linkedin-personal] A --> A2[twitter-monitor] A --> A3[facebook-research] A --> A4[instagram-brand] end subgraph "Enterprise" B --> B1[company-intranet] B --> B2[github-enterprise] B --> B3[confluence-docs] B --> B4[jira-tickets] end subgraph "E-commerce" C --> C1[amazon-seller] C --> C2[shopify-admin] C --> C3[ebay-monitor] C --> C4[marketplace-competitor] end subgraph "Research" D --> D1[academic-journals] D --> D2[data-platforms] D --> D3[survey-tools] D --> D4[government-portals] end subgraph "Usage Patterns" E[Daily Monitoring] --> A2 E --> B1 F[Weekly Reports] --> C3 F --> D2 G[On-Demand Research] --> D1 G --> D4 H[Competitive Analysis] --> C4 H --> A4 end style A1 fill:#e3f2fd style B1 fill:#f3e5f5 style C1 fill:#e8f5e8 style D1 fill:#fff3e0 ``` **📖 Learn more:** [CLI Reference](https://docs.crawl4ai.com/core/cli/), [Identity-Based Crawling](https://docs.crawl4ai.com/advanced/identity-based-crawling/), [Profile Management](https://docs.crawl4ai.com/advanced/session-management/), [Authentication Strategies](https://docs.crawl4ai.com/advanced/hooks-auth/)