feat: add Script Builder to Chrome Extension and reorganize LLM context files

This commit introduces significant enhancements to the Crawl4AI ecosystem:

  Chrome Extension - Script Builder (Alpha):
  - Add recording functionality to capture user interactions (clicks, typing, scrolling)
  - Implement smart event grouping for cleaner script generation
  - Support export to both JavaScript and C4A script formats
  - Add timeline view for visualizing and editing recorded actions
  - Include wait commands (time-based and element-based)
  - Add saved flows functionality for reusing automation scripts
  - Update UI with consistent dark terminal theme (Dank Mono font, green/pink accents)
  - Release new extension versions: v1.1.0, v1.2.0, v1.2.1

  LLM Context Builder Improvements:
  - Reorganize context files from llmtxt/ to llm.txt/ with better structure
  - Separate diagram templates from text content (diagrams/ and txt/ subdirectories)
  - Add comprehensive context files for all major Crawl4AI components
  - Improve file naming convention for better discoverability

  Documentation Updates:
  - Update apps index page to match main documentation theme
  - Standardize color scheme: "Available" tags use primary color (#50ffff)
  - Change "Coming Soon" tags to dark gray for better visual hierarchy
  - Add interactive two-column layout for extension landing page
  - Include code examples for both Schema Builder and Script Builder features

  Technical Improvements:
  - Enhance event capture mechanism with better element selection
  - Add support for contenteditable elements and complex form interactions
  - Implement proper scroll event handling for both window and element scrolling
  - Add meta key support for keyboard shortcuts
  - Improve selector generation for more reliable element targeting

  The Script Builder is released as Alpha, acknowledging potential bugs while providing
  early access to this powerful automation recording feature.
This commit is contained in:
UncleCode
2025-06-08 22:02:12 +08:00
parent 926592649e
commit 40640badad
72 changed files with 28600 additions and 100986 deletions

View File

@@ -0,0 +1,425 @@
## CLI Workflows and Profile Management
Visual representations of command-line interface operations, browser profile management, and identity-based crawling workflows.
### CLI Command Flow Architecture
```mermaid
flowchart TD
A[crwl command] --> B{Command Type?}
B -->|URL Crawling| C[Parse URL & Options]
B -->|Profile Management| D[profiles subcommand]
B -->|CDP Browser| E[cdp subcommand]
B -->|Browser Control| F[browser subcommand]
B -->|Configuration| G[config subcommand]
C --> C1{Output Format?}
C1 -->|Default| C2[HTML/Markdown]
C1 -->|JSON| C3[Structured Data]
C1 -->|markdown| C4[Clean Markdown]
C1 -->|markdown-fit| C5[Filtered Content]
C --> C6{Authentication?}
C6 -->|Profile Specified| C7[Load Browser Profile]
C6 -->|No Profile| C8[Anonymous Session]
C7 --> C9[Launch with User Data]
C8 --> C10[Launch Clean Browser]
C9 --> C11[Execute Crawl]
C10 --> C11
C11 --> C12{Success?}
C12 -->|Yes| C13[Return Results]
C12 -->|No| C14[Error Handling]
D --> D1[Interactive Profile Menu]
D1 --> D2{Menu Choice?}
D2 -->|Create| D3[Open Browser for Setup]
D2 -->|List| D4[Show Existing Profiles]
D2 -->|Delete| D5[Remove Profile]
D2 -->|Use| D6[Crawl with Profile]
E --> E1[Launch CDP Browser]
E1 --> E2[Remote Debugging Active]
F --> F1{Browser Action?}
F1 -->|start| F2[Start Builtin Browser]
F1 -->|stop| F3[Stop Builtin Browser]
F1 -->|status| F4[Check Browser Status]
F1 -->|view| F5[Open Browser Window]
G --> G1{Config Action?}
G1 -->|list| G2[Show All Settings]
G1 -->|set| G3[Update Setting]
G1 -->|get| G4[Read Setting]
style A fill:#e1f5fe
style C13 fill:#c8e6c9
style C14 fill:#ffcdd2
style D3 fill:#fff3e0
style E2 fill:#f3e5f5
```
### Profile Management Workflow
```mermaid
sequenceDiagram
participant User
participant CLI
participant ProfileManager
participant Browser
participant FileSystem
User->>CLI: crwl profiles
CLI->>ProfileManager: Initialize profile manager
ProfileManager->>FileSystem: Scan for existing profiles
FileSystem-->>ProfileManager: Profile list
ProfileManager-->>CLI: Show interactive menu
CLI-->>User: Display options
Note over User: User selects "Create new profile"
User->>CLI: Create profile "linkedin-auth"
CLI->>ProfileManager: create_profile("linkedin-auth")
ProfileManager->>FileSystem: Create profile directory
ProfileManager->>Browser: Launch with new user data dir
Browser-->>User: Opens browser window
Note over User: User manually logs in to LinkedIn
User->>Browser: Navigate and authenticate
Browser->>FileSystem: Save cookies, session data
User->>CLI: Press 'q' to save profile
CLI->>ProfileManager: finalize_profile()
ProfileManager->>FileSystem: Lock profile settings
ProfileManager-->>CLI: Profile saved
CLI-->>User: Profile "linkedin-auth" created
Note over User: Later usage
User->>CLI: crwl https://linkedin.com/feed -p linkedin-auth
CLI->>ProfileManager: load_profile("linkedin-auth")
ProfileManager->>FileSystem: Read profile data
FileSystem-->>ProfileManager: User data directory
ProfileManager-->>CLI: Profile configuration
CLI->>Browser: Launch with existing profile
Browser-->>CLI: Authenticated session ready
CLI->>Browser: Navigate to target URL
Browser-->>CLI: Crawl results with auth context
CLI-->>User: Authenticated content
```
### Browser Management State Machine
```mermaid
stateDiagram-v2
[*] --> Stopped: Initial state
Stopped --> Starting: crwl browser start
Starting --> Running: Browser launched
Running --> Viewing: crwl browser view
Viewing --> Running: Close window
Running --> Stopping: crwl browser stop
Stopping --> Stopped: Cleanup complete
Running --> Restarting: crwl browser restart
Restarting --> Running: New browser instance
Stopped --> CDP_Mode: crwl cdp
CDP_Mode --> CDP_Running: Remote debugging active
CDP_Running --> CDP_Mode: Manual close
CDP_Mode --> Stopped: Exit CDP
Running --> StatusCheck: crwl browser status
StatusCheck --> Running: Return status
note right of Running : Port 9222 active\nBuiltin browser available
note right of CDP_Running : Remote debugging\nManual control enabled
note right of Viewing : Visual browser window\nDirect interaction
```
### Authentication Workflow for Protected Sites
```mermaid
flowchart TD
A[Protected Site Access Needed] --> B[Create Profile Strategy]
B --> C{Existing Profile?}
C -->|Yes| D[Test Profile Validity]
C -->|No| E[Create New Profile]
D --> D1{Profile Valid?}
D1 -->|Yes| F[Use Existing Profile]
D1 -->|No| E
E --> E1[crwl profiles]
E1 --> E2[Select Create New Profile]
E2 --> E3[Enter Profile Name]
E3 --> E4[Browser Opens for Auth]
E4 --> E5{Authentication Method?}
E5 -->|Login Form| E6[Fill Username/Password]
E5 -->|OAuth| E7[OAuth Flow]
E5 -->|2FA| E8[Handle 2FA]
E5 -->|Session Cookie| E9[Import Cookies]
E6 --> E10[Manual Login Process]
E7 --> E10
E8 --> E10
E9 --> E10
E10 --> E11[Verify Authentication]
E11 --> E12{Auth Successful?}
E12 -->|Yes| E13[Save Profile - Press q]
E12 -->|No| E10
E13 --> F
F --> G[Execute Authenticated Crawl]
G --> H[crwl URL -p profile-name]
H --> I[Load Profile Data]
I --> J[Launch Browser with Auth]
J --> K[Navigate to Protected Content]
K --> L[Extract Authenticated Data]
L --> M[Return Results]
style E4 fill:#fff3e0
style E10 fill:#e3f2fd
style F fill:#e8f5e8
style M fill:#c8e6c9
```
### CDP Browser Architecture
```mermaid
graph TB
subgraph "CLI Layer"
A[crwl cdp command] --> B[CDP Manager]
B --> C[Port Configuration]
B --> D[Profile Selection]
end
subgraph "Browser Process"
E[Chromium/Firefox] --> F[Remote Debugging]
F --> G[WebSocket Endpoint]
G --> H[ws://localhost:9222]
end
subgraph "Client Connections"
I[Manual Browser Control] --> H
J[DevTools Interface] --> H
K[External Automation] --> H
L[Crawl4AI Crawler] --> H
end
subgraph "Profile Data"
M[User Data Directory] --> E
N[Cookies & Sessions] --> M
O[Extensions] --> M
P[Browser State] --> M
end
A --> E
C --> H
D --> M
style H fill:#e3f2fd
style E fill:#f3e5f5
style M fill:#e8f5e8
```
### Configuration Management Hierarchy
```mermaid
graph TD
subgraph "Global Configuration"
A[~/.crawl4ai/config.yml] --> B[Default Settings]
B --> C[LLM Providers]
B --> D[Browser Defaults]
B --> E[Output Preferences]
end
subgraph "Profile Configuration"
F[Profile Directory] --> G[Browser State]
F --> H[Authentication Data]
F --> I[Site-Specific Settings]
end
subgraph "Command-Line Overrides"
J[-b browser_config] --> K[Runtime Browser Settings]
L[-c crawler_config] --> M[Runtime Crawler Settings]
N[-o output_format] --> O[Runtime Output Format]
end
subgraph "Configuration Files"
P[browser.yml] --> Q[Browser Config Template]
R[crawler.yml] --> S[Crawler Config Template]
T[extract.yml] --> U[Extraction Config]
end
subgraph "Resolution Order"
V[Command Line Args] --> W[Config Files]
W --> X[Profile Settings]
X --> Y[Global Defaults]
end
J --> V
L --> V
N --> V
P --> W
R --> W
T --> W
F --> X
A --> Y
style V fill:#ffcdd2
style W fill:#fff3e0
style X fill:#e3f2fd
style Y fill:#e8f5e8
```
### Identity-Based Crawling Decision Tree
```mermaid
flowchart TD
A[Target Website Assessment] --> B{Authentication Required?}
B -->|No| C[Standard Anonymous Crawl]
B -->|Yes| D{Authentication Type?}
D -->|Login Form| E[Create Login Profile]
D -->|OAuth/SSO| F[Create OAuth Profile]
D -->|API Key/Token| G[Use Headers/Config]
D -->|Session Cookies| H[Import Cookie Profile]
E --> E1[crwl profiles → Manual login]
F --> F1[crwl profiles → OAuth flow]
G --> G1[Configure headers in crawler config]
H --> H1[Import cookies to profile]
E1 --> I[Test Authentication]
F1 --> I
G1 --> I
H1 --> I
I --> J{Auth Test Success?}
J -->|Yes| K[Production Crawl Setup]
J -->|No| L[Debug Authentication]
L --> L1{Common Issues?}
L1 -->|Rate Limiting| L2[Add delays/user simulation]
L1 -->|Bot Detection| L3[Enable stealth mode]
L1 -->|Session Expired| L4[Refresh authentication]
L1 -->|CAPTCHA| L5[Manual intervention needed]
L2 --> M[Retry with Adjustments]
L3 --> M
L4 --> E1
L5 --> N[Semi-automated approach]
M --> I
N --> O[Manual auth + automated crawl]
K --> P[Automated Authenticated Crawling]
O --> P
C --> P
P --> Q[Monitor & Maintain Profiles]
style I fill:#fff3e0
style K fill:#e8f5e8
style P fill:#c8e6c9
style L fill:#ffcdd2
style N fill:#f3e5f5
```
### CLI Usage Patterns and Best Practices
```mermaid
timeline
title CLI Workflow Evolution
section Setup Phase
Installation : pip install crawl4ai
: crawl4ai-setup
Basic Test : crwl https://example.com
Config Setup : crwl config set defaults
section Profile Creation
Site Analysis : Identify auth requirements
Profile Creation : crwl profiles
Manual Login : Authenticate in browser
Profile Save : Press 'q' to save
section Development Phase
Test Crawls : crwl URL -p profile -v
Config Tuning : Adjust browser/crawler settings
Output Testing : Try different output formats
Error Handling : Debug authentication issues
section Production Phase
Automated Crawls : crwl URL -p profile -o json
Batch Processing : Multiple URLs with same profile
Monitoring : Check profile validity
Maintenance : Update profiles as needed
```
### Multi-Profile Management Strategy
```mermaid
graph LR
subgraph "Profile Categories"
A[Social Media Profiles]
B[Work/Enterprise Profiles]
C[E-commerce Profiles]
D[Research Profiles]
end
subgraph "Social Media"
A --> A1[linkedin-personal]
A --> A2[twitter-monitor]
A --> A3[facebook-research]
A --> A4[instagram-brand]
end
subgraph "Enterprise"
B --> B1[company-intranet]
B --> B2[github-enterprise]
B --> B3[confluence-docs]
B --> B4[jira-tickets]
end
subgraph "E-commerce"
C --> C1[amazon-seller]
C --> C2[shopify-admin]
C --> C3[ebay-monitor]
C --> C4[marketplace-competitor]
end
subgraph "Research"
D --> D1[academic-journals]
D --> D2[data-platforms]
D --> D3[survey-tools]
D --> D4[government-portals]
end
subgraph "Usage Patterns"
E[Daily Monitoring] --> A2
E --> B1
F[Weekly Reports] --> C3
F --> D2
G[On-Demand Research] --> D1
G --> D4
H[Competitive Analysis] --> C4
H --> A4
end
style A1 fill:#e3f2fd
style B1 fill:#f3e5f5
style C1 fill:#e8f5e8
style D1 fill:#fff3e0
```
**📖 Learn more:** [CLI Reference](https://docs.crawl4ai.com/core/cli/), [Identity-Based Crawling](https://docs.crawl4ai.com/advanced/identity-based-crawling/), [Profile Management](https://docs.crawl4ai.com/advanced/session-management/), [Authentication Strategies](https://docs.crawl4ai.com/advanced/hooks-auth/)