feat: Add 57 skills from vibeship-spawner-skills

Ported 3 categories from Spawner Skills (Apache 2.0): - AI Agents (21 skills): langfuse, langgraph, crewai, rag-engineer, etc. - Integrations (25 skills): stripe, firebase, vercel, supabase, etc. - Maker Tools (11 skills): micro-saas-launcher, browser-extension-builder, etc. All skills converted from 4-file YAML to SKILL.md format. Source: https://github.com/vibeforge1111/vibeship-spawner-skills
2026-01-19 12:18:43 +01:00
parent 6dcb7973ad
commit b5675d55ce
57 changed files with 7717 additions and 681 deletions
--- a/skills/computer-use-agents/SKILL.md
+++ b/skills/computer-use-agents/SKILL.md
@@ -0,0 +1,315 @@
+---
+name: computer-use-agents
+description: "Build AI agents that interact with computers like humans do - viewing screens, moving cursors, clicking buttons, and typing text. Covers Anthropic's Computer Use, OpenAI's Operator/CUA, and open-source alternatives. Critical focus on sandboxing, security, and handling the unique challenges of vision-based control. Use when: computer use, desktop automation agent, screen control AI, vision-based agent, GUI automation."
+source: vibeship-spawner-skills (Apache 2.0)
+---
+
+# Computer Use Agents
+
+## Patterns
+
+### Perception-Reasoning-Action Loop
+
+The fundamental architecture of computer use agents: observe screen,
+reason about next action, execute action, repeat. This loop integrates
+vision models with action execution through an iterative pipeline.
+
+Key components:
+1. PERCEPTION: Screenshot captures current screen state
+2. REASONING: Vision-language model analyzes and plans
+3. ACTION: Execute mouse/keyboard operations
+4. FEEDBACK: Observe result, continue or correct
+
+Critical insight: Vision agents are completely still during "thinking"
+phase (1-5 seconds), creating a detectable pause pattern.
+
+
+**When to use**: ['Building any computer use agent from scratch', 'Integrating vision models with desktop control', 'Understanding agent behavior patterns']
+
+```python
+from anthropic import Anthropic
+from PIL import Image
+import base64
+import pyautogui
+import time
+
+class ComputerUseAgent:
+    """
+    Perception-Reasoning-Action loop implementation.
+    Based on Anthropic Computer Use patterns.
+    """
+
+    def __init__(self, client: Anthropic, model: str = "claude-sonnet-4-20250514"):
+        self.client = client
+        self.model = model
+        self.max_steps = 50  # Prevent runaway loops
+        self.action_delay = 0.5  # Seconds between actions
+
+    def capture_screenshot(self) -> str:
+        """Capture screen and return base64 encoded image."""
+        screenshot = pyautogui.screenshot()
+        # Resize for token efficiency (1280x800 is good balance)
+        screenshot = screenshot.resize((1280, 800), Image.LANCZOS)
+
+        import io
+        buffer = io.BytesIO()
+        screenshot.save(buffer, format="PNG")
+        return base64.b64encode(buffer.getvalue()).decode()
+
+    def execute_action(self, action: dict) -> dict:
+        """Execute mouse/keyboard action on the computer."""
+        action_type = action.get("type")
+
+        if action_type == "click":
+            x, y = action["x"], action["y"]
+            button = action.get("button", "left")
+            pyautogui.click(x, y, button=button)
+            return {"success": True, "action": f"clicked at ({x}, {y})"}
+
+        elif action_type == "type":
+            text = action["text"]
+            pyautogui.typewrite(text, interval=0.02)
+            return {"success": True, "action": f"typed {len(text)} chars"}
+
+        elif action_type == "key":
+            key = action["key"]
+            pyautogui.press(key)
+            return {"success": True, "action": f"pressed {key}"}
+
+        elif action_type == "scroll":
+            direction = action.get("direction", "down")
+            amount = action.get("amount", 3)
+            scroll = -amount if direction == "down" else amount
+            pyautogui.scroll(scroll)
+            return {"success": True, "action": f"scrolled {dir
+```
+
+### Sandboxed Environment Pattern
+
+Computer use agents MUST run in isolated, sandboxed environments.
+Never give agents direct access to your main system - the security
+risks are too high. Use Docker containers with virtual desktops.
+
+Key isolation requirements:
+1. NETWORK: Restrict to necessary endpoints only
+2. FILESYSTEM: Read-only or scoped to temp directories
+3. CREDENTIALS: No access to host credentials
+4. SYSCALLS: Filter dangerous system calls
+5. RESOURCES: Limit CPU, memory, time
+
+The goal is "blast radius minimization" - if the agent goes wrong,
+damage is contained to the sandbox.
+
+
+**When to use**: ['Deploying any computer use agent', 'Testing agent behavior safely', 'Running untrusted automation tasks']
+
+```python
+# Dockerfile for sandboxed computer use environment
+# Based on Anthropic's reference implementation pattern
+
+FROM ubuntu:22.04
+
+# Install desktop environment
+RUN apt-get update && apt-get install -y \
+    xvfb \
+    x11vnc \
+    fluxbox \
+    xterm \
+    firefox \
+    python3 \
+    python3-pip \
+    supervisor
+
+# Security: Create non-root user
+RUN useradd -m -s /bin/bash agent && \
+    mkdir -p /home/agent/.vnc
+
+# Install Python dependencies
+COPY requirements.txt /tmp/
+RUN pip3 install -r /tmp/requirements.txt
+
+# Security: Drop capabilities
+RUN apt-get install -y --no-install-recommends libcap2-bin && \
+    setcap -r /usr/bin/python3 || true
+
+# Copy agent code
+COPY --chown=agent:agent . /app
+WORKDIR /app
+
+# Supervisor config for virtual display + VNC
+COPY supervisord.conf /etc/supervisor/conf.d/
+
+# Expose VNC port only (not desktop directly)
+EXPOSE 5900
+
+# Run as non-root
+USER agent
+
+CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]
+
+---
+
+# docker-compose.yml with security constraints
+version: '3.8'
+
+services:
+  computer-use-agent:
+    build: .
+    ports:
+      - "5900:5900"  # VNC for observation
+      - "8080:8080"  # API for control
+
+    # Security constraints
+    security_opt:
+      - no-new-privileges:true
+      - seccomp:seccomp-profile.json
+
+    # Resource limits
+    deploy:
+      resources:
+        limits:
+          cpus: '2'
+          memory: 4G
+        reservations:
+          cpus: '0.5'
+          memory: 1G
+
+    # Network isolation
+    networks:
+      - agent-network
+
+    # No access to host filesystem
+    volumes:
+      - agent-tmp:/tmp
+
+    # Read-only root filesystem
+    read_only: true
+    tmpfs:
+      - /run
+      - /var/run
+
+    # Environment
+    environment:
+      - DISPLAY=:99
+      - NO_PROXY=localhost
+
+networks:
+  agent-network:
+    driver: bridge
+    internal: true  # No internet by default
+
+volumes:
+  agent-tmp:
+
+---
+
+# Python wrapper with additional runtime sandboxing
+import subprocess
+import os
+from dataclasses im
+```
+
+### Anthropic Computer Use Implementation
+
+Official implementation pattern using Claude's computer use capability.
+Claude 3.5 Sonnet was the first frontier model to offer computer use.
+Claude Opus 4.5 is now the "best model in the world for computer use."
+
+Key capabilities:
+- screenshot: Capture current screen state
+- mouse: Click, move, drag operations
+- keyboard: Type text, press keys
+- bash: Run shell commands
+- text_editor: View and edit files
+
+Tool versions:
+- computer_20251124 (Opus 4.5): Adds zoom action for detailed inspection
+- computer_20250124 (All other models): Standard capabilities
+
+Critical limitation: "Some UI elements (like dropdowns and scrollbars)
+might be tricky for Claude to manipulate" - Anthropic docs
+
+
+**When to use**: ['Building production computer use agents', 'Need highest quality vision understanding', 'Full desktop control (not just browser)']
+
+```python
+from anthropic import Anthropic
+from anthropic.types.beta import (
+    BetaToolComputerUse20241022,
+    BetaToolBash20241022,
+    BetaToolTextEditor20241022,
+)
+import subprocess
+import base64
+from PIL import Image
+import io
+
+class AnthropicComputerUse:
+    """
+    Official Anthropic Computer Use implementation.
+
+    Requires:
+    - Docker container with virtual display
+    - VNC for viewing agent actions
+    - Proper tool implementations
+    """
+
+    def __init__(self):
+        self.client = Anthropic()
+        self.model = "claude-sonnet-4-20250514"  # Best for computer use
+        self.screen_size = (1280, 800)
+
+    def get_tools(self) -> list:
+        """Define computer use tools."""
+        return [
+            BetaToolComputerUse20241022(
+                type="computer_20241022",
+                name="computer",
+                display_width_px=self.screen_size[0],
+                display_height_px=self.screen_size[1],
+            ),
+            BetaToolBash20241022(
+                type="bash_20241022",
+                name="bash",
+            ),
+            BetaToolTextEditor20241022(
+                type="text_editor_20241022",
+                name="str_replace_editor",
+            ),
+        ]
+
+    def execute_tool(self, name: str, input: dict) -> dict:
+        """Execute a tool and return result."""
+
+        if name == "computer":
+            return self._handle_computer_action(input)
+        elif name == "bash":
+            return self._handle_bash(input)
+        elif name == "str_replace_editor":
+            return self._handle_editor(input)
+        else:
+            return {"error": f"Unknown tool: {name}"}
+
+    def _handle_computer_action(self, input: dict) -> dict:
+        """Handle computer control actions."""
+        action = input.get("action")
+
+        if action == "screenshot":
+            # Capture via xdotool/scrot
+            subprocess.run(["scrot", "/tmp/screenshot.png"])
+
+            with open("/tmp/screenshot.png", "rb") as f:
+            
+```
+
+## ⚠️ Sharp Edges
+
+| Issue | Severity | Solution |
+|-------|----------|----------|
+| Issue | critical | ## Defense in depth - no single solution works |
+| Issue | medium | ## Add human-like variance to actions |
+| Issue | high | ## Use keyboard alternatives when possible |
+| Issue | medium | ## Accept the tradeoff |
+| Issue | high | ## Implement context management |
+| Issue | high | ## Monitor and limit costs |
+| Issue | critical | ## ALWAYS use sandboxing |