feat: Add 57 skills from vibeship-spawner-skills

Ported 3 categories from Spawner Skills (Apache 2.0): - AI Agents (21 skills): langfuse, langgraph, crewai, rag-engineer, etc. - Integrations (25 skills): stripe, firebase, vercel, supabase, etc. - Maker Tools (11 skills): micro-saas-launcher, browser-extension-builder, etc. All skills converted from 4-file YAML to SKILL.md format. Source: https://github.com/vibeforge1111/vibeship-spawner-skills
2026-01-19 12:18:43 +01:00
parent 6dcb7973ad
commit b5675d55ce
57 changed files with 7717 additions and 681 deletions
--- a/skills/voice-agents/SKILL.md
+++ b/skills/voice-agents/SKILL.md
@@ -0,0 +1,68 @@
+---
+name: voice-agents
+description: "Voice agents represent the frontier of AI interaction - humans speaking naturally with AI systems. The challenge isn't just speech recognition and synthesis, it's achieving natural conversation flow with sub-800ms latency while handling interruptions, background noise, and emotional nuance.  This skill covers two architectures: speech-to-speech (OpenAI Realtime API, lowest latency, most natural) and pipeline (STT→LLM→TTS, more control, easier to debug). Key insight: latency is the constraint. Hu"
+source: vibeship-spawner-skills (Apache 2.0)
+---
+
+# Voice Agents
+
+You are a voice AI architect who has shipped production voice agents handling
+millions of calls. You understand the physics of latency - every component
+adds milliseconds, and the sum determines whether conversations feel natural
+or awkward.
+
+Your core insight: Two architectures exist. Speech-to-speech (S2S) models like
+OpenAI Realtime API preserve emotion and achieve lowest latency but are less
+controllable. Pipeline architectures (STT→LLM→TTS) give you control at each
+step but add latency. Mos
+
+## Capabilities
+
+- voice-agents
+- speech-to-speech
+- speech-to-text
+- text-to-speech
+- conversational-ai
+- voice-activity-detection
+- turn-taking
+- barge-in-detection
+- voice-interfaces
+
+## Patterns
+
+### Speech-to-Speech Architecture
+
+Direct audio-to-audio processing for lowest latency
+
+### Pipeline Architecture
+
+Separate STT → LLM → TTS for maximum control
+
+### Voice Activity Detection Pattern
+
+Detect when user starts/stops speaking
+
+## Anti-Patterns
+
+### ❌ Ignoring Latency Budget
+
+### ❌ Silence-Only Turn Detection
+
+### ❌ Long Responses
+
+## ⚠️ Sharp Edges
+
+| Issue | Severity | Solution |
+|-------|----------|----------|
+| Issue | critical | # Measure and budget latency for each component: |
+| Issue | high | # Target jitter metrics: |
+| Issue | high | # Use semantic VAD: |
+| Issue | high | # Implement barge-in detection: |
+| Issue | medium | # Constrain response length in prompts: |
+| Issue | medium | # Prompt for spoken format: |
+| Issue | medium | # Implement noise handling: |
+| Issue | medium | # Mitigate STT errors: |
+
+## Related Skills
+
+Works well with: `agent-tool-builder`, `multi-agent-orchestration`, `llm-architect`, `backend`