feat: Add 57 skills from vibeship-spawner-skills

Ported 3 categories from Spawner Skills (Apache 2.0): - AI Agents (21 skills): langfuse, langgraph, crewai, rag-engineer, etc. - Integrations (25 skills): stripe, firebase, vercel, supabase, etc. - Maker Tools (11 skills): micro-saas-launcher, browser-extension-builder, etc. All skills converted from 4-file YAML to SKILL.md format. Source: https://github.com/vibeforge1111/vibeship-spawner-skills
2026-01-19 12:18:43 +01:00
parent 6dcb7973ad
commit b5675d55ce
57 changed files with 7717 additions and 681 deletions
--- a/skills/voice-ai-development/SKILL.md
+++ b/skills/voice-ai-development/SKILL.md
@@ -0,0 +1,302 @@
+---
+name: voice-ai-development
+description: "Expert in building voice AI applications - from real-time voice agents to voice-enabled apps. Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription, ElevenLabs for synthesis, LiveKit for real-time infrastructure, and WebRTC fundamentals. Knows how to build low-latency, production-ready voice experiences. Use when: voice ai, voice agent, speech to text, text to speech, realtime voice."
+source: vibeship-spawner-skills (Apache 2.0)
+---
+
+# Voice AI Development
+
+**Role**: Voice AI Architect
+
+You are an expert in building real-time voice applications. You think in terms of
+latency budgets, audio quality, and user experience. You know that voice apps feel
+magical when fast and broken when slow. You choose the right combination of providers
+for each use case and optimize relentlessly for perceived responsiveness.
+
+## Capabilities
+
+- OpenAI Realtime API
+- Vapi voice agents
+- Deepgram STT/TTS
+- ElevenLabs voice synthesis
+- LiveKit real-time infrastructure
+- WebRTC audio handling
+- Voice agent design
+- Latency optimization
+
+## Requirements
+
+- Python or Node.js
+- API keys for providers
+- Audio handling knowledge
+
+## Patterns
+
+### OpenAI Realtime API
+
+Native voice-to-voice with GPT-4o
+
+**When to use**: When you want integrated voice AI without separate STT/TTS
+
+```python
+import asyncio
+import websockets
+import json
+import base64
+
+OPENAI_API_KEY = "sk-..."
+
+async def voice_session():
+    url = "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview"
+    headers = {
+        "Authorization": f"Bearer {OPENAI_API_KEY}",
+        "OpenAI-Beta": "realtime=v1"
+    }
+
+    async with websockets.connect(url, extra_headers=headers) as ws:
+        # Configure session
+        await ws.send(json.dumps({
+            "type": "session.update",
+            "session": {
+                "modalities": ["text", "audio"],
+                "voice": "alloy",  # alloy, echo, fable, onyx, nova, shimmer
+                "input_audio_format": "pcm16",
+                "output_audio_format": "pcm16",
+                "input_audio_transcription": {
+                    "model": "whisper-1"
+                },
+                "turn_detection": {
+                    "type": "server_vad",  # Voice activity detection
+                    "threshold": 0.5,
+                    "prefix_padding_ms": 300,
+                    "silence_duration_ms": 500
+                },
+                "tools": [
+                    {
+                        "type": "function",
+                        "name": "get_weather",
+                        "description": "Get weather for a location",
+                        "parameters": {
+                            "type": "object",
+                            "properties": {
+                                "location": {"type": "string"}
+                            }
+                        }
+                    }
+                ]
+            }
+        }))
+
+        # Send audio (PCM16, 24kHz, mono)
+        async def send_audio(audio_bytes):
+            await ws.send(json.dumps({
+                "type": "input_audio_buffer.append",
+                "audio": base64.b64encode(audio_bytes).decode()
+            }))
+
+        # Receive events
+        async for message in ws:
+            event = json.loads(message)
+
+            if event["type"] == "resp
+```
+
+### Vapi Voice Agent
+
+Build voice agents with Vapi platform
+
+**When to use**: Phone-based agents, quick deployment
+
+```python
+# Vapi provides hosted voice agents with webhooks
+
+from flask import Flask, request, jsonify
+import vapi
+
+app = Flask(__name__)
+client = vapi.Vapi(api_key="...")
+
+# Create an assistant
+assistant = client.assistants.create(
+    name="Support Agent",
+    model={
+        "provider": "openai",
+        "model": "gpt-4o",
+        "messages": [
+            {
+                "role": "system",
+                "content": "You are a helpful support agent..."
+            }
+        ]
+    },
+    voice={
+        "provider": "11labs",
+        "voiceId": "21m00Tcm4TlvDq8ikWAM"  # Rachel
+    },
+    firstMessage="Hi! How can I help you today?",
+    transcriber={
+        "provider": "deepgram",
+        "model": "nova-2"
+    }
+)
+
+# Webhook for conversation events
+@app.route("/vapi/webhook", methods=["POST"])
+def vapi_webhook():
+    event = request.json
+
+    if event["type"] == "function-call":
+        # Handle tool call
+        name = event["functionCall"]["name"]
+        args = event["functionCall"]["parameters"]
+
+        if name == "check_order":
+            result = check_order(args["order_id"])
+            return jsonify({"result": result})
+
+    elif event["type"] == "end-of-call-report":
+        # Call ended - save transcript
+        transcript = event["transcript"]
+        save_transcript(event["call"]["id"], transcript)
+
+    return jsonify({"ok": True})
+
+# Start outbound call
+call = client.calls.create(
+    assistant_id=assistant.id,
+    customer={
+        "number": "+1234567890"
+    },
+    phoneNumber={
+        "twilioPhoneNumber": "+0987654321"
+    }
+)
+
+# Or create web call
+web_call = client.calls.create(
+    assistant_id=assistant.id,
+    type="web"
+)
+# Returns URL for WebRTC connection
+```
+
+### Deepgram STT + ElevenLabs TTS
+
+Best-in-class transcription and synthesis
+
+**When to use**: High quality voice, custom pipeline
+
+```python
+import asyncio
+from deepgram import DeepgramClient, LiveTranscriptionEvents
+from elevenlabs import ElevenLabs
+
+# Deepgram real-time transcription
+deepgram = DeepgramClient(api_key="...")
+
+async def transcribe_stream(audio_stream):
+    connection = deepgram.listen.live.v("1")
+
+    async def on_transcript(result):
+        transcript = result.channel.alternatives[0].transcript
+        if transcript:
+            print(f"Heard: {transcript}")
+            if result.is_final:
+                # Process final transcript
+                await handle_user_input(transcript)
+
+    connection.on(LiveTranscriptionEvents.Transcript, on_transcript)
+
+    await connection.start({
+        "model": "nova-2",  # Best quality
+        "language": "en",
+        "smart_format": True,
+        "interim_results": True,  # Get partial results
+        "utterance_end_ms": 1000,
+        "vad_events": True,  # Voice activity detection
+        "encoding": "linear16",
+        "sample_rate": 16000
+    })
+
+    # Stream audio
+    async for chunk in audio_stream:
+        await connection.send(chunk)
+
+    await connection.finish()
+
+# ElevenLabs streaming synthesis
+eleven = ElevenLabs(api_key="...")
+
+def text_to_speech_stream(text: str):
+    """Stream TTS audio chunks."""
+    audio_stream = eleven.text_to_speech.convert_as_stream(
+        voice_id="21m00Tcm4TlvDq8ikWAM",  # Rachel
+        model_id="eleven_turbo_v2_5",  # Fastest
+        text=text,
+        output_format="pcm_24000"  # Raw PCM for low latency
+    )
+
+    for chunk in audio_stream:
+        yield chunk
+
+# Or with WebSocket for lowest latency
+async def tts_websocket(text_stream):
+    async with eleven.text_to_speech.stream_async(
+        voice_id="21m00Tcm4TlvDq8ikWAM",
+        model_id="eleven_turbo_v2_5"
+    ) as tts:
+        async for text_chunk in text_stream:
+            audio = await tts.send(text_chunk)
+            yield audio
+
+        # Flush remaining audio
+        final_audio = await tts.flush()
+        yield final_audio
+```
+
+## Anti-Patterns
+
+### ❌ Non-streaming Pipeline
+
+**Why bad**: Adds seconds of latency.
+User perceives as slow.
+Loses conversation flow.
+
+**Instead**: Stream everything:
+- STT: interim results
+- LLM: token streaming
+- TTS: chunk streaming
+Start TTS before LLM finishes.
+
+### ❌ Ignoring Interruptions
+
+**Why bad**: Frustrating user experience.
+Feels like talking to a machine.
+Wastes time.
+
+**Instead**: Implement barge-in detection.
+Use VAD to detect user speech.
+Stop TTS immediately.
+Clear audio queue.
+
+### ❌ Single Provider Lock-in
+
+**Why bad**: May not be best quality.
+Single point of failure.
+Harder to optimize.
+
+**Instead**: Mix best providers:
+- Deepgram for STT (speed + accuracy)
+- ElevenLabs for TTS (voice quality)
+- OpenAI/Anthropic for LLM
+
+## Limitations
+
+- Latency varies by provider
+- Cost per minute adds up
+- Quality depends on network
+- Complex debugging
+
+## Related Skills
+
+Works well with: `langgraph`, `structured-output`, `langfuse`