feat: sync all 140 Microsoft skills with collision protection

- Add find_github_skills() to discover skills in .github/skills/ not reachable via the skills/ symlink tree (picks up 11 missing skills) - Add collision protection: if a target directory exists and was not from a previous Microsoft sync, append -ms suffix instead of overwriting - Microsoft mcp-builder → mcp-builder-ms (community version preserved) - Microsoft skill-creator → skill-creator-ms (community version preserved) - Total skills: 856 (was 845, +11 newly discovered)
2026-02-12 15:34:42 +05:00
parent 35556e0306
commit 44e51f0ea9
19 changed files with 3483 additions and 21 deletions
--- a/skills/podcast-generation/SKILL.md
+++ b/skills/podcast-generation/SKILL.md
@@ -0,0 +1,121 @@
+---
+name: podcast-generation
+description: Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from React frontend to Python FastAPI backend with WebSocket streaming.
+---
+
+# Podcast Generation with GPT Realtime Mini
+
+Generate real audio narratives from text content using Azure OpenAI's Realtime API.
+
+## Quick Start
+
+1. Configure environment variables for Realtime API
+2. Connect via WebSocket to Azure OpenAI Realtime endpoint
+3. Send text prompt, collect PCM audio chunks + transcript
+4. Convert PCM to WAV format
+5. Return base64-encoded audio to frontend for playback
+
+## Environment Configuration
+
+```env
+AZURE_OPENAI_AUDIO_API_KEY=your_realtime_api_key
+AZURE_OPENAI_AUDIO_ENDPOINT=https://your-resource.cognitiveservices.azure.com
+AZURE_OPENAI_AUDIO_DEPLOYMENT=gpt-realtime-mini
+```
+
+**Note**: Endpoint should NOT include `/openai/v1/` - just the base URL.
+
+## Core Workflow
+
+### Backend Audio Generation
+
+```python
+from openai import AsyncOpenAI
+import base64
+
+# Convert HTTPS endpoint to WebSocket URL
+ws_url = endpoint.replace("https://", "wss://") + "/openai/v1"
+
+client = AsyncOpenAI(
+    websocket_base_url=ws_url,
+    api_key=api_key
+)
+
+audio_chunks = []
+transcript_parts = []
+
+async with client.realtime.connect(model="gpt-realtime-mini") as conn:
+    # Configure for audio-only output
+    await conn.session.update(session={
+        "output_modalities": ["audio"],
+        "instructions": "You are a narrator. Speak naturally."
+    })
+    
+    # Send text to narrate
+    await conn.conversation.item.create(item={
+        "type": "message",
+        "role": "user",
+        "content": [{"type": "input_text", "text": prompt}]
+    })
+    
+    await conn.response.create()
+    
+    # Collect streaming events
+    async for event in conn:
+        if event.type == "response.output_audio.delta":
+            audio_chunks.append(base64.b64decode(event.delta))
+        elif event.type == "response.output_audio_transcript.delta":
+            transcript_parts.append(event.delta)
+        elif event.type == "response.done":
+            break
+
+# Convert PCM to WAV (see scripts/pcm_to_wav.py)
+pcm_audio = b''.join(audio_chunks)
+wav_audio = pcm_to_wav(pcm_audio, sample_rate=24000)
+```
+
+### Frontend Audio Playback
+
+```javascript
+// Convert base64 WAV to playable blob
+const base64ToBlob = (base64, mimeType) => {
+  const bytes = atob(base64);
+  const arr = new Uint8Array(bytes.length);
+  for (let i = 0; i < bytes.length; i++) arr[i] = bytes.charCodeAt(i);
+  return new Blob([arr], { type: mimeType });
+};
+
+const audioBlob = base64ToBlob(response.audio_data, 'audio/wav');
+const audioUrl = URL.createObjectURL(audioBlob);
+new Audio(audioUrl).play();
+```
+
+## Voice Options
+
+| Voice | Character |
+|-------|-----------|
+| alloy | Neutral |
+| echo | Warm |
+| fable | Expressive |
+| onyx | Deep |
+| nova | Friendly |
+| shimmer | Clear |
+
+## Realtime API Events
+
+- `response.output_audio.delta` - Base64 audio chunk
+- `response.output_audio_transcript.delta` - Transcript text
+- `response.done` - Generation complete
+- `error` - Handle with `event.error.message`
+
+## Audio Format
+
+- **Input**: Text prompt
+- **Output**: PCM audio (24kHz, 16-bit, mono)
+- **Storage**: Base64-encoded WAV
+
+## References
+
+- **Full architecture**: See [references/architecture.md](references/architecture.md) for complete stack design
+- **Code examples**: See [references/code-examples.md](references/code-examples.md) for production patterns
+- **PCM conversion**: Use [scripts/pcm_to_wav.py](scripts/pcm_to_wav.py) for audio format conversion