feat: sync all 140 Microsoft skills with collision protection
- Add find_github_skills() to discover skills in .github/skills/ not reachable via the skills/ symlink tree (picks up 11 missing skills) - Add collision protection: if a target directory exists and was not from a previous Microsoft sync, append -ms suffix instead of overwriting - Microsoft mcp-builder → mcp-builder-ms (community version preserved) - Microsoft skill-creator → skill-creator-ms (community version preserved) - Total skills: 856 (was 845, +11 newly discovered)
This commit is contained in:
121
skills/podcast-generation/SKILL.md
Normal file
121
skills/podcast-generation/SKILL.md
Normal file
@@ -0,0 +1,121 @@
|
||||
---
|
||||
name: podcast-generation
|
||||
description: Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from React frontend to Python FastAPI backend with WebSocket streaming.
|
||||
---
|
||||
|
||||
# Podcast Generation with GPT Realtime Mini
|
||||
|
||||
Generate real audio narratives from text content using Azure OpenAI's Realtime API.
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. Configure environment variables for Realtime API
|
||||
2. Connect via WebSocket to Azure OpenAI Realtime endpoint
|
||||
3. Send text prompt, collect PCM audio chunks + transcript
|
||||
4. Convert PCM to WAV format
|
||||
5. Return base64-encoded audio to frontend for playback
|
||||
|
||||
## Environment Configuration
|
||||
|
||||
```env
|
||||
AZURE_OPENAI_AUDIO_API_KEY=your_realtime_api_key
|
||||
AZURE_OPENAI_AUDIO_ENDPOINT=https://your-resource.cognitiveservices.azure.com
|
||||
AZURE_OPENAI_AUDIO_DEPLOYMENT=gpt-realtime-mini
|
||||
```
|
||||
|
||||
**Note**: Endpoint should NOT include `/openai/v1/` - just the base URL.
|
||||
|
||||
## Core Workflow
|
||||
|
||||
### Backend Audio Generation
|
||||
|
||||
```python
|
||||
from openai import AsyncOpenAI
|
||||
import base64
|
||||
|
||||
# Convert HTTPS endpoint to WebSocket URL
|
||||
ws_url = endpoint.replace("https://", "wss://") + "/openai/v1"
|
||||
|
||||
client = AsyncOpenAI(
|
||||
websocket_base_url=ws_url,
|
||||
api_key=api_key
|
||||
)
|
||||
|
||||
audio_chunks = []
|
||||
transcript_parts = []
|
||||
|
||||
async with client.realtime.connect(model="gpt-realtime-mini") as conn:
|
||||
# Configure for audio-only output
|
||||
await conn.session.update(session={
|
||||
"output_modalities": ["audio"],
|
||||
"instructions": "You are a narrator. Speak naturally."
|
||||
})
|
||||
|
||||
# Send text to narrate
|
||||
await conn.conversation.item.create(item={
|
||||
"type": "message",
|
||||
"role": "user",
|
||||
"content": [{"type": "input_text", "text": prompt}]
|
||||
})
|
||||
|
||||
await conn.response.create()
|
||||
|
||||
# Collect streaming events
|
||||
async for event in conn:
|
||||
if event.type == "response.output_audio.delta":
|
||||
audio_chunks.append(base64.b64decode(event.delta))
|
||||
elif event.type == "response.output_audio_transcript.delta":
|
||||
transcript_parts.append(event.delta)
|
||||
elif event.type == "response.done":
|
||||
break
|
||||
|
||||
# Convert PCM to WAV (see scripts/pcm_to_wav.py)
|
||||
pcm_audio = b''.join(audio_chunks)
|
||||
wav_audio = pcm_to_wav(pcm_audio, sample_rate=24000)
|
||||
```
|
||||
|
||||
### Frontend Audio Playback
|
||||
|
||||
```javascript
|
||||
// Convert base64 WAV to playable blob
|
||||
const base64ToBlob = (base64, mimeType) => {
|
||||
const bytes = atob(base64);
|
||||
const arr = new Uint8Array(bytes.length);
|
||||
for (let i = 0; i < bytes.length; i++) arr[i] = bytes.charCodeAt(i);
|
||||
return new Blob([arr], { type: mimeType });
|
||||
};
|
||||
|
||||
const audioBlob = base64ToBlob(response.audio_data, 'audio/wav');
|
||||
const audioUrl = URL.createObjectURL(audioBlob);
|
||||
new Audio(audioUrl).play();
|
||||
```
|
||||
|
||||
## Voice Options
|
||||
|
||||
| Voice | Character |
|
||||
|-------|-----------|
|
||||
| alloy | Neutral |
|
||||
| echo | Warm |
|
||||
| fable | Expressive |
|
||||
| onyx | Deep |
|
||||
| nova | Friendly |
|
||||
| shimmer | Clear |
|
||||
|
||||
## Realtime API Events
|
||||
|
||||
- `response.output_audio.delta` - Base64 audio chunk
|
||||
- `response.output_audio_transcript.delta` - Transcript text
|
||||
- `response.done` - Generation complete
|
||||
- `error` - Handle with `event.error.message`
|
||||
|
||||
## Audio Format
|
||||
|
||||
- **Input**: Text prompt
|
||||
- **Output**: PCM audio (24kHz, 16-bit, mono)
|
||||
- **Storage**: Base64-encoded WAV
|
||||
|
||||
## References
|
||||
|
||||
- **Full architecture**: See [references/architecture.md](references/architecture.md) for complete stack design
|
||||
- **Code examples**: See [references/code-examples.md](references/code-examples.md) for production patterns
|
||||
- **PCM conversion**: Use [scripts/pcm_to_wav.py](scripts/pcm_to_wav.py) for audio format conversion
|
||||
Reference in New Issue
Block a user