Rewrote sync_microsoft_skills.py (v4) to use each SKILL.md's frontmatter 'name' field as the flat directory name under skills/, replacing the nested skills/official/microsoft/<lang>/<category>/<service>/ hierarchy. This fixes CI failures caused by the indexing, validation, and catalog scripts expecting skills/<id>/SKILL.md (depth 1). Changes: - Rewrite scripts/sync_microsoft_skills.py for flat output with collision detection - Update scripts/tests/inspect_microsoft_repo.py for flat name mapping - Update scripts/tests/test_comprehensive_coverage.py for name uniqueness checks - Delete skills/official/ nested directory - Add 129 Microsoft skills as flat directories (e.g. skills/azure-mgmt-botservice-dotnet/) - Move attribution files to docs/ (LICENSE-MICROSOFT, microsoft-skills-attribution.json) - Rebuild skills_index.json, CATALOG.md, README.md (845 total skills)
266 lines
8.3 KiB
Markdown
266 lines
8.3 KiB
Markdown
---
|
|
name: azure-ai-voicelive-dotnet
|
|
description: |
|
|
Azure AI Voice Live SDK for .NET. Build real-time voice AI applications with bidirectional WebSocket communication. Use for voice assistants, conversational AI, real-time speech-to-speech, and voice-enabled chatbots. Triggers: "voice live", "real-time voice", "VoiceLiveClient", "VoiceLiveSession", "voice assistant .NET", "bidirectional audio", "speech-to-speech".
|
|
package: Azure.AI.VoiceLive
|
|
---
|
|
|
|
# Azure.AI.VoiceLive (.NET)
|
|
|
|
Real-time voice AI SDK for building bidirectional voice assistants with Azure AI.
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
dotnet add package Azure.AI.VoiceLive
|
|
dotnet add package Azure.Identity
|
|
dotnet add package NAudio # For audio capture/playback
|
|
```
|
|
|
|
**Current Versions**: Stable v1.0.0, Preview v1.1.0-beta.1
|
|
|
|
## Environment Variables
|
|
|
|
```bash
|
|
AZURE_VOICELIVE_ENDPOINT=https://<resource>.services.ai.azure.com/
|
|
AZURE_VOICELIVE_MODEL=gpt-4o-realtime-preview
|
|
AZURE_VOICELIVE_VOICE=en-US-AvaNeural
|
|
# Optional: API key if not using Entra ID
|
|
AZURE_VOICELIVE_API_KEY=<your-api-key>
|
|
```
|
|
|
|
## Authentication
|
|
|
|
### Microsoft Entra ID (Recommended)
|
|
|
|
```csharp
|
|
using Azure.Identity;
|
|
using Azure.AI.VoiceLive;
|
|
|
|
Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
|
|
DefaultAzureCredential credential = new DefaultAzureCredential();
|
|
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);
|
|
```
|
|
|
|
**Required Role**: `Cognitive Services User` (assign in Azure Portal → Access control)
|
|
|
|
### API Key
|
|
|
|
```csharp
|
|
Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
|
|
AzureKeyCredential credential = new AzureKeyCredential("your-api-key");
|
|
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);
|
|
```
|
|
|
|
## Client Hierarchy
|
|
|
|
```
|
|
VoiceLiveClient
|
|
└── VoiceLiveSession (WebSocket connection)
|
|
├── ConfigureSessionAsync()
|
|
├── GetUpdatesAsync() → SessionUpdate events
|
|
├── AddItemAsync() → UserMessageItem, FunctionCallOutputItem
|
|
├── SendAudioAsync()
|
|
└── StartResponseAsync()
|
|
```
|
|
|
|
## Core Workflow
|
|
|
|
### 1. Start Session and Configure
|
|
|
|
```csharp
|
|
using Azure.Identity;
|
|
using Azure.AI.VoiceLive;
|
|
|
|
var endpoint = new Uri(Environment.GetEnvironmentVariable("AZURE_VOICELIVE_ENDPOINT"));
|
|
var client = new VoiceLiveClient(endpoint, new DefaultAzureCredential());
|
|
|
|
var model = "gpt-4o-mini-realtime-preview";
|
|
|
|
// Start session
|
|
using VoiceLiveSession session = await client.StartSessionAsync(model);
|
|
|
|
// Configure session
|
|
VoiceLiveSessionOptions sessionOptions = new()
|
|
{
|
|
Model = model,
|
|
Instructions = "You are a helpful AI assistant. Respond naturally.",
|
|
Voice = new AzureStandardVoice("en-US-AvaNeural"),
|
|
TurnDetection = new AzureSemanticVadTurnDetection()
|
|
{
|
|
Threshold = 0.5f,
|
|
PrefixPadding = TimeSpan.FromMilliseconds(300),
|
|
SilenceDuration = TimeSpan.FromMilliseconds(500)
|
|
},
|
|
InputAudioFormat = InputAudioFormat.Pcm16,
|
|
OutputAudioFormat = OutputAudioFormat.Pcm16
|
|
};
|
|
|
|
// Set modalities (both text and audio for voice assistants)
|
|
sessionOptions.Modalities.Clear();
|
|
sessionOptions.Modalities.Add(InteractionModality.Text);
|
|
sessionOptions.Modalities.Add(InteractionModality.Audio);
|
|
|
|
await session.ConfigureSessionAsync(sessionOptions);
|
|
```
|
|
|
|
### 2. Process Events
|
|
|
|
```csharp
|
|
await foreach (SessionUpdate serverEvent in session.GetUpdatesAsync())
|
|
{
|
|
switch (serverEvent)
|
|
{
|
|
case SessionUpdateResponseAudioDelta audioDelta:
|
|
byte[] audioData = audioDelta.Delta.ToArray();
|
|
// Play audio via NAudio or other audio library
|
|
break;
|
|
|
|
case SessionUpdateResponseTextDelta textDelta:
|
|
Console.Write(textDelta.Delta);
|
|
break;
|
|
|
|
case SessionUpdateResponseFunctionCallArgumentsDone functionCall:
|
|
// Handle function call (see Function Calling section)
|
|
break;
|
|
|
|
case SessionUpdateError error:
|
|
Console.WriteLine($"Error: {error.Error.Message}");
|
|
break;
|
|
|
|
case SessionUpdateResponseDone:
|
|
Console.WriteLine("\n--- Response complete ---");
|
|
break;
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3. Send User Message
|
|
|
|
```csharp
|
|
await session.AddItemAsync(new UserMessageItem("Hello, can you help me?"));
|
|
await session.StartResponseAsync();
|
|
```
|
|
|
|
### 4. Function Calling
|
|
|
|
```csharp
|
|
// Define function
|
|
var weatherFunction = new VoiceLiveFunctionDefinition("get_current_weather")
|
|
{
|
|
Description = "Get the current weather for a given location",
|
|
Parameters = BinaryData.FromString("""
|
|
{
|
|
"type": "object",
|
|
"properties": {
|
|
"location": {
|
|
"type": "string",
|
|
"description": "The city and state or country"
|
|
}
|
|
},
|
|
"required": ["location"]
|
|
}
|
|
""")
|
|
};
|
|
|
|
// Add to session options
|
|
sessionOptions.Tools.Add(weatherFunction);
|
|
|
|
// Handle function call in event loop
|
|
if (serverEvent is SessionUpdateResponseFunctionCallArgumentsDone functionCall)
|
|
{
|
|
if (functionCall.Name == "get_current_weather")
|
|
{
|
|
var parameters = JsonSerializer.Deserialize<Dictionary<string, string>>(functionCall.Arguments);
|
|
string location = parameters?["location"] ?? "";
|
|
|
|
// Call external service
|
|
string weatherInfo = $"The weather in {location} is sunny, 75°F.";
|
|
|
|
// Send response
|
|
await session.AddItemAsync(new FunctionCallOutputItem(functionCall.CallId, weatherInfo));
|
|
await session.StartResponseAsync();
|
|
}
|
|
}
|
|
```
|
|
|
|
## Voice Options
|
|
|
|
| Voice Type | Class | Example |
|
|
|------------|-------|---------|
|
|
| Azure Standard | `AzureStandardVoice` | `"en-US-AvaNeural"` |
|
|
| Azure HD | `AzureStandardVoice` | `"en-US-Ava:DragonHDLatestNeural"` |
|
|
| Azure Custom | `AzureCustomVoice` | Custom voice with endpoint ID |
|
|
|
|
## Supported Models
|
|
|
|
| Model | Description |
|
|
|-------|-------------|
|
|
| `gpt-4o-realtime-preview` | GPT-4o with real-time audio |
|
|
| `gpt-4o-mini-realtime-preview` | Lightweight, fast interactions |
|
|
| `phi4-mm-realtime` | Cost-effective multimodal |
|
|
|
|
## Key Types Reference
|
|
|
|
| Type | Purpose |
|
|
|------|---------|
|
|
| `VoiceLiveClient` | Main client for creating sessions |
|
|
| `VoiceLiveSession` | Active WebSocket session |
|
|
| `VoiceLiveSessionOptions` | Session configuration |
|
|
| `AzureStandardVoice` | Standard Azure voice provider |
|
|
| `AzureSemanticVadTurnDetection` | Voice activity detection |
|
|
| `VoiceLiveFunctionDefinition` | Function tool definition |
|
|
| `UserMessageItem` | User text message |
|
|
| `FunctionCallOutputItem` | Function call response |
|
|
| `SessionUpdateResponseAudioDelta` | Audio chunk event |
|
|
| `SessionUpdateResponseTextDelta` | Text chunk event |
|
|
|
|
## Best Practices
|
|
|
|
1. **Always set both modalities** — Include `Text` and `Audio` for voice assistants
|
|
2. **Use `AzureSemanticVadTurnDetection`** — Provides natural conversation flow
|
|
3. **Configure appropriate silence duration** — 500ms typical to avoid premature cutoffs
|
|
4. **Use `using` statement** — Ensures proper session disposal
|
|
5. **Handle all event types** — Check for errors, audio, text, and function calls
|
|
6. **Use DefaultAzureCredential** — Never hardcode API keys
|
|
|
|
## Error Handling
|
|
|
|
```csharp
|
|
if (serverEvent is SessionUpdateError error)
|
|
{
|
|
if (error.Error.Message.Contains("Cancellation failed: no active response"))
|
|
{
|
|
// Benign error, can ignore
|
|
}
|
|
else
|
|
{
|
|
Console.WriteLine($"Error: {error.Error.Message}");
|
|
}
|
|
}
|
|
```
|
|
|
|
## Audio Configuration
|
|
|
|
- **Input Format**: `InputAudioFormat.Pcm16` (16-bit PCM)
|
|
- **Output Format**: `OutputAudioFormat.Pcm16`
|
|
- **Sample Rate**: 24kHz recommended
|
|
- **Channels**: Mono
|
|
|
|
## Related SDKs
|
|
|
|
| SDK | Purpose | Install |
|
|
|-----|---------|---------|
|
|
| `Azure.AI.VoiceLive` | Real-time voice (this SDK) | `dotnet add package Azure.AI.VoiceLive` |
|
|
| `Microsoft.CognitiveServices.Speech` | Speech-to-text, text-to-speech | `dotnet add package Microsoft.CognitiveServices.Speech` |
|
|
| `NAudio` | Audio capture/playback | `dotnet add package NAudio` |
|
|
|
|
## Reference Links
|
|
|
|
| Resource | URL |
|
|
|----------|-----|
|
|
| NuGet Package | https://www.nuget.org/packages/Azure.AI.VoiceLive |
|
|
| API Reference | https://learn.microsoft.com/dotnet/api/azure.ai.voicelive |
|
|
| GitHub Source | https://github.com/Azure/azure-sdk-for-net/tree/main/sdk/ai/Azure.AI.VoiceLive |
|
|
| Quickstart | https://learn.microsoft.com/azure/ai-services/speech-service/voice-live-quickstart |
|