feat: Add Official Microsoft & Gemini Skills (845+ Total)
🚀 Impact Significantly expands the capabilities of **Antigravity Awesome Skills** by integrating official skill collections from **Microsoft** and **Google Gemini**. This update increases the total skill count to **845+**, making the library even more comprehensive for AI coding assistants. ✨ Key Changes 1. New Official Skills - **Microsoft Skills**: Added a massive collection of official skills from [microsoft/skills](https://github.com/microsoft/skills). - Includes Azure, .NET, Python, TypeScript, and Semantic Kernel skills. - Preserves the original directory structure under `skills/official/microsoft/`. - Includes plugin skills from the `.github/plugins` directory. - **Gemini Skills**: Added official Gemini API development skills under `skills/gemini-api-dev/`. 2. New Scripts & Tooling - **`scripts/sync_microsoft_skills.py`**: A robust synchronization script that: - Clones the official Microsoft repository. - Preserves the original directory heirarchy. - Handles symlinks and plugin locations. - Generates attribution metadata. - **`scripts/tests/inspect_microsoft_repo.py`**: Debug tool to inspect the remote repository structure. - **`scripts/tests/test_comprehensive_coverage.py`**: Verification script to ensure 100% of skills are captured during sync. 3. Core Improvements - **`scripts/generate_index.py`**: Enhanced frontmatter parsing to safely handle unquoted values containing `@` symbols and commas (fixing issues with some Microsoft skill descriptions). - **`package.json`**: Added `sync:microsoft` and `sync:all-official` scripts for easy maintenance. 4. Documentation - Updated `README.md` to reflect the new skill counts (845+) and added Microsoft/Gemini to the provider list. - Updated `CATALOG.md` and `skills_index.json` with the new skills. 🧪 Verification - Ran `scripts/tests/test_comprehensive_coverage.py` to verify all Microsoft skills are detected. - Validated `generate_index.py` fixes by successfully indexing the new skills.
This commit is contained in:
309
skills/official/microsoft/python/foundry/voicelive/SKILL.md
Normal file
309
skills/official/microsoft/python/foundry/voicelive/SKILL.md
Normal file
@@ -0,0 +1,309 @@
|
||||
---
|
||||
name: azure-ai-voicelive-py
|
||||
description: Build real-time voice AI applications using Azure AI Voice Live SDK (azure-ai-voicelive). Use this skill when creating Python applications that need real-time bidirectional audio communication with Azure AI, including voice assistants, voice-enabled chatbots, real-time speech-to-speech translation, voice-driven avatars, or any WebSocket-based audio streaming with AI models. Supports Server VAD (Voice Activity Detection), turn-based conversation, function calling, MCP tools, avatar integration, and transcription.
|
||||
package: azure-ai-voicelive
|
||||
---
|
||||
|
||||
# Azure AI Voice Live SDK
|
||||
|
||||
Build real-time voice AI applications with bidirectional WebSocket communication.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install azure-ai-voicelive aiohttp azure-identity
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
AZURE_COGNITIVE_SERVICES_ENDPOINT=https://<region>.api.cognitive.microsoft.com
|
||||
# For API key auth (not recommended for production)
|
||||
AZURE_COGNITIVE_SERVICES_KEY=<api-key>
|
||||
```
|
||||
|
||||
## Authentication
|
||||
|
||||
**DefaultAzureCredential (preferred)**:
|
||||
```python
|
||||
from azure.ai.voicelive.aio import connect
|
||||
from azure.identity.aio import DefaultAzureCredential
|
||||
|
||||
async with connect(
|
||||
endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"],
|
||||
credential=DefaultAzureCredential(),
|
||||
model="gpt-4o-realtime-preview",
|
||||
credential_scopes=["https://cognitiveservices.azure.com/.default"]
|
||||
) as conn:
|
||||
...
|
||||
```
|
||||
|
||||
**API Key**:
|
||||
```python
|
||||
from azure.ai.voicelive.aio import connect
|
||||
from azure.core.credentials import AzureKeyCredential
|
||||
|
||||
async with connect(
|
||||
endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"],
|
||||
credential=AzureKeyCredential(os.environ["AZURE_COGNITIVE_SERVICES_KEY"]),
|
||||
model="gpt-4o-realtime-preview"
|
||||
) as conn:
|
||||
...
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
import os
|
||||
from azure.ai.voicelive.aio import connect
|
||||
from azure.identity.aio import DefaultAzureCredential
|
||||
|
||||
async def main():
|
||||
async with connect(
|
||||
endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"],
|
||||
credential=DefaultAzureCredential(),
|
||||
model="gpt-4o-realtime-preview",
|
||||
credential_scopes=["https://cognitiveservices.azure.com/.default"]
|
||||
) as conn:
|
||||
# Update session with instructions
|
||||
await conn.session.update(session={
|
||||
"instructions": "You are a helpful assistant.",
|
||||
"modalities": ["text", "audio"],
|
||||
"voice": "alloy"
|
||||
})
|
||||
|
||||
# Listen for events
|
||||
async for event in conn:
|
||||
print(f"Event: {event.type}")
|
||||
if event.type == "response.audio_transcript.done":
|
||||
print(f"Transcript: {event.transcript}")
|
||||
elif event.type == "response.done":
|
||||
break
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
## Core Architecture
|
||||
|
||||
### Connection Resources
|
||||
|
||||
The `VoiceLiveConnection` exposes these resources:
|
||||
|
||||
| Resource | Purpose | Key Methods |
|
||||
|----------|---------|-------------|
|
||||
| `conn.session` | Session configuration | `update(session=...)` |
|
||||
| `conn.response` | Model responses | `create()`, `cancel()` |
|
||||
| `conn.input_audio_buffer` | Audio input | `append()`, `commit()`, `clear()` |
|
||||
| `conn.output_audio_buffer` | Audio output | `clear()` |
|
||||
| `conn.conversation` | Conversation state | `item.create()`, `item.delete()`, `item.truncate()` |
|
||||
| `conn.transcription_session` | Transcription config | `update(session=...)` |
|
||||
|
||||
## Session Configuration
|
||||
|
||||
```python
|
||||
from azure.ai.voicelive.models import RequestSession, FunctionTool
|
||||
|
||||
await conn.session.update(session=RequestSession(
|
||||
instructions="You are a helpful voice assistant.",
|
||||
modalities=["text", "audio"],
|
||||
voice="alloy", # or "echo", "shimmer", "sage", etc.
|
||||
input_audio_format="pcm16",
|
||||
output_audio_format="pcm16",
|
||||
turn_detection={
|
||||
"type": "server_vad",
|
||||
"threshold": 0.5,
|
||||
"prefix_padding_ms": 300,
|
||||
"silence_duration_ms": 500
|
||||
},
|
||||
tools=[
|
||||
FunctionTool(
|
||||
type="function",
|
||||
name="get_weather",
|
||||
description="Get current weather",
|
||||
parameters={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"location": {"type": "string"}
|
||||
},
|
||||
"required": ["location"]
|
||||
}
|
||||
)
|
||||
]
|
||||
))
|
||||
```
|
||||
|
||||
## Audio Streaming
|
||||
|
||||
### Send Audio (Base64 PCM16)
|
||||
|
||||
```python
|
||||
import base64
|
||||
|
||||
# Read audio chunk (16-bit PCM, 24kHz mono)
|
||||
audio_chunk = await read_audio_from_microphone()
|
||||
b64_audio = base64.b64encode(audio_chunk).decode()
|
||||
|
||||
await conn.input_audio_buffer.append(audio=b64_audio)
|
||||
```
|
||||
|
||||
### Receive Audio
|
||||
|
||||
```python
|
||||
async for event in conn:
|
||||
if event.type == "response.audio.delta":
|
||||
audio_bytes = base64.b64decode(event.delta)
|
||||
await play_audio(audio_bytes)
|
||||
elif event.type == "response.audio.done":
|
||||
print("Audio complete")
|
||||
```
|
||||
|
||||
## Event Handling
|
||||
|
||||
```python
|
||||
async for event in conn:
|
||||
match event.type:
|
||||
# Session events
|
||||
case "session.created":
|
||||
print(f"Session: {event.session}")
|
||||
case "session.updated":
|
||||
print("Session updated")
|
||||
|
||||
# Audio input events
|
||||
case "input_audio_buffer.speech_started":
|
||||
print(f"Speech started at {event.audio_start_ms}ms")
|
||||
case "input_audio_buffer.speech_stopped":
|
||||
print(f"Speech stopped at {event.audio_end_ms}ms")
|
||||
|
||||
# Transcription events
|
||||
case "conversation.item.input_audio_transcription.completed":
|
||||
print(f"User said: {event.transcript}")
|
||||
case "conversation.item.input_audio_transcription.delta":
|
||||
print(f"Partial: {event.delta}")
|
||||
|
||||
# Response events
|
||||
case "response.created":
|
||||
print(f"Response started: {event.response.id}")
|
||||
case "response.audio_transcript.delta":
|
||||
print(event.delta, end="", flush=True)
|
||||
case "response.audio.delta":
|
||||
audio = base64.b64decode(event.delta)
|
||||
case "response.done":
|
||||
print(f"Response complete: {event.response.status}")
|
||||
|
||||
# Function calls
|
||||
case "response.function_call_arguments.done":
|
||||
result = handle_function(event.name, event.arguments)
|
||||
await conn.conversation.item.create(item={
|
||||
"type": "function_call_output",
|
||||
"call_id": event.call_id,
|
||||
"output": json.dumps(result)
|
||||
})
|
||||
await conn.response.create()
|
||||
|
||||
# Errors
|
||||
case "error":
|
||||
print(f"Error: {event.error.message}")
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Manual Turn Mode (No VAD)
|
||||
|
||||
```python
|
||||
await conn.session.update(session={"turn_detection": None})
|
||||
|
||||
# Manually control turns
|
||||
await conn.input_audio_buffer.append(audio=b64_audio)
|
||||
await conn.input_audio_buffer.commit() # End of user turn
|
||||
await conn.response.create() # Trigger response
|
||||
```
|
||||
|
||||
### Interrupt Handling
|
||||
|
||||
```python
|
||||
async for event in conn:
|
||||
if event.type == "input_audio_buffer.speech_started":
|
||||
# User interrupted - cancel current response
|
||||
await conn.response.cancel()
|
||||
await conn.output_audio_buffer.clear()
|
||||
```
|
||||
|
||||
### Conversation History
|
||||
|
||||
```python
|
||||
# Add system message
|
||||
await conn.conversation.item.create(item={
|
||||
"type": "message",
|
||||
"role": "system",
|
||||
"content": [{"type": "input_text", "text": "Be concise."}]
|
||||
})
|
||||
|
||||
# Add user message
|
||||
await conn.conversation.item.create(item={
|
||||
"type": "message",
|
||||
"role": "user",
|
||||
"content": [{"type": "input_text", "text": "Hello!"}]
|
||||
})
|
||||
|
||||
await conn.response.create()
|
||||
```
|
||||
|
||||
## Voice Options
|
||||
|
||||
| Voice | Description |
|
||||
|-------|-------------|
|
||||
| `alloy` | Neutral, balanced |
|
||||
| `echo` | Warm, conversational |
|
||||
| `shimmer` | Clear, professional |
|
||||
| `sage` | Calm, authoritative |
|
||||
| `coral` | Friendly, upbeat |
|
||||
| `ash` | Deep, measured |
|
||||
| `ballad` | Expressive |
|
||||
| `verse` | Storytelling |
|
||||
|
||||
Azure voices: Use `AzureStandardVoice`, `AzureCustomVoice`, or `AzurePersonalVoice` models.
|
||||
|
||||
## Audio Formats
|
||||
|
||||
| Format | Sample Rate | Use Case |
|
||||
|--------|-------------|----------|
|
||||
| `pcm16` | 24kHz | Default, high quality |
|
||||
| `pcm16-8000hz` | 8kHz | Telephony |
|
||||
| `pcm16-16000hz` | 16kHz | Voice assistants |
|
||||
| `g711_ulaw` | 8kHz | Telephony (US) |
|
||||
| `g711_alaw` | 8kHz | Telephony (EU) |
|
||||
|
||||
## Turn Detection Options
|
||||
|
||||
```python
|
||||
# Server VAD (default)
|
||||
{"type": "server_vad", "threshold": 0.5, "silence_duration_ms": 500}
|
||||
|
||||
# Azure Semantic VAD (smarter detection)
|
||||
{"type": "azure_semantic_vad"}
|
||||
{"type": "azure_semantic_vad_en"} # English optimized
|
||||
{"type": "azure_semantic_vad_multilingual"}
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
```python
|
||||
from azure.ai.voicelive.aio import ConnectionError, ConnectionClosed
|
||||
|
||||
try:
|
||||
async with connect(...) as conn:
|
||||
async for event in conn:
|
||||
if event.type == "error":
|
||||
print(f"API Error: {event.error.code} - {event.error.message}")
|
||||
except ConnectionClosed as e:
|
||||
print(f"Connection closed: {e.code} - {e.reason}")
|
||||
except ConnectionError as e:
|
||||
print(f"Connection error: {e}")
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- **Detailed API Reference**: See [references/api-reference.md](references/api-reference.md)
|
||||
- **Complete Examples**: See [references/examples.md](references/examples.md)
|
||||
- **All Models & Types**: See [references/models.md](references/models.md)
|
||||
Reference in New Issue
Block a user