refactor: flatten Microsoft skills from nested to flat directory structure

Rewrote sync_microsoft_skills.py (v4) to use each SKILL.md's frontmatter 'name' field as the flat directory name under skills/, replacing the nested skills/official/microsoft/<lang>/<category>/<service>/ hierarchy. This fixes CI failures caused by the indexing, validation, and catalog scripts expecting skills/<id>/SKILL.md (depth 1). Changes: - Rewrite scripts/sync_microsoft_skills.py for flat output with collision detection - Update scripts/tests/inspect_microsoft_repo.py for flat name mapping - Update scripts/tests/test_comprehensive_coverage.py for name uniqueness checks - Delete skills/official/ nested directory - Add 129 Microsoft skills as flat directories (e.g. skills/azure-mgmt-botservice-dotnet/) - Move attribution files to docs/ (LICENSE-MICROSOFT, microsoft-skills-attribution.json) - Rebuild skills_index.json, CATALOG.md, README.md (845 total skills)
2026-02-12 00:07:15 +05:00
parent e06454dafd
commit e7ae616385
142 changed files with 5683 additions and 6097 deletions
--- a/skills/azure-speech-to-text-rest-py/SKILL.md
+++ b/skills/azure-speech-to-text-rest-py/SKILL.md
@@ -0,0 +1,372 @@
+---
+name: azure-speech-to-text-rest-py
+description: |
+  Azure Speech to Text REST API for short audio (Python). Use for simple speech recognition of audio files up to 60 seconds without the Speech SDK.
+  Triggers: "speech to text REST", "short audio transcription", "speech recognition REST API", "STT REST", "recognize speech REST".
+  DO NOT USE FOR: Long audio (>60 seconds), real-time streaming, batch transcription, custom speech models, speech translation. Use Speech SDK or Batch Transcription API instead.
+---
+
+# Azure Speech to Text REST API for Short Audio
+
+Simple REST API for speech-to-text transcription of short audio files (up to 60 seconds). No SDK required - just HTTP requests.
+
+## Prerequisites
+
+1. **Azure subscription** - [Create one free](https://azure.microsoft.com/free/)
+2. **Speech resource** - Create in [Azure Portal](https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices)
+3. **Get credentials** - After deployment, go to resource > Keys and Endpoint
+
+## Environment Variables
+
+```bash
+# Required
+AZURE_SPEECH_KEY=<your-speech-resource-key>
+AZURE_SPEECH_REGION=<region>  # e.g., eastus, westus2, westeurope
+
+# Alternative: Use endpoint directly
+AZURE_SPEECH_ENDPOINT=https://<region>.stt.speech.microsoft.com
+```
+
+## Installation
+
+```bash
+pip install requests
+```
+
+## Quick Start
+
+```python
+import os
+import requests
+
+def transcribe_audio(audio_file_path: str, language: str = "en-US") -> dict:
+    """Transcribe short audio file (max 60 seconds) using REST API."""
+    region = os.environ["AZURE_SPEECH_REGION"]
+    api_key = os.environ["AZURE_SPEECH_KEY"]
+    
+    url = f"https://{region}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1"
+    
+    headers = {
+        "Ocp-Apim-Subscription-Key": api_key,
+        "Content-Type": "audio/wav; codecs=audio/pcm; samplerate=16000",
+        "Accept": "application/json"
+    }
+    
+    params = {
+        "language": language,
+        "format": "detailed"  # or "simple"
+    }
+    
+    with open(audio_file_path, "rb") as audio_file:
+        response = requests.post(url, headers=headers, params=params, data=audio_file)
+    
+    response.raise_for_status()
+    return response.json()
+
+# Usage
+result = transcribe_audio("audio.wav", "en-US")
+print(result["DisplayText"])
+```
+
+## Audio Requirements
+
+| Format | Codec | Sample Rate | Notes |
+|--------|-------|-------------|-------|
+| WAV | PCM | 16 kHz, mono | **Recommended** |
+| OGG | OPUS | 16 kHz, mono | Smaller file size |
+
+**Limitations:**
+- Maximum 60 seconds of audio
+- For pronunciation assessment: maximum 30 seconds
+- No partial/interim results (final only)
+
+## Content-Type Headers
+
+```python
+# WAV PCM 16kHz
+"Content-Type": "audio/wav; codecs=audio/pcm; samplerate=16000"
+
+# OGG OPUS
+"Content-Type": "audio/ogg; codecs=opus"
+```
+
+## Response Formats
+
+### Simple Format (default)
+
+```python
+params = {"language": "en-US", "format": "simple"}
+```
+
+```json
+{
+  "RecognitionStatus": "Success",
+  "DisplayText": "Remind me to buy 5 pencils.",
+  "Offset": "1236645672289",
+  "Duration": "1236645672289"
+}
+```
+
+### Detailed Format
+
+```python
+params = {"language": "en-US", "format": "detailed"}
+```
+
+```json
+{
+  "RecognitionStatus": "Success",
+  "Offset": "1236645672289",
+  "Duration": "1236645672289",
+  "NBest": [
+    {
+      "Confidence": 0.9052885,
+      "Display": "What's the weather like?",
+      "ITN": "what's the weather like",
+      "Lexical": "what's the weather like",
+      "MaskedITN": "what's the weather like"
+    }
+  ]
+}
+```
+
+## Chunked Transfer (Recommended)
+
+For lower latency, stream audio in chunks:
+
+```python
+import os
+import requests
+
+def transcribe_chunked(audio_file_path: str, language: str = "en-US") -> dict:
+    """Stream audio in chunks for lower latency."""
+    region = os.environ["AZURE_SPEECH_REGION"]
+    api_key = os.environ["AZURE_SPEECH_KEY"]
+    
+    url = f"https://{region}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1"
+    
+    headers = {
+        "Ocp-Apim-Subscription-Key": api_key,
+        "Content-Type": "audio/wav; codecs=audio/pcm; samplerate=16000",
+        "Accept": "application/json",
+        "Transfer-Encoding": "chunked",
+        "Expect": "100-continue"
+    }
+    
+    params = {"language": language, "format": "detailed"}
+    
+    def generate_chunks(file_path: str, chunk_size: int = 1024):
+        with open(file_path, "rb") as f:
+            while chunk := f.read(chunk_size):
+                yield chunk
+    
+    response = requests.post(
+        url, 
+        headers=headers, 
+        params=params, 
+        data=generate_chunks(audio_file_path)
+    )
+    
+    response.raise_for_status()
+    return response.json()
+```
+
+## Authentication Options
+
+### Option 1: Subscription Key (Simple)
+
+```python
+headers = {
+    "Ocp-Apim-Subscription-Key": os.environ["AZURE_SPEECH_KEY"]
+}
+```
+
+### Option 2: Bearer Token
+
+```python
+import requests
+import os
+
+def get_access_token() -> str:
+    """Get access token from the token endpoint."""
+    region = os.environ["AZURE_SPEECH_REGION"]
+    api_key = os.environ["AZURE_SPEECH_KEY"]
+    
+    token_url = f"https://{region}.api.cognitive.microsoft.com/sts/v1.0/issueToken"
+    
+    response = requests.post(
+        token_url,
+        headers={
+            "Ocp-Apim-Subscription-Key": api_key,
+            "Content-Type": "application/x-www-form-urlencoded",
+            "Content-Length": "0"
+        }
+    )
+    response.raise_for_status()
+    return response.text
+
+# Use token in requests (valid for 10 minutes)
+token = get_access_token()
+headers = {
+    "Authorization": f"Bearer {token}",
+    "Content-Type": "audio/wav; codecs=audio/pcm; samplerate=16000",
+    "Accept": "application/json"
+}
+```
+
+## Query Parameters
+
+| Parameter | Required | Values | Description |
+|-----------|----------|--------|-------------|
+| `language` | **Yes** | `en-US`, `de-DE`, etc. | Language of speech |
+| `format` | No | `simple`, `detailed` | Result format (default: simple) |
+| `profanity` | No | `masked`, `removed`, `raw` | Profanity handling (default: masked) |
+
+## Recognition Status Values
+
+| Status | Description |
+|--------|-------------|
+| `Success` | Recognition succeeded |
+| `NoMatch` | Speech detected but no words matched |
+| `InitialSilenceTimeout` | Only silence detected |
+| `BabbleTimeout` | Only noise detected |
+| `Error` | Internal service error |
+
+## Profanity Handling
+
+```python
+# Mask profanity with asterisks (default)
+params = {"language": "en-US", "profanity": "masked"}
+
+# Remove profanity entirely
+params = {"language": "en-US", "profanity": "removed"}
+
+# Include profanity as-is
+params = {"language": "en-US", "profanity": "raw"}
+```
+
+## Error Handling
+
+```python
+import requests
+
+def transcribe_with_error_handling(audio_path: str, language: str = "en-US") -> dict | None:
+    """Transcribe with proper error handling."""
+    region = os.environ["AZURE_SPEECH_REGION"]
+    api_key = os.environ["AZURE_SPEECH_KEY"]
+    
+    url = f"https://{region}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1"
+    
+    try:
+        with open(audio_path, "rb") as audio_file:
+            response = requests.post(
+                url,
+                headers={
+                    "Ocp-Apim-Subscription-Key": api_key,
+                    "Content-Type": "audio/wav; codecs=audio/pcm; samplerate=16000",
+                    "Accept": "application/json"
+                },
+                params={"language": language, "format": "detailed"},
+                data=audio_file
+            )
+        
+        if response.status_code == 200:
+            result = response.json()
+            if result.get("RecognitionStatus") == "Success":
+                return result
+            else:
+                print(f"Recognition failed: {result.get('RecognitionStatus')}")
+                return None
+        elif response.status_code == 400:
+            print(f"Bad request: Check language code or audio format")
+        elif response.status_code == 401:
+            print(f"Unauthorized: Check API key or token")
+        elif response.status_code == 403:
+            print(f"Forbidden: Missing authorization header")
+        else:
+            print(f"Error {response.status_code}: {response.text}")
+        
+        return None
+        
+    except requests.exceptions.RequestException as e:
+        print(f"Request failed: {e}")
+        return None
+```
+
+## Async Version
+
+```python
+import os
+import aiohttp
+import asyncio
+
+async def transcribe_async(audio_file_path: str, language: str = "en-US") -> dict:
+    """Async version using aiohttp."""
+    region = os.environ["AZURE_SPEECH_REGION"]
+    api_key = os.environ["AZURE_SPEECH_KEY"]
+    
+    url = f"https://{region}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1"
+    
+    headers = {
+        "Ocp-Apim-Subscription-Key": api_key,
+        "Content-Type": "audio/wav; codecs=audio/pcm; samplerate=16000",
+        "Accept": "application/json"
+    }
+    
+    params = {"language": language, "format": "detailed"}
+    
+    async with aiohttp.ClientSession() as session:
+        with open(audio_file_path, "rb") as f:
+            audio_data = f.read()
+        
+        async with session.post(url, headers=headers, params=params, data=audio_data) as response:
+            response.raise_for_status()
+            return await response.json()
+
+# Usage
+result = asyncio.run(transcribe_async("audio.wav", "en-US"))
+print(result["DisplayText"])
+```
+
+## Supported Languages
+
+Common language codes (see [full list](https://learn.microsoft.com/azure/ai-services/speech-service/language-support)):
+
+| Code | Language |
+|------|----------|
+| `en-US` | English (US) |
+| `en-GB` | English (UK) |
+| `de-DE` | German |
+| `fr-FR` | French |
+| `es-ES` | Spanish (Spain) |
+| `es-MX` | Spanish (Mexico) |
+| `zh-CN` | Chinese (Mandarin) |
+| `ja-JP` | Japanese |
+| `ko-KR` | Korean |
+| `pt-BR` | Portuguese (Brazil) |
+
+## Best Practices
+
+1. **Use WAV PCM 16kHz mono** for best compatibility
+2. **Enable chunked transfer** for lower latency
+3. **Cache access tokens** for 9 minutes (valid for 10)
+4. **Specify the correct language** for accurate recognition
+5. **Use detailed format** when you need confidence scores
+6. **Handle all RecognitionStatus values** in production code
+
+## When NOT to Use This API
+
+Use the Speech SDK or Batch Transcription API instead when you need:
+
+- Audio longer than 60 seconds
+- Real-time streaming transcription
+- Partial/interim results
+- Speech translation
+- Custom speech models
+- Batch transcription of many files
+
+## Reference Files
+
+| File | Contents |
+|------|----------|
+| [references/pronunciation-assessment.md](references/pronunciation-assessment.md) | Pronunciation assessment parameters and scoring |