feat: add 4 universal skills from cli-ai-skills
- Add audio-transcriber skill (v1.2.0): Transform audio to Markdown with Whisper - Add youtube-summarizer skill (v1.2.0): Generate summaries from YouTube videos - Update prompt-engineer skill: Enhanced with 11 optimization frameworks - Update skill-creator skill: Improved automation workflow All skills are zero-config, cross-platform (Claude Code, Copilot CLI, Codex) and follow Quality Bar V4 standards. Source: https://github.com/ericgandrade/cli-ai-skills
This commit is contained in:
558
skills/audio-transcriber/SKILL.md
Normal file
558
skills/audio-transcriber/SKILL.md
Normal file
@@ -0,0 +1,558 @@
|
||||
---
|
||||
name: audio-transcriber
|
||||
description: "Transform audio recordings into professional Markdown documentation with intelligent summaries using LLM integration"
|
||||
version: 1.2.0
|
||||
author: Eric Andrade
|
||||
created: 2025-02-01
|
||||
updated: 2026-02-04
|
||||
platforms: [github-copilot-cli, claude-code, codex]
|
||||
category: content
|
||||
tags: [audio, transcription, whisper, meeting-minutes, speech-to-text]
|
||||
risk: safe
|
||||
---
|
||||
|
||||
## Purpose
|
||||
|
||||
This skill automates audio-to-text transcription with professional Markdown output, extracting rich technical metadata (speakers, timestamps, language, file size, duration) and generating structured meeting minutes and executive summaries. It uses Faster-Whisper or Whisper with zero configuration, working universally across projects without hardcoded paths or API keys.
|
||||
|
||||
Inspired by tools like Plaud, this skill transforms raw audio recordings into actionable documentation, making it ideal for meetings, interviews, lectures, and content analysis.
|
||||
|
||||
## When to Use
|
||||
|
||||
Invoke this skill when:
|
||||
|
||||
- User needs to transcribe audio/video files to text
|
||||
- User wants meeting minutes automatically generated from recordings
|
||||
- User requires speaker identification (diarization) in conversations
|
||||
- User needs subtitles/captions (SRT, VTT formats)
|
||||
- User wants executive summaries of long audio content
|
||||
- User asks variations of "transcribe this audio", "convert audio to text", "generate meeting notes from recording"
|
||||
- User has audio files in common formats (MP3, WAV, M4A, OGG, FLAC, WEBM)
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 0: Discovery (Auto-detect Transcription Tools)
|
||||
|
||||
**Objective:** Identify available transcription engines without user configuration.
|
||||
|
||||
**Actions:**
|
||||
|
||||
Run detection commands to find installed tools:
|
||||
|
||||
```bash
|
||||
# Check for Faster-Whisper (preferred - 4-5x faster)
|
||||
if python3 -c "import faster_whisper" 2>/dev/null; then
|
||||
TRANSCRIBER="faster-whisper"
|
||||
echo "✅ Faster-Whisper detected (optimized)"
|
||||
# Fallback to original Whisper
|
||||
elif python3 -c "import whisper" 2>/dev/null; then
|
||||
TRANSCRIBER="whisper"
|
||||
echo "✅ OpenAI Whisper detected"
|
||||
else
|
||||
TRANSCRIBER="none"
|
||||
echo "⚠️ No transcription tool found"
|
||||
fi
|
||||
|
||||
# Check for ffmpeg (audio format conversion)
|
||||
if command -v ffmpeg &>/dev/null; then
|
||||
echo "✅ ffmpeg available (format conversion enabled)"
|
||||
else
|
||||
echo "ℹ️ ffmpeg not found (limited format support)"
|
||||
fi
|
||||
```
|
||||
|
||||
**If no transcriber found:**
|
||||
|
||||
Offer automatic installation using the provided script:
|
||||
|
||||
```bash
|
||||
echo "⚠️ No transcription tool found"
|
||||
echo ""
|
||||
echo "🔧 Auto-install dependencies? (Recommended)"
|
||||
read -p "Run installation script? [Y/n]: " AUTO_INSTALL
|
||||
|
||||
if [[ ! "$AUTO_INSTALL" =~ ^[Nn] ]]; then
|
||||
# Get skill directory (works for both repo and symlinked installations)
|
||||
SKILL_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
|
||||
# Run installation script
|
||||
if [[ -f "$SKILL_DIR/scripts/install-requirements.sh" ]]; then
|
||||
bash "$SKILL_DIR/scripts/install-requirements.sh"
|
||||
else
|
||||
echo "❌ Installation script not found"
|
||||
echo ""
|
||||
echo "📦 Manual installation:"
|
||||
echo " pip install faster-whisper # Recommended"
|
||||
echo " pip install openai-whisper # Alternative"
|
||||
echo " brew install ffmpeg # Optional (macOS)"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Verify installation succeeded
|
||||
if python3 -c "import faster_whisper" 2>/dev/null || python3 -c "import whisper" 2>/dev/null; then
|
||||
echo "✅ Installation successful! Proceeding with transcription..."
|
||||
else
|
||||
echo "❌ Installation failed. Please install manually."
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
echo ""
|
||||
echo "📦 Manual installation required:"
|
||||
echo ""
|
||||
echo "Recommended (fastest):"
|
||||
echo " pip install faster-whisper"
|
||||
echo ""
|
||||
echo "Alternative (original):"
|
||||
echo " pip install openai-whisper"
|
||||
echo ""
|
||||
echo "Optional (format conversion):"
|
||||
echo " brew install ffmpeg # macOS"
|
||||
echo " apt install ffmpeg # Linux"
|
||||
echo ""
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
This ensures users can install dependencies with one confirmation, or opt for manual installation if preferred.
|
||||
|
||||
**If transcriber found:**
|
||||
|
||||
Proceed to Step 0b (CLI Detection).
|
||||
|
||||
|
||||
### Step 1: Validate Audio File
|
||||
|
||||
**Objective:** Verify file exists, check format, and extract metadata.
|
||||
|
||||
**Actions:**
|
||||
|
||||
1. **Accept file path or URL** from user:
|
||||
- Local file: `meeting.mp3`
|
||||
- URL: `https://example.com/audio.mp3` (download to temp directory)
|
||||
|
||||
2. **Verify file exists:**
|
||||
|
||||
```bash
|
||||
if [[ ! -f "$AUDIO_FILE" ]]; then
|
||||
echo "❌ File not found: $AUDIO_FILE"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
3. **Extract metadata** using ffprobe or file utilities:
|
||||
|
||||
```bash
|
||||
# Get file size
|
||||
FILE_SIZE=$(du -h "$AUDIO_FILE" | cut -f1)
|
||||
|
||||
# Get duration and format using ffprobe
|
||||
DURATION=$(ffprobe -v error -show_entries format=duration \
|
||||
-of default=noprint_wrappers=1:nokey=1 "$AUDIO_FILE" 2>/dev/null)
|
||||
FORMAT=$(ffprobe -v error -select_streams a:0 -show_entries \
|
||||
stream=codec_name -of default=noprint_wrappers=1:nokey=1 "$AUDIO_FILE" 2>/dev/null)
|
||||
|
||||
# Convert duration to HH:MM:SS
|
||||
DURATION_HMS=$(date -u -r "$DURATION" +%H:%M:%S 2>/dev/null || echo "Unknown")
|
||||
```
|
||||
|
||||
4. **Check file size** (warn if large for cloud APIs):
|
||||
|
||||
```bash
|
||||
SIZE_MB=$(du -m "$AUDIO_FILE" | cut -f1)
|
||||
if [[ $SIZE_MB -gt 25 ]]; then
|
||||
echo "⚠️ Large file ($FILE_SIZE) - processing may take several minutes"
|
||||
fi
|
||||
```
|
||||
|
||||
5. **Validate format** (supported: MP3, WAV, M4A, OGG, FLAC, WEBM):
|
||||
|
||||
```bash
|
||||
EXTENSION="${AUDIO_FILE##*.}"
|
||||
SUPPORTED_FORMATS=("mp3" "wav" "m4a" "ogg" "flac" "webm" "mp4")
|
||||
|
||||
if [[ ! " ${SUPPORTED_FORMATS[@]} " =~ " ${EXTENSION,,} " ]]; then
|
||||
echo "⚠️ Unsupported format: $EXTENSION"
|
||||
if command -v ffmpeg &>/dev/null; then
|
||||
echo "🔄 Converting to WAV..."
|
||||
ffmpeg -i "$AUDIO_FILE" -ar 16000 "${AUDIO_FILE%.*}.wav" -y
|
||||
AUDIO_FILE="${AUDIO_FILE%.*}.wav"
|
||||
else
|
||||
echo "❌ Install ffmpeg to convert formats: brew install ffmpeg"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
|
||||
### Step 3: Generate Markdown Output
|
||||
|
||||
**Objective:** Create structured Markdown with metadata, transcription, meeting minutes, and summary.
|
||||
|
||||
**Output Template:**
|
||||
|
||||
```markdown
|
||||
# Audio Transcription Report
|
||||
|
||||
## 📊 Metadata
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **File Name** | {filename} |
|
||||
| **File Size** | {file_size} |
|
||||
| **Duration** | {duration_hms} |
|
||||
| **Language** | {language} ({language_code}) |
|
||||
| **Processed Date** | {process_date} |
|
||||
| **Speakers Identified** | {num_speakers} |
|
||||
| **Transcription Engine** | {engine} (model: {model}) |
|
||||
|
||||
|
||||
## 📋 Meeting Minutes
|
||||
|
||||
### Participants
|
||||
- {speaker_1}
|
||||
- {speaker_2}
|
||||
- ...
|
||||
|
||||
### Topics Discussed
|
||||
1. **{topic_1}** ({timestamp})
|
||||
- {key_point_1}
|
||||
- {key_point_2}
|
||||
|
||||
2. **{topic_2}** ({timestamp})
|
||||
- {key_point_1}
|
||||
|
||||
### Decisions Made
|
||||
- ✅ {decision_1}
|
||||
- ✅ {decision_2}
|
||||
|
||||
### Action Items
|
||||
- [ ] **{action_1}** - Assigned to: {speaker} - Due: {date_if_mentioned}
|
||||
- [ ] **{action_2}** - Assigned to: {speaker}
|
||||
|
||||
|
||||
*Generated by audio-transcriber skill v1.0.0*
|
||||
*Transcription engine: {engine} | Processing time: {elapsed_time}s*
|
||||
```
|
||||
|
||||
**Implementation:**
|
||||
|
||||
Use Python or bash with AI model (Claude/GPT) for intelligent summarization:
|
||||
|
||||
```python
|
||||
def generate_meeting_minutes(segments):
|
||||
"""Extract topics, decisions, action items from transcription."""
|
||||
|
||||
# Group segments by topic (simple clustering by timestamps)
|
||||
topics = cluster_by_topic(segments)
|
||||
|
||||
# Identify action items (keywords: "should", "will", "need to", "action")
|
||||
action_items = extract_action_items(segments)
|
||||
|
||||
# Identify decisions (keywords: "decided", "agreed", "approved")
|
||||
decisions = extract_decisions(segments)
|
||||
|
||||
return {
|
||||
"topics": topics,
|
||||
"decisions": decisions,
|
||||
"action_items": action_items
|
||||
}
|
||||
|
||||
def generate_summary(segments, max_paragraphs=5):
|
||||
"""Create executive summary using AI (Claude/GPT via API or local model)."""
|
||||
|
||||
full_text = " ".join([s["text"] for s in segments])
|
||||
|
||||
# Use Chain of Density approach (from prompt-engineer frameworks)
|
||||
summary_prompt = f"""
|
||||
Summarize the following transcription in {max_paragraphs} concise paragraphs.
|
||||
Focus on key topics, decisions, and action items.
|
||||
|
||||
Transcription:
|
||||
{full_text}
|
||||
"""
|
||||
|
||||
# Call AI model (placeholder - user can integrate Claude API or use local model)
|
||||
summary = call_ai_model(summary_prompt)
|
||||
|
||||
return summary
|
||||
```
|
||||
|
||||
**Output file naming:**
|
||||
|
||||
```bash
|
||||
# v1.1.0: Use timestamp para evitar sobrescrever
|
||||
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
|
||||
TRANSCRIPT_FILE="transcript-${TIMESTAMP}.md"
|
||||
ATA_FILE="ata-${TIMESTAMP}.md"
|
||||
|
||||
echo "$TRANSCRIPT_CONTENT" > "$TRANSCRIPT_FILE"
|
||||
echo "✅ Transcript salvo: $TRANSCRIPT_FILE"
|
||||
|
||||
if [[ -n "$ATA_CONTENT" ]]; then
|
||||
echo "$ATA_CONTENT" > "$ATA_FILE"
|
||||
echo "✅ Ata salva: $ATA_FILE"
|
||||
fi
|
||||
```
|
||||
|
||||
|
||||
#### **SCENARIO A: User Provided Custom Prompt**
|
||||
|
||||
**Workflow:**
|
||||
|
||||
1. **Display user's prompt:**
|
||||
```
|
||||
📝 Prompt fornecido pelo usuário:
|
||||
┌──────────────────────────────────┐
|
||||
│ [User's prompt preview] │
|
||||
└──────────────────────────────────┘
|
||||
```
|
||||
|
||||
2. **Automatically improve with prompt-engineer (if available):**
|
||||
```bash
|
||||
🔧 Melhorando prompt com prompt-engineer...
|
||||
[Invokes: gh copilot -p "melhore este prompt: {user_prompt}"]
|
||||
```
|
||||
|
||||
3. **Show both versions:**
|
||||
```
|
||||
✨ Versão melhorada:
|
||||
┌──────────────────────────────────┐
|
||||
│ Role: Você é um documentador... │
|
||||
│ Instructions: Transforme... │
|
||||
│ Steps: 1) ... 2) ... │
|
||||
│ End Goal: ... │
|
||||
└──────────────────────────────────┘
|
||||
|
||||
📝 Versão original:
|
||||
┌──────────────────────────────────┐
|
||||
│ [User's original prompt] │
|
||||
└──────────────────────────────────┘
|
||||
```
|
||||
|
||||
4. **Ask which to use:**
|
||||
```bash
|
||||
💡 Usar versão melhorada? [s/n] (default: s):
|
||||
```
|
||||
|
||||
5. **Process with selected prompt:**
|
||||
- If "s": use improved
|
||||
- If "n": use original
|
||||
|
||||
|
||||
#### **LLM Processing (Both Scenarios)**
|
||||
|
||||
Once prompt is finalized:
|
||||
|
||||
```python
|
||||
from rich.progress import Progress, SpinnerColumn, TextColumn
|
||||
|
||||
def process_with_llm(transcript, prompt, cli_tool='claude'):
|
||||
full_prompt = f"{prompt}\n\n---\n\nTranscrição:\n\n{transcript}"
|
||||
|
||||
with Progress(
|
||||
SpinnerColumn(),
|
||||
TextColumn("[progress.description]{task.description}"),
|
||||
transient=True
|
||||
) as progress:
|
||||
progress.add_task(
|
||||
description=f"🤖 Processando com {cli_tool}...",
|
||||
total=None
|
||||
)
|
||||
|
||||
if cli_tool == 'claude':
|
||||
result = subprocess.run(
|
||||
['claude', '-'],
|
||||
input=full_prompt,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=300 # 5 minutes
|
||||
)
|
||||
elif cli_tool == 'gh-copilot':
|
||||
result = subprocess.run(
|
||||
['gh', 'copilot', 'suggest', '-t', 'shell', full_prompt],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=300
|
||||
)
|
||||
|
||||
if result.returncode == 0:
|
||||
return result.stdout.strip()
|
||||
else:
|
||||
return None
|
||||
```
|
||||
|
||||
**Progress output:**
|
||||
```
|
||||
🤖 Processando com claude... ⠋
|
||||
[After completion:]
|
||||
✅ Ata gerada com sucesso!
|
||||
```
|
||||
|
||||
|
||||
#### **Final Output**
|
||||
|
||||
**Success (both files):**
|
||||
```bash
|
||||
💾 Salvando arquivos...
|
||||
|
||||
✅ Arquivos criados:
|
||||
- transcript-20260203-023045.md (transcript puro)
|
||||
- ata-20260203-023045.md (processado com LLM)
|
||||
|
||||
🧹 Removidos arquivos temporários: metadata.json, transcription.json
|
||||
|
||||
✅ Concluído! Tempo total: 3m 45s
|
||||
```
|
||||
|
||||
**Transcript only (user declined LLM):**
|
||||
```bash
|
||||
💾 Salvando arquivos...
|
||||
|
||||
✅ Arquivo criado:
|
||||
- transcript-20260203-023045.md
|
||||
|
||||
ℹ️ Ata não gerada (processamento LLM recusado pelo usuário)
|
||||
|
||||
🧹 Removidos arquivos temporários: metadata.json, transcription.json
|
||||
|
||||
✅ Concluído!
|
||||
```
|
||||
|
||||
|
||||
### Step 5: Display Results Summary
|
||||
|
||||
**Objective:** Show completion status and next steps.
|
||||
|
||||
**Output:**
|
||||
|
||||
```bash
|
||||
echo ""
|
||||
echo "✅ Transcription Complete!"
|
||||
echo ""
|
||||
echo "📊 Results:"
|
||||
echo " File: $OUTPUT_FILE"
|
||||
echo " Language: $LANGUAGE"
|
||||
echo " Duration: $DURATION_HMS"
|
||||
echo " Speakers: $NUM_SPEAKERS"
|
||||
echo " Words: $WORD_COUNT"
|
||||
echo " Processing time: ${ELAPSED_TIME}s"
|
||||
echo ""
|
||||
echo "📝 Generated:"
|
||||
echo " - $OUTPUT_FILE (Markdown report)"
|
||||
[if alternative formats:]
|
||||
echo " - ${OUTPUT_FILE%.*}.srt (Subtitles)"
|
||||
echo " - ${OUTPUT_FILE%.*}.json (Structured data)"
|
||||
echo ""
|
||||
echo "🎯 Next steps:"
|
||||
echo " 1. Review meeting minutes and action items"
|
||||
echo " 2. Share report with participants"
|
||||
echo " 3. Track action items to completion"
|
||||
```
|
||||
|
||||
|
||||
## Example Usage
|
||||
|
||||
### **Example 1: Basic Transcription**
|
||||
|
||||
**User Input:**
|
||||
```bash
|
||||
copilot> transcribe audio to markdown: meeting-2026-02-02.mp3
|
||||
```
|
||||
|
||||
**Skill Output:**
|
||||
|
||||
```bash
|
||||
✅ Faster-Whisper detected (optimized)
|
||||
✅ ffmpeg available (format conversion enabled)
|
||||
|
||||
📂 File: meeting-2026-02-02.mp3
|
||||
📊 Size: 12.3 MB
|
||||
⏱️ Duration: 00:45:32
|
||||
|
||||
🎙️ Processing...
|
||||
[████████████████████] 100%
|
||||
|
||||
✅ Language detected: Portuguese (pt-BR)
|
||||
👥 Speakers identified: 4
|
||||
📝 Generating Markdown output...
|
||||
|
||||
✅ Transcription Complete!
|
||||
|
||||
📊 Results:
|
||||
File: meeting-2026-02-02.md
|
||||
Language: pt-BR
|
||||
Duration: 00:45:32
|
||||
Speakers: 4
|
||||
Words: 6,842
|
||||
Processing time: 127s
|
||||
|
||||
📝 Generated:
|
||||
- meeting-2026-02-02.md (Markdown report)
|
||||
|
||||
🎯 Next steps:
|
||||
1. Review meeting minutes and action items
|
||||
2. Share report with participants
|
||||
3. Track action items to completion
|
||||
```
|
||||
|
||||
|
||||
### **Example 3: Batch Processing**
|
||||
|
||||
**User Input:**
|
||||
```bash
|
||||
copilot> transcreva estes áudios: recordings/*.mp3
|
||||
```
|
||||
|
||||
**Skill Output:**
|
||||
|
||||
```bash
|
||||
📦 Batch mode: 5 files found
|
||||
1. team-standup.mp3
|
||||
2. client-call.mp3
|
||||
3. brainstorm-session.mp3
|
||||
4. product-demo.mp3
|
||||
5. retrospective.mp3
|
||||
|
||||
🎙️ Processing batch...
|
||||
|
||||
[1/5] team-standup.mp3 ✅ (2m 34s)
|
||||
[2/5] client-call.mp3 ✅ (15m 12s)
|
||||
[3/5] brainstorm-session.mp3 ✅ (8m 47s)
|
||||
[4/5] product-demo.mp3 ✅ (22m 03s)
|
||||
[5/5] retrospective.mp3 ✅ (11m 28s)
|
||||
|
||||
✅ Batch Complete!
|
||||
📝 Generated 5 Markdown reports
|
||||
⏱️ Total processing time: 6m 15s
|
||||
```
|
||||
|
||||
|
||||
### **Example 5: Large File Warning**
|
||||
|
||||
**User Input:**
|
||||
```bash
|
||||
copilot> transcribe audio to markdown: conference-keynote.mp3
|
||||
```
|
||||
|
||||
**Skill Output:**
|
||||
|
||||
```bash
|
||||
✅ Faster-Whisper detected (optimized)
|
||||
|
||||
📂 File: conference-keynote.mp3
|
||||
📊 Size: 87.2 MB
|
||||
⏱️ Duration: 02:15:47
|
||||
⚠️ Large file (87.2 MB) - processing may take several minutes
|
||||
|
||||
Continue? [Y/n]:
|
||||
```
|
||||
|
||||
**User:** `Y`
|
||||
|
||||
```bash
|
||||
🎙️ Processing... (this may take 10-15 minutes)
|
||||
[████░░░░░░░░░░░░░░░░] 20% - Estimated time remaining: 12m
|
||||
```
|
||||
|
||||
|
||||
This skill is **platform-agnostic** and works in any terminal context where GitHub Copilot CLI is available. It does not depend on specific project configurations or external APIs, following the zero-configuration philosophy.
|
||||
Reference in New Issue
Block a user