feat: add 4 universal skills from cli-ai-skills
- Add audio-transcriber skill (v1.2.0): Transform audio to Markdown with Whisper - Add youtube-summarizer skill (v1.2.0): Generate summaries from YouTube videos - Update prompt-engineer skill: Enhanced with 11 optimization frameworks - Update skill-creator skill: Improved automation workflow All skills are zero-config, cross-platform (Claude Code, Copilot CLI, Codex) and follow Quality Bar V4 standards. Source: https://github.com/ericgandrade/cli-ai-skills
This commit is contained in:
137
skills/audio-transcriber/CHANGELOG.md
Normal file
137
skills/audio-transcriber/CHANGELOG.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# Changelog - audio-transcriber
|
||||
|
||||
All notable changes to the audio-transcriber skill will be documented in this file.
|
||||
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
---
|
||||
|
||||
## [1.1.0] - 2026-02-03
|
||||
|
||||
### ✨ Added
|
||||
|
||||
- **Intelligent Prompt Workflow** (Step 3b) - Complete integration with prompt-engineer skill
|
||||
- **Scenario A**: User-provided prompts are automatically improved with prompt-engineer
|
||||
- Displays both original and improved versions side-by-side
|
||||
- Single confirmation: "Usar versão melhorada? [s/n]"
|
||||
- **Scenario B**: Auto-generation when no prompt provided
|
||||
- Analyzes transcript and suggests document type (ata, resumo, notas)
|
||||
- Shows suggestion and asks confirmation
|
||||
- Generates complete structured prompt (RISEN/RODES/STAR)
|
||||
- Shows preview and asks final confirmation
|
||||
- Falls back to DEFAULT_MEETING_PROMPT if declined
|
||||
|
||||
- **LLM Integration** - Process transcripts with Claude CLI or GitHub Copilot CLI
|
||||
- Priority: Claude > GitHub Copilot > None (transcript-only mode)
|
||||
- Step 0b: CLI detection logic documented
|
||||
- Timeout handling (5 minutes default)
|
||||
- Graceful fallback if CLI unavailable
|
||||
|
||||
- **Progress Indicators** - Visual feedback during long operations
|
||||
- `tqdm` progress bar for Whisper transcription segments
|
||||
- `rich` spinner for LLM processing
|
||||
- Clear status messages at each step
|
||||
|
||||
- **Timestamp-based File Naming** - Avoid overwriting previous transcriptions
|
||||
- Format: `transcript-YYYYMMDD-HHMMSS.md`
|
||||
- Format: `ata-YYYYMMDD-HHMMSS.md`
|
||||
- Prevents data loss from repeated runs
|
||||
|
||||
- **Automatic Cleanup** - Remove temporary files after processing
|
||||
- Deletes `metadata.json` and `transcription.json` automatically
|
||||
- `--keep-temp` flag to preserve if needed
|
||||
- Clean output directory
|
||||
|
||||
- **Rich Terminal UI** - Beautiful output with `rich` library
|
||||
- Formatted panels for prompt previews
|
||||
- Color-coded status messages (green=success, yellow=warning, red=error)
|
||||
- Spinner animations for long-running tasks
|
||||
|
||||
- **Dual Output Support** - Generate both transcript and processed ata
|
||||
- `transcript-*.md` - Raw transcription with timestamps
|
||||
- `ata-*.md` - Intelligent summary/meeting minutes (if LLM available)
|
||||
- User can decline LLM processing to get transcript-only
|
||||
|
||||
### 🔧 Changed
|
||||
|
||||
- **SKILL.md** - Major documentation updates
|
||||
- Added Step 0b (CLI Detection)
|
||||
- Updated Step 2 (Progress Indicators)
|
||||
- Added Step 3b (Intelligent Prompt Workflow with 150+ lines)
|
||||
- Updated version to 1.1.0
|
||||
- Added detailed workflow diagrams for both scenarios
|
||||
|
||||
- **install-requirements.sh** - Added UI libraries
|
||||
- Now installs `tqdm` and `rich` packages
|
||||
- Graceful fallback if installation fails
|
||||
- Updated success messages
|
||||
|
||||
- **Python Implementation** - Complete refactor
|
||||
- Created `scripts/transcribe.py` (516 lines)
|
||||
- Functions: `detect_cli_tool()`, `invoke_prompt_engineer()`, `handle_prompt_workflow()`, `process_with_llm()`, `transcribe_audio()`, `save_outputs()`, `cleanup_temp_files()`
|
||||
- Command-line arguments: `--prompt`, `--model`, `--output-dir`, `--keep-temp`
|
||||
- Auto-installs `rich` and `tqdm` if missing
|
||||
|
||||
### 🐛 Fixed
|
||||
|
||||
- **User prompts no longer ignored** - v1.0.0 completely ignored custom prompts
|
||||
- Now processes all prompts (custom or auto-generated) with LLM
|
||||
- Improves simple prompts into structured frameworks
|
||||
|
||||
- **Temporary files cleanup** - v1.0.0 left `metadata.json` and `transcription.json` as trash
|
||||
- Now automatically removed after processing
|
||||
- Clean output directory
|
||||
|
||||
- **File overwriting** - v1.0.0 used same filename (e.g., `meeting.md`) every time
|
||||
- Now uses timestamp to prevent data loss
|
||||
- Each run creates unique files
|
||||
|
||||
- **Missing ata/summary** - v1.0.0 only generated raw transcript
|
||||
- Now generates intelligent ata/resumo using LLM
|
||||
- Respects user's prompt instructions
|
||||
|
||||
- **No progress feedback** - v1.0.0 had silent processing (users didn't know if it froze)
|
||||
- Now shows progress bar for transcription
|
||||
- Shows spinner for LLM processing
|
||||
- Clear status messages throughout
|
||||
|
||||
### 📝 Notes
|
||||
|
||||
- **Backward Compatibility:** Fully compatible with v1.0.0 workflows
|
||||
- **Requires:** Python 3.8+, faster-whisper OR whisper, tqdm, rich
|
||||
- **Optional:** Claude CLI or GitHub Copilot CLI for intelligent processing
|
||||
- **Optional:** prompt-engineer skill for automatic prompt generation
|
||||
|
||||
### 🔗 Related Issues
|
||||
|
||||
- Fixes #1: Prompt do usuário RISEN ignorado
|
||||
- Fixes #2: Arquivos temporários (metadata.json, transcription.json) deixados como lixo
|
||||
- Fixes #3: Output incompleto (apenas transcript RAW, sem ata)
|
||||
- Fixes #4: Falta de indicador de progresso visual
|
||||
- Fixes #5: Formato de saída sem timestamp
|
||||
|
||||
---
|
||||
|
||||
## [1.0.0] - 2026-02-02
|
||||
|
||||
### ✨ Initial Release
|
||||
|
||||
- Audio transcription using Faster-Whisper or OpenAI Whisper
|
||||
- Automatic language detection
|
||||
- Speaker diarization (basic)
|
||||
- Voice Activity Detection (VAD)
|
||||
- Markdown output with metadata table
|
||||
- Installation script for dependencies
|
||||
- Example scripts for basic transcription
|
||||
- Support for multiple audio formats (MP3, WAV, M4A, OGG, FLAC, WEBM)
|
||||
- FFmpeg integration for format conversion
|
||||
- Zero-configuration philosophy
|
||||
|
||||
### 📝 Known Limitations (Fixed in v1.1.0)
|
||||
|
||||
- User prompts ignored (no LLM integration)
|
||||
- Only raw transcript generated (no ata/summary)
|
||||
- Temporary files not cleaned up
|
||||
- No progress indicators
|
||||
- Files overwritten on repeated runs
|
||||
340
skills/audio-transcriber/README.md
Normal file
340
skills/audio-transcriber/README.md
Normal file
@@ -0,0 +1,340 @@
|
||||
# Audio Transcriber Skill v1.1.0
|
||||
|
||||
Transform audio recordings into professional Markdown documentation with **intelligent atas/summaries using LLM integration** (Claude/Copilot CLI) and automatic prompt engineering.
|
||||
|
||||
## 🆕 What's New in v1.1.0
|
||||
|
||||
- **🧠 LLM Integration** - Claude CLI (primary) or GitHub Copilot CLI (fallback) for intelligent processing
|
||||
- **✨ Smart Prompts** - Automatic integration with prompt-engineer skill
|
||||
- User-provided prompts → automatically improved → user chooses version
|
||||
- No prompt → analyzes transcript → suggests format → generates structured prompt
|
||||
- **📊 Progress Indicators** - Visual progress bars (tqdm) and spinners (rich)
|
||||
- **📁 Timestamp Filenames** - `transcript-YYYYMMDD-HHMMSS.md` + `ata-YYYYMMDD-HHMMSS.md`
|
||||
- **🧹 Auto-Cleanup** - Removes temporary `metadata.json` and `transcription.json`
|
||||
- **🎨 Rich Terminal UI** - Beautiful formatted output with panels and colors
|
||||
|
||||
See **[CHANGELOG.md](./CHANGELOG.md)** for complete v1.1.0 details.
|
||||
|
||||
## 🎯 Core Features
|
||||
|
||||
- **📝 Rich Markdown Output** - Structured reports with metadata tables, timestamps, and formatting
|
||||
- **🎙️ Speaker Diarization** - Automatically identifies and labels different speakers
|
||||
- **📊 Technical Metadata** - Extracts file size, duration, language, processing time
|
||||
- **📋 Intelligent Atas/Summaries** - Generated via LLM (Claude/Copilot) with customizable prompts
|
||||
- **💡 Executive Summaries** - AI-generated structured summaries with topics, decisions, action items
|
||||
- **🌍 Multi-language** - Supports 99 languages with auto-detection
|
||||
- **⚡ Zero Configuration** - Auto-discovers Faster-Whisper/Whisper installation
|
||||
- **🔒 Privacy-First** - 100% local Whisper processing, no cloud uploads
|
||||
- **🚀 Flexible Modes** - Transcript-only or intelligent processing with LLM
|
||||
|
||||
## 📦 Installation
|
||||
|
||||
### Quick Install (NPX)
|
||||
|
||||
```bash
|
||||
npx cli-ai-skills@latest install audio-transcriber
|
||||
```
|
||||
|
||||
This automatically:
|
||||
- Downloads the skill
|
||||
- Installs Python dependencies (faster-whisper, tqdm, rich)
|
||||
- Installs ffmpeg (macOS via Homebrew)
|
||||
- Sets up the skill globally
|
||||
|
||||
### Manual Installation
|
||||
|
||||
#### 1. Install Transcription Engine
|
||||
|
||||
**Recommended (fastest):**
|
||||
```bash
|
||||
pip install faster-whisper tqdm rich
|
||||
```
|
||||
|
||||
**Alternative (original Whisper):**
|
||||
```bash
|
||||
pip install openai-whisper tqdm rich
|
||||
```
|
||||
|
||||
#### 2. Install Audio Tools (Optional)
|
||||
|
||||
For format conversion support:
|
||||
```bash
|
||||
# macOS
|
||||
brew install ffmpeg
|
||||
|
||||
# Linux
|
||||
apt install ffmpeg
|
||||
```
|
||||
|
||||
#### 3. Install LLM CLI (Optional - for intelligent summaries)
|
||||
|
||||
**Claude CLI (recommended):**
|
||||
```bash
|
||||
# Follow: https://docs.anthropic.com/en/docs/claude-cli
|
||||
```
|
||||
|
||||
**GitHub Copilot CLI (alternative):**
|
||||
```bash
|
||||
gh extension install github/gh-copilot
|
||||
```
|
||||
|
||||
#### 4. Install Skill
|
||||
|
||||
**Global installation (auto-updates with git pull):**
|
||||
```bash
|
||||
cd /path/to/cli-ai-skills
|
||||
./scripts/install-skills.sh $(pwd)
|
||||
```
|
||||
|
||||
**Repository only:**
|
||||
```bash
|
||||
# Skill is already available if you cloned the repo
|
||||
```
|
||||
|
||||
## 🚀 Usage
|
||||
|
||||
### Basic Transcription
|
||||
|
||||
```bash
|
||||
copilot> transcribe audio to markdown: meeting.mp3
|
||||
```
|
||||
|
||||
**Output:**
|
||||
- `meeting.md` - Full Markdown report with metadata, transcription, minutes, summary
|
||||
|
||||
### With Subtitles
|
||||
|
||||
```bash
|
||||
copilot> convert audio file to text with subtitles: interview.wav
|
||||
```
|
||||
|
||||
**Generates:**
|
||||
- `interview.md` - Markdown report
|
||||
- `interview.srt` - Subtitle file
|
||||
|
||||
### Batch Processing
|
||||
|
||||
```bash
|
||||
copilot> transcreva estes áudios: recordings/*.mp3
|
||||
```
|
||||
|
||||
**Processes all MP3 files in the directory.**
|
||||
|
||||
### Trigger Phrases
|
||||
|
||||
Activate the skill with any of these phrases:
|
||||
|
||||
- "transcribe audio to markdown"
|
||||
- "transcreva este áudio"
|
||||
- "convert audio file to text"
|
||||
- "extract speech from audio"
|
||||
- "áudio para texto com metadados"
|
||||
|
||||
## 📋 Use Cases
|
||||
|
||||
### 1. Team Meetings
|
||||
Record standups, planning sessions, or retrospectives and automatically generate:
|
||||
- Participant list
|
||||
- Discussion topics with timestamps
|
||||
- Decisions made
|
||||
- Action items assigned
|
||||
|
||||
### 2. Client Calls
|
||||
Transcribe client conversations with:
|
||||
- Speaker identification
|
||||
- Key agreements documented
|
||||
- Follow-up tasks extracted
|
||||
|
||||
### 3. Interviews
|
||||
Convert interviews to text with:
|
||||
- Question/answer attribution
|
||||
- Subtitle generation for video
|
||||
- Searchable transcript
|
||||
|
||||
### 4. Lectures & Training
|
||||
Document educational content with:
|
||||
- Timestamped notes
|
||||
- Topic breakdown
|
||||
- Key concepts summary
|
||||
|
||||
### 5. Content Creation
|
||||
Analyze podcasts, videos, YouTube content:
|
||||
- Full transcription
|
||||
- Chapter markers (timestamps)
|
||||
- Summary for show notes
|
||||
|
||||
## 📊 Output Example
|
||||
|
||||
```markdown
|
||||
# Audio Transcription Report
|
||||
|
||||
## 📊 Metadata
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **File Name** | team-standup.mp3 |
|
||||
| **File Size** | 3.2 MB |
|
||||
| **Duration** | 00:12:47 |
|
||||
| **Language** | English (en) |
|
||||
| **Processed Date** | 2026-02-02 14:35:21 |
|
||||
| **Speakers Identified** | 5 |
|
||||
| **Transcription Engine** | Faster-Whisper (model: base) |
|
||||
|
||||
---
|
||||
|
||||
## 🎙️ Full Transcription
|
||||
|
||||
**[00:00:12 → 00:00:45]** *Speaker 1*
|
||||
Good morning everyone. Let's start with updates from the frontend team.
|
||||
|
||||
**[00:00:46 → 00:01:23]** *Speaker 2*
|
||||
We completed the dashboard redesign and deployed to staging yesterday.
|
||||
|
||||
---
|
||||
|
||||
## 📋 Meeting Minutes
|
||||
|
||||
### Participants
|
||||
- Speaker 1 (Meeting Lead)
|
||||
- Speaker 2 (Frontend Developer)
|
||||
- Speaker 3 (Backend Developer)
|
||||
- Speaker 4 (Designer)
|
||||
- Speaker 5 (Product Manager)
|
||||
|
||||
### Topics Discussed
|
||||
1. **Dashboard Redesign** (00:00:46)
|
||||
- Completed and deployed to staging
|
||||
- Positive feedback from QA team
|
||||
|
||||
2. **API Performance Issues** (00:03:12)
|
||||
- Database query optimization needed
|
||||
- Target response time < 200ms
|
||||
|
||||
### Decisions Made
|
||||
- ✅ Approved dashboard for production deployment
|
||||
- ✅ Allocated 2 sprint points for API optimization
|
||||
|
||||
### Action Items
|
||||
- [ ] **Deploy dashboard to production** - Assigned to: Speaker 2 - Due: 2026-02-05
|
||||
- [ ] **Optimize database queries** - Assigned to: Speaker 3
|
||||
- [ ] **Schedule user testing session** - Assigned to: Speaker 5
|
||||
|
||||
---
|
||||
|
||||
## 📝 Executive Summary
|
||||
|
||||
The team standup covered progress on the dashboard redesign, which has been successfully completed and is ready for production deployment. The frontend team received positive feedback from QA and the design aligns with user requirements.
|
||||
|
||||
Backend performance concerns were raised regarding API response times. The team decided to prioritize query optimization in the current sprint, with a target of sub-200ms response times.
|
||||
|
||||
Next steps include production deployment of the dashboard by end of week and scheduling user testing sessions to validate the new design with real users.
|
||||
|
||||
### Key Points
|
||||
- 🔹 Dashboard redesign complete and staging-approved
|
||||
- 🔹 API performance optimization prioritized
|
||||
- 🔹 User testing scheduled for next week
|
||||
|
||||
### Next Steps
|
||||
1. Production deployment (Speaker 2)
|
||||
2. Database optimization (Speaker 3)
|
||||
3. User testing coordination (Speaker 5)
|
||||
```
|
||||
|
||||
## ⚙️ Configuration
|
||||
|
||||
No configuration needed! The skill automatically:
|
||||
- Detects Faster-Whisper or Whisper installation
|
||||
- Chooses the fastest available engine
|
||||
- Selects appropriate model based on file size
|
||||
- Auto-detects language
|
||||
|
||||
## 🔧 Troubleshooting
|
||||
|
||||
### "No transcription tool found"
|
||||
**Solution:** Install Whisper:
|
||||
```bash
|
||||
pip install faster-whisper
|
||||
```
|
||||
|
||||
### "Unsupported format"
|
||||
**Solution:** Install ffmpeg:
|
||||
```bash
|
||||
brew install ffmpeg # macOS
|
||||
apt install ffmpeg # Linux
|
||||
```
|
||||
|
||||
### Slow processing
|
||||
**Solution:** Use a smaller Whisper model:
|
||||
```bash
|
||||
# Edit the skill to use "tiny" or "base" model instead of "medium"
|
||||
```
|
||||
|
||||
### Poor speaker identification
|
||||
**Solution:**
|
||||
- Ensure clear audio with minimal background noise
|
||||
- Use a better microphone for recordings
|
||||
- Try the "medium" or "large" Whisper model
|
||||
|
||||
## 🛠️ Advanced Usage
|
||||
|
||||
### Custom Model Selection
|
||||
|
||||
Edit `SKILL.md` Step 2 to change model:
|
||||
```python
|
||||
model = WhisperModel("small", device="cpu") # Change "base" to "small", "medium", etc.
|
||||
```
|
||||
|
||||
### Output Language Control
|
||||
|
||||
Force output in specific language:
|
||||
```bash
|
||||
# Edit Step 3 to set language explicitly
|
||||
```
|
||||
|
||||
### Batch Settings
|
||||
|
||||
Process specific file types only:
|
||||
```bash
|
||||
copilot> transcribe audio: recordings/*.wav # Only WAV files
|
||||
```
|
||||
|
||||
## 📚 FAQ
|
||||
|
||||
**Q: Does this work offline?**
|
||||
A: Yes! 100% local processing, no internet required after initial model download.
|
||||
|
||||
**Q: What's the difference between Whisper and Faster-Whisper?**
|
||||
A: Faster-Whisper is 4-5x faster with same quality. Always prefer it if available.
|
||||
|
||||
**Q: Can I transcribe YouTube videos?**
|
||||
A: Not directly. Use a YouTube downloader first, then transcribe the audio file. Or use the `youtube-summarizer` skill instead.
|
||||
|
||||
**Q: How accurate is speaker identification?**
|
||||
A: Accuracy depends on audio quality. Clear recordings with distinct voices work best. Currently uses simple estimation; future versions will use advanced diarization.
|
||||
|
||||
**Q: What languages are supported?**
|
||||
A: 99 languages including English, Portuguese, Spanish, French, German, Chinese, Japanese, Arabic, and more.
|
||||
|
||||
**Q: Can I edit the meeting minutes format?**
|
||||
A: Yes! Edit the Markdown template in SKILL.md Step 3.
|
||||
|
||||
## 🔗 Related Skills
|
||||
|
||||
- **youtube-summarizer** - Extract and summarize YouTube video transcripts
|
||||
- **prompt-engineer** - Optimize prompts for better AI summaries
|
||||
|
||||
## 📄 License
|
||||
|
||||
This skill is part of the cli-ai-skills repository.
|
||||
MIT License - See repository LICENSE file.
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
Found a bug or have a feature request?
|
||||
Open an issue in the [cli-ai-skills repository](https://github.com/yourusername/cli-ai-skills).
|
||||
|
||||
---
|
||||
|
||||
**Version:** 1.0.0
|
||||
**Author:** Eric Andrade
|
||||
**Created:** 2026-02-02
|
||||
558
skills/audio-transcriber/SKILL.md
Normal file
558
skills/audio-transcriber/SKILL.md
Normal file
@@ -0,0 +1,558 @@
|
||||
---
|
||||
name: audio-transcriber
|
||||
description: "Transform audio recordings into professional Markdown documentation with intelligent summaries using LLM integration"
|
||||
version: 1.2.0
|
||||
author: Eric Andrade
|
||||
created: 2025-02-01
|
||||
updated: 2026-02-04
|
||||
platforms: [github-copilot-cli, claude-code, codex]
|
||||
category: content
|
||||
tags: [audio, transcription, whisper, meeting-minutes, speech-to-text]
|
||||
risk: safe
|
||||
---
|
||||
|
||||
## Purpose
|
||||
|
||||
This skill automates audio-to-text transcription with professional Markdown output, extracting rich technical metadata (speakers, timestamps, language, file size, duration) and generating structured meeting minutes and executive summaries. It uses Faster-Whisper or Whisper with zero configuration, working universally across projects without hardcoded paths or API keys.
|
||||
|
||||
Inspired by tools like Plaud, this skill transforms raw audio recordings into actionable documentation, making it ideal for meetings, interviews, lectures, and content analysis.
|
||||
|
||||
## When to Use
|
||||
|
||||
Invoke this skill when:
|
||||
|
||||
- User needs to transcribe audio/video files to text
|
||||
- User wants meeting minutes automatically generated from recordings
|
||||
- User requires speaker identification (diarization) in conversations
|
||||
- User needs subtitles/captions (SRT, VTT formats)
|
||||
- User wants executive summaries of long audio content
|
||||
- User asks variations of "transcribe this audio", "convert audio to text", "generate meeting notes from recording"
|
||||
- User has audio files in common formats (MP3, WAV, M4A, OGG, FLAC, WEBM)
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 0: Discovery (Auto-detect Transcription Tools)
|
||||
|
||||
**Objective:** Identify available transcription engines without user configuration.
|
||||
|
||||
**Actions:**
|
||||
|
||||
Run detection commands to find installed tools:
|
||||
|
||||
```bash
|
||||
# Check for Faster-Whisper (preferred - 4-5x faster)
|
||||
if python3 -c "import faster_whisper" 2>/dev/null; then
|
||||
TRANSCRIBER="faster-whisper"
|
||||
echo "✅ Faster-Whisper detected (optimized)"
|
||||
# Fallback to original Whisper
|
||||
elif python3 -c "import whisper" 2>/dev/null; then
|
||||
TRANSCRIBER="whisper"
|
||||
echo "✅ OpenAI Whisper detected"
|
||||
else
|
||||
TRANSCRIBER="none"
|
||||
echo "⚠️ No transcription tool found"
|
||||
fi
|
||||
|
||||
# Check for ffmpeg (audio format conversion)
|
||||
if command -v ffmpeg &>/dev/null; then
|
||||
echo "✅ ffmpeg available (format conversion enabled)"
|
||||
else
|
||||
echo "ℹ️ ffmpeg not found (limited format support)"
|
||||
fi
|
||||
```
|
||||
|
||||
**If no transcriber found:**
|
||||
|
||||
Offer automatic installation using the provided script:
|
||||
|
||||
```bash
|
||||
echo "⚠️ No transcription tool found"
|
||||
echo ""
|
||||
echo "🔧 Auto-install dependencies? (Recommended)"
|
||||
read -p "Run installation script? [Y/n]: " AUTO_INSTALL
|
||||
|
||||
if [[ ! "$AUTO_INSTALL" =~ ^[Nn] ]]; then
|
||||
# Get skill directory (works for both repo and symlinked installations)
|
||||
SKILL_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
|
||||
# Run installation script
|
||||
if [[ -f "$SKILL_DIR/scripts/install-requirements.sh" ]]; then
|
||||
bash "$SKILL_DIR/scripts/install-requirements.sh"
|
||||
else
|
||||
echo "❌ Installation script not found"
|
||||
echo ""
|
||||
echo "📦 Manual installation:"
|
||||
echo " pip install faster-whisper # Recommended"
|
||||
echo " pip install openai-whisper # Alternative"
|
||||
echo " brew install ffmpeg # Optional (macOS)"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Verify installation succeeded
|
||||
if python3 -c "import faster_whisper" 2>/dev/null || python3 -c "import whisper" 2>/dev/null; then
|
||||
echo "✅ Installation successful! Proceeding with transcription..."
|
||||
else
|
||||
echo "❌ Installation failed. Please install manually."
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
echo ""
|
||||
echo "📦 Manual installation required:"
|
||||
echo ""
|
||||
echo "Recommended (fastest):"
|
||||
echo " pip install faster-whisper"
|
||||
echo ""
|
||||
echo "Alternative (original):"
|
||||
echo " pip install openai-whisper"
|
||||
echo ""
|
||||
echo "Optional (format conversion):"
|
||||
echo " brew install ffmpeg # macOS"
|
||||
echo " apt install ffmpeg # Linux"
|
||||
echo ""
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
This ensures users can install dependencies with one confirmation, or opt for manual installation if preferred.
|
||||
|
||||
**If transcriber found:**
|
||||
|
||||
Proceed to Step 0b (CLI Detection).
|
||||
|
||||
|
||||
### Step 1: Validate Audio File
|
||||
|
||||
**Objective:** Verify file exists, check format, and extract metadata.
|
||||
|
||||
**Actions:**
|
||||
|
||||
1. **Accept file path or URL** from user:
|
||||
- Local file: `meeting.mp3`
|
||||
- URL: `https://example.com/audio.mp3` (download to temp directory)
|
||||
|
||||
2. **Verify file exists:**
|
||||
|
||||
```bash
|
||||
if [[ ! -f "$AUDIO_FILE" ]]; then
|
||||
echo "❌ File not found: $AUDIO_FILE"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
3. **Extract metadata** using ffprobe or file utilities:
|
||||
|
||||
```bash
|
||||
# Get file size
|
||||
FILE_SIZE=$(du -h "$AUDIO_FILE" | cut -f1)
|
||||
|
||||
# Get duration and format using ffprobe
|
||||
DURATION=$(ffprobe -v error -show_entries format=duration \
|
||||
-of default=noprint_wrappers=1:nokey=1 "$AUDIO_FILE" 2>/dev/null)
|
||||
FORMAT=$(ffprobe -v error -select_streams a:0 -show_entries \
|
||||
stream=codec_name -of default=noprint_wrappers=1:nokey=1 "$AUDIO_FILE" 2>/dev/null)
|
||||
|
||||
# Convert duration to HH:MM:SS
|
||||
DURATION_HMS=$(date -u -r "$DURATION" +%H:%M:%S 2>/dev/null || echo "Unknown")
|
||||
```
|
||||
|
||||
4. **Check file size** (warn if large for cloud APIs):
|
||||
|
||||
```bash
|
||||
SIZE_MB=$(du -m "$AUDIO_FILE" | cut -f1)
|
||||
if [[ $SIZE_MB -gt 25 ]]; then
|
||||
echo "⚠️ Large file ($FILE_SIZE) - processing may take several minutes"
|
||||
fi
|
||||
```
|
||||
|
||||
5. **Validate format** (supported: MP3, WAV, M4A, OGG, FLAC, WEBM):
|
||||
|
||||
```bash
|
||||
EXTENSION="${AUDIO_FILE##*.}"
|
||||
SUPPORTED_FORMATS=("mp3" "wav" "m4a" "ogg" "flac" "webm" "mp4")
|
||||
|
||||
if [[ ! " ${SUPPORTED_FORMATS[@]} " =~ " ${EXTENSION,,} " ]]; then
|
||||
echo "⚠️ Unsupported format: $EXTENSION"
|
||||
if command -v ffmpeg &>/dev/null; then
|
||||
echo "🔄 Converting to WAV..."
|
||||
ffmpeg -i "$AUDIO_FILE" -ar 16000 "${AUDIO_FILE%.*}.wav" -y
|
||||
AUDIO_FILE="${AUDIO_FILE%.*}.wav"
|
||||
else
|
||||
echo "❌ Install ffmpeg to convert formats: brew install ffmpeg"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
|
||||
### Step 3: Generate Markdown Output
|
||||
|
||||
**Objective:** Create structured Markdown with metadata, transcription, meeting minutes, and summary.
|
||||
|
||||
**Output Template:**
|
||||
|
||||
```markdown
|
||||
# Audio Transcription Report
|
||||
|
||||
## 📊 Metadata
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **File Name** | {filename} |
|
||||
| **File Size** | {file_size} |
|
||||
| **Duration** | {duration_hms} |
|
||||
| **Language** | {language} ({language_code}) |
|
||||
| **Processed Date** | {process_date} |
|
||||
| **Speakers Identified** | {num_speakers} |
|
||||
| **Transcription Engine** | {engine} (model: {model}) |
|
||||
|
||||
|
||||
## 📋 Meeting Minutes
|
||||
|
||||
### Participants
|
||||
- {speaker_1}
|
||||
- {speaker_2}
|
||||
- ...
|
||||
|
||||
### Topics Discussed
|
||||
1. **{topic_1}** ({timestamp})
|
||||
- {key_point_1}
|
||||
- {key_point_2}
|
||||
|
||||
2. **{topic_2}** ({timestamp})
|
||||
- {key_point_1}
|
||||
|
||||
### Decisions Made
|
||||
- ✅ {decision_1}
|
||||
- ✅ {decision_2}
|
||||
|
||||
### Action Items
|
||||
- [ ] **{action_1}** - Assigned to: {speaker} - Due: {date_if_mentioned}
|
||||
- [ ] **{action_2}** - Assigned to: {speaker}
|
||||
|
||||
|
||||
*Generated by audio-transcriber skill v1.0.0*
|
||||
*Transcription engine: {engine} | Processing time: {elapsed_time}s*
|
||||
```
|
||||
|
||||
**Implementation:**
|
||||
|
||||
Use Python or bash with AI model (Claude/GPT) for intelligent summarization:
|
||||
|
||||
```python
|
||||
def generate_meeting_minutes(segments):
|
||||
"""Extract topics, decisions, action items from transcription."""
|
||||
|
||||
# Group segments by topic (simple clustering by timestamps)
|
||||
topics = cluster_by_topic(segments)
|
||||
|
||||
# Identify action items (keywords: "should", "will", "need to", "action")
|
||||
action_items = extract_action_items(segments)
|
||||
|
||||
# Identify decisions (keywords: "decided", "agreed", "approved")
|
||||
decisions = extract_decisions(segments)
|
||||
|
||||
return {
|
||||
"topics": topics,
|
||||
"decisions": decisions,
|
||||
"action_items": action_items
|
||||
}
|
||||
|
||||
def generate_summary(segments, max_paragraphs=5):
|
||||
"""Create executive summary using AI (Claude/GPT via API or local model)."""
|
||||
|
||||
full_text = " ".join([s["text"] for s in segments])
|
||||
|
||||
# Use Chain of Density approach (from prompt-engineer frameworks)
|
||||
summary_prompt = f"""
|
||||
Summarize the following transcription in {max_paragraphs} concise paragraphs.
|
||||
Focus on key topics, decisions, and action items.
|
||||
|
||||
Transcription:
|
||||
{full_text}
|
||||
"""
|
||||
|
||||
# Call AI model (placeholder - user can integrate Claude API or use local model)
|
||||
summary = call_ai_model(summary_prompt)
|
||||
|
||||
return summary
|
||||
```
|
||||
|
||||
**Output file naming:**
|
||||
|
||||
```bash
|
||||
# v1.1.0: Use timestamp para evitar sobrescrever
|
||||
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
|
||||
TRANSCRIPT_FILE="transcript-${TIMESTAMP}.md"
|
||||
ATA_FILE="ata-${TIMESTAMP}.md"
|
||||
|
||||
echo "$TRANSCRIPT_CONTENT" > "$TRANSCRIPT_FILE"
|
||||
echo "✅ Transcript salvo: $TRANSCRIPT_FILE"
|
||||
|
||||
if [[ -n "$ATA_CONTENT" ]]; then
|
||||
echo "$ATA_CONTENT" > "$ATA_FILE"
|
||||
echo "✅ Ata salva: $ATA_FILE"
|
||||
fi
|
||||
```
|
||||
|
||||
|
||||
#### **SCENARIO A: User Provided Custom Prompt**
|
||||
|
||||
**Workflow:**
|
||||
|
||||
1. **Display user's prompt:**
|
||||
```
|
||||
📝 Prompt fornecido pelo usuário:
|
||||
┌──────────────────────────────────┐
|
||||
│ [User's prompt preview] │
|
||||
└──────────────────────────────────┘
|
||||
```
|
||||
|
||||
2. **Automatically improve with prompt-engineer (if available):**
|
||||
```bash
|
||||
🔧 Melhorando prompt com prompt-engineer...
|
||||
[Invokes: gh copilot -p "melhore este prompt: {user_prompt}"]
|
||||
```
|
||||
|
||||
3. **Show both versions:**
|
||||
```
|
||||
✨ Versão melhorada:
|
||||
┌──────────────────────────────────┐
|
||||
│ Role: Você é um documentador... │
|
||||
│ Instructions: Transforme... │
|
||||
│ Steps: 1) ... 2) ... │
|
||||
│ End Goal: ... │
|
||||
└──────────────────────────────────┘
|
||||
|
||||
📝 Versão original:
|
||||
┌──────────────────────────────────┐
|
||||
│ [User's original prompt] │
|
||||
└──────────────────────────────────┘
|
||||
```
|
||||
|
||||
4. **Ask which to use:**
|
||||
```bash
|
||||
💡 Usar versão melhorada? [s/n] (default: s):
|
||||
```
|
||||
|
||||
5. **Process with selected prompt:**
|
||||
- If "s": use improved
|
||||
- If "n": use original
|
||||
|
||||
|
||||
#### **LLM Processing (Both Scenarios)**
|
||||
|
||||
Once prompt is finalized:
|
||||
|
||||
```python
|
||||
from rich.progress import Progress, SpinnerColumn, TextColumn
|
||||
|
||||
def process_with_llm(transcript, prompt, cli_tool='claude'):
|
||||
full_prompt = f"{prompt}\n\n---\n\nTranscrição:\n\n{transcript}"
|
||||
|
||||
with Progress(
|
||||
SpinnerColumn(),
|
||||
TextColumn("[progress.description]{task.description}"),
|
||||
transient=True
|
||||
) as progress:
|
||||
progress.add_task(
|
||||
description=f"🤖 Processando com {cli_tool}...",
|
||||
total=None
|
||||
)
|
||||
|
||||
if cli_tool == 'claude':
|
||||
result = subprocess.run(
|
||||
['claude', '-'],
|
||||
input=full_prompt,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=300 # 5 minutes
|
||||
)
|
||||
elif cli_tool == 'gh-copilot':
|
||||
result = subprocess.run(
|
||||
['gh', 'copilot', 'suggest', '-t', 'shell', full_prompt],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=300
|
||||
)
|
||||
|
||||
if result.returncode == 0:
|
||||
return result.stdout.strip()
|
||||
else:
|
||||
return None
|
||||
```
|
||||
|
||||
**Progress output:**
|
||||
```
|
||||
🤖 Processando com claude... ⠋
|
||||
[After completion:]
|
||||
✅ Ata gerada com sucesso!
|
||||
```
|
||||
|
||||
|
||||
#### **Final Output**
|
||||
|
||||
**Success (both files):**
|
||||
```bash
|
||||
💾 Salvando arquivos...
|
||||
|
||||
✅ Arquivos criados:
|
||||
- transcript-20260203-023045.md (transcript puro)
|
||||
- ata-20260203-023045.md (processado com LLM)
|
||||
|
||||
🧹 Removidos arquivos temporários: metadata.json, transcription.json
|
||||
|
||||
✅ Concluído! Tempo total: 3m 45s
|
||||
```
|
||||
|
||||
**Transcript only (user declined LLM):**
|
||||
```bash
|
||||
💾 Salvando arquivos...
|
||||
|
||||
✅ Arquivo criado:
|
||||
- transcript-20260203-023045.md
|
||||
|
||||
ℹ️ Ata não gerada (processamento LLM recusado pelo usuário)
|
||||
|
||||
🧹 Removidos arquivos temporários: metadata.json, transcription.json
|
||||
|
||||
✅ Concluído!
|
||||
```
|
||||
|
||||
|
||||
### Step 5: Display Results Summary
|
||||
|
||||
**Objective:** Show completion status and next steps.
|
||||
|
||||
**Output:**
|
||||
|
||||
```bash
|
||||
echo ""
|
||||
echo "✅ Transcription Complete!"
|
||||
echo ""
|
||||
echo "📊 Results:"
|
||||
echo " File: $OUTPUT_FILE"
|
||||
echo " Language: $LANGUAGE"
|
||||
echo " Duration: $DURATION_HMS"
|
||||
echo " Speakers: $NUM_SPEAKERS"
|
||||
echo " Words: $WORD_COUNT"
|
||||
echo " Processing time: ${ELAPSED_TIME}s"
|
||||
echo ""
|
||||
echo "📝 Generated:"
|
||||
echo " - $OUTPUT_FILE (Markdown report)"
|
||||
[if alternative formats:]
|
||||
echo " - ${OUTPUT_FILE%.*}.srt (Subtitles)"
|
||||
echo " - ${OUTPUT_FILE%.*}.json (Structured data)"
|
||||
echo ""
|
||||
echo "🎯 Next steps:"
|
||||
echo " 1. Review meeting minutes and action items"
|
||||
echo " 2. Share report with participants"
|
||||
echo " 3. Track action items to completion"
|
||||
```
|
||||
|
||||
|
||||
## Example Usage
|
||||
|
||||
### **Example 1: Basic Transcription**
|
||||
|
||||
**User Input:**
|
||||
```bash
|
||||
copilot> transcribe audio to markdown: meeting-2026-02-02.mp3
|
||||
```
|
||||
|
||||
**Skill Output:**
|
||||
|
||||
```bash
|
||||
✅ Faster-Whisper detected (optimized)
|
||||
✅ ffmpeg available (format conversion enabled)
|
||||
|
||||
📂 File: meeting-2026-02-02.mp3
|
||||
📊 Size: 12.3 MB
|
||||
⏱️ Duration: 00:45:32
|
||||
|
||||
🎙️ Processing...
|
||||
[████████████████████] 100%
|
||||
|
||||
✅ Language detected: Portuguese (pt-BR)
|
||||
👥 Speakers identified: 4
|
||||
📝 Generating Markdown output...
|
||||
|
||||
✅ Transcription Complete!
|
||||
|
||||
📊 Results:
|
||||
File: meeting-2026-02-02.md
|
||||
Language: pt-BR
|
||||
Duration: 00:45:32
|
||||
Speakers: 4
|
||||
Words: 6,842
|
||||
Processing time: 127s
|
||||
|
||||
📝 Generated:
|
||||
- meeting-2026-02-02.md (Markdown report)
|
||||
|
||||
🎯 Next steps:
|
||||
1. Review meeting minutes and action items
|
||||
2. Share report with participants
|
||||
3. Track action items to completion
|
||||
```
|
||||
|
||||
|
||||
### **Example 3: Batch Processing**
|
||||
|
||||
**User Input:**
|
||||
```bash
|
||||
copilot> transcreva estes áudios: recordings/*.mp3
|
||||
```
|
||||
|
||||
**Skill Output:**
|
||||
|
||||
```bash
|
||||
📦 Batch mode: 5 files found
|
||||
1. team-standup.mp3
|
||||
2. client-call.mp3
|
||||
3. brainstorm-session.mp3
|
||||
4. product-demo.mp3
|
||||
5. retrospective.mp3
|
||||
|
||||
🎙️ Processing batch...
|
||||
|
||||
[1/5] team-standup.mp3 ✅ (2m 34s)
|
||||
[2/5] client-call.mp3 ✅ (15m 12s)
|
||||
[3/5] brainstorm-session.mp3 ✅ (8m 47s)
|
||||
[4/5] product-demo.mp3 ✅ (22m 03s)
|
||||
[5/5] retrospective.mp3 ✅ (11m 28s)
|
||||
|
||||
✅ Batch Complete!
|
||||
📝 Generated 5 Markdown reports
|
||||
⏱️ Total processing time: 6m 15s
|
||||
```
|
||||
|
||||
|
||||
### **Example 5: Large File Warning**
|
||||
|
||||
**User Input:**
|
||||
```bash
|
||||
copilot> transcribe audio to markdown: conference-keynote.mp3
|
||||
```
|
||||
|
||||
**Skill Output:**
|
||||
|
||||
```bash
|
||||
✅ Faster-Whisper detected (optimized)
|
||||
|
||||
📂 File: conference-keynote.mp3
|
||||
📊 Size: 87.2 MB
|
||||
⏱️ Duration: 02:15:47
|
||||
⚠️ Large file (87.2 MB) - processing may take several minutes
|
||||
|
||||
Continue? [Y/n]:
|
||||
```
|
||||
|
||||
**User:** `Y`
|
||||
|
||||
```bash
|
||||
🎙️ Processing... (this may take 10-15 minutes)
|
||||
[████░░░░░░░░░░░░░░░░] 20% - Estimated time remaining: 12m
|
||||
```
|
||||
|
||||
|
||||
This skill is **platform-agnostic** and works in any terminal context where GitHub Copilot CLI is available. It does not depend on specific project configurations or external APIs, following the zero-configuration philosophy.
|
||||
250
skills/audio-transcriber/examples/basic-transcription.sh
Executable file
250
skills/audio-transcriber/examples/basic-transcription.sh
Executable file
@@ -0,0 +1,250 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
# Basic Audio Transcription Example
|
||||
# Demonstrates how to use the audio-transcriber skill manually
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# Configuration
|
||||
AUDIO_FILE="${1:-}"
|
||||
MODEL="${MODEL:-base}" # Options: tiny, base, small, medium, large
|
||||
OUTPUT_FORMAT="${OUTPUT_FORMAT:-markdown}" # Options: markdown, txt, srt, vtt, json
|
||||
|
||||
# Colors for output
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
# Helper functions
|
||||
error() {
|
||||
echo -e "${RED}❌ Error: $1${NC}" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
success() {
|
||||
echo -e "${GREEN}✅ $1${NC}"
|
||||
}
|
||||
|
||||
info() {
|
||||
echo -e "${BLUE}ℹ️ $1${NC}"
|
||||
}
|
||||
|
||||
warn() {
|
||||
echo -e "${YELLOW}⚠️ $1${NC}"
|
||||
}
|
||||
|
||||
# Check if audio file is provided
|
||||
if [[ -z "$AUDIO_FILE" ]]; then
|
||||
error "Usage: $0 <audio_file>"
|
||||
fi
|
||||
|
||||
# Verify file exists
|
||||
if [[ ! -f "$AUDIO_FILE" ]]; then
|
||||
error "File not found: $AUDIO_FILE"
|
||||
fi
|
||||
|
||||
# Step 0: Discovery - Check for transcription tools
|
||||
info "Step 0: Discovering transcription tools..."
|
||||
|
||||
TRANSCRIBER=""
|
||||
if python3 -c "import faster_whisper" 2>/dev/null; then
|
||||
TRANSCRIBER="faster-whisper"
|
||||
success "Faster-Whisper detected (optimized)"
|
||||
elif python3 -c "import whisper" 2>/dev/null; then
|
||||
TRANSCRIBER="whisper"
|
||||
success "OpenAI Whisper detected"
|
||||
else
|
||||
error "No transcription tool found. Install with: pip install faster-whisper"
|
||||
fi
|
||||
|
||||
# Check for ffmpeg
|
||||
if command -v ffmpeg &>/dev/null; then
|
||||
success "ffmpeg available (format conversion enabled)"
|
||||
else
|
||||
warn "ffmpeg not found (limited format support)"
|
||||
fi
|
||||
|
||||
# Step 1: Extract metadata
|
||||
info "Step 1: Extracting audio metadata..."
|
||||
|
||||
FILE_SIZE=$(du -h "$AUDIO_FILE" | cut -f1)
|
||||
info "File size: $FILE_SIZE"
|
||||
|
||||
# Get duration if ffprobe is available
|
||||
if command -v ffprobe &>/dev/null; then
|
||||
DURATION=$(ffprobe -v error -show_entries format=duration \
|
||||
-of default=noprint_wrappers=1:nokey=1 "$AUDIO_FILE" 2>/dev/null || echo "0")
|
||||
|
||||
# Convert to HH:MM:SS
|
||||
if command -v date &>/dev/null; then
|
||||
if [[ "$OSTYPE" == "darwin"* ]]; then
|
||||
# macOS
|
||||
DURATION_HMS=$(date -u -r "${DURATION%.*}" +%H:%M:%S 2>/dev/null || echo "Unknown")
|
||||
else
|
||||
# Linux
|
||||
DURATION_HMS=$(date -u -d @"${DURATION%.*}" +%H:%M:%S 2>/dev/null || echo "Unknown")
|
||||
fi
|
||||
else
|
||||
DURATION_HMS="Unknown"
|
||||
fi
|
||||
|
||||
info "Duration: $DURATION_HMS"
|
||||
else
|
||||
warn "ffprobe not found - cannot extract duration"
|
||||
DURATION="0"
|
||||
DURATION_HMS="Unknown"
|
||||
fi
|
||||
|
||||
# Check file size warning
|
||||
SIZE_MB=$(du -m "$AUDIO_FILE" | cut -f1)
|
||||
if [[ $SIZE_MB -gt 25 ]]; then
|
||||
warn "Large file ($FILE_SIZE) - processing may take several minutes"
|
||||
read -p "Continue? [Y/n]: " CONTINUE
|
||||
if [[ "$CONTINUE" =~ ^[Nn] ]]; then
|
||||
info "Transcription cancelled"
|
||||
exit 0
|
||||
fi
|
||||
fi
|
||||
|
||||
# Step 2: Transcribe using Python
|
||||
info "Step 2: Transcribing audio..."
|
||||
|
||||
OUTPUT_FILE="${AUDIO_FILE%.*}.md"
|
||||
TEMP_JSON="/tmp/transcription_$$.json"
|
||||
|
||||
python3 << EOF
|
||||
import sys
|
||||
import json
|
||||
from datetime import datetime
|
||||
|
||||
try:
|
||||
if "$TRANSCRIBER" == "faster-whisper":
|
||||
from faster_whisper import WhisperModel
|
||||
model = WhisperModel("$MODEL", device="cpu", compute_type="int8")
|
||||
segments, info = model.transcribe("$AUDIO_FILE", language=None, vad_filter=True)
|
||||
|
||||
data = {
|
||||
"language": info.language,
|
||||
"language_probability": round(info.language_probability, 2),
|
||||
"duration": info.duration,
|
||||
"segments": []
|
||||
}
|
||||
|
||||
for segment in segments:
|
||||
data["segments"].append({
|
||||
"start": round(segment.start, 2),
|
||||
"end": round(segment.end, 2),
|
||||
"text": segment.text.strip()
|
||||
})
|
||||
else:
|
||||
import whisper
|
||||
model = whisper.load_model("$MODEL")
|
||||
result = model.transcribe("$AUDIO_FILE")
|
||||
|
||||
data = {
|
||||
"language": result["language"],
|
||||
"duration": result["segments"][-1]["end"] if result["segments"] else 0,
|
||||
"segments": result["segments"]
|
||||
}
|
||||
|
||||
with open("$TEMP_JSON", "w") as f:
|
||||
json.dump(data, f)
|
||||
|
||||
print(f"✅ Language detected: {data['language']}")
|
||||
print(f"📝 Transcribed {len(data['segments'])} segments")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
EOF
|
||||
|
||||
# Check if transcription succeeded
|
||||
if [[ ! -f "$TEMP_JSON" ]]; then
|
||||
error "Transcription failed"
|
||||
fi
|
||||
|
||||
# Step 3: Generate Markdown output
|
||||
info "Step 3: Generating Markdown report..."
|
||||
|
||||
python3 << 'EOF'
|
||||
import json
|
||||
import sys
|
||||
from datetime import datetime
|
||||
|
||||
# Load transcription data
|
||||
with open("${TEMP_JSON}") as f:
|
||||
data = json.load(f)
|
||||
|
||||
# Prepare metadata
|
||||
filename = "${AUDIO_FILE}".split("/")[-1]
|
||||
file_size = "${FILE_SIZE}"
|
||||
duration_hms = "${DURATION_HMS}"
|
||||
language = data["language"]
|
||||
process_date = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
|
||||
num_segments = len(data["segments"])
|
||||
|
||||
# Generate Markdown
|
||||
markdown = f"""# Audio Transcription Report
|
||||
|
||||
## 📊 Metadata
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **File Name** | {filename} |
|
||||
| **File Size** | {file_size} |
|
||||
| **Duration** | {duration_hms} |
|
||||
| **Language** | {language.upper()} |
|
||||
| **Processed Date** | {process_date} |
|
||||
| **Segments** | {num_segments} |
|
||||
| **Transcription Engine** | ${TRANSCRIBER} (model: ${MODEL}) |
|
||||
|
||||
---
|
||||
|
||||
## 🎙️ Full Transcription
|
||||
|
||||
"""
|
||||
|
||||
# Add transcription with timestamps
|
||||
for seg in data["segments"]:
|
||||
start_time = f"{int(seg['start'] // 60):02d}:{int(seg['start'] % 60):02d}"
|
||||
end_time = f"{int(seg['end'] // 60):02d}:{int(seg['end'] % 60):02d}"
|
||||
markdown += f"**[{start_time} → {end_time}]** \n{seg['text']}\n\n"
|
||||
|
||||
markdown += """---
|
||||
|
||||
## 📝 Summary
|
||||
|
||||
*Automatic summary generation requires AI integration (Claude/GPT).*
|
||||
*For now, review the full transcription above.*
|
||||
|
||||
---
|
||||
|
||||
*Generated by audio-transcriber skill example script*
|
||||
*Transcription engine: ${TRANSCRIBER} | Model: ${MODEL}*
|
||||
"""
|
||||
|
||||
# Write to file
|
||||
with open("${OUTPUT_FILE}", "w") as f:
|
||||
f.write(markdown)
|
||||
|
||||
print(f"✅ Markdown report saved: ${OUTPUT_FILE}")
|
||||
EOF
|
||||
|
||||
# Clean up
|
||||
rm -f "$TEMP_JSON"
|
||||
|
||||
# Step 4: Display summary
|
||||
success "Transcription complete!"
|
||||
echo ""
|
||||
echo "📊 Results:"
|
||||
echo " Output file: $OUTPUT_FILE"
|
||||
echo " Transcription engine: $TRANSCRIBER"
|
||||
echo " Model: $MODEL"
|
||||
echo ""
|
||||
info "Next steps:"
|
||||
echo " 1. Review the transcription: cat $OUTPUT_FILE"
|
||||
echo " 2. Edit if needed: vim $OUTPUT_FILE"
|
||||
echo " 3. Share with team or archive"
|
||||
EOF
|
||||
352
skills/audio-transcriber/references/tools-comparison.md
Normal file
352
skills/audio-transcriber/references/tools-comparison.md
Normal file
@@ -0,0 +1,352 @@
|
||||
# Transcription Tools Comparison
|
||||
|
||||
Comprehensive comparison of audio transcription engines supported by the audio-transcriber skill.
|
||||
|
||||
## Overview
|
||||
|
||||
| Tool | Type | Speed | Quality | Cost | Privacy | Offline | Languages |
|
||||
|------|------|-------|---------|------|---------|---------|-----------|
|
||||
| **Faster-Whisper** | Open-source | ⚡⚡⚡⚡⚡ | ⭐⭐⭐⭐⭐ | Free | 100% | ✅ | 99 |
|
||||
| **Whisper** | Open-source | ⚡⚡⚡ | ⭐⭐⭐⭐⭐ | Free | 100% | ✅ | 99 |
|
||||
| Google Speech-to-Text | Commercial API | ⚡⚡⚡⚡ | ⭐⭐⭐⭐⭐ | $0.006/15s | Partial | ❌ | 125+ |
|
||||
| Azure Speech | Commercial API | ⚡⚡⚡⚡ | ⭐⭐⭐⭐ | $1/hour | Partial | ❌ | 100+ |
|
||||
| AssemblyAI | Commercial API | ⚡⚡⚡⚡ | ⭐⭐⭐⭐⭐ | $0.00025/s | Partial | ❌ | 99 |
|
||||
|
||||
---
|
||||
|
||||
## Faster-Whisper (Recommended)
|
||||
|
||||
### Pros
|
||||
✅ **4-5x faster** than original Whisper
|
||||
✅ **Same quality** as original Whisper
|
||||
✅ **Lower memory usage** (50-60% less RAM)
|
||||
✅ **Free and open-source**
|
||||
✅ **100% offline** (privacy guaranteed)
|
||||
✅ **Easy installation** (`pip install faster-whisper`)
|
||||
✅ **Drop-in replacement** for Whisper
|
||||
|
||||
### Cons
|
||||
❌ Requires Python 3.8+
|
||||
❌ Initial model download (~100MB-1.5GB)
|
||||
❌ GPU optional but speeds up significantly
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install faster-whisper
|
||||
```
|
||||
|
||||
### Usage Example
|
||||
|
||||
```python
|
||||
from faster_whisper import WhisperModel
|
||||
|
||||
# Load model (auto-downloads on first run)
|
||||
model = WhisperModel("base", device="cpu", compute_type="int8")
|
||||
|
||||
# Transcribe
|
||||
segments, info = model.transcribe("audio.mp3", language="pt")
|
||||
|
||||
# Print results
|
||||
for segment in segments:
|
||||
print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
|
||||
```
|
||||
|
||||
### Model Sizes
|
||||
|
||||
| Model | Size | RAM | Speed (CPU) | Quality |
|
||||
|-------|------|-----|-------------|---------|
|
||||
| `tiny` | 39 MB | ~1 GB | Very fast (~10x realtime) | Basic |
|
||||
| `base` | 74 MB | ~1 GB | Fast (~7x realtime) | Good |
|
||||
| `small` | 244 MB | ~2 GB | Moderate (~4x realtime) | Very good |
|
||||
| `medium` | 769 MB | ~5 GB | Slow (~2x realtime) | Excellent |
|
||||
| `large` | 1550 MB | ~10 GB | Very slow (~1x realtime) | Best |
|
||||
|
||||
**Recommendation:** `small` or `medium` for production use.
|
||||
|
||||
---
|
||||
|
||||
## Whisper (Original)
|
||||
|
||||
### Pros
|
||||
✅ **Official OpenAI model**
|
||||
✅ **Excellent quality**
|
||||
✅ **Free and open-source**
|
||||
✅ **100% offline**
|
||||
✅ **Well-documented**
|
||||
✅ **Large community**
|
||||
|
||||
### Cons
|
||||
❌ **Slower** than Faster-Whisper (4-5x)
|
||||
❌ **Higher memory usage**
|
||||
❌ Requires PyTorch (large dependency)
|
||||
❌ GPU highly recommended for larger models
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install openai-whisper
|
||||
```
|
||||
|
||||
### Usage Example
|
||||
|
||||
```python
|
||||
import whisper
|
||||
|
||||
# Load model
|
||||
model = whisper.load_model("base")
|
||||
|
||||
# Transcribe
|
||||
result = model.transcribe("audio.mp3", language="pt")
|
||||
|
||||
# Print results
|
||||
print(result["text"])
|
||||
```
|
||||
|
||||
### When to Use Whisper vs. Faster-Whisper
|
||||
|
||||
**Use Faster-Whisper if:**
|
||||
- Speed is important
|
||||
- Limited RAM available
|
||||
- Processing many files
|
||||
|
||||
**Use Original Whisper if:**
|
||||
- Faster-Whisper installation issues
|
||||
- Need exact OpenAI implementation
|
||||
- Already have Whisper in project dependencies
|
||||
|
||||
---
|
||||
|
||||
## Google Cloud Speech-to-Text
|
||||
|
||||
### Pros
|
||||
✅ **Very accurate** (industry-leading)
|
||||
✅ **Fast processing** (cloud infrastructure)
|
||||
✅ **125+ languages**
|
||||
✅ **Word-level timestamps**
|
||||
✅ **Punctuation & capitalization**
|
||||
✅ **Speaker diarization** (premium)
|
||||
|
||||
### Cons
|
||||
❌ **Requires internet** (cloud-only)
|
||||
❌ **Costs money** (after free tier)
|
||||
❌ **Privacy concerns** (audio uploaded to Google)
|
||||
❌ Requires GCP account setup
|
||||
❌ Complex authentication
|
||||
|
||||
### Pricing
|
||||
|
||||
- **Free tier:** 60 minutes/month
|
||||
- **Standard:** $0.006 per 15 seconds ($1.44/hour)
|
||||
- **Premium:** $0.009 per 15 seconds (with diarization)
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install google-cloud-speech
|
||||
```
|
||||
|
||||
### Setup
|
||||
|
||||
1. Create GCP project
|
||||
2. Enable Speech-to-Text API
|
||||
3. Create service account & download JSON key
|
||||
4. Set environment variable:
|
||||
```bash
|
||||
export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json"
|
||||
```
|
||||
|
||||
### Usage Example
|
||||
|
||||
```python
|
||||
from google.cloud import speech
|
||||
|
||||
client = speech.SpeechClient()
|
||||
|
||||
with open("audio.wav", "rb") as audio_file:
|
||||
content = audio_file.read()
|
||||
|
||||
audio = speech.RecognitionAudio(content=content)
|
||||
config = speech.RecognitionConfig(
|
||||
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
|
||||
sample_rate_hertz=16000,
|
||||
language_code="pt-BR",
|
||||
)
|
||||
|
||||
response = client.recognize(config=config, audio=audio)
|
||||
|
||||
for result in response.results:
|
||||
print(result.alternatives[0].transcript)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Azure Speech Services
|
||||
|
||||
### Pros
|
||||
✅ **High accuracy**
|
||||
✅ **100+ languages**
|
||||
✅ **Real-time transcription**
|
||||
✅ **Custom models** (train on your data)
|
||||
✅ **Good Microsoft ecosystem integration**
|
||||
|
||||
### Cons
|
||||
❌ **Requires internet**
|
||||
❌ **Costs money** (after free tier)
|
||||
❌ **Privacy concerns** (cloud processing)
|
||||
❌ Requires Azure account
|
||||
❌ Complex setup
|
||||
|
||||
### Pricing
|
||||
|
||||
- **Free tier:** 5 hours/month
|
||||
- **Standard:** $1.00 per audio hour
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install azure-cognitiveservices-speech
|
||||
```
|
||||
|
||||
### Setup
|
||||
|
||||
1. Create Azure account
|
||||
2. Create Speech resource
|
||||
3. Get API key and region
|
||||
4. Set environment variables:
|
||||
```bash
|
||||
export AZURE_SPEECH_KEY="your-key"
|
||||
export AZURE_SPEECH_REGION="your-region"
|
||||
```
|
||||
|
||||
### Usage Example
|
||||
|
||||
```python
|
||||
import azure.cognitiveservices.speech as speechsdk
|
||||
|
||||
speech_config = speechsdk.SpeechConfig(
|
||||
subscription=os.environ.get('AZURE_SPEECH_KEY'),
|
||||
region=os.environ.get('AZURE_SPEECH_REGION')
|
||||
)
|
||||
|
||||
audio_config = speechsdk.audio.AudioConfig(filename="audio.wav")
|
||||
speech_recognizer = speechsdk.SpeechRecognizer(
|
||||
speech_config=speech_config,
|
||||
audio_config=audio_config
|
||||
)
|
||||
|
||||
result = speech_recognizer.recognize_once()
|
||||
print(result.text)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## AssemblyAI
|
||||
|
||||
### Pros
|
||||
✅ **Modern, developer-friendly API**
|
||||
✅ **Excellent accuracy**
|
||||
✅ **Advanced features** (sentiment, topic detection, PII redaction)
|
||||
✅ **Speaker diarization** (included)
|
||||
✅ **Fast processing**
|
||||
✅ **Good documentation**
|
||||
|
||||
### Cons
|
||||
❌ **Requires internet**
|
||||
❌ **Costs money** (no free tier, only trial credits)
|
||||
❌ **Privacy concerns** (cloud processing)
|
||||
❌ Requires API key
|
||||
|
||||
### Pricing
|
||||
|
||||
- **Free trial:** $50 credits
|
||||
- **Standard:** $0.00025 per second (~$0.90/hour)
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install assemblyai
|
||||
```
|
||||
|
||||
### Setup
|
||||
|
||||
1. Sign up at assemblyai.com
|
||||
2. Get API key
|
||||
3. Set environment variable:
|
||||
```bash
|
||||
export ASSEMBLYAI_API_KEY="your-key"
|
||||
```
|
||||
|
||||
### Usage Example
|
||||
|
||||
```python
|
||||
import assemblyai as aai
|
||||
|
||||
aai.settings.api_key = os.environ["ASSEMBLYAI_API_KEY"]
|
||||
|
||||
transcriber = aai.Transcriber()
|
||||
transcript = transcriber.transcribe("audio.mp3")
|
||||
|
||||
print(transcript.text)
|
||||
|
||||
# Speaker diarization
|
||||
for utterance in transcript.utterances:
|
||||
print(f"Speaker {utterance.speaker}: {utterance.text}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Recommendation Matrix
|
||||
|
||||
### Use Faster-Whisper if:
|
||||
- ✅ Privacy is critical (local processing)
|
||||
- ✅ Want zero cost (free forever)
|
||||
- ✅ Need offline capability
|
||||
- ✅ Processing many files (speed matters)
|
||||
- ✅ Limited budget
|
||||
|
||||
### Use Google Speech-to-Text if:
|
||||
- ✅ Need absolute best accuracy
|
||||
- ✅ Have budget for cloud services
|
||||
- ✅ Want advanced features (punctuation, diarization)
|
||||
- ✅ Already using GCP ecosystem
|
||||
|
||||
### Use Azure Speech if:
|
||||
- ✅ In Microsoft ecosystem
|
||||
- ✅ Need custom model training
|
||||
- ✅ Want real-time transcription
|
||||
- ✅ Have Azure credits
|
||||
|
||||
### Use AssemblyAI if:
|
||||
- ✅ Need advanced features (sentiment, topics)
|
||||
- ✅ Want easiest API experience
|
||||
- ✅ Need automatic PII redaction
|
||||
- ✅ Value developer experience
|
||||
|
||||
---
|
||||
|
||||
## Performance Benchmarks
|
||||
|
||||
**Test:** 1-hour podcast (MP3, 44.1kHz, stereo)
|
||||
|
||||
| Tool | Processing Time | Accuracy | Cost |
|
||||
|------|----------------|----------|------|
|
||||
| Faster-Whisper (small) | 8 min | 94% | $0 |
|
||||
| Whisper (small) | 32 min | 94% | $0 |
|
||||
| Google Speech | 2 min | 96% | $1.44 |
|
||||
| Azure Speech | 3 min | 95% | $1.00 |
|
||||
| AssemblyAI | 4 min | 96% | $0.90 |
|
||||
|
||||
*Benchmarks run on MacBook Pro M1, 16GB RAM*
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
**For the audio-transcriber skill:**
|
||||
|
||||
1. **Primary:** Faster-Whisper (best balance of speed, quality, privacy, cost)
|
||||
2. **Fallback:** Whisper (if Faster-Whisper unavailable)
|
||||
3. **Optional:** Cloud APIs (user choice for premium features)
|
||||
|
||||
This ensures the skill works out-of-the-box for most users while allowing advanced users to integrate commercial services if needed.
|
||||
190
skills/audio-transcriber/scripts/install-requirements.sh
Executable file
190
skills/audio-transcriber/scripts/install-requirements.sh
Executable file
@@ -0,0 +1,190 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
# Audio Transcriber - Requirements Installation Script
|
||||
# Automatically installs and validates dependencies
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# Colors
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
RED='\033[0;31m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m'
|
||||
|
||||
echo -e "${BLUE}🔧 Audio Transcriber - Dependency Installation${NC}"
|
||||
echo ""
|
||||
|
||||
# Check Python
|
||||
if ! command -v python3 &>/dev/null; then
|
||||
echo -e "${RED}❌ Python 3 not found. Please install Python 3.8+${NC}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
PYTHON_VERSION=$(python3 --version | cut -d' ' -f2 | cut -d'.' -f1,2)
|
||||
echo -e "${GREEN}✅ Python ${PYTHON_VERSION} detected${NC}"
|
||||
|
||||
# Check pip
|
||||
if ! python3 -m pip --version &>/dev/null; then
|
||||
echo -e "${RED}❌ pip not found. Please install pip${NC}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo -e "${GREEN}✅ pip available${NC}"
|
||||
echo ""
|
||||
|
||||
# Install system dependencies (macOS only)
|
||||
if [[ "$OSTYPE" == "darwin"* ]]; then
|
||||
echo -e "${BLUE}📦 Checking system dependencies (macOS)...${NC}"
|
||||
|
||||
# Check for Homebrew
|
||||
if command -v brew &>/dev/null; then
|
||||
# Install pkg-config and ffmpeg if not present
|
||||
NEED_INSTALL=""
|
||||
|
||||
if ! brew list pkg-config &>/dev/null 2>&1; then
|
||||
NEED_INSTALL="$NEED_INSTALL pkg-config"
|
||||
fi
|
||||
|
||||
if ! brew list ffmpeg &>/dev/null 2>&1; then
|
||||
NEED_INSTALL="$NEED_INSTALL ffmpeg"
|
||||
fi
|
||||
|
||||
if [[ -n "$NEED_INSTALL" ]]; then
|
||||
echo -e "${BLUE}Installing:$NEED_INSTALL${NC}"
|
||||
brew install $NEED_INSTALL --quiet
|
||||
echo -e "${GREEN}✅ System dependencies installed${NC}"
|
||||
else
|
||||
echo -e "${GREEN}✅ System dependencies already installed${NC}"
|
||||
fi
|
||||
else
|
||||
echo -e "${YELLOW}⚠️ Homebrew not found. Install manually if needed:${NC}"
|
||||
echo " /bin/bash -c \"\$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\""
|
||||
fi
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Install faster-whisper (recommended)
|
||||
echo -e "${BLUE}📦 Installing Faster-Whisper...${NC}"
|
||||
|
||||
# Try different installation methods based on Python environment
|
||||
if python3 -m pip install faster-whisper --quiet 2>/dev/null; then
|
||||
echo -e "${GREEN}✅ Faster-Whisper installed successfully${NC}"
|
||||
elif python3 -m pip install --user --break-system-packages faster-whisper --quiet 2>/dev/null; then
|
||||
echo -e "${GREEN}✅ Faster-Whisper installed successfully (user mode)${NC}"
|
||||
else
|
||||
echo -e "${YELLOW}⚠️ Faster-Whisper installation failed, trying Whisper...${NC}"
|
||||
|
||||
if python3 -m pip install openai-whisper --quiet 2>/dev/null; then
|
||||
echo -e "${GREEN}✅ Whisper installed successfully${NC}"
|
||||
elif python3 -m pip install --user --break-system-packages openai-whisper --quiet 2>/dev/null; then
|
||||
echo -e "${GREEN}✅ Whisper installed successfully (user mode)${NC}"
|
||||
else
|
||||
echo -e "${RED}❌ Failed to install transcription engine${NC}"
|
||||
echo ""
|
||||
echo -e "${YELLOW}Manual installation options:${NC}"
|
||||
echo " 1. Use --break-system-packages (macOS/Homebrew Python):"
|
||||
echo " python3 -m pip install --user --break-system-packages openai-whisper"
|
||||
echo ""
|
||||
echo " 2. Use virtual environment (recommended):"
|
||||
echo " python3 -m venv ~/whisper-env"
|
||||
echo " source ~/whisper-env/bin/activate"
|
||||
echo " pip install faster-whisper"
|
||||
echo ""
|
||||
echo " 3. Use pipx (isolated):"
|
||||
echo " brew install pipx"
|
||||
echo " pipx install openai-whisper"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# Install UI/progress libraries (tqdm, rich)
|
||||
echo ""
|
||||
echo -e "${BLUE}📦 Installing UI libraries (tqdm, rich)...${NC}"
|
||||
|
||||
if python3 -m pip install tqdm rich --quiet 2>/dev/null; then
|
||||
echo -e "${GREEN}✅ tqdm and rich installed successfully${NC}"
|
||||
elif python3 -m pip install --user --break-system-packages tqdm rich --quiet 2>/dev/null; then
|
||||
echo -e "${GREEN}✅ tqdm and rich installed successfully (user mode)${NC}"
|
||||
else
|
||||
echo -e "${YELLOW}⚠️ Optional UI libraries not installed (skill will still work)${NC}"
|
||||
fi
|
||||
|
||||
# Check ffmpeg (optional but recommended)
|
||||
echo ""
|
||||
if command -v ffmpeg &>/dev/null; then
|
||||
echo -e "${GREEN}✅ ffmpeg already installed${NC}"
|
||||
else
|
||||
echo -e "${YELLOW}⚠️ ffmpeg not found (should have been installed earlier)${NC}"
|
||||
if [[ "$OSTYPE" == "darwin"* ]] && command -v brew &>/dev/null; then
|
||||
echo -e "${BLUE}Installing ffmpeg via Homebrew...${NC}"
|
||||
brew install ffmpeg --quiet && echo -e "${GREEN}✅ ffmpeg installed${NC}"
|
||||
else
|
||||
echo -e "${BLUE}ℹ️ ffmpeg is optional but recommended for format conversion${NC}"
|
||||
echo ""
|
||||
echo "Install ffmpeg:"
|
||||
if [[ "$OSTYPE" == "darwin"* ]]; then
|
||||
echo " brew install ffmpeg"
|
||||
elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
|
||||
echo " sudo apt install ffmpeg # Debian/Ubuntu"
|
||||
echo " sudo yum install ffmpeg # CentOS/RHEL"
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
# Verify installation
|
||||
echo ""
|
||||
echo -e "${BLUE}🔍 Verifying installation...${NC}"
|
||||
|
||||
if python3 -c "import faster_whisper" 2>/dev/null; then
|
||||
echo -e "${GREEN}✅ Faster-Whisper verified${NC}"
|
||||
TRANSCRIBER="Faster-Whisper"
|
||||
elif python3 -c "import whisper" 2>/dev/null; then
|
||||
echo -e "${GREEN}✅ Whisper verified${NC}"
|
||||
TRANSCRIBER="Whisper"
|
||||
else
|
||||
echo -e "${RED}❌ No transcription engine found after installation${NC}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Download initial model (optional)
|
||||
read -p "Download Whisper 'base' model now? (recommended, ~74MB) [Y/n]: " DOWNLOAD_MODEL
|
||||
|
||||
if [[ ! "$DOWNLOAD_MODEL" =~ ^[Nn] ]]; then
|
||||
echo ""
|
||||
echo -e "${BLUE}📥 Downloading 'base' model...${NC}"
|
||||
|
||||
python3 << 'EOF'
|
||||
try:
|
||||
import faster_whisper
|
||||
model = faster_whisper.WhisperModel("base", device="cpu", compute_type="int8")
|
||||
print("✅ Model downloaded successfully")
|
||||
except:
|
||||
try:
|
||||
import whisper
|
||||
model = whisper.load_model("base")
|
||||
print("✅ Model downloaded successfully")
|
||||
except Exception as e:
|
||||
print(f"❌ Model download failed: {e}")
|
||||
EOF
|
||||
fi
|
||||
|
||||
# Success summary
|
||||
echo ""
|
||||
echo -e "${GREEN}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo -e "${GREEN}✅ Installation Complete!${NC}"
|
||||
echo -e "${GREEN}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo ""
|
||||
echo "📊 Installed components:"
|
||||
echo " • Transcription engine: $TRANSCRIBER"
|
||||
if command -v ffmpeg &>/dev/null; then
|
||||
echo " • Format conversion: ffmpeg (available)"
|
||||
else
|
||||
echo " • Format conversion: ffmpeg (not installed)"
|
||||
fi
|
||||
echo ""
|
||||
echo "🚀 Ready to use! Try:"
|
||||
echo " copilot> transcribe audio to markdown: myfile.mp3"
|
||||
echo " claude> transcreva este áudio: myfile.mp3"
|
||||
echo ""
|
||||
510
skills/audio-transcriber/scripts/transcribe.py
Executable file
510
skills/audio-transcriber/scripts/transcribe.py
Executable file
@@ -0,0 +1,510 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Audio Transcriber v1.1.0
|
||||
Transcreve áudio para texto e gera atas/resumos usando LLM.
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
import subprocess
|
||||
import shutil
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
# Rich for beautiful terminal output
|
||||
try:
|
||||
from rich.console import Console
|
||||
from rich.prompt import Prompt
|
||||
from rich.panel import Panel
|
||||
from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn
|
||||
from rich import print as rprint
|
||||
RICH_AVAILABLE = True
|
||||
except ImportError:
|
||||
RICH_AVAILABLE = False
|
||||
print("⚠️ Installing rich for better UI...")
|
||||
subprocess.run([sys.executable, "-m", "pip", "install", "--user", "rich"], check=False)
|
||||
from rich.console import Console
|
||||
from rich.prompt import Prompt
|
||||
from rich.panel import Panel
|
||||
from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn
|
||||
from rich import print as rprint
|
||||
|
||||
# tqdm for progress bars
|
||||
try:
|
||||
from tqdm import tqdm
|
||||
except ImportError:
|
||||
print("⚠️ Installing tqdm for progress bars...")
|
||||
subprocess.run([sys.executable, "-m", "pip", "install", "--user", "tqdm"], check=False)
|
||||
from tqdm import tqdm
|
||||
|
||||
# Whisper engines
|
||||
try:
|
||||
from faster_whisper import WhisperModel
|
||||
TRANSCRIBER = "faster-whisper"
|
||||
except ImportError:
|
||||
try:
|
||||
import whisper
|
||||
TRANSCRIBER = "whisper"
|
||||
except ImportError:
|
||||
print("❌ Nenhum engine de transcrição encontrado!")
|
||||
print(" Instale: pip install faster-whisper")
|
||||
sys.exit(1)
|
||||
|
||||
console = Console()
|
||||
|
||||
# Template padrão RISEN para fallback
|
||||
DEFAULT_MEETING_PROMPT = """
|
||||
Role: Você é um transcritor profissional especializado em documentação.
|
||||
|
||||
Instructions: Transforme a transcrição fornecida em um documento estruturado e profissional.
|
||||
|
||||
Steps:
|
||||
1. Identifique o tipo de conteúdo (reunião, palestra, entrevista, etc.)
|
||||
2. Extraia os principais tópicos e pontos-chave
|
||||
3. Identifique participantes/speakers (se aplicável)
|
||||
4. Extraia decisões tomadas e ações definidas (se reunião)
|
||||
5. Organize em formato apropriado com seções claras
|
||||
6. Use Markdown para formatação profissional
|
||||
|
||||
End Goal: Documento final bem estruturado, legível e pronto para distribuição.
|
||||
|
||||
Narrowing:
|
||||
- Mantenha objetividade e clareza
|
||||
- Preserve contexto importante
|
||||
- Use formatação Markdown adequada
|
||||
- Inclua timestamps relevantes quando aplicável
|
||||
"""
|
||||
|
||||
|
||||
def detect_cli_tool():
|
||||
"""Detecta qual CLI de LLM está disponível (claude > gh copilot)."""
|
||||
if shutil.which('claude'):
|
||||
return 'claude'
|
||||
elif shutil.which('gh'):
|
||||
result = subprocess.run(['gh', 'copilot', '--version'],
|
||||
capture_output=True, text=True)
|
||||
if result.returncode == 0:
|
||||
return 'gh-copilot'
|
||||
return None
|
||||
|
||||
|
||||
def invoke_prompt_engineer(raw_prompt, timeout=90):
|
||||
"""
|
||||
Invoca prompt-engineer skill via CLI para melhorar/gerar prompts.
|
||||
|
||||
Args:
|
||||
raw_prompt: Prompt a ser melhorado ou meta-prompt
|
||||
timeout: Timeout em segundos
|
||||
|
||||
Returns:
|
||||
str: Prompt melhorado ou DEFAULT_MEETING_PROMPT se falhar
|
||||
"""
|
||||
try:
|
||||
# Tentar via gh copilot
|
||||
console.print("[dim] Invocando prompt-engineer...[/dim]")
|
||||
|
||||
result = subprocess.run(
|
||||
['gh', 'copilot', 'suggest', '-t', 'shell', raw_prompt],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout
|
||||
)
|
||||
|
||||
if result.returncode == 0 and result.stdout.strip():
|
||||
return result.stdout.strip()
|
||||
else:
|
||||
console.print("[yellow]⚠️ prompt-engineer não respondeu, usando template padrão[/yellow]")
|
||||
return DEFAULT_MEETING_PROMPT
|
||||
|
||||
except subprocess.TimeoutExpired:
|
||||
console.print(f"[red]⚠️ Timeout após {timeout}s, usando template padrão[/red]")
|
||||
return DEFAULT_MEETING_PROMPT
|
||||
except Exception as e:
|
||||
console.print(f"[red]⚠️ Erro ao invocar prompt-engineer: {e}[/red]")
|
||||
return DEFAULT_MEETING_PROMPT
|
||||
|
||||
|
||||
def handle_prompt_workflow(user_prompt, transcript):
|
||||
"""
|
||||
Gerencia fluxo completo de prompts com prompt-engineer.
|
||||
|
||||
Cenário A: Usuário forneceu prompt → Melhorar AUTOMATICAMENTE → Confirmar
|
||||
Cenário B: Sem prompt → Sugerir tipo → Confirmar → Gerar → Confirmar
|
||||
|
||||
Returns:
|
||||
str: Prompt final a usar, ou None se usuário recusou processamento
|
||||
"""
|
||||
prompt_engineer_available = os.path.exists(
|
||||
os.path.expanduser('~/.copilot/skills/prompt-engineer/SKILL.md')
|
||||
)
|
||||
|
||||
# ========== CENÁRIO A: USUÁRIO FORNECEU PROMPT ==========
|
||||
if user_prompt:
|
||||
console.print("\n[cyan]📝 Prompt fornecido pelo usuário[/cyan]")
|
||||
console.print(Panel(user_prompt[:300] + ("..." if len(user_prompt) > 300 else ""),
|
||||
title="Prompt original", border_style="dim"))
|
||||
|
||||
if prompt_engineer_available:
|
||||
# Melhora AUTOMATICAMENTE (sem perguntar)
|
||||
console.print("\n[cyan]🔧 Melhorando prompt com prompt-engineer...[/cyan]")
|
||||
|
||||
improved_prompt = invoke_prompt_engineer(
|
||||
f"melhore este prompt:\n\n{user_prompt}"
|
||||
)
|
||||
|
||||
# Mostrar AMBAS versões
|
||||
console.print("\n[green]✨ Versão melhorada:[/green]")
|
||||
console.print(Panel(improved_prompt[:500] + ("..." if len(improved_prompt) > 500 else ""),
|
||||
title="Prompt otimizado", border_style="green"))
|
||||
|
||||
console.print("\n[dim]📝 Versão original:[/dim]")
|
||||
console.print(Panel(user_prompt[:300] + ("..." if len(user_prompt) > 300 else ""),
|
||||
title="Seu prompt", border_style="dim"))
|
||||
|
||||
# Pergunta qual usar
|
||||
confirm = Prompt.ask(
|
||||
"\n💡 Usar versão melhorada?",
|
||||
choices=["s", "n"],
|
||||
default="s"
|
||||
)
|
||||
|
||||
return improved_prompt if confirm == "s" else user_prompt
|
||||
else:
|
||||
# prompt-engineer não disponível
|
||||
console.print("[yellow]⚠️ prompt-engineer skill não disponível[/yellow]")
|
||||
console.print("[dim]✅ Usando seu prompt original[/dim]")
|
||||
return user_prompt
|
||||
|
||||
# ========== CENÁRIO B: SEM PROMPT - AUTO-GERAÇÃO ==========
|
||||
else:
|
||||
console.print("\n[yellow]⚠️ Nenhum prompt fornecido.[/yellow]")
|
||||
|
||||
if not prompt_engineer_available:
|
||||
console.print("[yellow]⚠️ prompt-engineer skill não encontrado[/yellow]")
|
||||
console.print("[dim]Usando template padrão...[/dim]")
|
||||
return DEFAULT_MEETING_PROMPT
|
||||
|
||||
# PASSO 1: Perguntar se quer auto-gerar
|
||||
console.print("Posso analisar o transcript e sugerir um formato de resumo/ata?")
|
||||
|
||||
generate = Prompt.ask(
|
||||
"\n💡 Gerar prompt automaticamente?",
|
||||
choices=["s", "n"],
|
||||
default="s"
|
||||
)
|
||||
|
||||
if generate == "n":
|
||||
console.print("[dim]✅ Ok, gerando apenas transcript.md (sem ata)[/dim]")
|
||||
return None # Sinaliza: não processar com LLM
|
||||
|
||||
# PASSO 2: Analisar transcript e SUGERIR tipo
|
||||
console.print("\n[cyan]🔍 Analisando transcript...[/cyan]")
|
||||
|
||||
suggestion_meta_prompt = f"""
|
||||
Analise este transcript ({len(transcript)} caracteres) e sugira:
|
||||
|
||||
1. Tipo de conteúdo (reunião, palestra, entrevista, etc.)
|
||||
2. Formato de saída recomendado (ata formal, resumo executivo, notas estruturadas)
|
||||
3. Framework ideal (RISEN, RODES, STAR, etc.)
|
||||
|
||||
Primeiras 1000 palavras do transcript:
|
||||
{transcript[:4000]}
|
||||
|
||||
Responda em 2-3 linhas concisas.
|
||||
"""
|
||||
|
||||
suggested_type = invoke_prompt_engineer(suggestion_meta_prompt)
|
||||
|
||||
# PASSO 3: Mostrar sugestão e CONFIRMAR
|
||||
console.print("\n[green]💡 Sugestão de formato:[/green]")
|
||||
console.print(Panel(suggested_type, title="Análise do transcript", border_style="green"))
|
||||
|
||||
confirm_type = Prompt.ask(
|
||||
"\n💡 Usar este formato?",
|
||||
choices=["s", "n"],
|
||||
default="s"
|
||||
)
|
||||
|
||||
if confirm_type == "n":
|
||||
console.print("[dim]Usando template padrão...[/dim]")
|
||||
return DEFAULT_MEETING_PROMPT
|
||||
|
||||
# PASSO 4: Gerar prompt completo baseado na sugestão
|
||||
console.print("\n[cyan]✨ Gerando prompt estruturado...[/cyan]")
|
||||
|
||||
final_meta_prompt = f"""
|
||||
Crie um prompt completo e estruturado (usando framework apropriado) para:
|
||||
|
||||
{suggested_type}
|
||||
|
||||
O prompt deve instruir uma IA a transformar o transcript em um documento
|
||||
profissional e bem formatado em Markdown.
|
||||
"""
|
||||
|
||||
generated_prompt = invoke_prompt_engineer(final_meta_prompt)
|
||||
|
||||
# PASSO 5: Mostrar prompt gerado e CONFIRMAR
|
||||
console.print("\n[green]✅ Prompt gerado:[/green]")
|
||||
console.print(Panel(generated_prompt[:600] + ("..." if len(generated_prompt) > 600 else ""),
|
||||
title="Preview", border_style="green"))
|
||||
|
||||
confirm_final = Prompt.ask(
|
||||
"\n💡 Usar este prompt?",
|
||||
choices=["s", "n"],
|
||||
default="s"
|
||||
)
|
||||
|
||||
if confirm_final == "s":
|
||||
return generated_prompt
|
||||
else:
|
||||
console.print("[dim]Usando template padrão...[/dim]")
|
||||
return DEFAULT_MEETING_PROMPT
|
||||
|
||||
|
||||
def process_with_llm(transcript, prompt, cli_tool='claude', timeout=300):
|
||||
"""
|
||||
Processa transcript com LLM usando prompt fornecido.
|
||||
|
||||
Args:
|
||||
transcript: Texto transcrito
|
||||
prompt: Prompt instruindo como processar
|
||||
cli_tool: 'claude' ou 'gh-copilot'
|
||||
timeout: Timeout em segundos
|
||||
|
||||
Returns:
|
||||
str: Ata/resumo processado
|
||||
"""
|
||||
full_prompt = f"{prompt}\n\n---\n\nTranscrição:\n\n{transcript}"
|
||||
|
||||
try:
|
||||
with Progress(
|
||||
SpinnerColumn(),
|
||||
TextColumn("[progress.description]{task.description}"),
|
||||
transient=True
|
||||
) as progress:
|
||||
progress.add_task(description=f"🤖 Processando com {cli_tool}...", total=None)
|
||||
|
||||
if cli_tool == 'claude':
|
||||
result = subprocess.run(
|
||||
['claude', '-'],
|
||||
input=full_prompt,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout
|
||||
)
|
||||
elif cli_tool == 'gh-copilot':
|
||||
result = subprocess.run(
|
||||
['gh', 'copilot', 'suggest', '-t', 'shell', full_prompt],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout
|
||||
)
|
||||
else:
|
||||
raise ValueError(f"CLI tool desconhecido: {cli_tool}")
|
||||
|
||||
if result.returncode == 0:
|
||||
return result.stdout.strip()
|
||||
else:
|
||||
console.print(f"[red]❌ Erro ao processar com {cli_tool}[/red]")
|
||||
console.print(f"[dim]{result.stderr[:200]}[/dim]")
|
||||
return None
|
||||
|
||||
except subprocess.TimeoutExpired:
|
||||
console.print(f"[red]❌ Timeout após {timeout}s[/red]")
|
||||
return None
|
||||
except Exception as e:
|
||||
console.print(f"[red]❌ Erro: {e}[/red]")
|
||||
return None
|
||||
|
||||
|
||||
def transcribe_audio(audio_file, model="base"):
|
||||
"""
|
||||
Transcreve áudio usando Whisper com barra de progresso.
|
||||
|
||||
Returns:
|
||||
dict: {language, duration, segments: [{start, end, text}]}
|
||||
"""
|
||||
console.print(f"\n[cyan]🎙️ Transcrevendo áudio com {TRANSCRIBER}...[/cyan]")
|
||||
|
||||
try:
|
||||
if TRANSCRIBER == "faster-whisper":
|
||||
model_obj = WhisperModel(model, device="cpu", compute_type="int8")
|
||||
segments, info = model_obj.transcribe(
|
||||
audio_file,
|
||||
language=None,
|
||||
vad_filter=True,
|
||||
word_timestamps=True
|
||||
)
|
||||
|
||||
data = {
|
||||
"language": info.language,
|
||||
"language_probability": round(info.language_probability, 2),
|
||||
"duration": info.duration,
|
||||
"segments": []
|
||||
}
|
||||
|
||||
# Converter generator em lista com progresso
|
||||
console.print("[dim]Processando segmentos...[/dim]")
|
||||
for segment in tqdm(segments, desc="Segmentos", unit="seg"):
|
||||
data["segments"].append({
|
||||
"start": round(segment.start, 2),
|
||||
"end": round(segment.end, 2),
|
||||
"text": segment.text.strip()
|
||||
})
|
||||
|
||||
else: # whisper original
|
||||
import whisper
|
||||
model_obj = whisper.load_model(model)
|
||||
result = model_obj.transcribe(audio_file, word_timestamps=True)
|
||||
|
||||
data = {
|
||||
"language": result["language"],
|
||||
"duration": result["segments"][-1]["end"] if result["segments"] else 0,
|
||||
"segments": result["segments"]
|
||||
}
|
||||
|
||||
console.print(f"[green]✅ Transcrição completa! Idioma: {data['language'].upper()}[/green]")
|
||||
console.print(f"[dim] {len(data['segments'])} segmentos processados[/dim]")
|
||||
|
||||
return data
|
||||
|
||||
except Exception as e:
|
||||
console.print(f"[red]❌ Erro na transcrição: {e}[/red]")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def save_outputs(transcript_text, ata_text, audio_file, output_dir="."):
|
||||
"""
|
||||
Salva transcript e ata em arquivos .md com timestamp.
|
||||
|
||||
Returns:
|
||||
tuple: (transcript_path, ata_path or None)
|
||||
"""
|
||||
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
|
||||
base_name = Path(audio_file).stem
|
||||
|
||||
# Sempre salva transcript
|
||||
transcript_filename = f"transcript-{timestamp}.md"
|
||||
transcript_path = Path(output_dir) / transcript_filename
|
||||
|
||||
with open(transcript_path, 'w', encoding='utf-8') as f:
|
||||
f.write(transcript_text)
|
||||
|
||||
console.print(f"[green]✅ Transcript salvo:[/green] {transcript_filename}")
|
||||
|
||||
# Salva ata se existir
|
||||
ata_path = None
|
||||
if ata_text:
|
||||
ata_filename = f"ata-{timestamp}.md"
|
||||
ata_path = Path(output_dir) / ata_filename
|
||||
|
||||
with open(ata_path, 'w', encoding='utf-8') as f:
|
||||
f.write(ata_text)
|
||||
|
||||
console.print(f"[green]✅ Ata salva:[/green] {ata_filename}")
|
||||
|
||||
return str(transcript_path), str(ata_path) if ata_path else None
|
||||
|
||||
|
||||
def cleanup_temp_files(output_dir=".", keep_temp=False):
|
||||
"""Remove arquivos temporários JSON se não for para manter."""
|
||||
if keep_temp:
|
||||
return
|
||||
|
||||
temp_files = ["metadata.json", "transcription.json"]
|
||||
removed = []
|
||||
|
||||
for filename in temp_files:
|
||||
filepath = Path(output_dir) / filename
|
||||
if filepath.exists():
|
||||
filepath.unlink()
|
||||
removed.append(filename)
|
||||
|
||||
if removed:
|
||||
console.print(f"[dim]🧹 Removidos arquivos temporários: {', '.join(removed)}[/dim]")
|
||||
|
||||
|
||||
def main():
|
||||
"""Função principal."""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(description="Audio Transcriber v1.1.0")
|
||||
parser.add_argument("audio_file", help="Arquivo de áudio para transcrever")
|
||||
parser.add_argument("--prompt", help="Prompt customizado para processar transcript")
|
||||
parser.add_argument("--model", default="base", help="Modelo Whisper (tiny/base/small/medium/large)")
|
||||
parser.add_argument("--output-dir", default=".", help="Diretório de saída")
|
||||
parser.add_argument("--keep-temp", action="store_true", help="Manter arquivos temporários JSON")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Verificar arquivo existe
|
||||
if not os.path.exists(args.audio_file):
|
||||
console.print(f"[red]❌ Arquivo não encontrado: {args.audio_file}[/red]")
|
||||
sys.exit(1)
|
||||
|
||||
console.print("[bold cyan]🎵 Audio Transcriber v1.1.0[/bold cyan]\n")
|
||||
|
||||
# Step 1: Transcrever
|
||||
transcription_data = transcribe_audio(args.audio_file, model=args.model)
|
||||
|
||||
# Gerar texto do transcript
|
||||
transcript_text = f"# Transcrição de Áudio\n\n"
|
||||
transcript_text += f"**Arquivo:** {Path(args.audio_file).name}\n"
|
||||
transcript_text += f"**Idioma:** {transcription_data['language'].upper()}\n"
|
||||
transcript_text += f"**Data:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n\n"
|
||||
transcript_text += "---\n\n## Transcrição Completa\n\n"
|
||||
|
||||
for seg in transcription_data["segments"]:
|
||||
start_min = int(seg["start"] // 60)
|
||||
start_sec = int(seg["start"] % 60)
|
||||
end_min = int(seg["end"] // 60)
|
||||
end_sec = int(seg["end"] % 60)
|
||||
transcript_text += f"**[{start_min:02d}:{start_sec:02d} → {end_min:02d}:{end_sec:02d}]** \n{seg['text']}\n\n"
|
||||
|
||||
# Step 2: Detectar CLI
|
||||
cli_tool = detect_cli_tool()
|
||||
|
||||
if not cli_tool:
|
||||
console.print("\n[yellow]⚠️ Nenhuma CLI de IA detectada (Claude ou GitHub Copilot)[/yellow]")
|
||||
console.print("[dim]ℹ️ Salvando apenas transcript.md...[/dim]")
|
||||
|
||||
save_outputs(transcript_text, None, args.audio_file, args.output_dir)
|
||||
cleanup_temp_files(args.output_dir, args.keep_temp)
|
||||
|
||||
console.print("\n[cyan]💡 Para gerar ata/resumo:[/cyan]")
|
||||
console.print(" - Instale Claude CLI: pip install claude-cli")
|
||||
console.print(" - Ou GitHub Copilot CLI já está instalado (gh copilot)")
|
||||
return
|
||||
|
||||
console.print(f"\n[green]✅ CLI detectada: {cli_tool}[/green]")
|
||||
|
||||
# Step 3: Workflow de prompt
|
||||
final_prompt = handle_prompt_workflow(args.prompt, transcript_text)
|
||||
|
||||
if final_prompt is None:
|
||||
# Usuário recusou processamento
|
||||
save_outputs(transcript_text, None, args.audio_file, args.output_dir)
|
||||
cleanup_temp_files(args.output_dir, args.keep_temp)
|
||||
return
|
||||
|
||||
# Step 4: Processar com LLM
|
||||
ata_text = process_with_llm(transcript_text, final_prompt, cli_tool)
|
||||
|
||||
if ata_text:
|
||||
console.print("[green]✅ Ata gerada com sucesso![/green]")
|
||||
else:
|
||||
console.print("[yellow]⚠️ Falha ao gerar ata, salvando apenas transcript[/yellow]")
|
||||
|
||||
# Step 5: Salvar arquivos
|
||||
console.print("\n[cyan]💾 Salvando arquivos...[/cyan]")
|
||||
save_outputs(transcript_text, ata_text, args.audio_file, args.output_dir)
|
||||
|
||||
# Step 6: Cleanup
|
||||
cleanup_temp_files(args.output_dir, args.keep_temp)
|
||||
|
||||
console.print("\n[bold green]✅ Concluído![/bold green]")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user