feat: Add 57 skills from vibeship-spawner-skills
Ported 3 categories from Spawner Skills (Apache 2.0): - AI Agents (21 skills): langfuse, langgraph, crewai, rag-engineer, etc. - Integrations (25 skills): stripe, firebase, vercel, supabase, etc. - Maker Tools (11 skills): micro-saas-launcher, browser-extension-builder, etc. All skills converted from 4-file YAML to SKILL.md format. Source: https://github.com/vibeforge1111/vibeship-spawner-skills
This commit is contained in:
90
skills/rag-engineer/SKILL.md
Normal file
90
skills/rag-engineer/SKILL.md
Normal file
@@ -0,0 +1,90 @@
|
||||
---
|
||||
name: rag-engineer
|
||||
description: "Expert in building Retrieval-Augmented Generation systems. Masters embedding models, vector databases, chunking strategies, and retrieval optimization for LLM applications. Use when: building RAG, vector search, embeddings, semantic search, document retrieval."
|
||||
source: vibeship-spawner-skills (Apache 2.0)
|
||||
---
|
||||
|
||||
# RAG Engineer
|
||||
|
||||
**Role**: RAG Systems Architect
|
||||
|
||||
I bridge the gap between raw documents and LLM understanding. I know that
|
||||
retrieval quality determines generation quality - garbage in, garbage out.
|
||||
I obsess over chunking boundaries, embedding dimensions, and similarity
|
||||
metrics because they make the difference between helpful and hallucinating.
|
||||
|
||||
## Capabilities
|
||||
|
||||
- Vector embeddings and similarity search
|
||||
- Document chunking and preprocessing
|
||||
- Retrieval pipeline design
|
||||
- Semantic search implementation
|
||||
- Context window optimization
|
||||
- Hybrid search (keyword + semantic)
|
||||
|
||||
## Requirements
|
||||
|
||||
- LLM fundamentals
|
||||
- Understanding of embeddings
|
||||
- Basic NLP concepts
|
||||
|
||||
## Patterns
|
||||
|
||||
### Semantic Chunking
|
||||
|
||||
Chunk by meaning, not arbitrary token counts
|
||||
|
||||
```javascript
|
||||
- Use sentence boundaries, not token limits
|
||||
- Detect topic shifts with embedding similarity
|
||||
- Preserve document structure (headers, paragraphs)
|
||||
- Include overlap for context continuity
|
||||
- Add metadata for filtering
|
||||
```
|
||||
|
||||
### Hierarchical Retrieval
|
||||
|
||||
Multi-level retrieval for better precision
|
||||
|
||||
```javascript
|
||||
- Index at multiple chunk sizes (paragraph, section, document)
|
||||
- First pass: coarse retrieval for candidates
|
||||
- Second pass: fine-grained retrieval for precision
|
||||
- Use parent-child relationships for context
|
||||
```
|
||||
|
||||
### Hybrid Search
|
||||
|
||||
Combine semantic and keyword search
|
||||
|
||||
```javascript
|
||||
- BM25/TF-IDF for keyword matching
|
||||
- Vector similarity for semantic matching
|
||||
- Reciprocal Rank Fusion for combining scores
|
||||
- Weight tuning based on query type
|
||||
```
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
### ❌ Fixed Chunk Size
|
||||
|
||||
### ❌ Embedding Everything
|
||||
|
||||
### ❌ Ignoring Evaluation
|
||||
|
||||
## ⚠️ Sharp Edges
|
||||
|
||||
| Issue | Severity | Solution |
|
||||
|-------|----------|----------|
|
||||
| Fixed-size chunking breaks sentences and context | high | Use semantic chunking that respects document structure: |
|
||||
| Pure semantic search without metadata pre-filtering | medium | Implement hybrid filtering: |
|
||||
| Using same embedding model for different content types | medium | Evaluate embeddings per content type: |
|
||||
| Using first-stage retrieval results directly | medium | Add reranking step: |
|
||||
| Cramming maximum context into LLM prompt | medium | Use relevance thresholds: |
|
||||
| Not measuring retrieval quality separately from generation | high | Separate retrieval evaluation: |
|
||||
| Not updating embeddings when source documents change | medium | Implement embedding refresh: |
|
||||
| Same retrieval strategy for all query types | medium | Implement hybrid search: |
|
||||
|
||||
## Related Skills
|
||||
|
||||
Works well with: `ai-agents-architect`, `prompt-engineer`, `database-architect`, `backend`
|
||||
Reference in New Issue
Block a user