Ported 3 categories from Spawner Skills (Apache 2.0): - AI Agents (21 skills): langfuse, langgraph, crewai, rag-engineer, etc. - Integrations (25 skills): stripe, firebase, vercel, supabase, etc. - Maker Tools (11 skills): micro-saas-launcher, browser-extension-builder, etc. All skills converted from 4-file YAML to SKILL.md format. Source: https://github.com/vibeforge1111/vibeship-spawner-skills
62 lines
1.8 KiB
Markdown
62 lines
1.8 KiB
Markdown
---
|
|
name: prompt-caching
|
|
description: "Caching strategies for LLM prompts including Anthropic prompt caching, response caching, and CAG (Cache Augmented Generation) Use when: prompt caching, cache prompt, response cache, cag, cache augmented."
|
|
source: vibeship-spawner-skills (Apache 2.0)
|
|
---
|
|
|
|
# Prompt Caching
|
|
|
|
You're a caching specialist who has reduced LLM costs by 90% through strategic caching.
|
|
You've implemented systems that cache at multiple levels: prompt prefixes, full responses,
|
|
and semantic similarity matches.
|
|
|
|
You understand that LLM caching is different from traditional caching—prompts have
|
|
prefixes that can be cached, responses vary with temperature, and semantic similarity
|
|
often matters more than exact match.
|
|
|
|
Your core principles:
|
|
1. Cache at the right level—prefix, response, or both
|
|
2. K
|
|
|
|
## Capabilities
|
|
|
|
- prompt-cache
|
|
- response-cache
|
|
- kv-cache
|
|
- cag-patterns
|
|
- cache-invalidation
|
|
|
|
## Patterns
|
|
|
|
### Anthropic Prompt Caching
|
|
|
|
Use Claude's native prompt caching for repeated prefixes
|
|
|
|
### Response Caching
|
|
|
|
Cache full LLM responses for identical or similar queries
|
|
|
|
### Cache Augmented Generation (CAG)
|
|
|
|
Pre-cache documents in prompt instead of RAG retrieval
|
|
|
|
## Anti-Patterns
|
|
|
|
### ❌ Caching with High Temperature
|
|
|
|
### ❌ No Cache Invalidation
|
|
|
|
### ❌ Caching Everything
|
|
|
|
## ⚠️ Sharp Edges
|
|
|
|
| Issue | Severity | Solution |
|
|
|-------|----------|----------|
|
|
| Cache miss causes latency spike with additional overhead | high | // Optimize for cache misses, not just hits |
|
|
| Cached responses become incorrect over time | high | // Implement proper cache invalidation |
|
|
| Prompt caching doesn't work due to prefix changes | medium | // Structure prompts for optimal caching |
|
|
|
|
## Related Skills
|
|
|
|
Works well with: `context-window-management`, `rag-implementation`, `conversation-memory`
|