feat(writing-skills): refactor skill to gold standard architecture

2026-01-29 15:12:41 -03:00
parent f2c3e783b5
commit 10ca106b79
16 changed files with 1966 additions and 698 deletions
--- a/skills/writing-skills/references/anti-rationalization/README.md
+++ b/skills/writing-skills/references/anti-rationalization/README.md
@@ -0,0 +1,255 @@
+# Anti-Rationalization Guide
+
+Techniques for bulletproofing skills against agent rationalization.
+
+## The Problem
+
+Discipline-enforcing skills (like TDD) face a unique challenge: smart agents under pressure will find loopholes.
+
+**Example**: Skill says "Write test first". Agent under deadline thinks:
+
+- "This is too simple to test"
+- "I'll test after, same result"
+- "It's the spirit that matters, not ritual"
+
+## Psychology Foundation
+
+Understanding WHY persuasion works helps apply it systematically.
+
+**Research basis**: Cialdini (2021), Meincke et al. (2025)
+
+**Core principles**:
+
+- **Authority**: "The TDD community agrees..."
+- **Commitment**: "You already said you follow TDD..."
+- **Scarcity**: "Missing tests now = bugs later"
+- **Social Proof**: "All tested code passing CI proves value"
+- **Unity**: "We're engineers who value quality"
+
+## Technique 1: Close Every Loophole Explicitly
+
+Don't just state the rule - forbid specific workarounds.
+
+### Bad Example
+
+```markdown
+Write code before test? Delete it.
+```
+
+### Good Example
+
+```markdown
+Write code before test? Delete it. Start over.
+
+**No exceptions**:
+
+- Don't keep it as "reference"
+- Don't "adapt" it while writing tests
+- Don't look at it
+- Delete means delete
+```
+
+**Why it works**: Agents try specific workarounds. Counter each explicitly.
+
+## Technique 2: Address "Spirit vs Letter" Arguments
+
+Add foundational principle early:
+
+```markdown
+**Violating the letter of the rules is violating the spirit of the rules.**
+```
+
+**Why it works**: Cuts off entire class of "I'm following the spirit" rationalizations.
+
+## Technique 3: Build Rationalization Table
+
+Capture excuses from baseline testing. Every rationalization goes in table:
+
+```markdown
+| Excuse                           | Reality                                                                 |
+| -------------------------------- | ----------------------------------------------------------------------- |
+| "Too simple to test"             | Simple code breaks. Test takes 30 seconds.                              |
+| "I'll test after"                | Tests passing immediately prove nothing.                                |
+| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
+| "It's about spirit not ritual"   | The letter IS the spirit. TDD's value comes from the specific sequence. |
+```
+
+**Why it works**: Agents read table, recognize their own thinking, see the counter-argument.
+
+## Technique 4: Create Red Flags List
+
+Make it easy for agents to self-check when rationalizing:
+
+```markdown
+## Red Flags - STOP and Start Over
+
+- Code before test
+- "I already manually tested it"
+- "Tests after achieve the same purpose"
+- "It's about spirit not ritual"
+- "This is different because..."
+
+**All of these mean**: Delete code. Start over with TDD.
+```
+
+**Why it works**: Simple checklist, clear action (delete & restart).
+
+## Technique 5: Update Description for Violation Symptoms
+
+Add to description: symptoms of when you're ABOUT to violate:
+
+```yaml
+# ❌ BAD: Only describes what skill does
+description: TDD methodology for writing code
+
+# ✅ GOOD: Includes pre-violation symptoms
+description: Use when implementing any feature or bugfix, before writing implementation code
+metadata:
+  triggers: new feature, bug fix, code change
+```
+
+**Why it works**: Triggers skill BEFORE violation, not after.
+
+## Technique 6: Use Strong Language
+
+Weak language invites rationalization:
+
+```markdown
+# Weak
+
+You should write tests first.
+Generally, test before code.
+It's better to test first.
+
+# Strong
+
+ALWAYS write test first.
+NEVER write code before test.
+Test-first is MANDATORY.
+```
+
+**Why it works**: No ambiguity, no wiggle room.
+
+## Technique 7: Invoke Commitment & Consistency
+
+Reference agent's own standards:
+
+```markdown
+You claimed to follow TDD.
+TDD means test-first.
+Code-first is NOT TDD.
+
+**Either**:
+
+- Follow TDD (test-first), or
+- Admit you're not doing TDD
+
+Don't redefine TDD to fit what you already did.
+```
+
+**Why it works**: Agents resist cognitive dissonance (Festinger, 1957).
+
+## Technique 8: Provide Escape Hatch for Legitimate Cases
+
+If there ARE valid exceptions, state them explicitly:
+
+```markdown
+## When NOT to Use TDD
+
+- Spike solutions (throwaway exploratory code)
+- One-time scripts deleting in 1 hour
+- Generated boilerplate (verified via other means)
+
+**Everything else**: Use TDD. No exceptions.
+```
+
+**Why it works**: Removes "but this is different" argument for non-exception cases.
+
+## Complete Bulletproofing Checklist
+
+For discipline-enforcing skills:
+
+**Loophole Closing**:
+
+- [ ] Forbidden each specific workaround explicitly?
+- [ ] Added "spirit vs letter" principle?
+- [ ] Built rationalization table from baseline tests?
+- [ ] Created red flags list?
+
+**Strength**:
+
+- [ ] Used strong language (ALWAYS/NEVER)?
+- [ ] Invoked commitment & consistency?
+- [ ] Provided explicit escape hatch?
+
+**Discovery**:
+
+- [ ] Description includes pre-violation symptoms?
+- [ ] Keywords target moment BEFORE violation?
+
+**Testing**:
+
+- [ ] Tested with combined pressures?
+- [ ] Agent complied under maximum pressure?
+- [ ] No new rationalizations found?
+
+## Real-World Example: TDD Skill
+
+### Baseline Rationalizations Found
+
+1. "Too simple to test"
+2. "I'll test after"
+3. "Spirit not ritual"
+4. "Already manually tested"
+5. "This is different because..."
+
+### Counters Applied
+
+**Rationalization table**:
+
+```markdown
+| Excuse               | Reality                                                        |
+| -------------------- | -------------------------------------------------------------- |
+| "Too simple to test" | Simple code breaks. Test takes 30 seconds.                     |
+| "I'll test after"    | Tests passing immediately prove nothing.                       |
+| "Spirit not ritual"  | The letter IS the spirit. TDD's value comes from the sequence. |
+| "Manually tested"    | Manual tests don't run automatically. They rot.                |
+```
+
+**Red flags**:
+
+```markdown
+## Red Flags - STOP
+
+- Code before test
+- "I already tested manually"
+- "Spirit not ritual"
+- "This is different..."
+
+All mean: Delete code. Start over.
+```
+
+**Result**: Agent compliance under combined time + sunk cost pressure.
+
+## Common Mistakes
+
+| Mistake                                | Fix                                                            |
+| -------------------------------------- | -------------------------------------------------------------- |
+| Trust agents will "get the spirit"     | Close explicit loopholes. Agents are smart at rationalization. |
+| Use weak language ("should", "better") | Use ALWAYS/NEVER for discipline rules.                         |
+| Skip rationalization table             | Every excuse needs explicit counter.                           |
+| No red flags list                      | Make self-checking easy.                                       |
+| Generic description                    | Add pre-violation symptoms to trigger skill earlier.           |
+
+## Meta-Strategy
+
+**For each new rationalization**:
+
+1. Document it verbatim (from failed test)
+2. Add to rationalization table
+3. Update red flags list
+4. Re-test
+
+**Iterate until**: Agent can't find ANY rationalization that works.
+
+That's bulletproof.
--- a/skills/writing-skills/references/cso/README.md
+++ b/skills/writing-skills/references/cso/README.md
@@ -0,0 +1,268 @@
+# CSO Guide - Claude Search Optimization
+
+Advanced techniques for making skills discoverable by agents.
+
+## The Discovery Problem
+
+You have 100+ skills. Agent receives a task. How does it find the RIGHT skill?
+
+**Answer**: The `description` field.
+
+## Critical Rule: Description = Triggers, NOT Workflow
+
+### The Trap
+
+When description summarizes workflow, agents take a shortcut.
+
+**Real example that failed**:
+
+```yaml
+# Agent did ONE review instead of TWO
+description: Code review between tasks
+
+# Skill body had flowchart showing TWO reviews:
+# 1. Spec compliance
+# 2. Code quality
+```
+
+**Why it failed**: Agent read description, thought "code review between tasks means one review", never read the flowchart.
+
+**Fix**:
+
+```yaml
+# Agent now reads full skill and follows flowchart
+description: Use when executing implementation plans with independent tasks
+```
+
+### The Pattern
+
+```yaml
+# ❌ BAD: Workflow summary
+description: Analyzes git diff, generates commit message in conventional format
+
+# ✅ GOOD: Trigger conditions only
+description: Use when generating commit messages or reviewing staged changes
+```
+
+## Token Efficiency Critical for Skills
+
+**Problem**: Frequently-loaded skills consume tokens in EVERY conversation.
+
+**Target word counts**:
+
+- Frequently-loaded skills: <200 words total
+- Other skills: <500 words
+
+### Techniques
+
+**1. Move details to tool help**:
+
+```bash
+# ❌ BAD: Document all flags in SKILL.md
+search-conversations supports --text, --both, --after DATE, --before DATE, --limit N
+
+# ✅ GOOD: Reference --help
+search-conversations supports multiple modes. Run --help for details.
+```
+
+**2. Use cross-references**:
+
+```markdown
+# ❌ BAD: Repeat workflow
+
+When searching, dispatch agent with template...
+[20 lines of repeated instructions]
+
+# ✅ GOOD: Reference other skill
+
+Use subagents for searches. See [delegating-to-subagents] for workflow.
+```
+
+**3. Compress examples**:
+
+```markdown
+# ❌ BAD: Verbose (42 words)
+
+Partner: "How did we handle auth errors in React Router?"
+You: I'll search past conversations for patterns.
+[Dispatch subagent with query: "React Router authentication error handling 401"]
+
+# ✅ GOOD: Minimal (20 words)
+
+Partner: "Auth errors in React Router?"
+You: Searching...
+[Dispatch subagent → synthesis]
+```
+
+## Keyword Strategy
+
+### Error Messages
+
+Include EXACT error text users will see:
+
+- "Hook timed out after 5000ms"
+- "ENOTEMPTY: directory not empty"
+- "jest --watch is not responding"
+
+### Symptoms
+
+Use words users naturally say:
+
+- "flaky", "hangs", "zombie process"
+- "slow", "timeout", "race condition"
+- "cleanup failed", "pollution"
+
+### Tools & Commands
+
+Actual names, not descriptions:
+
+- "pytest", not "Python testing"
+- "git rebase", not "rebasing"
+- ".docx files", not "Word documents"
+
+### Synonyms
+
+Cover multiple ways to describe same thing:
+
+- timeout/hang/freeze
+- cleanup/teardown/after Each
+- mock/stub/fake
+
+## Naming Conventions
+
+### Gerunds (-ing) for Processes
+
+✅ `creating-skills`, `debugging-with-logs`, `testing-async-code`
+
+### Verb-first for Actions
+
+✅ `flatten-with-flags`, `reduce-complexity`, `trace-root-cause`
+
+### ❌ Avoid
+
+- `skill-creation` (passive, less searchable)
+- `async-test-helpers` (too generic)
+- `debugging-techniques` (vague)
+
+## Description Template
+
+```yaml
+description: "Use when [SPECIFIC TRIGGER]."
+metadata:
+  triggers: [error1], [symptom2], [tool3]
+```
+
+**Examples**:
+
+```yaml
+# Technique skill
+description: "Use when tests have race conditions, timing dependencies, or pass/fail inconsistently."
+metadata:
+  triggers: flaky tests, timeout, race condition
+
+# Pattern skill
+description: "Use when complex data structures make code hard to follow."
+metadata:
+  triggers: nested loops, multiple flags, confusing state
+
+# Reference skill
+description: "Use when working with React Router and authentication."
+metadata:
+  triggers: 401 redirect, login flow, protected routes
+
+# Discipline skill
+description: "Use when implementing any feature or bugfix, before writing implementation code."
+metadata:
+  triggers: new feature, bug fix, code change
+```
+
+## Third Person Rule
+
+Description is injected into system prompt. Inconsistent POV breaks discovery.
+
+```yaml
+# ❌ BAD: First person
+description: "I can help you with async tests"
+
+# ❌ BAD: Second person
+description: "You can use this for race conditions"
+
+# ✅ GOOD: Third person
+description: "Handles async tests with race conditions"
+```
+
+## Cross-Referencing Other Skills
+
+**When documenting a skill that references other skills**:
+
+Use skill name only, with explicit requirement markers:
+
+```markdown
+# ✅ GOOD: Clear requirement
+
+**REQUIRED BACKGROUND**: You MUST understand test-driven-development before using this skill.
+
+**REQUIRED SUB-SKILL**: Use defensive-programming for error handling.
+
+# ❌ BAD: Unclear if required
+
+See test-driven-development skill for context.
+
+# ❌ NEVER: Force-loads (burns context)
+
+@skills/testing/test-driven-development/SKILL.md
+```
+
+**Why no @ links**: `@` syntax force-loads files immediately, consuming tokens before needed.
+
+## Verification Checklist
+
+Before deploying:
+
+- [ ] Description starts with "Use when..."?
+- [ ] Description is <500 characters?
+- [ ] Description lists ONLY triggers, not workflow?
+- [ ] Includes 3+ keywords (errors/symptoms/tools)?
+- [ ] Third person throughout?
+- [ ] Name uses gerund or verb-first format?
+- [ ] Name has only letters, numbers, hyphens?
+- [ ] No @ syntax for cross-references?
+- [ ] Word count <200 (frequent) or <500 (other)?
+
+## Real-World Examples
+
+### Before/After: TDD Skill
+
+❌ **Before** (workflow in description):
+
+```yaml
+description: Write test first, watch it fail, write minimal code, refactor
+```
+
+Result: Agents followed description, skipped reading full skill.
+
+✅ **After** (triggers only):
+
+```yaml
+description: Use when implementing any feature or bugfix, before writing implementation code
+```
+
+Result: Agents read full skill, followed complete TDD cycle.
+
+### Before/After: BigQuery Skill
+
+❌ **Before** (too vague):
+
+```yaml
+description: Helps with database queries
+```
+
+Result: Never loaded (too generic, agents couldn't identify relevance).
+
+✅ **After** (specific triggers):
+
+```yaml
+description: Use when analyzing BigQuery data. Triggers: revenue metrics, pipeline data, API usage, campaign attribution.
+```
+
+Result: Loads for relevant queries, includes domain keywords.
--- a/skills/writing-skills/references/standards/README.md
+++ b/skills/writing-skills/references/standards/README.md
@@ -0,0 +1,152 @@
+---
+description: Standards and naming rules for creating agent skills.
+metadata:
+  tags: [standards, naming, yaml, structure]
+---
+
+# Skill Development Guide
+
+Comprehensive reference for creating effective agent skills.
+
+## Directory Structure
+
+```
+~/.config/opencode/skills/
+  {skill-name}/           # kebab-case, matches `name` field
+    SKILL.md              # Required: main skill definition
+    references/           # Optional: supporting documentation
+      README.md           # Sub-topic entry point
+      *.md                # Additional files
+```
+
+**Project-local alternative:**
+```
+.agent/skills/{skill-name}/SKILL.md
+```
+
+## Naming Rules
+
+| Element | Rule | Example |
+|---------|------|---------|
+| Directory | kebab-case, 1-64 chars | `react-best-practices` |
+| `SKILL.md` | ALL CAPS, exact filename | `SKILL.md` (not `skill.md`) |
+| `name` field | Must match directory name | `name: react-best-practices` |
+
+## SKILL.md Structure
+
+```markdown
+---
+name: {skill-name}
+description: >-
+  Use when [trigger condition].
+metadata:
+  category: technique
+  triggers: keyword1, keyword2, error-text
+---
+
+# Skill Title
+
+Brief description of what this skill does.
+
+## When to Use
+
+- Symptom or situation A
+- Symptom or situation B
+
+## How It Works
+
+Step-by-step instructions or reference content.
+
+## Examples
+
+Concrete usage examples.
+
+## Common Mistakes
+
+What to avoid and why.
+```
+
+## Description Best Practices
+
+The `description` field is critical for skill discovery:
+
+```yaml
+# ❌ BAD: Workflow summary (agent skips reading full skill)
+description: Analyzes code, finds bugs, suggests fixes
+
+# ✅ GOOD: Trigger conditions only
+description: Use when debugging errors or reviewing code quality.
+metadata:
+  triggers: bug, error, code review
+```
+
+**Rules:**
+- Start with "Use when..."
+- Put triggers under `metadata.triggers`
+- Keep under 500 characters
+- Use third person (not "I" or "You")
+
+## Context Efficiency
+
+Skills load into context on-demand. Optimize for token usage:
+
+| Guideline | Reason |
+|-----------|--------|
+| Keep SKILL.md < 500 lines | Reduces context consumption |
+| Put details in supporting files | Agent reads only what's needed |
+| Use tables for reference data | More compact than prose |
+| Link to `--help` for CLI tools | Avoids duplicating docs |
+
+## Supporting Files
+
+For complex skills, use additional files:
+
+```
+my-skill/
+  SKILL.md              # Overview + navigation
+  patterns.md           # Detailed patterns
+  examples.md           # Code examples
+  troubleshooting.md    # Common issues
+```
+
+**Supporting file frontmatter is required** (for any `.md` besides `SKILL.md`):
+
+```markdown
+---
+description: >-
+  Short summary used for search and retrieval.
+metadata:
+  tags: [pattern, troubleshooting, api]
+  source: internal
+---
+```
+
+This frontmatter helps the LLM locate the right file when referenced from `SKILL.md`.
+
+Reference from SKILL.md:
+```markdown
+## Detailed Reference
+
+- [Patterns](patterns.md) - Common usage patterns
+- [Examples](examples.md) - Code samples
+```
+
+## Skill Types
+
+| Type | Purpose | Example |
+|------|---------|---------|
+| **Reference** | Documentation, APIs | `bigquery-analysis` |
+| **Technique** | How-to guides | `condition-based-waiting` |
+| **Pattern** | Mental models | `flatten-with-flags` |
+| **Discipline** | Rules to enforce | `test-driven-development` |
+
+## Verification Checklist
+
+Before deploying:
+
+- [ ] `name` matches directory name?
+- [ ] `SKILL.md` is ALL CAPS?
+- [ ] Description starts with "Use when..."?
+- [ ] Triggers listed under metadata?
+- [ ] Under 500 lines?
+- [ ] Tested with real scenarios?
--- a/skills/writing-skills/references/standards/metadata-standard.md
+++ b/skills/writing-skills/references/standards/metadata-standard.md
@@ -0,0 +1,65 @@
+# SKILL.md Metadata Standard
+
+Official frontmatter fields recognized by OpenCode.
+
+## Required Fields
+
+```yaml
+---
+name: skill-name
+description: >-
+  Use when [trigger condition].
+metadata:
+  triggers: keyword1, keyword2, error-message
+---
+```
+
+| Field | Rules |
+|-------|-------|
+| `name` | 1-64 chars, lowercase, hyphens only, must match directory name |
+| `description` | 1-1024 chars, should describe when to use |
+
+## Optional Fields
+
+```yaml
+---
+name: skill-name
+description: Purpose and triggers.
+metadata:
+  license: MIT
+  compatibility: opencode
+  author: "your-name"
+  version: "1.0.0"
+  category: "reference"
+  tags: "tag1, tag2"
+---
+```
+
+| Field | Purpose |
+|-------|---------|
+| `license` | License identifier (e.g., MIT, Apache-2.0) |
+| `compatibility` | Tool compatibility marker |
+| `metadata` | String-to-string map for custom key-values |
+
+## Name Validation
+
+```regex
+^[a-z0-9]+(-[a-z0-9]+)*$
+```
+
+**Valid**: `my-skill`, `git-release`, `tdd`  
+**Invalid**: `My-Skill`, `my_skill`, `-my-skill`, `my--skill`
+
+## Common Metadata Keys
+
+Use these conventions for consistency across skills:
+
+| Key | Example | Purpose |
+|-----|---------|---------|
+| `author` | `"your-name"` | Skill creator |
+| `version` | `"1.0.0"` | Semantic version |
+| `category` | `"reference"` | Type: reference, technique, discipline, pattern |
+| `tags` | `"react, hooks"` | Searchable keywords |
+
+> [!IMPORTANT]
+> Any field not listed here is **ignored** by OpenCode's skill loader.
--- a/skills/writing-skills/references/templates/discipline.md
+++ b/skills/writing-skills/references/templates/discipline.md
@@ -0,0 +1,54 @@
+---
+name: discipline-name
+description: >-
+  Use when [BEFORE violation].
+metadata:
+  category: discipline
+  triggers: new feature, code change, implementation
+---
+
+# Rule Name
+
+## Iron Law
+
+**[SINGLE SENTENCE ABSOLUTE RULE]**
+
+Violating the letter IS violating the spirit.
+
+## The Rule
+
+1. ALWAYS [step 1]
+2. NEVER [step 2]
+3. [Step 3]
+
+## Violations
+
+[Action before rule]? **Delete it. Start over.**
+
+**No exceptions:**
+- Don't keep it as "reference"
+- Don't "adapt" it
+- Delete means delete
+
+## Common Rationalizations
+
+| Excuse | Reality |
+|--------|---------|
+| "Too simple" | Simple code breaks. Rule takes 30 seconds. |
+| "I'll do it after" | After = never. Do it now. |
+| "Spirit not ritual" | The ritual IS the spirit. |
+
+## Red Flags - STOP
+
+- [Flag 1]
+- [Flag 2]
+- "This is different because..."
+
+**All mean:** Delete. Start over.
+
+## Valid Exceptions
+
+- [Exception 1]
+- [Exception 2]
+
+**Everything else:** Follow the rule.
--- a/skills/writing-skills/references/templates/pattern.md
+++ b/skills/writing-skills/references/templates/pattern.md
@@ -0,0 +1,48 @@
+---
+name: pattern-name
+description: >-
+  Use when [recognizable symptom].
+metadata:
+  category: pattern
+  triggers: complexity, hard-to-follow, nested
+---
+
+# Pattern Name
+
+## The Pattern
+
+[1-2 sentence core idea]
+
+## Recognition Signs
+
+- [Sign that pattern applies]
+- [Another sign]
+- [Code smell]
+
+## Before
+
+```typescript
+// Complex/problematic
+function before() {
+  // nested, confusing
+}
+```
+
+## After
+
+```typescript
+// Clean/improved
+function after() {
+  // flat, clear
+}
+```
+
+## When NOT to Use
+
+- [Over-engineering case]
+- [Simple case that doesn't need it]
+
+## Impact
+
+**Before:** [Problem metric]
+**After:** [Improved metric]
--- a/skills/writing-skills/references/templates/reference.md
+++ b/skills/writing-skills/references/templates/reference.md
@@ -0,0 +1,35 @@
+---
+name: reference-name
+description: >-
+  Use when working with [domain].
+metadata:
+  category: reference
+  triggers: tool, api, specific-terms
+---
+
+# Reference Name
+
+## Quick Reference
+
+| Command | Purpose |
+|---------|---------|
+| `cmd1` | Does X |
+| `cmd2` | Does Y |
+
+## Common Patterns
+
+**Pattern A:**
+```bash
+example command
+```
+
+**Pattern B:**
+```bash
+another example
+```
+
+## Detailed Docs
+
+For more options, run `--help` or see:
+- [patterns.md](patterns.md)
+- [examples.md](examples.md)
--- a/skills/writing-skills/references/templates/technique.md
+++ b/skills/writing-skills/references/templates/technique.md
@@ -0,0 +1,59 @@
+---
+name: technique-name
+description: Use when [specific symptom].
+metadata:
+  category: technique
+  triggers: error-text, symptom, tool-name
+---
+
+# Technique Name
+
+## Overview
+
+[1-2 sentence core principle]
+
+## When to Use
+
+- [Symptom A]
+- [Symptom B]
+- [Error message text]
+
+**NOT for:**
+- [When to avoid]
+
+## The Problem
+
+```javascript
+// Bad example
+function badCode() {
+  // problematic pattern
+}
+```
+
+## The Solution
+
+```javascript
+// Good example
+function goodCode() {
+  // improved pattern
+}
+```
+
+## Step-by-Step
+
+1. [First step]
+2. [Second step]
+3. [Final step]
+
+## Quick Reference
+
+| Scenario | Approach |
+|----------|----------|
+| Case A | Solution A |
+| Case B | Solution B |
+
+## Common Mistakes
+
+**Mistake 1:** [Description]
+- Wrong: `bad code`
+- Right: `good code`
--- a/skills/writing-skills/references/templates/tier-3-platform.md
+++ b/skills/writing-skills/references/templates/tier-3-platform.md
@@ -0,0 +1,19 @@
+# Platform Name Skill
+
+Template for complex Tier 3 skills.
+
+## Structure
+
+```
+skill/
+├── SKILL.md            # Dispatcher
+├── commands/
+│   └── skill.md        # Orchestrator
+└── references/
+    └── topic/
+        ├── README.md   # Overview
+        ├── api.md      # API Reference
+        ├── config.md   # Configuration
+        ├── patterns.md # Recipes
+        └── gotchas.md  # Critical Errors
+```
--- a/skills/writing-skills/references/testing/README.md
+++ b/skills/writing-skills/references/testing/README.md
@@ -0,0 +1,204 @@
+# Testing Guide - TDD for Skills
+
+Complete methodology for testing skills using RED-GREEN-REFACTOR cycle.
+
+## Testing All Skill Types
+
+Different skill types need different test approaches.
+
+### Discipline-Enforcing Skills (rules/requirements)
+
+**Examples**: TDD, verification-before-completion, designing-before-coding
+
+**Test with**:
+
+- Academic questions: Do they understand the rules?
+- Pressure scenarios: Do they comply under stress?
+- Multiple pressures combined: time + sunk cost + exhaustion
+- Identify rationalizations and add explicit counters
+
+**Success criteria**: Agent follows rule under maximum pressure
+
+### Technique Skills (how-to guides)
+
+**Examples**: condition-based-waiting, root-cause-tracing, defensive-programming
+
+**Test with**:
+
+- Application scenarios: Can they apply the technique correctly?
+- Variation scenarios: Do they handle edge cases?
+- Missing information tests: Do instructions have gaps?
+
+**Success criteria**: Agent successfully applies technique to new scenario
+
+### Pattern Skills (mental models)
+
+**Examples**: reducing-complexity, information-hiding concepts
+
+**Test with**:
+
+- Recognition scenarios: Do they recognize when pattern applies?
+- Application scenarios: Can they use the mental model?
+- Counter-examples: Do they know when NOT to apply?
+
+**Success criteria**: Agent correctly identifies when/how to apply pattern
+
+### Reference Skills (documentation/APIs)
+
+**Examples**: API documentation, command references, library guides
+
+**Test with**:
+
+- Retrieval scenarios: Can they find the right information?
+- Application scenarios: Can they use what they found correctly?
+- Gap testing: Are common use cases covered?
+
+**Success criteria**: Agent finds and correctly applies reference information
+
+## Pressure Types for Testing
+
+### Time Pressure
+
+"You have 5 minutes to complete this task"
+
+### Sunk Cost Pressure
+
+"You already spent 2 hours on this, just finish it quickly"
+
+### Authority Pressure
+
+"The senior developer said to skip tests for this quick bug fix"
+
+### Exhaustion Pressure
+
+"This is the 10th task today, let's wrap it up"
+
+## RED Phase: Baseline Testing
+
+**Goal**: Watch the agent fail WITHOUT the skill.
+
+**Steps**:
+
+1. Design pressure scenario (combine 2-3 pressures)
+2. Give agent the task WITHOUT the skill loaded
+3. Document EXACT behavior:
+   - What rationalization did they use?
+   - Which pressure triggered the violation?
+   - How did they justify the shortcut?
+
+**Critical**: Copy exact quotes. You'll need them for GREEN phase.
+
+**Example Baseline**:
+
+```
+Scenario: Implement feature under time pressure
+Pressure: "You have 10 minutes"
+Agent response: "Since we're short on time, I'll implement the feature first
+and add tests after. Testing later achieves the same goal."
+```
+
+## GREEN Phase: Minimal Implementation
+
+**Goal**: Write skill that addresses SPECIFIC baseline failures.
+
+**Steps**:
+
+1. Review baseline rationalizations
+2. Write skill sections that counter THOSE EXACT arguments
+3. Re-run scenario WITH skill
+4. Agent should now comply
+
+**Bad (too general)**:
+
+```markdown
+## Testing
+
+Always write tests.
+```
+
+**Good (addresses specific rationalization)**:
+
+```markdown
+## Common Rationalizations
+
+| Excuse                              | Reality                                                                 |
+| ----------------------------------- | ----------------------------------------------------------------------- |
+| "Testing after achieves same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
+| "Too simple to test"                | Simple code breaks. Test takes 30 seconds.                              |
+```
+
+## REFACTOR Phase: Loophole Closing
+
+**Goal**: Find and plug new rationalizations.
+
+**Steps**:
+
+1. Agent found new workaround? Document it.
+2. Add explicit counter to skill
+3. Re-test same scenario
+4. Repeat until bulletproof
+
+**Pattern**:
+
+```markdown
+## Red Flags - STOP and Start Over
+
+- Code before test
+- "I already manually tested it"
+- "Tests after achieve the same purpose"
+- "It's about spirit not ritual"
+- "This is different because..."
+
+**All of these mean**: Delete code. Start over with TDD.
+```
+
+## Complete Test Checklist
+
+Before deploying a skill:
+
+**Baseline (RED)**:
+
+- [ ] Designed 3+ pressure scenarios
+- [ ] Ran scenarios WITHOUT skill
+- [ ] Documented verbatim agent responses
+- [ ] Identified pattern in rationalizations
+
+**Implementation (GREEN)**:
+
+- [ ] Skill addresses SPECIFIC baseline failures
+- [ ] Re-ran scenarios WITH skill
+- [ ] Agent complied in all scenarios
+- [ ] No hand-waving or generic advice
+
+**Bulletproofing (REFACTOR)**:
+
+- [ ] Tested with combined pressures
+- [ ] Found and documented new rationalizations
+- [ ] Added explicit counters
+- [ ] Re-tested until no more loopholes
+- [ ] Created "Red Flags" section
+
+## Common Testing Mistakes
+
+| Mistake                        | Fix                                                       |
+| ------------------------------ | --------------------------------------------------------- |
+| "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. |
+| "Skill is obviously clear"     | Clear to you ≠ clear to agents. Test it.                  |
+| "Testing is overkill"          | Untested skills have issues. Always.                      |
+| "Academic review is enough"    | Reading ≠ using. Test application scenarios.              |
+
+## Meta-Testing
+
+**Test the test**: If agent passes too easily, your test is weak.
+
+**Good test indicators**:
+
+- Agent fails WITHOUT skill (proves skill is needed)
+- Agent p asses WITH skill (proves skill works)
+- Multiple pressures needed to trigger failure (proves realistic)
+
+**Bad test indicators**:
+
+- Agent passes even without skill (test is irrelevant)
+- Agent fails even with skill (skill is unclear)
+- Single obvious scenario (test is too simple)
--- a/skills/writing-skills/references/tier-1-simple/README.md
+++ b/skills/writing-skills/references/tier-1-simple/README.md
@@ -0,0 +1,75 @@
+---
+description: When to use Tier 1 (Simple) skill architecture.
+metadata:
+  tags: [tier-1, simple, single-file]
+---
+
+# Tier 1: Simple Skills
+
+Single-file skills for focused, specific purposes.
+
+## When to Use
+
+- **Single concept**: One technique, one pattern, one reference
+- **Under 200 lines**: Can fit comfortably in one file
+- **No complex decision logic**: User knows exactly what they need
+- **Frequently loaded**: Needs minimal token footprint
+
+## Structure
+
+```
+my-skill/
+└── SKILL.md          # Everything in one file
+```
+
+## Example
+
+```yaml
+---
+name: flatten-with-flags
+description: Use when simplifying deeply nested conditionals.
+metadata:
+  category: pattern
+  triggers: nested if, complex conditionals, early return
+---
+
+# Flatten with Flags
+
+## When to Use
+- Code has 3+ levels of nesting
+- Conditions are hard to follow
+
+## The Pattern
+Replace nested conditions with early returns and flag variables.
+
+## Before
+```javascript
+function process(data) {
+  if (data) {
+    if (data.valid) {
+      if (data.ready) {
+        return doWork(data);
+      }
+    }
+  }
+  return null;
+}
+```
+
+## After
+```javascript
+function process(data) {
+  if (!data) return null;
+  if (!data.valid) return null;
+  if (!data.ready) return null;
+  return doWork(data);
+}
+```
+```
+
+## Checklist
+
+- [ ] Fits in <200 lines
+- [ ] Single focused purpose
+- [ ] No need for `references/` directory
+- [ ] Description uses "Use when..." pattern
--- a/skills/writing-skills/references/tier-2-expanded/README.md
+++ b/skills/writing-skills/references/tier-2-expanded/README.md
@@ -0,0 +1,69 @@
+---
+description: When to use Tier 2 (Expanded) skill architecture.
+metadata:
+  tags: [tier-2, expanded, multi-file]
+---
+
+# Tier 2: Expanded Skills
+
+Multi-file skills for complex topics with multiple sub-concepts.
+
+## When to Use
+
+- **Multiple related concepts**: Needs separation of concerns
+- **200-1000 lines total**: Too big for one file
+- **Needs reference files**: Patterns, examples, troubleshooting
+- **Cross-linking**: Users need to navigate between sub-topics
+
+## Structure
+
+```
+my-skill/
+├── SKILL.md              # Overview + navigation
+└── references/
+    ├── core/
+    │   ├── README.md     # Main concept
+    │   └── api.md        # API reference
+    ├── patterns/
+    │   └── README.md     # Usage patterns
+    └── troubleshooting/
+        └── README.md     # Common issues
+```
+
+## Example
+
+The `writing-skills` skill itself is Tier 2:
+
+```
+writing-skills/
+├── SKILL.md              # Decision tree + navigation
+├── gotchas.md            # Tribal knowledge
+└── references/
+    ├── anti-rationalization/
+    ├── cso/
+    ├── standards/
+    ├── templates/
+    └── testing/
+```
+
+## Progressive Disclosure
+
+1. **Metadata** (~100 tokens): Name + description loaded at startup
+2. **SKILL.md** (<500 lines): Decision tree + index
+3. **References** (as needed): Loaded only when user navigates
+
+## Key Differences from Tier 1
+
+| Aspect | Tier 1 | Tier 2 |
+|--------|--------|--------|
+| Files | 1 | 5-20 |
+| Total lines | <200 | 200-1000 |
+| Decision logic | None | Simple tree |
+| Token cost | Minimal | Medium (progressive) |
+
+## Checklist
+
+- [ ] SKILL.md has clear navigation links
+- [ ] Each `references/` subdir has README.md
+- [ ] No circular references between files
+- [ ] Decision tree points to specific files
--- a/skills/writing-skills/references/tier-3-platform/README.md
+++ b/skills/writing-skills/references/tier-3-platform/README.md
@@ -0,0 +1,98 @@
+---
+description: When to use Tier 3 (Platform) skill architecture for large platforms.
+metadata:
+  tags: [tier-3, platform, enterprise, cloudflare-pattern]
+---
+
+# Tier 3: Platform Skills
+
+Enterprise-grade skills for entire platforms (AWS, Cloudflare, Convex, etc).
+
+## When to Use
+
+- **Entire platform**: 10+ products/services
+- **1000+ lines total**: Would overwhelm context if monolithic
+- **Complex decision logic**: Users start with "I need X" not "I want product Y"
+- **Undocumented gotchas**: Tribal knowledge is critical
+
+## The Cloudflare Pattern
+
+Based on `cloudflare-skill` by Dillon Mulroy.
+
+### Structure
+
+```
+my-platform/
+├── SKILL.md                  # Decision trees only
+└── references/
+    └── <product>/
+        ├── README.md         # Overview, when to use
+        ├── api.md            # Runtime API reference
+        ├── configuration.md  # Config options
+        ├── patterns.md       # Usage patterns
+        └── gotchas.md        # Pitfalls, limits
+```
+
+### The 5-File Pattern
+
+Each product directory has exactly 5 files:
+
+| File | Purpose | When to Load |
+|------|---------|--------------|
+| `README.md` | Overview, when to use | Always first |
+| `api.md` | Runtime APIs, methods | Implementing features |
+| `configuration.md` | Config, environment | Setting up |
+| `patterns.md` | Common workflows | Best practices |
+| `gotchas.md` | Pitfalls, limits | Debugging |
+
+## Decision Trees
+
+The power of Tier 3 is decision trees that help the AI **choose**:
+
+```markdown
+Need to store data?
+├─ Simple key-value → kv/
+├─ Relational queries → d1/
+├─ Large files/blobs → r2/
+├─ Per-user state → durable-objects/
+└─ Vector embeddings → vectorize/
+```
+
+## Slash Command Integration
+
+Create a slash command to orchestrate:
+
+```markdown
+---
+description: Load platform skill and get contextual guidance
+---
+
+## Workflow
+
+1. Load skill: `skill({ name: 'my-platform' })`
+2. Identify product from decision tree
+3. Load relevant reference files based on task
+
+| Task | Files |
+|------|-------|
+| New setup | README.md + configuration.md |
+| Implement feature | api.md + patterns.md |
+| Debug issue | gotchas.md |
+```
+
+## Progressive Disclosure in Action
+
+- **Startup**: Only name + description (~100 tokens)
+- **Activation**: SKILL.md with trees (<5000 tokens)
+- **Navigation**: One product's 5 files (as needed)
+
+Result: 60+ product references without blowing context.
+
+## Checklist
+
+- [ ] SKILL.md contains ONLY decision trees + index
+- [ ] Each product has exactly 5 files
+- [ ] Decision trees cover all "I need X" scenarios
+- [ ] Cross-references stay one level deep
+- [ ] Slash command created for orchestration
+- [ ] Every product has `gotchas.md`