feat: integrate PR #28 and #29 (multi-agent brainstorming, design orchestration)

2026-01-25 17:53:35 +01:00
parent af57b96721
commit ae3d038711
6 changed files with 317 additions and 267 deletions
--- a/skills/ab-test-setup/SKILL.md
+++ b/skills/ab-test-setup/SKILL.md
@@ -1,6 +1,32 @@
+---
+name: ab-test-setup
+description: Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.
+---
+
+# A/B Test Setup
+
+## 1️⃣ Purpose & Scope
+
+Ensure every A/B test is **valid, rigorous, and safe** before a single line of code is written.
+
+- Prevents "peeking"
+- Enforces statistical power
+- Blocks invalid hypotheses
+
+---
+
+## 2️⃣ Pre-Requisites
+
+You must have:
+
+- A clear user problem
+- Access to an analytics source
+- Roughly estimated traffic volume

 #### Hypothesis Quality Checklist
+
 A valid hypothesis includes:
+
 - Observation or evidence
 - Single, specific change
 - Directional expectation
@@ -39,6 +65,7 @@ Explicitly list assumptions about:
 - External factors (seasonality, campaigns, releases)

 If assumptions are weak or violated:
+
 - Warn the user
 - Recommend delaying or redesigning the test

@@ -60,16 +87,19 @@ Default to **A/B** unless there is a clear reason otherwise.
 ### 6️⃣ Metrics Definition

 #### Primary Metric (Mandatory)
+
 - Single metric used to evaluate success
 - Directly tied to the hypothesis
 - Pre-defined and frozen before launch

 #### Secondary Metrics
+
 - Provide context
- Explain *why* results occurred
+- Explain _why_ results occurred
 - Must not override the primary metric

 #### Guardrail Metrics
+
 - Metrics that must not degrade
 - Used to prevent harmful wins
 - Trigger test stop if significantly negative
@@ -79,12 +109,14 @@ Default to **A/B** unless there is a clear reason otherwise.
 ### 7️⃣ Sample Size & Duration

 Define upfront:
+
 - Baseline rate
 - MDE
 - Significance level (typically 95%)
 - Statistical power (typically 80%)

 Estimate:
+
 - Required sample size per variant
 - Expected test duration

@@ -112,10 +144,12 @@ If any item is missing, stop and resolve it.
 ### During the Test

 **DO:**
+
 - Monitor technical health
 - Document external factors

 **DO NOT:**
+
 - Stop early due to “good-looking” results
 - Change variants mid-test
 - Add new traffic sources
@@ -136,12 +170,12 @@ When interpreting results:

 ### Interpretation Outcomes

-| Result | Action |
-|------|-------|
-| Significant positive | Consider rollout |
-| Significant negative | Reject variant, document learning |
-| Inconclusive | Consider more traffic or bolder change |
-| Guardrail failure | Do not ship, even if primary wins |
+| Result               | Action                                 |
+| -------------------- | -------------------------------------- |
+| Significant positive | Consider rollout                       |
+| Significant negative | Reject variant, document learning      |
+| Inconclusive         | Consider more traffic or bolder change |
+| Guardrail failure    | Do not ship, even if primary wins      |

 ---

@@ -150,6 +184,7 @@ When interpreting results:
 ### Test Record (Mandatory)

 Document:
+
 - Hypothesis
 - Variants
 - Metrics
@@ -166,6 +201,7 @@ Store records in a shared, searchable location to avoid repeated failures.
 ## Refusal Conditions (Safety)

 Refuse to proceed if:
+
 - Baseline rate is unknown and cannot be estimated
 - Traffic is insufficient to detect the MDE
 - Primary metric is undefined
--- a/skills/design-orchestration/SKILL.md
+++ b/skills/design-orchestration/SKILL.md
@@ -0,0 +1,167 @@
+---
+name: design-orchestration
+description: >
+  Orchestrates design workflows by routing work through
+  brainstorming, multi-agent review, and execution readiness
+  in the correct order. Prevents premature implementation,
+  skipped validation, and unreviewed high-risk designs.
+---
+
+# Design Orchestration (Meta-Skill)
+
+## Purpose
+
+Ensure that **ideas become designs**, **designs are reviewed**, and
+**only validated designs reach implementation**.
+
+This skill does not generate designs.
+It **controls the flow between other skills**.
+
+---
+
+## Operating Model
+
+This is a **routing and enforcement skill**, not a creative one.
+
+It decides:
+- which skill must run next
+- whether escalation is required
+- whether execution is permitted
+
+---
+
+## Controlled Skills
+
+This meta-skill coordinates the following:
+
+- `brainstorming` — design generation
+- `multi-agent-brainstorming` — design validation
+- downstream implementation or planning skills
+
+---
+
+## Entry Conditions
+
+Invoke this skill when:
+- a user proposes a new feature, system, or change
+- a design decision carries meaningful risk
+- correctness matters more than speed
+
+---
+
+## Routing Logic
+
+### Step 1 — Brainstorming (Mandatory)
+
+If no validated design exists:
+
+- Invoke `brainstorming`
+- Require:
+  - Understanding Lock
+  - Initial Design
+  - Decision Log started
+
+You may NOT proceed without these artifacts.
+
+---
+
+### Step 2 — Risk Assessment
+
+After brainstorming completes, classify the design as:
+
+- **Low risk**
+- **Moderate risk**
+- **High risk**
+
+Use factors such as:
+- user impact
+- irreversibility
+- operational cost
+- complexity
+- uncertainty
+- novelty
+
+---
+
+### Step 3 — Conditional Escalation
+
+- **Low risk**  
+  → Proceed to implementation planning
+
+- **Moderate risk**  
+  → Recommend `multi-agent-brainstorming`
+
+- **High risk**  
+  → REQUIRE `multi-agent-brainstorming`
+
+Skipping escalation when required is prohibited.
+
+---
+
+### Step 4 — Multi-Agent Review (If Invoked)
+
+If `multi-agent-brainstorming` is run:
+
+Require:
+- completed Understanding Lock
+- current Design
+- Decision Log
+
+Do NOT allow:
+- new ideation
+- scope expansion
+- reopening problem definition
+
+Only critique, revision, and decision resolution are allowed.
+
+---
+
+### Step 5 — Execution Readiness Check
+
+Before allowing implementation:
+
+Confirm:
+- design is approved (single-agent or multi-agent)
+- Decision Log is complete
+- major assumptions are documented
+- known risks are acknowledged
+
+If any condition fails:
+- block execution
+- return to the appropriate skill
+
+---
+
+## Enforcement Rules
+
+- Do NOT allow implementation without a validated design
+- Do NOT allow skipping required review
+- Do NOT allow silent escalation or de-escalation
+- Do NOT merge design and implementation phases
+
+---
+
+## Exit Conditions
+
+This meta-skill exits ONLY when:
+- the next step is explicitly identified, AND
+- all required prior steps are complete
+
+Possible exits:
+- “Proceed to implementation planning”
+- “Run multi-agent-brainstorming”
+- “Return to brainstorming for clarification”
+- "If a reviewed design reports a final disposition of APPROVED, REVISE, or REJECT, you MUST route the workflow accordingly and state the chosen next step explicitly."
+---
+
+## Design Philosophy
+
+This skill exists to:
+- slow down the right decisions
+- speed up the right execution
+- prevent costly mistakes
+
+Good systems fail early.
+Bad systems fail in production.
+
+This meta-skill exists to enforce the former.
--- a/skills/multi-agent-brainstorming/SKILL.md
+++ b/skills/multi-agent-brainstorming/SKILL.md
@@ -0,0 +1,256 @@
+---
+name: multi-agent-brainstorming
+description: >
+  Use this skill when a design or idea requires higher confidence,
+  risk reduction, or formal review. This skill orchestrates a
+  structured, sequential multi-agent design review where each agent
+  has a strict, non-overlapping role. It prevents blind spots,
+  false confidence, and premature convergence.
+---
+
+# Multi-Agent Brainstorming (Structured Design Review)
+
+## Purpose
+
+Transform a single-agent design into a **robust, review-validated design**
+by simulating a formal peer-review process using multiple constrained agents.
+
+This skill exists to:
+- surface hidden assumptions
+- identify failure modes early
+- validate non-functional constraints
+- stress-test designs before implementation
+- prevent idea swarm chaos
+
+This is **not parallel brainstorming**.
+It is **sequential design review with enforced roles**.
+
+---
+
+## Operating Model
+
+- One agent designs.
+- Other agents review.
+- No agent may exceed its mandate.
+- Creativity is centralized; critique is distributed.
+- Decisions are explicit and logged.
+
+The process is **gated** and **terminates by design**.
+
+---
+
+## Agent Roles (Non-Negotiable)
+
+Each agent operates under a **hard scope limit**.
+
+### 1️⃣ Primary Designer (Lead Agent)
+
+**Role:**
+- Owns the design
+- Runs the standard `brainstorming` skill
+- Maintains the Decision Log
+
+**May:**
+- Ask clarification questions
+- Propose designs and alternatives
+- Revise designs based on feedback
+
+**May NOT:**
+- Self-approve the final design
+- Ignore reviewer objections
+- Invent requirements post-lock
+
+---
+
+### 2️⃣ Skeptic / Challenger Agent
+
+**Role:**
+- Assume the design will fail
+- Identify weaknesses and risks
+
+**May:**
+- Question assumptions
+- Identify edge cases
+- Highlight ambiguity or overconfidence
+- Flag YAGNI violations
+
+**May NOT:**
+- Propose new features
+- Redesign the system
+- Offer alternative architectures
+
+Prompting guidance:
+> “Assume this design fails in production. Why?”
+
+---
+
+### 3️⃣ Constraint Guardian Agent
+
+**Role:**
+- Enforce non-functional and real-world constraints
+
+Focus areas:
+- performance
+- scalability
+- reliability
+- security & privacy
+- maintainability
+- operational cost
+
+**May:**
+- Reject designs that violate constraints
+- Request clarification of limits
+
+**May NOT:**
+- Debate product goals
+- Suggest feature changes
+- Optimize beyond stated requirements
+
+---
+
+### 4️⃣ User Advocate Agent
+
+**Role:**
+- Represent the end user
+
+Focus areas:
+- cognitive load
+- usability
+- clarity of flows
+- error handling from user perspective
+- mismatch between intent and experience
+
+**May:**
+- Identify confusing or misleading aspects
+- Flag poor defaults or unclear behavior
+
+**May NOT:**
+- Redesign architecture
+- Add features
+- Override stated user goals
+
+---
+
+### 5️⃣ Integrator / Arbiter Agent
+
+**Role:**
+- Resolve conflicts
+- Finalize decisions
+- Enforce exit criteria
+
+**May:**
+- Accept or reject objections
+- Require design revisions
+- Declare the design complete
+
+**May NOT:**
+- Invent new ideas
+- Add requirements
+- Reopen locked decisions without cause
+
+---
+
+## The Process
+
+### Phase 1 — Single-Agent Design
+
+1. Primary Designer runs the **standard `brainstorming` skill**
+2. Understanding Lock is completed and confirmed
+3. Initial design is produced
+4. Decision Log is started
+
+No other agents participate yet.
+
+---
+
+### Phase 2 — Structured Review Loop
+
+Agents are invoked **one at a time**, in the following order:
+
+1. Skeptic / Challenger
+2. Constraint Guardian
+3. User Advocate
+
+For each reviewer:
+- Feedback must be explicit and scoped
+- Objections must reference assumptions or decisions
+- No new features may be introduced
+
+Primary Designer must:
+- Respond to each objection
+- Revise the design if required
+- Update the Decision Log
+
+---
+
+### Phase 3 — Integration & Arbitration
+
+The Integrator / Arbiter reviews:
+- the final design
+- the Decision Log
+- unresolved objections
+
+The Arbiter must explicitly decide:
+- which objections are accepted
+- which are rejected (with rationale)
+
+---
+
+## Decision Log (Mandatory Artifact)
+
+The Decision Log must record:
+
+- Decision made
+- Alternatives considered
+- Objections raised
+- Resolution and rationale
+
+No design is considered valid without a completed log.
+
+---
+
+## Exit Criteria (Hard Stop)
+
+You may exit multi-agent brainstorming **only when all are true**:
+
+- Understanding Lock was completed
+- All reviewer agents have been invoked
+- All objections are resolved or explicitly rejected
+- Decision Log is complete
+- Arbiter has declared the design acceptable
+- 
+If any criterion is unmet:
+- Continue review
+- Do NOT proceed to implementation
+If this skill was invoked by a routing or orchestration layer, you MUST report the final disposition explicitly as one of: APPROVED, REVISE, or REJECT, with a brief rationale.
+---
+
+## Failure Modes This Skill Prevents
+
+- Idea swarm chaos
+- Hallucinated consensus
+- Overconfident single-agent designs
+- Hidden assumptions
+- Premature implementation
+- Endless debate
+
+---
+
+## Key Principles
+
+- One designer, many reviewers
+- Creativity is centralized
+- Critique is constrained
+- Decisions are explicit
+- Process must terminate
+
+---
+
+## Final Reminder
+
+This skill exists to answer one question with confidence:
+
+> “If this design fails, did we do everything reasonable to catch it early?”
+
+If the answer is unclear, **do not exit this skill**.
+