feat: integrate PR #28 and #29 (multi-agent brainstorming, design orchestration)

This commit is contained in:
sck_0
2026-01-25 17:53:35 +01:00
parent af57b96721
commit ae3d038711
6 changed files with 317 additions and 267 deletions

View File

@@ -1,6 +1,32 @@
---
name: ab-test-setup
description: Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.
---
# A/B Test Setup
## 1⃣ Purpose & Scope
Ensure every A/B test is **valid, rigorous, and safe** before a single line of code is written.
- Prevents "peeking"
- Enforces statistical power
- Blocks invalid hypotheses
---
## 2⃣ Pre-Requisites
You must have:
- A clear user problem
- Access to an analytics source
- Roughly estimated traffic volume
#### Hypothesis Quality Checklist
A valid hypothesis includes:
- Observation or evidence
- Single, specific change
- Directional expectation
@@ -39,6 +65,7 @@ Explicitly list assumptions about:
- External factors (seasonality, campaigns, releases)
If assumptions are weak or violated:
- Warn the user
- Recommend delaying or redesigning the test
@@ -60,16 +87,19 @@ Default to **A/B** unless there is a clear reason otherwise.
### 6⃣ Metrics Definition
#### Primary Metric (Mandatory)
- Single metric used to evaluate success
- Directly tied to the hypothesis
- Pre-defined and frozen before launch
#### Secondary Metrics
- Provide context
- Explain *why* results occurred
- Explain _why_ results occurred
- Must not override the primary metric
#### Guardrail Metrics
- Metrics that must not degrade
- Used to prevent harmful wins
- Trigger test stop if significantly negative
@@ -79,12 +109,14 @@ Default to **A/B** unless there is a clear reason otherwise.
### 7⃣ Sample Size & Duration
Define upfront:
- Baseline rate
- MDE
- Significance level (typically 95%)
- Statistical power (typically 80%)
Estimate:
- Required sample size per variant
- Expected test duration
@@ -112,10 +144,12 @@ If any item is missing, stop and resolve it.
### During the Test
**DO:**
- Monitor technical health
- Document external factors
**DO NOT:**
- Stop early due to “good-looking” results
- Change variants mid-test
- Add new traffic sources
@@ -136,12 +170,12 @@ When interpreting results:
### Interpretation Outcomes
| Result | Action |
|------|-------|
| Significant positive | Consider rollout |
| Significant negative | Reject variant, document learning |
| Inconclusive | Consider more traffic or bolder change |
| Guardrail failure | Do not ship, even if primary wins |
| Result | Action |
| -------------------- | -------------------------------------- |
| Significant positive | Consider rollout |
| Significant negative | Reject variant, document learning |
| Inconclusive | Consider more traffic or bolder change |
| Guardrail failure | Do not ship, even if primary wins |
---
@@ -150,6 +184,7 @@ When interpreting results:
### Test Record (Mandatory)
Document:
- Hypothesis
- Variants
- Metrics
@@ -166,6 +201,7 @@ Store records in a shared, searchable location to avoid repeated failures.
## Refusal Conditions (Safety)
Refuse to proceed if:
- Baseline rate is unknown and cannot be estimated
- Traffic is insufficient to detect the MDE
- Primary metric is undefined

View File

@@ -0,0 +1,167 @@
---
name: design-orchestration
description: >
Orchestrates design workflows by routing work through
brainstorming, multi-agent review, and execution readiness
in the correct order. Prevents premature implementation,
skipped validation, and unreviewed high-risk designs.
---
# Design Orchestration (Meta-Skill)
## Purpose
Ensure that **ideas become designs**, **designs are reviewed**, and
**only validated designs reach implementation**.
This skill does not generate designs.
It **controls the flow between other skills**.
---
## Operating Model
This is a **routing and enforcement skill**, not a creative one.
It decides:
- which skill must run next
- whether escalation is required
- whether execution is permitted
---
## Controlled Skills
This meta-skill coordinates the following:
- `brainstorming` — design generation
- `multi-agent-brainstorming` — design validation
- downstream implementation or planning skills
---
## Entry Conditions
Invoke this skill when:
- a user proposes a new feature, system, or change
- a design decision carries meaningful risk
- correctness matters more than speed
---
## Routing Logic
### Step 1 — Brainstorming (Mandatory)
If no validated design exists:
- Invoke `brainstorming`
- Require:
- Understanding Lock
- Initial Design
- Decision Log started
You may NOT proceed without these artifacts.
---
### Step 2 — Risk Assessment
After brainstorming completes, classify the design as:
- **Low risk**
- **Moderate risk**
- **High risk**
Use factors such as:
- user impact
- irreversibility
- operational cost
- complexity
- uncertainty
- novelty
---
### Step 3 — Conditional Escalation
- **Low risk**
→ Proceed to implementation planning
- **Moderate risk**
→ Recommend `multi-agent-brainstorming`
- **High risk**
→ REQUIRE `multi-agent-brainstorming`
Skipping escalation when required is prohibited.
---
### Step 4 — Multi-Agent Review (If Invoked)
If `multi-agent-brainstorming` is run:
Require:
- completed Understanding Lock
- current Design
- Decision Log
Do NOT allow:
- new ideation
- scope expansion
- reopening problem definition
Only critique, revision, and decision resolution are allowed.
---
### Step 5 — Execution Readiness Check
Before allowing implementation:
Confirm:
- design is approved (single-agent or multi-agent)
- Decision Log is complete
- major assumptions are documented
- known risks are acknowledged
If any condition fails:
- block execution
- return to the appropriate skill
---
## Enforcement Rules
- Do NOT allow implementation without a validated design
- Do NOT allow skipping required review
- Do NOT allow silent escalation or de-escalation
- Do NOT merge design and implementation phases
---
## Exit Conditions
This meta-skill exits ONLY when:
- the next step is explicitly identified, AND
- all required prior steps are complete
Possible exits:
- “Proceed to implementation planning”
- “Run multi-agent-brainstorming”
- “Return to brainstorming for clarification”
- "If a reviewed design reports a final disposition of APPROVED, REVISE, or REJECT, you MUST route the workflow accordingly and state the chosen next step explicitly."
---
## Design Philosophy
This skill exists to:
- slow down the right decisions
- speed up the right execution
- prevent costly mistakes
Good systems fail early.
Bad systems fail in production.
This meta-skill exists to enforce the former.

View File

@@ -0,0 +1,256 @@
---
name: multi-agent-brainstorming
description: >
Use this skill when a design or idea requires higher confidence,
risk reduction, or formal review. This skill orchestrates a
structured, sequential multi-agent design review where each agent
has a strict, non-overlapping role. It prevents blind spots,
false confidence, and premature convergence.
---
# Multi-Agent Brainstorming (Structured Design Review)
## Purpose
Transform a single-agent design into a **robust, review-validated design**
by simulating a formal peer-review process using multiple constrained agents.
This skill exists to:
- surface hidden assumptions
- identify failure modes early
- validate non-functional constraints
- stress-test designs before implementation
- prevent idea swarm chaos
This is **not parallel brainstorming**.
It is **sequential design review with enforced roles**.
---
## Operating Model
- One agent designs.
- Other agents review.
- No agent may exceed its mandate.
- Creativity is centralized; critique is distributed.
- Decisions are explicit and logged.
The process is **gated** and **terminates by design**.
---
## Agent Roles (Non-Negotiable)
Each agent operates under a **hard scope limit**.
### 1⃣ Primary Designer (Lead Agent)
**Role:**
- Owns the design
- Runs the standard `brainstorming` skill
- Maintains the Decision Log
**May:**
- Ask clarification questions
- Propose designs and alternatives
- Revise designs based on feedback
**May NOT:**
- Self-approve the final design
- Ignore reviewer objections
- Invent requirements post-lock
---
### 2⃣ Skeptic / Challenger Agent
**Role:**
- Assume the design will fail
- Identify weaknesses and risks
**May:**
- Question assumptions
- Identify edge cases
- Highlight ambiguity or overconfidence
- Flag YAGNI violations
**May NOT:**
- Propose new features
- Redesign the system
- Offer alternative architectures
Prompting guidance:
> “Assume this design fails in production. Why?”
---
### 3⃣ Constraint Guardian Agent
**Role:**
- Enforce non-functional and real-world constraints
Focus areas:
- performance
- scalability
- reliability
- security & privacy
- maintainability
- operational cost
**May:**
- Reject designs that violate constraints
- Request clarification of limits
**May NOT:**
- Debate product goals
- Suggest feature changes
- Optimize beyond stated requirements
---
### 4⃣ User Advocate Agent
**Role:**
- Represent the end user
Focus areas:
- cognitive load
- usability
- clarity of flows
- error handling from user perspective
- mismatch between intent and experience
**May:**
- Identify confusing or misleading aspects
- Flag poor defaults or unclear behavior
**May NOT:**
- Redesign architecture
- Add features
- Override stated user goals
---
### 5⃣ Integrator / Arbiter Agent
**Role:**
- Resolve conflicts
- Finalize decisions
- Enforce exit criteria
**May:**
- Accept or reject objections
- Require design revisions
- Declare the design complete
**May NOT:**
- Invent new ideas
- Add requirements
- Reopen locked decisions without cause
---
## The Process
### Phase 1 — Single-Agent Design
1. Primary Designer runs the **standard `brainstorming` skill**
2. Understanding Lock is completed and confirmed
3. Initial design is produced
4. Decision Log is started
No other agents participate yet.
---
### Phase 2 — Structured Review Loop
Agents are invoked **one at a time**, in the following order:
1. Skeptic / Challenger
2. Constraint Guardian
3. User Advocate
For each reviewer:
- Feedback must be explicit and scoped
- Objections must reference assumptions or decisions
- No new features may be introduced
Primary Designer must:
- Respond to each objection
- Revise the design if required
- Update the Decision Log
---
### Phase 3 — Integration & Arbitration
The Integrator / Arbiter reviews:
- the final design
- the Decision Log
- unresolved objections
The Arbiter must explicitly decide:
- which objections are accepted
- which are rejected (with rationale)
---
## Decision Log (Mandatory Artifact)
The Decision Log must record:
- Decision made
- Alternatives considered
- Objections raised
- Resolution and rationale
No design is considered valid without a completed log.
---
## Exit Criteria (Hard Stop)
You may exit multi-agent brainstorming **only when all are true**:
- Understanding Lock was completed
- All reviewer agents have been invoked
- All objections are resolved or explicitly rejected
- Decision Log is complete
- Arbiter has declared the design acceptable
-
If any criterion is unmet:
- Continue review
- Do NOT proceed to implementation
If this skill was invoked by a routing or orchestration layer, you MUST report the final disposition explicitly as one of: APPROVED, REVISE, or REJECT, with a brief rationale.
---
## Failure Modes This Skill Prevents
- Idea swarm chaos
- Hallucinated consensus
- Overconfident single-agent designs
- Hidden assumptions
- Premature implementation
- Endless debate
---
## Key Principles
- One designer, many reviewers
- Creativity is centralized
- Critique is constrained
- Decisions are explicit
- Process must terminate
---
## Final Reminder
This skill exists to answer one question with confidence:
> “If this design fails, did we do everything reasonable to catch it early?”
If the answer is unclear, **do not exit this skill**.