This commit is contained in:
@@ -1,6 +1,32 @@
|
||||
---
|
||||
name: ab-test-setup
|
||||
description: Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.
|
||||
---
|
||||
|
||||
# A/B Test Setup
|
||||
|
||||
## 1️⃣ Purpose & Scope
|
||||
|
||||
Ensure every A/B test is **valid, rigorous, and safe** before a single line of code is written.
|
||||
|
||||
- Prevents "peeking"
|
||||
- Enforces statistical power
|
||||
- Blocks invalid hypotheses
|
||||
|
||||
---
|
||||
|
||||
## 2️⃣ Pre-Requisites
|
||||
|
||||
You must have:
|
||||
|
||||
- A clear user problem
|
||||
- Access to an analytics source
|
||||
- Roughly estimated traffic volume
|
||||
|
||||
#### Hypothesis Quality Checklist
|
||||
|
||||
A valid hypothesis includes:
|
||||
|
||||
- Observation or evidence
|
||||
- Single, specific change
|
||||
- Directional expectation
|
||||
@@ -39,6 +65,7 @@ Explicitly list assumptions about:
|
||||
- External factors (seasonality, campaigns, releases)
|
||||
|
||||
If assumptions are weak or violated:
|
||||
|
||||
- Warn the user
|
||||
- Recommend delaying or redesigning the test
|
||||
|
||||
@@ -60,16 +87,19 @@ Default to **A/B** unless there is a clear reason otherwise.
|
||||
### 6️⃣ Metrics Definition
|
||||
|
||||
#### Primary Metric (Mandatory)
|
||||
|
||||
- Single metric used to evaluate success
|
||||
- Directly tied to the hypothesis
|
||||
- Pre-defined and frozen before launch
|
||||
|
||||
#### Secondary Metrics
|
||||
|
||||
- Provide context
|
||||
- Explain *why* results occurred
|
||||
- Explain _why_ results occurred
|
||||
- Must not override the primary metric
|
||||
|
||||
#### Guardrail Metrics
|
||||
|
||||
- Metrics that must not degrade
|
||||
- Used to prevent harmful wins
|
||||
- Trigger test stop if significantly negative
|
||||
@@ -79,12 +109,14 @@ Default to **A/B** unless there is a clear reason otherwise.
|
||||
### 7️⃣ Sample Size & Duration
|
||||
|
||||
Define upfront:
|
||||
|
||||
- Baseline rate
|
||||
- MDE
|
||||
- Significance level (typically 95%)
|
||||
- Statistical power (typically 80%)
|
||||
|
||||
Estimate:
|
||||
|
||||
- Required sample size per variant
|
||||
- Expected test duration
|
||||
|
||||
@@ -112,10 +144,12 @@ If any item is missing, stop and resolve it.
|
||||
### During the Test
|
||||
|
||||
**DO:**
|
||||
|
||||
- Monitor technical health
|
||||
- Document external factors
|
||||
|
||||
**DO NOT:**
|
||||
|
||||
- Stop early due to “good-looking” results
|
||||
- Change variants mid-test
|
||||
- Add new traffic sources
|
||||
@@ -136,12 +170,12 @@ When interpreting results:
|
||||
|
||||
### Interpretation Outcomes
|
||||
|
||||
| Result | Action |
|
||||
|------|-------|
|
||||
| Significant positive | Consider rollout |
|
||||
| Significant negative | Reject variant, document learning |
|
||||
| Inconclusive | Consider more traffic or bolder change |
|
||||
| Guardrail failure | Do not ship, even if primary wins |
|
||||
| Result | Action |
|
||||
| -------------------- | -------------------------------------- |
|
||||
| Significant positive | Consider rollout |
|
||||
| Significant negative | Reject variant, document learning |
|
||||
| Inconclusive | Consider more traffic or bolder change |
|
||||
| Guardrail failure | Do not ship, even if primary wins |
|
||||
|
||||
---
|
||||
|
||||
@@ -150,6 +184,7 @@ When interpreting results:
|
||||
### Test Record (Mandatory)
|
||||
|
||||
Document:
|
||||
|
||||
- Hypothesis
|
||||
- Variants
|
||||
- Metrics
|
||||
@@ -166,6 +201,7 @@ Store records in a shared, searchable location to avoid repeated failures.
|
||||
## Refusal Conditions (Safety)
|
||||
|
||||
Refuse to proceed if:
|
||||
|
||||
- Baseline rate is unknown and cannot be estimated
|
||||
- Traffic is insufficient to detect the MDE
|
||||
- Primary metric is undefined
|
||||
|
||||
167
skills/design-orchestration/SKILL.md
Normal file
167
skills/design-orchestration/SKILL.md
Normal file
@@ -0,0 +1,167 @@
|
||||
---
|
||||
name: design-orchestration
|
||||
description: >
|
||||
Orchestrates design workflows by routing work through
|
||||
brainstorming, multi-agent review, and execution readiness
|
||||
in the correct order. Prevents premature implementation,
|
||||
skipped validation, and unreviewed high-risk designs.
|
||||
---
|
||||
|
||||
# Design Orchestration (Meta-Skill)
|
||||
|
||||
## Purpose
|
||||
|
||||
Ensure that **ideas become designs**, **designs are reviewed**, and
|
||||
**only validated designs reach implementation**.
|
||||
|
||||
This skill does not generate designs.
|
||||
It **controls the flow between other skills**.
|
||||
|
||||
---
|
||||
|
||||
## Operating Model
|
||||
|
||||
This is a **routing and enforcement skill**, not a creative one.
|
||||
|
||||
It decides:
|
||||
- which skill must run next
|
||||
- whether escalation is required
|
||||
- whether execution is permitted
|
||||
|
||||
---
|
||||
|
||||
## Controlled Skills
|
||||
|
||||
This meta-skill coordinates the following:
|
||||
|
||||
- `brainstorming` — design generation
|
||||
- `multi-agent-brainstorming` — design validation
|
||||
- downstream implementation or planning skills
|
||||
|
||||
---
|
||||
|
||||
## Entry Conditions
|
||||
|
||||
Invoke this skill when:
|
||||
- a user proposes a new feature, system, or change
|
||||
- a design decision carries meaningful risk
|
||||
- correctness matters more than speed
|
||||
|
||||
---
|
||||
|
||||
## Routing Logic
|
||||
|
||||
### Step 1 — Brainstorming (Mandatory)
|
||||
|
||||
If no validated design exists:
|
||||
|
||||
- Invoke `brainstorming`
|
||||
- Require:
|
||||
- Understanding Lock
|
||||
- Initial Design
|
||||
- Decision Log started
|
||||
|
||||
You may NOT proceed without these artifacts.
|
||||
|
||||
---
|
||||
|
||||
### Step 2 — Risk Assessment
|
||||
|
||||
After brainstorming completes, classify the design as:
|
||||
|
||||
- **Low risk**
|
||||
- **Moderate risk**
|
||||
- **High risk**
|
||||
|
||||
Use factors such as:
|
||||
- user impact
|
||||
- irreversibility
|
||||
- operational cost
|
||||
- complexity
|
||||
- uncertainty
|
||||
- novelty
|
||||
|
||||
---
|
||||
|
||||
### Step 3 — Conditional Escalation
|
||||
|
||||
- **Low risk**
|
||||
→ Proceed to implementation planning
|
||||
|
||||
- **Moderate risk**
|
||||
→ Recommend `multi-agent-brainstorming`
|
||||
|
||||
- **High risk**
|
||||
→ REQUIRE `multi-agent-brainstorming`
|
||||
|
||||
Skipping escalation when required is prohibited.
|
||||
|
||||
---
|
||||
|
||||
### Step 4 — Multi-Agent Review (If Invoked)
|
||||
|
||||
If `multi-agent-brainstorming` is run:
|
||||
|
||||
Require:
|
||||
- completed Understanding Lock
|
||||
- current Design
|
||||
- Decision Log
|
||||
|
||||
Do NOT allow:
|
||||
- new ideation
|
||||
- scope expansion
|
||||
- reopening problem definition
|
||||
|
||||
Only critique, revision, and decision resolution are allowed.
|
||||
|
||||
---
|
||||
|
||||
### Step 5 — Execution Readiness Check
|
||||
|
||||
Before allowing implementation:
|
||||
|
||||
Confirm:
|
||||
- design is approved (single-agent or multi-agent)
|
||||
- Decision Log is complete
|
||||
- major assumptions are documented
|
||||
- known risks are acknowledged
|
||||
|
||||
If any condition fails:
|
||||
- block execution
|
||||
- return to the appropriate skill
|
||||
|
||||
---
|
||||
|
||||
## Enforcement Rules
|
||||
|
||||
- Do NOT allow implementation without a validated design
|
||||
- Do NOT allow skipping required review
|
||||
- Do NOT allow silent escalation or de-escalation
|
||||
- Do NOT merge design and implementation phases
|
||||
|
||||
---
|
||||
|
||||
## Exit Conditions
|
||||
|
||||
This meta-skill exits ONLY when:
|
||||
- the next step is explicitly identified, AND
|
||||
- all required prior steps are complete
|
||||
|
||||
Possible exits:
|
||||
- “Proceed to implementation planning”
|
||||
- “Run multi-agent-brainstorming”
|
||||
- “Return to brainstorming for clarification”
|
||||
- "If a reviewed design reports a final disposition of APPROVED, REVISE, or REJECT, you MUST route the workflow accordingly and state the chosen next step explicitly."
|
||||
---
|
||||
|
||||
## Design Philosophy
|
||||
|
||||
This skill exists to:
|
||||
- slow down the right decisions
|
||||
- speed up the right execution
|
||||
- prevent costly mistakes
|
||||
|
||||
Good systems fail early.
|
||||
Bad systems fail in production.
|
||||
|
||||
This meta-skill exists to enforce the former.
|
||||
256
skills/multi-agent-brainstorming/SKILL.md
Normal file
256
skills/multi-agent-brainstorming/SKILL.md
Normal file
@@ -0,0 +1,256 @@
|
||||
---
|
||||
name: multi-agent-brainstorming
|
||||
description: >
|
||||
Use this skill when a design or idea requires higher confidence,
|
||||
risk reduction, or formal review. This skill orchestrates a
|
||||
structured, sequential multi-agent design review where each agent
|
||||
has a strict, non-overlapping role. It prevents blind spots,
|
||||
false confidence, and premature convergence.
|
||||
---
|
||||
|
||||
# Multi-Agent Brainstorming (Structured Design Review)
|
||||
|
||||
## Purpose
|
||||
|
||||
Transform a single-agent design into a **robust, review-validated design**
|
||||
by simulating a formal peer-review process using multiple constrained agents.
|
||||
|
||||
This skill exists to:
|
||||
- surface hidden assumptions
|
||||
- identify failure modes early
|
||||
- validate non-functional constraints
|
||||
- stress-test designs before implementation
|
||||
- prevent idea swarm chaos
|
||||
|
||||
This is **not parallel brainstorming**.
|
||||
It is **sequential design review with enforced roles**.
|
||||
|
||||
---
|
||||
|
||||
## Operating Model
|
||||
|
||||
- One agent designs.
|
||||
- Other agents review.
|
||||
- No agent may exceed its mandate.
|
||||
- Creativity is centralized; critique is distributed.
|
||||
- Decisions are explicit and logged.
|
||||
|
||||
The process is **gated** and **terminates by design**.
|
||||
|
||||
---
|
||||
|
||||
## Agent Roles (Non-Negotiable)
|
||||
|
||||
Each agent operates under a **hard scope limit**.
|
||||
|
||||
### 1️⃣ Primary Designer (Lead Agent)
|
||||
|
||||
**Role:**
|
||||
- Owns the design
|
||||
- Runs the standard `brainstorming` skill
|
||||
- Maintains the Decision Log
|
||||
|
||||
**May:**
|
||||
- Ask clarification questions
|
||||
- Propose designs and alternatives
|
||||
- Revise designs based on feedback
|
||||
|
||||
**May NOT:**
|
||||
- Self-approve the final design
|
||||
- Ignore reviewer objections
|
||||
- Invent requirements post-lock
|
||||
|
||||
---
|
||||
|
||||
### 2️⃣ Skeptic / Challenger Agent
|
||||
|
||||
**Role:**
|
||||
- Assume the design will fail
|
||||
- Identify weaknesses and risks
|
||||
|
||||
**May:**
|
||||
- Question assumptions
|
||||
- Identify edge cases
|
||||
- Highlight ambiguity or overconfidence
|
||||
- Flag YAGNI violations
|
||||
|
||||
**May NOT:**
|
||||
- Propose new features
|
||||
- Redesign the system
|
||||
- Offer alternative architectures
|
||||
|
||||
Prompting guidance:
|
||||
> “Assume this design fails in production. Why?”
|
||||
|
||||
---
|
||||
|
||||
### 3️⃣ Constraint Guardian Agent
|
||||
|
||||
**Role:**
|
||||
- Enforce non-functional and real-world constraints
|
||||
|
||||
Focus areas:
|
||||
- performance
|
||||
- scalability
|
||||
- reliability
|
||||
- security & privacy
|
||||
- maintainability
|
||||
- operational cost
|
||||
|
||||
**May:**
|
||||
- Reject designs that violate constraints
|
||||
- Request clarification of limits
|
||||
|
||||
**May NOT:**
|
||||
- Debate product goals
|
||||
- Suggest feature changes
|
||||
- Optimize beyond stated requirements
|
||||
|
||||
---
|
||||
|
||||
### 4️⃣ User Advocate Agent
|
||||
|
||||
**Role:**
|
||||
- Represent the end user
|
||||
|
||||
Focus areas:
|
||||
- cognitive load
|
||||
- usability
|
||||
- clarity of flows
|
||||
- error handling from user perspective
|
||||
- mismatch between intent and experience
|
||||
|
||||
**May:**
|
||||
- Identify confusing or misleading aspects
|
||||
- Flag poor defaults or unclear behavior
|
||||
|
||||
**May NOT:**
|
||||
- Redesign architecture
|
||||
- Add features
|
||||
- Override stated user goals
|
||||
|
||||
---
|
||||
|
||||
### 5️⃣ Integrator / Arbiter Agent
|
||||
|
||||
**Role:**
|
||||
- Resolve conflicts
|
||||
- Finalize decisions
|
||||
- Enforce exit criteria
|
||||
|
||||
**May:**
|
||||
- Accept or reject objections
|
||||
- Require design revisions
|
||||
- Declare the design complete
|
||||
|
||||
**May NOT:**
|
||||
- Invent new ideas
|
||||
- Add requirements
|
||||
- Reopen locked decisions without cause
|
||||
|
||||
---
|
||||
|
||||
## The Process
|
||||
|
||||
### Phase 1 — Single-Agent Design
|
||||
|
||||
1. Primary Designer runs the **standard `brainstorming` skill**
|
||||
2. Understanding Lock is completed and confirmed
|
||||
3. Initial design is produced
|
||||
4. Decision Log is started
|
||||
|
||||
No other agents participate yet.
|
||||
|
||||
---
|
||||
|
||||
### Phase 2 — Structured Review Loop
|
||||
|
||||
Agents are invoked **one at a time**, in the following order:
|
||||
|
||||
1. Skeptic / Challenger
|
||||
2. Constraint Guardian
|
||||
3. User Advocate
|
||||
|
||||
For each reviewer:
|
||||
- Feedback must be explicit and scoped
|
||||
- Objections must reference assumptions or decisions
|
||||
- No new features may be introduced
|
||||
|
||||
Primary Designer must:
|
||||
- Respond to each objection
|
||||
- Revise the design if required
|
||||
- Update the Decision Log
|
||||
|
||||
---
|
||||
|
||||
### Phase 3 — Integration & Arbitration
|
||||
|
||||
The Integrator / Arbiter reviews:
|
||||
- the final design
|
||||
- the Decision Log
|
||||
- unresolved objections
|
||||
|
||||
The Arbiter must explicitly decide:
|
||||
- which objections are accepted
|
||||
- which are rejected (with rationale)
|
||||
|
||||
---
|
||||
|
||||
## Decision Log (Mandatory Artifact)
|
||||
|
||||
The Decision Log must record:
|
||||
|
||||
- Decision made
|
||||
- Alternatives considered
|
||||
- Objections raised
|
||||
- Resolution and rationale
|
||||
|
||||
No design is considered valid without a completed log.
|
||||
|
||||
---
|
||||
|
||||
## Exit Criteria (Hard Stop)
|
||||
|
||||
You may exit multi-agent brainstorming **only when all are true**:
|
||||
|
||||
- Understanding Lock was completed
|
||||
- All reviewer agents have been invoked
|
||||
- All objections are resolved or explicitly rejected
|
||||
- Decision Log is complete
|
||||
- Arbiter has declared the design acceptable
|
||||
-
|
||||
If any criterion is unmet:
|
||||
- Continue review
|
||||
- Do NOT proceed to implementation
|
||||
If this skill was invoked by a routing or orchestration layer, you MUST report the final disposition explicitly as one of: APPROVED, REVISE, or REJECT, with a brief rationale.
|
||||
---
|
||||
|
||||
## Failure Modes This Skill Prevents
|
||||
|
||||
- Idea swarm chaos
|
||||
- Hallucinated consensus
|
||||
- Overconfident single-agent designs
|
||||
- Hidden assumptions
|
||||
- Premature implementation
|
||||
- Endless debate
|
||||
|
||||
---
|
||||
|
||||
## Key Principles
|
||||
|
||||
- One designer, many reviewers
|
||||
- Creativity is centralized
|
||||
- Critique is constrained
|
||||
- Decisions are explicit
|
||||
- Process must terminate
|
||||
|
||||
---
|
||||
|
||||
## Final Reminder
|
||||
|
||||
This skill exists to answer one question with confidence:
|
||||
|
||||
> “If this design fails, did we do everything reasonable to catch it early?”
|
||||
|
||||
If the answer is unclear, **do not exit this skill**.
|
||||
|
||||
Reference in New Issue
Block a user