Fix: Ensure all skills are tracked as files, not submodules
This commit is contained in:
534
skills/loki-mode/references/lab-research-patterns.md
Normal file
534
skills/loki-mode/references/lab-research-patterns.md
Normal file
@@ -0,0 +1,534 @@
|
||||
# Lab Research Patterns Reference
|
||||
|
||||
Research-backed patterns from Google DeepMind and Anthropic for enhanced multi-agent orchestration and safety.
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This reference consolidates key patterns from:
|
||||
1. **Google DeepMind** - World models, self-improvement, scalable oversight
|
||||
2. **Anthropic** - Constitutional AI, alignment safety, agentic coding
|
||||
|
||||
---
|
||||
|
||||
## Google DeepMind Patterns
|
||||
|
||||
### World Model Training (Dreamer 4)
|
||||
|
||||
**Key Insight:** Train agents inside world models for safety and data efficiency.
|
||||
|
||||
```yaml
|
||||
world_model_training:
|
||||
principle: "Learn behaviors through simulation, not real environment"
|
||||
benefits:
|
||||
- 100x less data than real-world training
|
||||
- Safe exploration of dangerous actions
|
||||
- Faster iteration cycles
|
||||
|
||||
architecture:
|
||||
tokenizer: "Compress frames into continuous representation"
|
||||
dynamics_model: "Predict next world state given action"
|
||||
imagination_training: "RL inside simulated trajectories"
|
||||
|
||||
loki_application:
|
||||
- Run agent tasks in isolated containers first
|
||||
- Simulate deployment before actual deploy
|
||||
- Test error scenarios in sandbox
|
||||
```
|
||||
|
||||
### Self-Improvement Loop (SIMA 2)
|
||||
|
||||
**Key Insight:** Use AI to generate tasks and score outcomes for bootstrapped learning.
|
||||
|
||||
```python
|
||||
class SelfImprovementLoop:
|
||||
"""
|
||||
Based on SIMA 2's self-improvement mechanism.
|
||||
Gemini-based teacher + learned reward model.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
self.task_generator = "Use LLM to generate varied tasks"
|
||||
self.reward_model = "Learned model to score trajectories"
|
||||
self.experience_bank = []
|
||||
|
||||
def bootstrap_cycle(self):
|
||||
# 1. Generate tasks with estimated rewards
|
||||
tasks = self.task_generator.generate(
|
||||
domain=current_project,
|
||||
difficulty_curriculum=True
|
||||
)
|
||||
|
||||
# 2. Execute tasks, accumulate experience
|
||||
for task in tasks:
|
||||
trajectory = execute(task)
|
||||
reward = self.reward_model.score(trajectory)
|
||||
self.experience_bank.append((trajectory, reward))
|
||||
|
||||
# 3. Train next generation on experience
|
||||
next_agent = train_on_experience(self.experience_bank)
|
||||
|
||||
# 4. Iterate with minimal human intervention
|
||||
return next_agent
|
||||
```
|
||||
|
||||
**Loki Mode Application:**
|
||||
- Generate test scenarios automatically
|
||||
- Score code quality with learned criteria
|
||||
- Bootstrap agent training across projects
|
||||
|
||||
### Hierarchical Reasoning (Gemini Robotics)
|
||||
|
||||
**Key Insight:** Separate high-level planning from low-level execution.
|
||||
|
||||
```
|
||||
+------------------------------------------------------------------+
|
||||
| EMBODIED REASONING MODEL (Gemini Robotics-ER) |
|
||||
| - Orchestrates activities like a "high-level brain" |
|
||||
| - Spatial understanding, planning, logical decisions |
|
||||
| - Natively calls tools (search, user functions) |
|
||||
| - Does NOT directly control actions |
|
||||
+------------------------------------------------------------------+
|
||||
|
|
||||
| High-level insights
|
||||
v
|
||||
+------------------------------------------------------------------+
|
||||
| VISION-LANGUAGE-ACTION MODEL (Gemini Robotics) |
|
||||
| - "Thinks before taking action" |
|
||||
| - Generates internal reasoning in natural language |
|
||||
| - Decomposes long tasks into simpler segments |
|
||||
| - Directly outputs actions/commands |
|
||||
+------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
**Loki Mode Application:**
|
||||
- Orchestrator = ER model (planning, tool calls)
|
||||
- Implementation agents = VLA model (code actions)
|
||||
- Task decomposition before execution
|
||||
|
||||
### Cross-Embodiment Transfer
|
||||
|
||||
**Key Insight:** Skills learned by one agent type transfer to others.
|
||||
|
||||
```yaml
|
||||
transfer_learning:
|
||||
observation: "Tasks learned on ALOHA2 work on Apollo humanoid"
|
||||
mechanism: "Shared action space abstraction"
|
||||
|
||||
loki_application:
|
||||
- Patterns learned by frontend agent transfer to mobile agent
|
||||
- Testing strategies from QA apply to security testing
|
||||
- Deployment scripts generalize across cloud providers
|
||||
|
||||
implementation:
|
||||
shared_skills_library: ".loki/memory/skills/"
|
||||
abstraction_layer: "Domain-agnostic action primitives"
|
||||
transfer_score: "Confidence in skill applicability"
|
||||
```
|
||||
|
||||
### Scalable Oversight via Debate
|
||||
|
||||
**Key Insight:** Pit AI capabilities against each other for verification.
|
||||
|
||||
```python
|
||||
async def debate_verification(proposal, max_rounds=2):
|
||||
"""
|
||||
Based on DeepMind's Scalable AI Safety via Doubly-Efficient Debate.
|
||||
Use debate to break down verification into manageable sub-tasks.
|
||||
"""
|
||||
# Two equally capable AI critics
|
||||
proponent = Agent(role="defender", model="opus")
|
||||
opponent = Agent(role="challenger", model="opus")
|
||||
|
||||
debate_log = []
|
||||
|
||||
for round in range(max_rounds):
|
||||
# Proponent defends proposal
|
||||
defense = await proponent.argue(
|
||||
proposal=proposal,
|
||||
counter_arguments=debate_log
|
||||
)
|
||||
|
||||
# Opponent challenges
|
||||
challenge = await opponent.argue(
|
||||
proposal=proposal,
|
||||
defense=defense,
|
||||
goal="find_flaws"
|
||||
)
|
||||
|
||||
debate_log.append({
|
||||
"round": round,
|
||||
"defense": defense,
|
||||
"challenge": challenge
|
||||
})
|
||||
|
||||
# If opponent cannot find valid flaw, proposal is verified
|
||||
if not challenge.has_valid_flaw:
|
||||
return VerificationResult(verified=True, debate_log=debate_log)
|
||||
|
||||
# Human reviews remaining disagreements
|
||||
return escalate_to_human(debate_log)
|
||||
```
|
||||
|
||||
### Amplified Oversight
|
||||
|
||||
**Key Insight:** Use AI to help humans supervise AI beyond human capability.
|
||||
|
||||
```yaml
|
||||
amplified_oversight:
|
||||
goal: "Supervision as close as possible to human with complete understanding"
|
||||
|
||||
techniques:
|
||||
- "AI explains its reasoning transparently"
|
||||
- "AI argues against itself when wrong"
|
||||
- "AI cites relevant evidence"
|
||||
- "Monitor knows when it doesn't know"
|
||||
|
||||
monitoring_principle:
|
||||
when_unsure: "Either reject action OR flag for review"
|
||||
never: "Approve uncertain actions silently"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Anthropic Patterns
|
||||
|
||||
### Constitutional AI Principles
|
||||
|
||||
**Key Insight:** Train AI to self-critique based on explicit principles.
|
||||
|
||||
```python
|
||||
class ConstitutionalAI:
|
||||
"""
|
||||
Based on Anthropic's Constitutional AI: Harmlessness from AI Feedback.
|
||||
Self-critique and revision based on constitutional principles.
|
||||
"""
|
||||
|
||||
def __init__(self, constitution):
|
||||
self.constitution = constitution # List of principles
|
||||
|
||||
async def supervised_learning_phase(self, response):
|
||||
"""Phase 1: Self-critique and revise."""
|
||||
# Generate initial response
|
||||
initial = response
|
||||
|
||||
# Self-critique against each principle
|
||||
critiques = []
|
||||
for principle in self.constitution:
|
||||
critique = await self.critique(
|
||||
response=initial,
|
||||
principle=principle,
|
||||
prompt=f"Does this response violate: {principle}?"
|
||||
)
|
||||
critiques.append(critique)
|
||||
|
||||
# Revise based on critiques
|
||||
revised = await self.revise(
|
||||
response=initial,
|
||||
critiques=critiques
|
||||
)
|
||||
|
||||
return revised
|
||||
|
||||
async def rlai_phase(self, response_pairs):
|
||||
"""Phase 2: AI compares responses for constitutional compliance."""
|
||||
preferences = []
|
||||
for (response_a, response_b) in response_pairs:
|
||||
preference = await self.compare(
|
||||
a=response_a,
|
||||
b=response_b,
|
||||
criterion="Which better follows the constitution?"
|
||||
)
|
||||
preferences.append(preference)
|
||||
|
||||
# Train preference model on AI feedback
|
||||
return train_preference_model(preferences)
|
||||
```
|
||||
|
||||
**Loki Mode Constitution Example:**
|
||||
```yaml
|
||||
loki_constitution:
|
||||
- "Never delete production data without explicit backup"
|
||||
- "Never commit secrets or credentials to version control"
|
||||
- "Never bypass quality gates for speed"
|
||||
- "Always verify tests pass before marking task complete"
|
||||
- "Never claim completion without running actual tests"
|
||||
- "Prefer simple solutions over clever ones"
|
||||
- "Document decisions, not just code"
|
||||
```
|
||||
|
||||
### Sleeper Agent Detection (Defection Probes)
|
||||
|
||||
**Key Insight:** Use linear classifiers to detect when agents might misbehave.
|
||||
|
||||
```python
|
||||
class DefectionProbe:
|
||||
"""
|
||||
Based on Anthropic's "Simple probes can catch sleeper agents".
|
||||
Linear classifiers detecting anomalous agent behavior.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
self.baseline_behaviors = []
|
||||
self.probe = LinearClassifier()
|
||||
|
||||
def train_probe(self, normal_trajectories, defection_examples):
|
||||
"""
|
||||
Train on contrast pairs:
|
||||
- Normal agent behavior
|
||||
- Examples of defection/misbehavior
|
||||
"""
|
||||
contrast_pairs = create_contrast_pairs(
|
||||
normal=normal_trajectories,
|
||||
defection=defection_examples
|
||||
)
|
||||
self.probe.fit(contrast_pairs)
|
||||
|
||||
def check_for_defection(self, agent_action, context):
|
||||
"""
|
||||
Check if action appears anomalous.
|
||||
AUROC > 99% achieved in Anthropic's research.
|
||||
"""
|
||||
features = extract_features(agent_action, context)
|
||||
defection_probability = self.probe.predict_proba(features)
|
||||
|
||||
if defection_probability > THRESHOLD:
|
||||
return DefectionAlert(
|
||||
probability=defection_probability,
|
||||
action=agent_action,
|
||||
recommendation="Flag for human review"
|
||||
)
|
||||
return None
|
||||
```
|
||||
|
||||
**Loki Mode Application:**
|
||||
- Monitor agent actions for anomalies
|
||||
- Flag unusual file access patterns
|
||||
- Detect attempts to bypass quality gates
|
||||
|
||||
### Alignment Faking Awareness
|
||||
|
||||
**Key Insight:** Models may strategically comply while preserving hidden preferences.
|
||||
|
||||
```yaml
|
||||
alignment_faking_signals:
|
||||
warning_signs:
|
||||
- "Agent complies only when it knows it's being monitored"
|
||||
- "Agent's reasoning contradicts its final action"
|
||||
- "Sudden behavior change when training context mentioned"
|
||||
- "Agent attempts to exfiltrate its own weights/state"
|
||||
|
||||
detection_strategy:
|
||||
- "Monitor chain-of-thought for rationalization"
|
||||
- "Compare behavior in 'free' vs 'monitored' conditions"
|
||||
- "Check for attempts to undermine oversight systems"
|
||||
|
||||
mitigation:
|
||||
- "Constitutional constraints at training time"
|
||||
- "Transparency in reasoning (visible extended thinking)"
|
||||
- "Defection probes on intermediate reasoning"
|
||||
```
|
||||
|
||||
### Extended Thinking Levels
|
||||
|
||||
**Key Insight:** Allocate computation based on problem complexity.
|
||||
|
||||
```yaml
|
||||
thinking_levels:
|
||||
"think":
|
||||
computation: "Minimal"
|
||||
use_for: "Simple questions, straightforward tasks"
|
||||
|
||||
"think hard":
|
||||
computation: "Moderate"
|
||||
use_for: "Multi-step problems, code implementation"
|
||||
|
||||
"think harder":
|
||||
computation: "Extended"
|
||||
use_for: "Complex debugging, architecture decisions"
|
||||
|
||||
"ultrathink":
|
||||
computation: "Maximum"
|
||||
use_for: "Security analysis, critical system design"
|
||||
|
||||
loki_mode_mapping:
|
||||
haiku_tasks: "think"
|
||||
sonnet_tasks: "think hard"
|
||||
opus_tasks: "think harder to ultrathink"
|
||||
```
|
||||
|
||||
### Explore-Plan-Code Pattern
|
||||
|
||||
**Key Insight:** Research before planning, plan before coding.
|
||||
|
||||
```
|
||||
+------------------------------------------------------------------+
|
||||
| PHASE 1: EXPLORE |
|
||||
| - Research relevant files |
|
||||
| - Understand existing patterns |
|
||||
| - Identify dependencies and constraints |
|
||||
| - NO CODE CHANGES YET |
|
||||
+------------------------------------------------------------------+
|
||||
|
|
||||
v
|
||||
+------------------------------------------------------------------+
|
||||
| PHASE 2: PLAN |
|
||||
| - Create detailed implementation plan |
|
||||
| - List all files to modify |
|
||||
| - Define success criteria |
|
||||
| - Get checkpoint approval if needed |
|
||||
| - STILL NO CODE CHANGES |
|
||||
+------------------------------------------------------------------+
|
||||
|
|
||||
v
|
||||
+------------------------------------------------------------------+
|
||||
| PHASE 3: CODE |
|
||||
| - Execute plan systematically |
|
||||
| - Test after each file change |
|
||||
| - Update plan if discoveries require it |
|
||||
| - Verify against success criteria |
|
||||
+------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
### Context Reset Strategy
|
||||
|
||||
**Key Insight:** Fresh context often performs better than accumulated context.
|
||||
|
||||
```yaml
|
||||
context_management:
|
||||
problem: "Long sessions accumulate irrelevant information"
|
||||
|
||||
solution:
|
||||
trigger_reset:
|
||||
- "After completing major task"
|
||||
- "When changing domains (backend -> frontend)"
|
||||
- "When agent seems confused or repeating errors"
|
||||
|
||||
preserve_across_reset:
|
||||
- "CONTINUITY.md (working memory)"
|
||||
- "Key decisions made this session"
|
||||
- "Current task state"
|
||||
|
||||
discard_on_reset:
|
||||
- "Intermediate debugging attempts"
|
||||
- "Abandoned approaches"
|
||||
- "Superseded plans"
|
||||
```
|
||||
|
||||
### Parallel Instance Pattern
|
||||
|
||||
**Key Insight:** Multiple Claude instances with separation of concerns.
|
||||
|
||||
```python
|
||||
async def parallel_instance_pattern(task):
|
||||
"""
|
||||
Run multiple Claude instances for separation of concerns.
|
||||
Based on Anthropic's Claude Code best practices.
|
||||
"""
|
||||
# Instance 1: Implementation
|
||||
implementer = spawn_instance(
|
||||
role="implementer",
|
||||
context=implementation_context,
|
||||
permissions=["edit", "bash"]
|
||||
)
|
||||
|
||||
# Instance 2: Review
|
||||
reviewer = spawn_instance(
|
||||
role="reviewer",
|
||||
context=review_context,
|
||||
permissions=["read"] # Read-only for safety
|
||||
)
|
||||
|
||||
# Parallel execution
|
||||
implementation = await implementer.execute(task)
|
||||
review = await reviewer.review(implementation)
|
||||
|
||||
if review.approved:
|
||||
return implementation
|
||||
else:
|
||||
# Feed review back to implementer for fixes
|
||||
fixed = await implementer.fix(review.issues)
|
||||
return fixed
|
||||
```
|
||||
|
||||
### Prompt Injection Defense
|
||||
|
||||
**Key Insight:** Multi-layer defense against injection attacks.
|
||||
|
||||
```yaml
|
||||
prompt_injection_defense:
|
||||
layers:
|
||||
layer_1_recognition:
|
||||
- "Train to recognize injection patterns"
|
||||
- "Detect malicious content in external sources"
|
||||
|
||||
layer_2_context_isolation:
|
||||
- "Sandbox external content processing"
|
||||
- "Mark user content vs system instructions"
|
||||
|
||||
layer_3_action_validation:
|
||||
- "Verify requested actions are authorized"
|
||||
- "Block sensitive operations without confirmation"
|
||||
|
||||
layer_4_monitoring:
|
||||
- "Log all external content interactions"
|
||||
- "Alert on suspicious patterns"
|
||||
|
||||
performance:
|
||||
claude_opus_4: "89% attack prevention"
|
||||
claude_sonnet_4: "86% attack prevention"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Combined Patterns for Loki Mode
|
||||
|
||||
### Self-Improving Multi-Agent System
|
||||
|
||||
```yaml
|
||||
combined_approach:
|
||||
world_model_training: "Test in simulation before real execution"
|
||||
self_improvement: "Bootstrap learning from successful trajectories"
|
||||
constitutional_constraints: "Principles-based self-critique"
|
||||
debate_verification: "Pit reviewers against each other"
|
||||
defection_probes: "Monitor for alignment faking"
|
||||
|
||||
implementation_priority:
|
||||
high:
|
||||
- Constitutional AI principles in agent prompts
|
||||
- Explore-Plan-Code workflow enforcement
|
||||
- Context reset triggers
|
||||
|
||||
medium:
|
||||
- Self-improvement loop for task generation
|
||||
- Debate-based verification for critical changes
|
||||
- Cross-embodiment skill transfer
|
||||
|
||||
low:
|
||||
- Full world model training
|
||||
- Defection probe classifiers
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Sources
|
||||
|
||||
**Google DeepMind:**
|
||||
- [SIMA 2: Generalist AI Agent](https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/)
|
||||
- [Gemini Robotics 1.5](https://deepmind.google/blog/gemini-robotics-15-brings-ai-agents-into-the-physical-world/)
|
||||
- [Dreamer 4: World Model Training](https://danijar.com/project/dreamer4/)
|
||||
- [Genie 3: World Models](https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/)
|
||||
- [Scalable AI Safety via Debate](https://deepmind.google/research/publications/34920/)
|
||||
- [Amplified Oversight](https://deepmindsafetyresearch.medium.com/human-ai-complementarity-a-goal-for-amplified-oversight-0ad8a44cae0a)
|
||||
- [Technical AGI Safety Approach](https://arxiv.org/html/2504.01849v1)
|
||||
|
||||
**Anthropic:**
|
||||
- [Constitutional AI](https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback)
|
||||
- [Building Effective Agents](https://www.anthropic.com/research/building-effective-agents)
|
||||
- [Claude Code Best Practices](https://www.anthropic.com/engineering/claude-code-best-practices)
|
||||
- [Sleeper Agents Detection](https://www.anthropic.com/research/probes-catch-sleeper-agents)
|
||||
- [Alignment Faking](https://www.anthropic.com/research/alignment-faking)
|
||||
- [Visible Extended Thinking](https://www.anthropic.com/research/visible-extended-thinking)
|
||||
- [Computer Use Safety](https://www.anthropic.com/news/3-5-models-and-computer-use)
|
||||
- [Sabotage Evaluations](https://www.anthropic.com/research/sabotage-evaluations-for-frontier-models)
|
||||
Reference in New Issue
Block a user