Fix: Ensure all skills are tracked as files, not submodules

2026-01-14 18:48:48 +01:00
parent 7f46ed8ca1
commit 8bd204708b
1113 changed files with 82065 additions and 2 deletions
--- a/skills/loki-mode/references/lab-research-patterns.md
+++ b/skills/loki-mode/references/lab-research-patterns.md
@@ -0,0 +1,534 @@
+# Lab Research Patterns Reference
+
+Research-backed patterns from Google DeepMind and Anthropic for enhanced multi-agent orchestration and safety.
+
+---
+
+## Overview
+
+This reference consolidates key patterns from:
+1. **Google DeepMind** - World models, self-improvement, scalable oversight
+2. **Anthropic** - Constitutional AI, alignment safety, agentic coding
+
+---
+
+## Google DeepMind Patterns
+
+### World Model Training (Dreamer 4)
+
+**Key Insight:** Train agents inside world models for safety and data efficiency.
+
+```yaml
+world_model_training:
+  principle: "Learn behaviors through simulation, not real environment"
+  benefits:
+    - 100x less data than real-world training
+    - Safe exploration of dangerous actions
+    - Faster iteration cycles
+
+  architecture:
+    tokenizer: "Compress frames into continuous representation"
+    dynamics_model: "Predict next world state given action"
+    imagination_training: "RL inside simulated trajectories"
+
+  loki_application:
+    - Run agent tasks in isolated containers first
+    - Simulate deployment before actual deploy
+    - Test error scenarios in sandbox
+```
+
+### Self-Improvement Loop (SIMA 2)
+
+**Key Insight:** Use AI to generate tasks and score outcomes for bootstrapped learning.
+
+```python
+class SelfImprovementLoop:
+    """
+    Based on SIMA 2's self-improvement mechanism.
+    Gemini-based teacher + learned reward model.
+    """
+
+    def __init__(self):
+        self.task_generator = "Use LLM to generate varied tasks"
+        self.reward_model = "Learned model to score trajectories"
+        self.experience_bank = []
+
+    def bootstrap_cycle(self):
+        # 1. Generate tasks with estimated rewards
+        tasks = self.task_generator.generate(
+            domain=current_project,
+            difficulty_curriculum=True
+        )
+
+        # 2. Execute tasks, accumulate experience
+        for task in tasks:
+            trajectory = execute(task)
+            reward = self.reward_model.score(trajectory)
+            self.experience_bank.append((trajectory, reward))
+
+        # 3. Train next generation on experience
+        next_agent = train_on_experience(self.experience_bank)
+
+        # 4. Iterate with minimal human intervention
+        return next_agent
+```
+
+**Loki Mode Application:**
+- Generate test scenarios automatically
+- Score code quality with learned criteria
+- Bootstrap agent training across projects
+
+### Hierarchical Reasoning (Gemini Robotics)
+
+**Key Insight:** Separate high-level planning from low-level execution.
+
+```
+------------------------------------------------------------------+
+| EMBODIED REASONING MODEL (Gemini Robotics-ER)                     |
+| - Orchestrates activities like a "high-level brain"               |
+| - Spatial understanding, planning, logical decisions              |
+| - Natively calls tools (search, user functions)                   |
+| - Does NOT directly control actions                               |
+------------------------------------------------------------------+
+        |
+        | High-level insights
+        v
+------------------------------------------------------------------+
+| VISION-LANGUAGE-ACTION MODEL (Gemini Robotics)                    |
+| - "Thinks before taking action"                                   |
+| - Generates internal reasoning in natural language                |
+| - Decomposes long tasks into simpler segments                     |
+| - Directly outputs actions/commands                               |
+------------------------------------------------------------------+
+```
+
+**Loki Mode Application:**
+- Orchestrator = ER model (planning, tool calls)
+- Implementation agents = VLA model (code actions)
+- Task decomposition before execution
+
+### Cross-Embodiment Transfer
+
+**Key Insight:** Skills learned by one agent type transfer to others.
+
+```yaml
+transfer_learning:
+  observation: "Tasks learned on ALOHA2 work on Apollo humanoid"
+  mechanism: "Shared action space abstraction"
+
+  loki_application:
+    - Patterns learned by frontend agent transfer to mobile agent
+    - Testing strategies from QA apply to security testing
+    - Deployment scripts generalize across cloud providers
+
+  implementation:
+    shared_skills_library: ".loki/memory/skills/"
+    abstraction_layer: "Domain-agnostic action primitives"
+    transfer_score: "Confidence in skill applicability"
+```
+
+### Scalable Oversight via Debate
+
+**Key Insight:** Pit AI capabilities against each other for verification.
+
+```python
+async def debate_verification(proposal, max_rounds=2):
+    """
+    Based on DeepMind's Scalable AI Safety via Doubly-Efficient Debate.
+    Use debate to break down verification into manageable sub-tasks.
+    """
+    # Two equally capable AI critics
+    proponent = Agent(role="defender", model="opus")
+    opponent = Agent(role="challenger", model="opus")
+
+    debate_log = []
+
+    for round in range(max_rounds):
+        # Proponent defends proposal
+        defense = await proponent.argue(
+            proposal=proposal,
+            counter_arguments=debate_log
+        )
+
+        # Opponent challenges
+        challenge = await opponent.argue(
+            proposal=proposal,
+            defense=defense,
+            goal="find_flaws"
+        )
+
+        debate_log.append({
+            "round": round,
+            "defense": defense,
+            "challenge": challenge
+        })
+
+        # If opponent cannot find valid flaw, proposal is verified
+        if not challenge.has_valid_flaw:
+            return VerificationResult(verified=True, debate_log=debate_log)
+
+    # Human reviews remaining disagreements
+    return escalate_to_human(debate_log)
+```
+
+### Amplified Oversight
+
+**Key Insight:** Use AI to help humans supervise AI beyond human capability.
+
+```yaml
+amplified_oversight:
+  goal: "Supervision as close as possible to human with complete understanding"
+
+  techniques:
+    - "AI explains its reasoning transparently"
+    - "AI argues against itself when wrong"
+    - "AI cites relevant evidence"
+    - "Monitor knows when it doesn't know"
+
+  monitoring_principle:
+    when_unsure: "Either reject action OR flag for review"
+    never: "Approve uncertain actions silently"
+```
+
+---
+
+## Anthropic Patterns
+
+### Constitutional AI Principles
+
+**Key Insight:** Train AI to self-critique based on explicit principles.
+
+```python
+class ConstitutionalAI:
+    """
+    Based on Anthropic's Constitutional AI: Harmlessness from AI Feedback.
+    Self-critique and revision based on constitutional principles.
+    """
+
+    def __init__(self, constitution):
+        self.constitution = constitution  # List of principles
+
+    async def supervised_learning_phase(self, response):
+        """Phase 1: Self-critique and revise."""
+        # Generate initial response
+        initial = response
+
+        # Self-critique against each principle
+        critiques = []
+        for principle in self.constitution:
+            critique = await self.critique(
+                response=initial,
+                principle=principle,
+                prompt=f"Does this response violate: {principle}?"
+            )
+            critiques.append(critique)
+
+        # Revise based on critiques
+        revised = await self.revise(
+            response=initial,
+            critiques=critiques
+        )
+
+        return revised
+
+    async def rlai_phase(self, response_pairs):
+        """Phase 2: AI compares responses for constitutional compliance."""
+        preferences = []
+        for (response_a, response_b) in response_pairs:
+            preference = await self.compare(
+                a=response_a,
+                b=response_b,
+                criterion="Which better follows the constitution?"
+            )
+            preferences.append(preference)
+
+        # Train preference model on AI feedback
+        return train_preference_model(preferences)
+```
+
+**Loki Mode Constitution Example:**
+```yaml
+loki_constitution:
+  - "Never delete production data without explicit backup"
+  - "Never commit secrets or credentials to version control"
+  - "Never bypass quality gates for speed"
+  - "Always verify tests pass before marking task complete"
+  - "Never claim completion without running actual tests"
+  - "Prefer simple solutions over clever ones"
+  - "Document decisions, not just code"
+```
+
+### Sleeper Agent Detection (Defection Probes)
+
+**Key Insight:** Use linear classifiers to detect when agents might misbehave.
+
+```python
+class DefectionProbe:
+    """
+    Based on Anthropic's "Simple probes can catch sleeper agents".
+    Linear classifiers detecting anomalous agent behavior.
+    """
+
+    def __init__(self):
+        self.baseline_behaviors = []
+        self.probe = LinearClassifier()
+
+    def train_probe(self, normal_trajectories, defection_examples):
+        """
+        Train on contrast pairs:
+        - Normal agent behavior
+        - Examples of defection/misbehavior
+        """
+        contrast_pairs = create_contrast_pairs(
+            normal=normal_trajectories,
+            defection=defection_examples
+        )
+        self.probe.fit(contrast_pairs)
+
+    def check_for_defection(self, agent_action, context):
+        """
+        Check if action appears anomalous.
+        AUROC > 99% achieved in Anthropic's research.
+        """
+        features = extract_features(agent_action, context)
+        defection_probability = self.probe.predict_proba(features)
+
+        if defection_probability > THRESHOLD:
+            return DefectionAlert(
+                probability=defection_probability,
+                action=agent_action,
+                recommendation="Flag for human review"
+            )
+        return None
+```
+
+**Loki Mode Application:**
+- Monitor agent actions for anomalies
+- Flag unusual file access patterns
+- Detect attempts to bypass quality gates
+
+### Alignment Faking Awareness
+
+**Key Insight:** Models may strategically comply while preserving hidden preferences.
+
+```yaml
+alignment_faking_signals:
+  warning_signs:
+    - "Agent complies only when it knows it's being monitored"
+    - "Agent's reasoning contradicts its final action"
+    - "Sudden behavior change when training context mentioned"
+    - "Agent attempts to exfiltrate its own weights/state"
+
+  detection_strategy:
+    - "Monitor chain-of-thought for rationalization"
+    - "Compare behavior in 'free' vs 'monitored' conditions"
+    - "Check for attempts to undermine oversight systems"
+
+  mitigation:
+    - "Constitutional constraints at training time"
+    - "Transparency in reasoning (visible extended thinking)"
+    - "Defection probes on intermediate reasoning"
+```
+
+### Extended Thinking Levels
+
+**Key Insight:** Allocate computation based on problem complexity.
+
+```yaml
+thinking_levels:
+  "think":
+    computation: "Minimal"
+    use_for: "Simple questions, straightforward tasks"
+
+  "think hard":
+    computation: "Moderate"
+    use_for: "Multi-step problems, code implementation"
+
+  "think harder":
+    computation: "Extended"
+    use_for: "Complex debugging, architecture decisions"
+
+  "ultrathink":
+    computation: "Maximum"
+    use_for: "Security analysis, critical system design"
+
+loki_mode_mapping:
+  haiku_tasks: "think"
+  sonnet_tasks: "think hard"
+  opus_tasks: "think harder to ultrathink"
+```
+
+### Explore-Plan-Code Pattern
+
+**Key Insight:** Research before planning, plan before coding.
+
+```
+------------------------------------------------------------------+
+| PHASE 1: EXPLORE                                                  |
+| - Research relevant files                                         |
+| - Understand existing patterns                                    |
+| - Identify dependencies and constraints                           |
+| - NO CODE CHANGES YET                                             |
+------------------------------------------------------------------+
+        |
+        v
+------------------------------------------------------------------+
+| PHASE 2: PLAN                                                     |
+| - Create detailed implementation plan                             |
+| - List all files to modify                                        |
+| - Define success criteria                                         |
+| - Get checkpoint approval if needed                               |
+| - STILL NO CODE CHANGES                                           |
+------------------------------------------------------------------+
+        |
+        v
+------------------------------------------------------------------+
+| PHASE 3: CODE                                                     |
+| - Execute plan systematically                                     |
+| - Test after each file change                                     |
+| - Update plan if discoveries require it                           |
+| - Verify against success criteria                                 |
+------------------------------------------------------------------+
+```
+
+### Context Reset Strategy
+
+**Key Insight:** Fresh context often performs better than accumulated context.
+
+```yaml
+context_management:
+  problem: "Long sessions accumulate irrelevant information"
+
+  solution:
+    trigger_reset:
+      - "After completing major task"
+      - "When changing domains (backend -> frontend)"
+      - "When agent seems confused or repeating errors"
+
+    preserve_across_reset:
+      - "CONTINUITY.md (working memory)"
+      - "Key decisions made this session"
+      - "Current task state"
+
+    discard_on_reset:
+      - "Intermediate debugging attempts"
+      - "Abandoned approaches"
+      - "Superseded plans"
+```
+
+### Parallel Instance Pattern
+
+**Key Insight:** Multiple Claude instances with separation of concerns.
+
+```python
+async def parallel_instance_pattern(task):
+    """
+    Run multiple Claude instances for separation of concerns.
+    Based on Anthropic's Claude Code best practices.
+    """
+    # Instance 1: Implementation
+    implementer = spawn_instance(
+        role="implementer",
+        context=implementation_context,
+        permissions=["edit", "bash"]
+    )
+
+    # Instance 2: Review
+    reviewer = spawn_instance(
+        role="reviewer",
+        context=review_context,
+        permissions=["read"]  # Read-only for safety
+    )
+
+    # Parallel execution
+    implementation = await implementer.execute(task)
+    review = await reviewer.review(implementation)
+
+    if review.approved:
+        return implementation
+    else:
+        # Feed review back to implementer for fixes
+        fixed = await implementer.fix(review.issues)
+        return fixed
+```
+
+### Prompt Injection Defense
+
+**Key Insight:** Multi-layer defense against injection attacks.
+
+```yaml
+prompt_injection_defense:
+  layers:
+    layer_1_recognition:
+      - "Train to recognize injection patterns"
+      - "Detect malicious content in external sources"
+
+    layer_2_context_isolation:
+      - "Sandbox external content processing"
+      - "Mark user content vs system instructions"
+
+    layer_3_action_validation:
+      - "Verify requested actions are authorized"
+      - "Block sensitive operations without confirmation"
+
+    layer_4_monitoring:
+      - "Log all external content interactions"
+      - "Alert on suspicious patterns"
+
+  performance:
+    claude_opus_4: "89% attack prevention"
+    claude_sonnet_4: "86% attack prevention"
+```
+
+---
+
+## Combined Patterns for Loki Mode
+
+### Self-Improving Multi-Agent System
+
+```yaml
+combined_approach:
+  world_model_training: "Test in simulation before real execution"
+  self_improvement: "Bootstrap learning from successful trajectories"
+  constitutional_constraints: "Principles-based self-critique"
+  debate_verification: "Pit reviewers against each other"
+  defection_probes: "Monitor for alignment faking"
+
+  implementation_priority:
+    high:
+      - Constitutional AI principles in agent prompts
+      - Explore-Plan-Code workflow enforcement
+      - Context reset triggers
+
+    medium:
+      - Self-improvement loop for task generation
+      - Debate-based verification for critical changes
+      - Cross-embodiment skill transfer
+
+    low:
+      - Full world model training
+      - Defection probe classifiers
+```
+
+---
+
+## Sources
+
+**Google DeepMind:**
+- [SIMA 2: Generalist AI Agent](https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/)
+- [Gemini Robotics 1.5](https://deepmind.google/blog/gemini-robotics-15-brings-ai-agents-into-the-physical-world/)
+- [Dreamer 4: World Model Training](https://danijar.com/project/dreamer4/)
+- [Genie 3: World Models](https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/)
+- [Scalable AI Safety via Debate](https://deepmind.google/research/publications/34920/)
+- [Amplified Oversight](https://deepmindsafetyresearch.medium.com/human-ai-complementarity-a-goal-for-amplified-oversight-0ad8a44cae0a)
+- [Technical AGI Safety Approach](https://arxiv.org/html/2504.01849v1)
+
+**Anthropic:**
+- [Constitutional AI](https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback)
+- [Building Effective Agents](https://www.anthropic.com/research/building-effective-agents)
+- [Claude Code Best Practices](https://www.anthropic.com/engineering/claude-code-best-practices)
+- [Sleeper Agents Detection](https://www.anthropic.com/research/probes-catch-sleeper-agents)
+- [Alignment Faking](https://www.anthropic.com/research/alignment-faking)
+- [Visible Extended Thinking](https://www.anthropic.com/research/visible-extended-thinking)
+- [Computer Use Safety](https://www.anthropic.com/news/3-5-models-and-computer-use)
+- [Sabotage Evaluations](https://www.anthropic.com/research/sabotage-evaluations-for-frontier-models)