Fix: Ensure all skills are tracked as files, not submodules

2026-01-14 18:48:48 +01:00
parent 7f46ed8ca1
commit 8bd204708b
1113 changed files with 82065 additions and 2 deletions
--- a/skills/loki-mode/benchmarks/results/2026-01-05-00-23-56/SUMMARY.md
+++ b/skills/loki-mode/benchmarks/results/2026-01-05-00-23-56/SUMMARY.md
@@ -0,0 +1,48 @@
+# Loki Mode Benchmark Results
+
+## Overview
+
+This directory contains benchmark results for Loki Mode multi-agent system.
+
+## Benchmarks Available
+
+### HumanEval
+- **Problems:** 164 Python programming problems
+- **Metric:** Pass@1 (percentage of problems solved on first attempt)
+- **Competitor Baseline:** MetaGPT achieves 85.9-87.7%
+
+### SWE-bench Lite
+- **Problems:** 300 real-world GitHub issues
+- **Metric:** Resolution rate
+- **Competitor Baseline:** Top agents achieve 45-77%
+
+## Running Benchmarks
+
+```bash
+# Run all benchmarks
+./benchmarks/run-benchmarks.sh all
+
+# Run specific benchmark
+./benchmarks/run-benchmarks.sh humaneval --execute
+./benchmarks/run-benchmarks.sh swebench --execute
+```
+
+## Results Format
+
+Results are saved as JSON files with:
+- Timestamp
+- Problem count
+- Pass rate
+- Individual problem results
+- Token usage
+- Execution time
+
+## Methodology
+
+Loki Mode uses its multi-agent architecture to solve each problem:
+1. **Architect Agent** analyzes the problem
+2. **Engineer Agent** implements the solution
+3. **QA Agent** validates with test cases
+4. **Review Agent** checks code quality
+
+This mirrors real-world software development more accurately than single-agent approaches.
--- a/skills/loki-mode/benchmarks/results/2026-01-05-00-23-56/humaneval-results.json
+++ b/skills/loki-mode/benchmarks/results/2026-01-05-00-23-56/humaneval-results.json
@@ -0,0 +1,15 @@
+{
+  "benchmark": "HumanEval",
+  "version": "1.0",
+  "timestamp": "2026-01-05T00:24:04.904083",
+  "total_problems": 164,
+  "status": "INFRASTRUCTURE_READY",
+  "note": "Benchmark infrastructure created. Run with --execute to run actual tests.",
+  "sample_problems": [
+    "HumanEval/0",
+    "HumanEval/1",
+    "HumanEval/2",
+    "HumanEval/3",
+    "HumanEval/4"
+  ]
+}
--- a/skills/loki-mode/benchmarks/results/2026-01-05-00-23-56/swebench-results.json
+++ b/skills/loki-mode/benchmarks/results/2026-01-05-00-23-56/swebench-results.json
@@ -0,0 +1,10 @@
+{
+  "benchmark": "SWE-bench Lite",
+  "version": "1.0",
+  "timestamp": "2026-01-05T00:24:04.950779",
+  "total_problems": 300,
+  "status": "INFRASTRUCTURE_READY",
+  "note": "Benchmark infrastructure created. Install swebench package for full evaluation.",
+  "install": "pip install swebench",
+  "evaluation": "python -m swebench.harness.run_evaluation --predictions predictions.json"
+}