Fix: Ensure all skills are tracked as files, not submodules

2026-01-14 18:48:48 +01:00
parent 7f46ed8ca1
commit 8bd204708b
1113 changed files with 82065 additions and 2 deletions
--- a/skills/loki-mode/benchmarks/results/2026-01-05-01-35-39/SUMMARY.md
+++ b/skills/loki-mode/benchmarks/results/2026-01-05-01-35-39/SUMMARY.md
@@ -0,0 +1,48 @@
+# Loki Mode Benchmark Results
+
+**Generated:** 2026-01-05 02:32:40
+
+## Overview
+
+This directory contains benchmark results for Loki Mode multi-agent system.
+
+## SWE-bench Lite Results
+
+| Metric | Value |
+|--------|-------|
+| Problems | 50 |
+| Patches Generated | 50 |
+| Errors | 0 |
+| Model | opus |
+| Time | 3413.75s |
+
+**Next Step:** Run the SWE-bench evaluator to validate patches:
+
+```bash
+python -m swebench.harness.run_evaluation     --predictions /Users/lokesh/git/loki-mode/benchmarks/results/2026-01-05-01-35-39/swebench-predictions.json     --max_workers 4
+```
+
+## Methodology
+
+Loki Mode uses its multi-agent architecture to solve each problem:
+1. **Architect Agent** analyzes the problem
+2. **Engineer Agent** implements the solution
+3. **QA Agent** validates with test cases
+4. **Review Agent** checks code quality
+
+This mirrors real-world software development more accurately than single-agent approaches.
+
+## Running Benchmarks
+
+```bash
+# Setup only (download datasets)
+./benchmarks/run-benchmarks.sh all
+
+# Execute with Claude
+./benchmarks/run-benchmarks.sh humaneval --execute
+./benchmarks/run-benchmarks.sh humaneval --execute --limit 10  # First 10 only
+./benchmarks/run-benchmarks.sh swebench --execute --limit 5    # First 5 only
+
+# Use different model
+./benchmarks/run-benchmarks.sh humaneval --execute --model opus
+```