Fix: Ensure all skills are tracked as files, not submodules

This commit is contained in:
sck_0
2026-01-14 18:48:48 +01:00
parent 7f46ed8ca1
commit 8bd204708b
1113 changed files with 82065 additions and 2 deletions

View File

@@ -0,0 +1,50 @@
# Loki Mode Benchmark Results
**Generated:** 2026-01-05 01:10:21
## Overview
This directory contains benchmark results for Loki Mode multi-agent system.
## HumanEval Results
| Metric | Value |
|--------|-------|
| Problems | 164 |
| Passed | 161 |
| Failed | 3 |
| **Pass Rate** | **98.17%** |
| Model | opus |
| Time | 1263.46s |
### Competitor Comparison
| System | Pass@1 |
|--------|--------|
| MetaGPT | 85.9-87.7% |
| **Loki Mode** | **98.17%** |
## Methodology
Loki Mode uses its multi-agent architecture to solve each problem:
1. **Architect Agent** analyzes the problem
2. **Engineer Agent** implements the solution
3. **QA Agent** validates with test cases
4. **Review Agent** checks code quality
This mirrors real-world software development more accurately than single-agent approaches.
## Running Benchmarks
```bash
# Setup only (download datasets)
./benchmarks/run-benchmarks.sh all
# Execute with Claude
./benchmarks/run-benchmarks.sh humaneval --execute
./benchmarks/run-benchmarks.sh humaneval --execute --limit 10 # First 10 only
./benchmarks/run-benchmarks.sh swebench --execute --limit 5 # First 5 only
# Use different model
./benchmarks/run-benchmarks.sh humaneval --execute --model opus
```