Fix: Ensure all skills are tracked as files, not submodules

This commit is contained in:
sck_0
2026-01-14 18:48:48 +01:00
parent 7f46ed8ca1
commit 8bd204708b
1113 changed files with 82065 additions and 2 deletions

View File

@@ -0,0 +1,48 @@
# Loki Mode Benchmark Results
## Overview
This directory contains benchmark results for Loki Mode multi-agent system.
## Benchmarks Available
### HumanEval
- **Problems:** 164 Python programming problems
- **Metric:** Pass@1 (percentage of problems solved on first attempt)
- **Competitor Baseline:** MetaGPT achieves 85.9-87.7%
### SWE-bench Lite
- **Problems:** 300 real-world GitHub issues
- **Metric:** Resolution rate
- **Competitor Baseline:** Top agents achieve 45-77%
## Running Benchmarks
```bash
# Run all benchmarks
./benchmarks/run-benchmarks.sh all
# Run specific benchmark
./benchmarks/run-benchmarks.sh humaneval --execute
./benchmarks/run-benchmarks.sh swebench --execute
```
## Results Format
Results are saved as JSON files with:
- Timestamp
- Problem count
- Pass rate
- Individual problem results
- Token usage
- Execution time
## Methodology
Loki Mode uses its multi-agent architecture to solve each problem:
1. **Architect Agent** analyzes the problem
2. **Engineer Agent** implements the solution
3. **QA Agent** validates with test cases
4. **Review Agent** checks code quality
This mirrors real-world software development more accurately than single-agent approaches.