Fix: Ensure all skills are tracked as files, not submodules

This commit is contained in:
sck_0
2026-01-14 18:48:48 +01:00
parent 7f46ed8ca1
commit 8bd204708b
1113 changed files with 82065 additions and 2 deletions

View File

@@ -0,0 +1,48 @@
# Loki Mode Benchmark Results
## Overview
This directory contains benchmark results for Loki Mode multi-agent system.
## Benchmarks Available
### HumanEval
- **Problems:** 164 Python programming problems
- **Metric:** Pass@1 (percentage of problems solved on first attempt)
- **Competitor Baseline:** MetaGPT achieves 85.9-87.7%
### SWE-bench Lite
- **Problems:** 300 real-world GitHub issues
- **Metric:** Resolution rate
- **Competitor Baseline:** Top agents achieve 45-77%
## Running Benchmarks
```bash
# Run all benchmarks
./benchmarks/run-benchmarks.sh all
# Run specific benchmark
./benchmarks/run-benchmarks.sh humaneval --execute
./benchmarks/run-benchmarks.sh swebench --execute
```
## Results Format
Results are saved as JSON files with:
- Timestamp
- Problem count
- Pass rate
- Individual problem results
- Token usage
- Execution time
## Methodology
Loki Mode uses its multi-agent architecture to solve each problem:
1. **Architect Agent** analyzes the problem
2. **Engineer Agent** implements the solution
3. **QA Agent** validates with test cases
4. **Review Agent** checks code quality
This mirrors real-world software development more accurately than single-agent approaches.

View File

@@ -0,0 +1,15 @@
{
"benchmark": "HumanEval",
"version": "1.0",
"timestamp": "2026-01-05T00:24:04.904083",
"total_problems": 164,
"status": "INFRASTRUCTURE_READY",
"note": "Benchmark infrastructure created. Run with --execute to run actual tests.",
"sample_problems": [
"HumanEval/0",
"HumanEval/1",
"HumanEval/2",
"HumanEval/3",
"HumanEval/4"
]
}

View File

@@ -0,0 +1,10 @@
{
"benchmark": "SWE-bench Lite",
"version": "1.0",
"timestamp": "2026-01-05T00:24:04.950779",
"total_problems": 300,
"status": "INFRASTRUCTURE_READY",
"note": "Benchmark infrastructure created. Install swebench package for full evaluation.",
"install": "pip install swebench",
"evaluation": "python -m swebench.harness.run_evaluation --predictions predictions.json"
}