Supergoal

An agent operating system for coding runs: turn a rough request into a structured run.json, evidence-backed phases, one ready-to-paste /goal, mechanical gates, and a report an engineer can inspect.

Current
v1.0.0
Runs on
Claude Code + Codex
Status
Run kernel ready
$ /supergoal redesign billing settings

Stage 0  memory + tools loaded
Stage 2  repo recon completed
Stage 4  sliced into 5 phases
Stage 5  compiled run.json
Stage 6  kernel validated

PREFLIGHT_GREEN

/goal "Execute .supergoal/billing-settings-H7x3 until AUDIT_COMPLETE, RUN_REPORT_WRITTEN, and SUPERGOAL_RUN_COMPLETE."
Phase gate Evidence, commands, scope drift
Telemetry Events, command logs, proof files
Report Trust debt and run history
What it is

A small runtime for work agents usually only promise.

Supergoal is not a framework, server, or UI library. It is a skill package that gives Claude Code and Codex a disciplined run kernel for non-trivial software work: a manifest, scoped phases, command evidence, failure events, mechanical gates, and a final audit.

The key move is simple: /goal carries the short end-state, while the real contract lives on disk. The executor keeps returning to run.json, phase specs, evidence files, and the protocol instead of trusting chat memory.

Run Kernel

The source of truth is no longer a transcript.

v1 gives every run a small operating system: manifest, event stream, evidence vault, gates, audit, and report.

01

run.json

Canonical state for phases, command ids, allowed paths, criteria classes, deliverables, and status.

02

events.jsonl

Append-only black box recorder for starts, gates, failures, audits, and report generation.

03

evidence/

Command logs, diffs, screenshots, and proof files the phase gate can inspect.

04

sg.py

Standard-library kernel for validation, event recording, phase gates, audit, resume, and reports.

Presentation Mode

The v1 run behaves like an operating system.

Each layer removes one class of agent failure: vague plans, invisible work, unbounded edits, weak recovery, and unverifiable completion.

01

Manifest before motion

The planner writes run.json before any execution. Phases, dependencies, allowed paths, command ids, deliverables, and trust debt become a contract, not a suggestion.

{
  "schema_version": "1.0",
  "phase": {
    "id": 1,
    "allowed_paths": ["src/auth/"],
    "commands": ["test"]
  }
}
02

Evidence becomes filesystem state

Command output, diff summaries, screenshots, and audit notes live in evidence/phase-N/. The transcript can summarize proof; the run keeps the proof.

evidence/
`-- phase-3/
    |-- commands/test.log
    |-- diffs/summary.txt
    `-- screenshots/mobile.png
03

Gates make success expensive to fake

sg.py gate-phase checks required evidence, command exit markers, changed files, and trust debt before SUPERGOAL_PHASE_DONE.

PHASE_GATE_VERIFY pass
TRUST_DEBT phase 3: 1/8 trust-prior (12%)
SCOPE_DRIFT: none
04

Recovery becomes replayable

Failures write events. Resume does not ask the next agent to infer where the run died; it prints the exact next phase, gap, or blocked reason.

{"type":"failure.probe","phase":2,"status":"fail"}
{"type":"audit.fail","data":{"gaps":["missing deliverable"]}}
05

The report is the product surface

report.html turns a long autonomous session into a review artifact: phase status, event history, evidence counts, and the boundary between mechanical proof and human judgment.

AUDIT_COMPLETE
RUN_REPORT_WRITTEN .supergoal/run/report.html
SUPERGOAL_RUN_COMPLETE
Workflow

From vague task to audited completion.

Supergoal makes the planning explicit before execution starts, then keeps execution bound to a structured run contract.

  1. 01

    Load context

    Detect memory, tools, repo state, active runs, and whether the work is greenfield or brownfield.

  2. 02

    Recon before promises

    Scan the codebase and environment so the plan reflects the project instead of guessing from the prompt.

  3. 03

    Compile the run

    Create run.json, markdown mirrors, phase specs, and the evidence vault.

  4. 04

    Paste once

    Supergoal prints one /goal line. Slash commands remain user-triggered, so the handoff is honest.

  5. 05

    Gate with proof

    Every phase must pass evidence, command, scope, and trust-debt checks before done.

Protocol

Built for the failure modes that usually derail agents.

The protocol is intentionally blunt: phases must be measurable, evidence must exist on disk, and the final audit checks the working tree rather than trusting the conversation.

Evidence gate

Required files and command logs must exist before a phase can print SUPERGOAL_PHASE_DONE.

Scope firewall

Changed files are checked against each phase's allowed_paths; drift prints SCOPE_DRIFT.

Trust debt

Criteria are labeled mechanical, human, or trust-prior; weak proof is visible.

Final report

After audit, Supergoal writes report.html with phases, events, evidence counts, and trust debt.

Why it matters

Less babysitting, fewer fake finishes.

Ordinary agent planning

  • Plan lives only in chat or loose markdown.
  • User must keep prompting each phase.
  • Failure recovery depends on ad hoc follow-up.
  • "Done" can mean tests passed, not that deliverables shipped.

With Supergoal

  • run.json, state, protocol, evidence, and phase specs live on disk.
  • One pasted /goal drives the whole run.
  • Phase gates stop missing evidence and out-of-scope edits.
  • AUDIT_COMPLETE and RUN_REPORT_WRITTEN must appear before completion.
Report Surface

Every run leaves an artifact you can inspect.

The report is generated under the run root, not hosted externally. It turns a long agent session into a reviewable state summary.

Supergoal v1 run report COMPLETE

Billing Settings Redesign

5phases
19events
2/17trust-prior
41evidence files
PhaseStatusGateEvidence
Foundationcompletepass9 files
States & edgescompletepass12 files
Polish & Hardencompletepass8 files

Mechanically verified

  • Manifest validity and dependency graph.
  • Required evidence files.
  • Command logs with explicit exit 0.
  • Changed files inside allowed_paths.
  • Deliverables present in the working tree.

Human judgment

  • Subjective UI taste and copy quality.
  • Screenshot interpretation.
  • Whether scope was broad enough.
  • Whether the plan aimed at the right product outcome.
  • Anything labeled trust-prior.
Install

GitHub Pages ready, command-line friendly.

This site is static. The repo can publish it from site/ with GitHub Actions and no frontend build step.

Claude Code marketplace install
/plugin marketplace add https://github.com/robzilla1738/supergoal.git
/plugin install supergoal@supergoal
/reload-plugins
Repository anatomy

Small package, load-bearing files.

Supergoal ships the skill and its runtime assets. Tests, docs, and this website stay in the repository.

supergoal/
├── skills/supergoal/
│   ├── SKILL.md
│   ├── scripts/
│   │   ├── claim-run.sh
│   │   ├── sg.py
│   │   └── repo-state.sh
│   ├── templates/
│   │   ├── ROADMAP.md
│   │   ├── STATE.md
│   │   └── PROTOCOL.md
│   └── references/
├── tests/
│   ├── sg-run-kernel.test.sh
│   ├── claim-run.test.sh
│   └── repo-state.test.sh
├── site/
└── .github/workflows/pages.yml

SKILL.md

The main instruction surface that defines stages, intake, recon, plan review, and handoff behavior.

sg.py

Validates manifests, records events, gates phases, audits deliverables, resumes runs, and writes reports.

repo-state.sh

Checks deliverables against the complete working tree, including untracked files.

PROTOCOL.md

Defines the autonomous phase loop, evidence vault, gate commands, recovery blocks, audit, and report markers.

Transcript markers

The run says exactly where it is.

SUPERGOAL_PHASE_START SUPERGOAL_PHASE_VERIFY PHASE_GATE_VERIFY SCOPE_DRIFT TRUST_DEBT MEMORY_SAVED AUDIT_COMPLETE RUN_REPORT_WRITTEN SUPERGOAL_RUN_COMPLETE