Supergoal

An agent operating system for coding runs: turn a rough request into a structured run.json, evidence-backed phases, one ready-to-paste /goal, mechanical gates, and a report an engineer can inspect.

Install See the kernel

Current: v1.0.0
Runs on: Claude Code + Codex
Status: Run kernel ready

$ /supergoal redesign billing settings

Stage 0  memory + tools loaded
Stage 2  repo recon completed
Stage 4  sliced into 5 phases
Stage 5  compiled run.json
Stage 6  kernel validated

PREFLIGHT_GREEN

/goal "Execute .supergoal/billing-settings-H7x3 until AUDIT_COMPLETE, RUN_REPORT_WRITTEN, and SUPERGOAL_RUN_COMPLETE."

Phase gate Evidence, commands, scope drift

Telemetry Events, command logs, proof files

Report Trust debt and run history

What it is

A small runtime for work agents usually only promise.

Supergoal is not a framework, server, or UI library. It is a skill package that gives Claude Code and Codex a disciplined run kernel for non-trivial software work: a manifest, scoped phases, command evidence, failure events, mechanical gates, and a final audit.

The key move is simple: /goal carries the short end-state, while the real contract lives on disk. The executor keeps returning to run.json, phase specs, evidence files, and the protocol instead of trusting chat memory.

Run Kernel

The source of truth is no longer a transcript.

v1 gives every run a small operating system: manifest, event stream, evidence vault, gates, audit, and report.

`run.json`

Canonical state for phases, command ids, allowed paths, criteria classes, deliverables, and status.

`events.jsonl`

Append-only black box recorder for starts, gates, failures, audits, and report generation.

`evidence/`

Command logs, diffs, screenshots, and proof files the phase gate can inspect.

`sg.py`

Standard-library kernel for validation, event recording, phase gates, audit, resume, and reports.

Presentation Mode

The v1 run behaves like an operating system.

Each layer removes one class of agent failure: vague plans, invisible work, unbounded edits, weak recovery, and unverifiable completion.

Manifest before motion

The planner writes run.json before any execution. Phases, dependencies, allowed paths, command ids, deliverables, and trust debt become a contract, not a suggestion.

{
  "schema_version": "1.0",
  "phase": {
    "id": 1,
    "allowed_paths": ["src/auth/"],
    "commands": ["test"]
  }
}

Evidence becomes filesystem state

Command output, diff summaries, screenshots, and audit notes live in evidence/phase-N/. The transcript can summarize proof; the run keeps the proof.

evidence/
`-- phase-3/
    |-- commands/test.log
    |-- diffs/summary.txt
    `-- screenshots/mobile.png

Gates make success expensive to fake

sg.py gate-phase checks required evidence, command exit markers, changed files, and trust debt before SUPERGOAL_PHASE_DONE.

PHASE_GATE_VERIFY pass
TRUST_DEBT phase 3: 1/8 trust-prior (12%)
SCOPE_DRIFT: none

Recovery becomes replayable

Failures write events. Resume does not ask the next agent to infer where the run died; it prints the exact next phase, gap, or blocked reason.

{"type":"failure.probe","phase":2,"status":"fail"}
{"type":"audit.fail","data":{"gaps":["missing deliverable"]}}

The report is the product surface

report.html turns a long autonomous session into a review artifact: phase status, event history, evidence counts, and the boundary between mechanical proof and human judgment.

AUDIT_COMPLETE
RUN_REPORT_WRITTEN .supergoal/run/report.html
SUPERGOAL_RUN_COMPLETE

Workflow

From vague task to audited completion.

Supergoal makes the planning explicit before execution starts, then keeps execution bound to a structured run contract.

Plan

/supergoal Intake

Memory, tools, repo state, risks.

run kernel Compiled contract

run.json, events, evidence vault.

Execute

/goal Paste once

Executor resumes from disk, not chat memory.

phase loop Gate every phase

Commands, scope, evidence, trust debt.

Recover

recovery Failures become state

Retry, fix spec, or blocked handoff.

audit Report the proof

report.html exposes what passed and what still needs judgment.

01

Load context

Detect memory, tools, repo state, active runs, and whether the work is greenfield or brownfield.
02

Recon before promises

Scan the codebase and environment so the plan reflects the project instead of guessing from the prompt.
03

Compile the run

Create run.json, markdown mirrors, phase specs, and the evidence vault.
04

Paste once

Supergoal prints one /goal line. Slash commands remain user-triggered, so the handoff is honest.
05

Gate with proof

Every phase must pass evidence, command, scope, and trust-debt checks before done.

Protocol

Built for the failure modes that usually derail agents.

The protocol is intentionally blunt: phases must be measurable, evidence must exist on disk, and the final audit checks the working tree rather than trusting the conversation.

Evidence gate

Required files and command logs must exist before a phase can print SUPERGOAL_PHASE_DONE.

Scope firewall

Changed files are checked against each phase's allowed_paths; drift prints SCOPE_DRIFT.

Trust debt

Criteria are labeled mechanical, human, or trust-prior; weak proof is visible.

Final report

After audit, Supergoal writes report.html with phases, events, evidence counts, and trust debt.

Why it matters

Less babysitting, fewer fake finishes.

Ordinary agent planning

Plan lives only in chat or loose markdown.
User must keep prompting each phase.
Failure recovery depends on ad hoc follow-up.
"Done" can mean tests passed, not that deliverables shipped.

With Supergoal

run.json, state, protocol, evidence, and phase specs live on disk.
One pasted /goal drives the whole run.
Phase gates stop missing evidence and out-of-scope edits.
AUDIT_COMPLETE and RUN_REPORT_WRITTEN must appear before completion.

Report Surface

Every run leaves an artifact you can inspect.

The report is generated under the run root, not hosted externally. It turns a long agent session into a reviewable state summary.

Supergoal v1 run report COMPLETE

Billing Settings Redesign

5phases

19events

2/17trust-prior

41evidence files

Phase	Status	Gate	Evidence
Foundation	complete	pass	9 files
States & edges	complete	pass	12 files
Polish & Harden	complete	pass	8 files

Mechanically verified

Manifest validity and dependency graph.
Required evidence files.
Command logs with explicit exit 0.
Changed files inside allowed_paths.
Deliverables present in the working tree.

Human judgment

Subjective UI taste and copy quality.
Screenshot interpretation.
Whether scope was broad enough.
Whether the plan aimed at the right product outcome.
Anything labeled trust-prior.

Example Gallery

Four run states worth showing, not hiding.

Supergoal should make clean success obvious, but it should make imperfect runs even more inspectable.

Success

All gates passed

Every phase has evidence, commands exited cleanly, audit found no gaps, report was written.

AUDIT_COMPLETE

Audit fixed

Gap found, then repaired

Audit identified a missing deliverable, wrote a focused fix spec, reran, and completed cleanly.

audit.fail -> audit.pass

Blocked

Recovery exhausted

The run stops with probe history and exact next action rather than pretending the task is done.

FAILURE_HANDOFF

Scope drift

Out-of-scope edit caught

The phase tried to touch a file outside allowed_paths; the gate flagged it before completion.

SCOPE_DRIFT

Install

GitHub Pages ready, command-line friendly.

This site is static. The repo can publish it from site/ with GitHub Actions and no frontend build step.

Claude Code marketplace install

/plugin marketplace add https://github.com/robzilla1738/supergoal.git
/plugin install supergoal@supergoal
/reload-plugins

Codex manual skill install

mkdir -p ~/.codex/skills
git clone https://github.com/robzilla1738/supergoal /tmp/supergoal-clone
cp -R /tmp/supergoal-clone/skills/supergoal ~/.codex/skills/
rm -rf /tmp/supergoal-clone

Pages deployment included in this branch

git add site .github/workflows/pages.yml
git commit -m "Add GitHub Pages site"
git push origin main

# In GitHub: Settings -> Pages -> Source -> GitHub Actions

Repository anatomy

Small package, load-bearing files.

Supergoal ships the skill and its runtime assets. Tests, docs, and this website stay in the repository.

supergoal/
├── skills/supergoal/
│   ├── SKILL.md
│   ├── scripts/
│   │   ├── claim-run.sh
│   │   ├── sg.py
│   │   └── repo-state.sh
│   ├── templates/
│   │   ├── ROADMAP.md
│   │   ├── STATE.md
│   │   └── PROTOCOL.md
│   └── references/
├── tests/
│   ├── sg-run-kernel.test.sh
│   ├── claim-run.test.sh
│   └── repo-state.test.sh
├── site/
└── .github/workflows/pages.yml

`SKILL.md`

The main instruction surface that defines stages, intake, recon, plan review, and handoff behavior.

`sg.py`

Validates manifests, records events, gates phases, audits deliverables, resumes runs, and writes reports.

`repo-state.sh`

Checks deliverables against the complete working tree, including untracked files.

`PROTOCOL.md`

Defines the autonomous phase loop, evidence vault, gate commands, recovery blocks, audit, and report markers.

Transcript markers

The run says exactly where it is.

SUPERGOAL_PHASE_START SUPERGOAL_PHASE_VERIFY PHASE_GATE_VERIFY SCOPE_DRIFT TRUST_DEBT MEMORY_SAVED AUDIT_COMPLETE RUN_REPORT_WRITTEN SUPERGOAL_RUN_COMPLETE

Supergoal

A small runtime for work agents usually only promise.

The source of truth is no longer a transcript.

run.json

events.jsonl

evidence/

sg.py

The v1 run behaves like an operating system.

Manifest before motion

Evidence becomes filesystem state

Gates make success expensive to fake

Recovery becomes replayable

The report is the product surface

From vague task to audited completion.

Load context

Recon before promises

Compile the run

Paste once

Gate with proof

Built for the failure modes that usually derail agents.

Evidence gate

Scope firewall

Trust debt

Final report

Less babysitting, fewer fake finishes.

Ordinary agent planning

With Supergoal

Every run leaves an artifact you can inspect.

Billing Settings Redesign

Mechanically verified

Human judgment

Four run states worth showing, not hiding.

All gates passed

Gap found, then repaired

Recovery exhausted

Out-of-scope edit caught

GitHub Pages ready, command-line friendly.

Small package, load-bearing files.

SKILL.md

sg.py

repo-state.sh

PROTOCOL.md

The run says exactly where it is.

`run.json`

`events.jsonl`

`evidence/`

`sg.py`

`SKILL.md`

`sg.py`

`repo-state.sh`

`PROTOCOL.md`