Phases & Runs¶

Agent Smith tracks its own development through a structured workflow of phases and runs. This is the meta-workflow — how the project itself is planned, executed, and documented.

The .agentsmith/ Directory¶

Every project that Agent Smith works on gets an .agentsmith/ directory:

.agentsmith/
├── context.yaml          # Project description + state tracking
├── coding-principles.md  # Detected coding conventions
├── decisions.md          # Why the agent made each decision
├── code-map.yaml         # LLM-generated code map
├── phases/
│   ├── done/             # Completed phase documents
│   ├── active/           # Currently executing (max 1)
│   └── planned/          # Upcoming phases
└── runs/
    ├── r01-fix-login-bug/
    │   ├── plan.md       # Execution plan
    │   └── result.md     # Outcome with cost data
    └── r02-add-search/
        ├── plan.md
        └── result.md

Phases¶

A phase is a unit of planned work — a feature, refactor, or capability addition. Each phase has its own Markdown document describing the goal, motivation, approach, files to create/modify, and definition of done.

Phase Lifecycle¶

planned/ → active/ → done/

planned/ — documented but not started. Includes requirements, approach, and acceptance criteria.
active/ — currently being worked on. Only one phase can be active at a time.
done/ — completed. The document stays as historical reference.

Phase Document Structure¶

# Phase 52: Single Executable Release

## Goal
What we're building and why.

## Motivation
The problem this solves.

## Approach
Technical details of the implementation.

## Files to Create
- list of new files

## Files to Modify
- list of existing files to change

## Definition of Done
- [ ] Checklist of acceptance criteria

Phase Tracking in context.yaml¶

The state section in context.yaml tracks all phases:

state:
  done:
    p01: "Initial pipeline: fetch ticket, checkout, plan, execute, commit"
    p02: "Retry and resilience: Polly policies, test retry loop"
    # ...
    p52: "Single executable release: binaries for 5 platforms, GitHub Releases"
  active: {}
  planned:
    p23: "Multi-repo support → .agentsmith/phases/planned/p23-multi-repo.md"
    p25: "PR review iteration → .agentsmith/phases/planned/p25-pr-review.md"

Phases are numbered sequentially (p01, p02, ..., p52). The description after the number is a one-line summary. Planned phases link to their full document.

Runs¶

A run is a single execution of a pipeline against a ticket or task. Each run produces artifacts:

plan.md¶

The execution plan generated by the AI before writing code. Contains:

Analysis of the ticket and relevant code
Step-by-step implementation plan
Files to modify and why
Test strategy

result.md¶

The outcome of the run. Contains YAML frontmatter with machine-readable data:

---
ticket: "#57 — GET /todos returns 500 when database is empty"
project: todo-list
date: 2026-02-24
result: success
branch: fix/57
pr_url: https://github.com/org/repo/pull/42
duration_seconds: 50
cost:
  total_usd: 0.0682
  phases:
    scout:
      model: claude-haiku-4-5-20251001
      input_tokens: 12450
      output_tokens: 890
      turns: 3
      usd: 0.0062
    primary:
      model: claude-sonnet-4-20250514
      input_tokens: 45200
      output_tokens: 8900
      turns: 7
      usd: 0.0620
---

## Summary
What was done and why.

## Changes
Files modified with explanations.

## Decisions
Architectural choices made during execution.

## Test Results
Pass/fail status of the test suite.

Run Numbering¶

Runs use a global counter (r01, r02, ...) tracked in context.yaml. Each run gets a directory named r{NN}-slug:

runs/
├── r01-fix-login-bug/
├── r02-add-search-endpoint/
├── r03-security-scan-api/
└── r04-fix-null-reference/

The Complete Workflow¶

When Agent Smith processes a ticket:

Plan — generates plan.md with the approach
Execute — writes code, guided by the plan
Test — runs the test suite
Result — writes result.md with cost and decision data
PR — commits everything, opens PR, includes result.md

The PR reviewer sees not just the code changes but also:

What the agent planned to do
What it actually did (and why it deviated, if applicable)
How much it cost
What decisions it made and why

This transparency is the point. When the agent's code breaks six months later, you'll know what it was thinking.