Phases & Runs¶
Agent Smith tracks its own development through a structured workflow of phases and runs. This is the meta-workflow — how the project itself is planned, executed, and documented.
The .agentsmith/ Directory¶
Every project that Agent Smith works on gets an .agentsmith/ directory:
.agentsmith/
├── context.yaml # Project description + state tracking
├── coding-principles.md # Detected coding conventions
├── decisions.md # Why the agent made each decision
├── code-map.yaml # LLM-generated code map
├── phases/
│ ├── done/ # Completed phase documents
│ ├── active/ # Currently executing (max 1)
│ └── planned/ # Upcoming phases
├── wiki/ # LLM-compiled knowledge base
│ ├── index.md # Master index with backlinks
│ ├── decisions.md # Compiled decisions across runs
│ ├── known-issues.md # Recurring problems and solutions
│ ├── patterns.md # Detected code patterns
│ └── concepts/ # Domain-specific articles
├── security/ # SARIF snapshots for trend analysis
└── runs/
├── 2026-05-20T22-27-43-8a3f-fix-login-bug/
│ ├── plan.md # Execution plan
│ └── result.md # Outcome with cost data
└── 2026-05-20T22-29-11-4c19-add-search/
├── plan.md
└── result.md
The wiki/ directory is maintained by the Project Knowledge Base -- an LLM-compiled wiki that accumulates knowledge from all runs, decisions, and security scans.
Phases¶
A phase is a unit of planned work — a feature, refactor, or capability addition. Each phase has its own Markdown document describing the goal, motivation, approach, files to create/modify, and definition of done.
Phase Lifecycle¶
- planned/ — documented but not started. Includes requirements, approach, and acceptance criteria.
- active/ — currently being worked on. Only one phase can be active at a time.
- done/ — completed. The document stays as historical reference.
Phase Document Structure¶
# Phase 52: Single Executable Release
## Goal
What we're building and why.
## Motivation
The problem this solves.
## Approach
Technical details of the implementation.
## Files to Create
- list of new files
## Files to Modify
- list of existing files to change
## Definition of Done
- [ ] Checklist of acceptance criteria
Phase Tracking in context.yaml¶
The state section in context.yaml tracks all phases:
state:
done:
p0001: "Initial pipeline: fetch ticket, checkout, plan, execute, commit"
p0002: "Retry and resilience: Polly policies, test retry loop"
# ...
p0052: "Single executable release: binaries for 5 platforms, GitHub Releases"
active: {}
planned:
p0023: "Multi-repo support → .agentsmith/phases/planned/p0023-multi-repo.md"
p0025: "PR review iteration → .agentsmith/phases/planned/p0025-pr-review.md"
Phases are numbered sequentially (p0001, p0002, ..., p0052). The description after the number is a one-line summary. Planned phases link to their full document.
Runs¶
A run is a single execution of a pipeline against a ticket or task. Each run produces artifacts:
plan.md¶
The execution plan generated by the AI before writing code. Contains:
- Analysis of the ticket and relevant code
- Step-by-step implementation plan
- Files to modify and why
- Test strategy
result.md¶
The outcome of the run. Contains YAML frontmatter with machine-readable data:
---
ticket: "#57 — GET /todos returns 500 when database is empty"
project: todo-list
date: 2026-02-24
result: success
branch: fix/57
pr_url: https://github.com/org/repo/pull/42
duration_seconds: 50
cost:
total_usd: 0.0682
phases:
scout:
model: claude-haiku-4-5-20251001
input_tokens: 12450
output_tokens: 890
turns: 3
usd: 0.0062
primary:
model: claude-sonnet-4-20250514
input_tokens: 45200
output_tokens: 8900
turns: 7
usd: 0.0620
---
## Summary
What was done and why.
## Changes
Files modified with explanations.
## Decisions
Architectural choices made during execution.
## Test Results
Pass/fail status of the test suite.
Run Numbering¶
Run identifiers use an ISO-8601 UTC timestamp plus a 4-hex collision suffix and a slug — {yyyy-MM-ddTHH-mm-ss}-{4hex}-{slug}. Lexicographically sortable, filesystem-safe across every OS, readable in ls, and collision-resistant under same-second batch enqueue (the 16-bit suffix kills the race that hit the old r{NN} counter when two tickets queued in the same second).
runs/
├── 2026-05-20T22-27-43-8a3f-fix-login-bug/
├── 2026-05-20T22-29-11-4c19-add-search-endpoint/
├── 2026-05-20T23-04-02-b7d1-security-scan-api/
└── 2026-05-21T08-17-55-2e09-fix-null-reference/
Each run also appends an entry under a top-level runs: key in context.yaml:
runs:
"2026-05-20T22-27-43-8a3f": "fix #54: null ref in UserService"
"2026-05-20T22-29-11-4c19": "feat #61: add /search endpoint"
Display sites render the timestamp with seconds plus the suffix (Run 2026-05-20 22:27:43 UTC (8a3f)) so the operator can map a header back to a directory at a glance.
Pre-p0156 directories using the old r{NN} format are invisible to wiki compilation; document them in CHANGELOG.md if you have any.
Agent Smith Evolution¶
Agent Smith itself is built using this phase workflow. The timeline below shows how the project grew from core infrastructure to a full multi-agent orchestration system:
timeline
title Agent Smith Evolution
section Foundation
p0001 : Core Infrastructure
p0006 : Retry & Resilience
p0011 : Multi-Provider (Claude, OpenAI, Ollama)
section Skill System
p0034 : Multi-Skill Architecture
p0037 : Strategy Pattern
p0038 : MAD Discussion Pipeline
section Security
p0043b : Security Pipeline
p0054 : 91 Pattern Scanner
p0055 : Findings Compression
p0064 : Typed Skill Orchestration
Every one of these phases has a document in .agentsmith/phases/done/ that explains the why, the how, and the definition of done. See Self-Documentation for the full picture.
Pipeline Cost Reference¶
Real numbers from actual pipeline runs. Costs depend on model choice, codebase size, and finding density.
| Pipeline | LLM Calls | Avg. Cost | Tokens (in/out) | Output |
|---|---|---|---|---|
| security-scan | 9 | $0.35 | 52k / 13k | 16 confirmed findings |
| api-scan | ~12 | ~$0.45 | ~60k / 15k | SARIF + findings report |
| fix-bug | 8--12 | $0.07--0.40 | varies | PR with code change |
| legal-analysis | ~6 | ~$0.25 | ~35k / 10k | German legal analysis |
| mad-discussion | ~15 | ~$0.55 | ~80k / 20k | Discussion document + PR |
Real vs. estimated
The security-scan row reflects a verified run (2026-04-09, agent-smith repo, claude-sonnet-4). Other rows are estimates based on typical runs. See Cost Tracking for querying your own cost history.
The Complete Workflow¶
When Agent Smith processes a ticket:
- Plan — generates
plan.mdwith the approach - Execute — writes code, guided by the plan
- Test — runs the test suite
- Result — writes
result.mdwith cost and decision data - PR — commits everything, opens PR, includes result.md
The PR reviewer sees not just the code changes but also:
- What the agent planned to do
- What it actually did (and why it deviated, if applicable)
- How much it cost
- What decisions it made and why
This transparency is the point. When the agent's code breaks six months later, you'll know what it was thinking.