Phases & Runs¶
Agent Smith tracks its own development through a structured workflow of phases and runs. This is the meta-workflow — how the project itself is planned, executed, and documented.
The .agentsmith/ Directory¶
Every project that Agent Smith works on gets an .agentsmith/ directory:
.agentsmith/
├── context.yaml # Project description + state tracking
├── coding-principles.md # Detected coding conventions
├── decisions.md # Why the agent made each decision
├── code-map.yaml # LLM-generated code map
├── phases/
│ ├── done/ # Completed phase documents
│ ├── active/ # Currently executing (max 1)
│ └── planned/ # Upcoming phases
├── wiki/ # LLM-compiled knowledge base
│ ├── index.md # Master index with backlinks
│ ├── decisions.md # Compiled decisions across runs
│ ├── known-issues.md # Recurring problems and solutions
│ ├── patterns.md # Detected code patterns
│ └── concepts/ # Domain-specific articles
├── security/ # SARIF snapshots for trend analysis
└── runs/
├── r01-fix-login-bug/
│ ├── plan.md # Execution plan
│ └── result.md # Outcome with cost data
└── r02-add-search/
├── plan.md
└── result.md
The wiki/ directory is maintained by the Project Knowledge Base -- an LLM-compiled wiki that accumulates knowledge from all runs, decisions, and security scans.
Phases¶
A phase is a unit of planned work — a feature, refactor, or capability addition. Each phase has its own Markdown document describing the goal, motivation, approach, files to create/modify, and definition of done.
Phase Lifecycle¶
- planned/ — documented but not started. Includes requirements, approach, and acceptance criteria.
- active/ — currently being worked on. Only one phase can be active at a time.
- done/ — completed. The document stays as historical reference.
Phase Document Structure¶
# Phase 52: Single Executable Release
## Goal
What we're building and why.
## Motivation
The problem this solves.
## Approach
Technical details of the implementation.
## Files to Create
- list of new files
## Files to Modify
- list of existing files to change
## Definition of Done
- [ ] Checklist of acceptance criteria
Phase Tracking in context.yaml¶
The state section in context.yaml tracks all phases:
state:
done:
p0001: "Initial pipeline: fetch ticket, checkout, plan, execute, commit"
p0002: "Retry and resilience: Polly policies, test retry loop"
# ...
p0052: "Single executable release: binaries for 5 platforms, GitHub Releases"
active: {}
planned:
p0023: "Multi-repo support → .agentsmith/phases/planned/p0023-multi-repo.md"
p0025: "PR review iteration → .agentsmith/phases/planned/p0025-pr-review.md"
Phases are numbered sequentially (p0001, p0002, ..., p0052). The description after the number is a one-line summary. Planned phases link to their full document.
Runs¶
A run is a single execution of a pipeline against a ticket or task. Each run produces artifacts:
plan.md¶
The execution plan generated by the AI before writing code. Contains:
- Analysis of the ticket and relevant code
- Step-by-step implementation plan
- Files to modify and why
- Test strategy
result.md¶
The outcome of the run. Contains YAML frontmatter with machine-readable data:
---
ticket: "#57 — GET /todos returns 500 when database is empty"
project: todo-list
date: 2026-02-24
result: success
branch: fix/57
pr_url: https://github.com/org/repo/pull/42
duration_seconds: 50
cost:
total_usd: 0.0682
phases:
scout:
model: claude-haiku-4-5-20251001
input_tokens: 12450
output_tokens: 890
turns: 3
usd: 0.0062
primary:
model: claude-sonnet-4-20250514
input_tokens: 45200
output_tokens: 8900
turns: 7
usd: 0.0620
---
## Summary
What was done and why.
## Changes
Files modified with explanations.
## Decisions
Architectural choices made during execution.
## Test Results
Pass/fail status of the test suite.
Run Numbering¶
Runs use a global counter (r01, r02, ...) tracked in context.yaml. Each run gets a directory named r{NN}-slug:
runs/
├── r01-fix-login-bug/
├── r02-add-search-endpoint/
├── r03-security-scan-api/
└── r04-fix-null-reference/
Agent Smith Evolution¶
Agent Smith itself is built using this phase workflow. The timeline below shows how the project grew from core infrastructure to a full multi-agent orchestration system:
timeline
title Agent Smith Evolution
section Foundation
p0001 : Core Infrastructure
p0006 : Retry & Resilience
p0011 : Multi-Provider (Claude, OpenAI, Ollama)
section Skill System
p0034 : Multi-Skill Architecture
p0037 : Strategy Pattern
p0038 : MAD Discussion Pipeline
section Security
p0043b : Security Pipeline
p0054 : 91 Pattern Scanner
p0055 : Findings Compression
p0064 : Typed Skill Orchestration
Every one of these phases has a document in .agentsmith/phases/done/ that explains the why, the how, and the definition of done. See Self-Documentation for the full picture.
Pipeline Cost Reference¶
Real numbers from actual pipeline runs. Costs depend on model choice, codebase size, and finding density.
| Pipeline | LLM Calls | Avg. Cost | Tokens (in/out) | Output |
|---|---|---|---|---|
| security-scan | 9 | $0.35 | 52k / 13k | 16 confirmed findings |
| api-scan | ~12 | ~$0.45 | ~60k / 15k | SARIF + findings report |
| fix-bug | 8--12 | $0.07--0.40 | varies | PR with code change |
| legal-analysis | ~6 | ~$0.25 | ~35k / 10k | German legal analysis |
| mad-discussion | ~15 | ~$0.55 | ~80k / 20k | Discussion document + PR |
Real vs. estimated
The security-scan row reflects a verified run (2026-04-09, agent-smith repo, claude-sonnet-4). Other rows are estimates based on typical runs. See Cost Tracking for querying your own cost history.
The Complete Workflow¶
When Agent Smith processes a ticket:
- Plan — generates
plan.mdwith the approach - Execute — writes code, guided by the plan
- Test — runs the test suite
- Result — writes
result.mdwith cost and decision data - PR — commits everything, opens PR, includes result.md
The PR reviewer sees not just the code changes but also:
- What the agent planned to do
- What it actually did (and why it deviated, if applicable)
- How much it cost
- What decisions it made and why
This transparency is the point. When the agent's code breaks six months later, you'll know what it was thinking.