Skip to content

Phases & Runs

Agent Smith tracks its own development through a structured workflow of phases and runs. This is the meta-workflow — how the project itself is planned, executed, and documented.

The .agentsmith/ Directory

Every project that Agent Smith works on gets an .agentsmith/ directory:

.agentsmith/
├── context.yaml          # Project description + state tracking
├── coding-principles.md  # Detected coding conventions
├── decisions.md          # Why the agent made each decision
├── code-map.yaml         # LLM-generated code map
├── phases/
│   ├── done/             # Completed phase documents
│   ├── active/           # Currently executing (max 1)
│   └── planned/          # Upcoming phases
├── wiki/                 # LLM-compiled knowledge base
│   ├── index.md          # Master index with backlinks
│   ├── decisions.md      # Compiled decisions across runs
│   ├── known-issues.md   # Recurring problems and solutions
│   ├── patterns.md       # Detected code patterns
│   └── concepts/         # Domain-specific articles
├── security/             # SARIF snapshots for trend analysis
└── runs/
    ├── r01-fix-login-bug/
    │   ├── plan.md       # Execution plan
    │   └── result.md     # Outcome with cost data
    └── r02-add-search/
        ├── plan.md
        └── result.md

The wiki/ directory is maintained by the Project Knowledge Base -- an LLM-compiled wiki that accumulates knowledge from all runs, decisions, and security scans.

Phases

A phase is a unit of planned work — a feature, refactor, or capability addition. Each phase has its own Markdown document describing the goal, motivation, approach, files to create/modify, and definition of done.

Phase Lifecycle

planned/ → active/ → done/
  • planned/ — documented but not started. Includes requirements, approach, and acceptance criteria.
  • active/ — currently being worked on. Only one phase can be active at a time.
  • done/ — completed. The document stays as historical reference.

Phase Document Structure

# Phase 52: Single Executable Release

## Goal
What we're building and why.

## Motivation
The problem this solves.

## Approach
Technical details of the implementation.

## Files to Create
- list of new files

## Files to Modify
- list of existing files to change

## Definition of Done
- [ ] Checklist of acceptance criteria

Phase Tracking in context.yaml

The state section in context.yaml tracks all phases:

state:
  done:
    p0001: "Initial pipeline: fetch ticket, checkout, plan, execute, commit"
    p0002: "Retry and resilience: Polly policies, test retry loop"
    # ...
    p0052: "Single executable release: binaries for 5 platforms, GitHub Releases"
  active: {}
  planned:
    p0023: "Multi-repo support  .agentsmith/phases/planned/p0023-multi-repo.md"
    p0025: "PR review iteration  .agentsmith/phases/planned/p0025-pr-review.md"

Phases are numbered sequentially (p0001, p0002, ..., p0052). The description after the number is a one-line summary. Planned phases link to their full document.

Runs

A run is a single execution of a pipeline against a ticket or task. Each run produces artifacts:

plan.md

The execution plan generated by the AI before writing code. Contains:

  • Analysis of the ticket and relevant code
  • Step-by-step implementation plan
  • Files to modify and why
  • Test strategy

result.md

The outcome of the run. Contains YAML frontmatter with machine-readable data:

---
ticket: "#57  GET /todos returns 500 when database is empty"
project: todo-list
date: 2026-02-24
result: success
branch: fix/57
pr_url: https://github.com/org/repo/pull/42
duration_seconds: 50
cost:
  total_usd: 0.0682
  phases:
    scout:
      model: claude-haiku-4-5-20251001
      input_tokens: 12450
      output_tokens: 890
      turns: 3
      usd: 0.0062
    primary:
      model: claude-sonnet-4-20250514
      input_tokens: 45200
      output_tokens: 8900
      turns: 7
      usd: 0.0620
---

## Summary
What was done and why.

## Changes
Files modified with explanations.

## Decisions
Architectural choices made during execution.

## Test Results
Pass/fail status of the test suite.

Run Numbering

Runs use a global counter (r01, r02, ...) tracked in context.yaml. Each run gets a directory named r{NN}-slug:

runs/
├── r01-fix-login-bug/
├── r02-add-search-endpoint/
├── r03-security-scan-api/
└── r04-fix-null-reference/

Agent Smith Evolution

Agent Smith itself is built using this phase workflow. The timeline below shows how the project grew from core infrastructure to a full multi-agent orchestration system:

timeline
    title Agent Smith Evolution
    section Foundation
        p0001 : Core Infrastructure
        p0006 : Retry & Resilience
        p0011 : Multi-Provider (Claude, OpenAI, Ollama)
    section Skill System
        p0034 : Multi-Skill Architecture
        p0037 : Strategy Pattern
        p0038 : MAD Discussion Pipeline
    section Security
        p0043b : Security Pipeline
        p0054  : 91 Pattern Scanner
        p0055  : Findings Compression
        p0064  : Typed Skill Orchestration

Every one of these phases has a document in .agentsmith/phases/done/ that explains the why, the how, and the definition of done. See Self-Documentation for the full picture.

Pipeline Cost Reference

Real numbers from actual pipeline runs. Costs depend on model choice, codebase size, and finding density.

Pipeline LLM Calls Avg. Cost Tokens (in/out) Output
security-scan 9 $0.35 52k / 13k 16 confirmed findings
api-scan ~12 ~$0.45 ~60k / 15k SARIF + findings report
fix-bug 8--12 $0.07--0.40 varies PR with code change
legal-analysis ~6 ~$0.25 ~35k / 10k German legal analysis
mad-discussion ~15 ~$0.55 ~80k / 20k Discussion document + PR

Real vs. estimated

The security-scan row reflects a verified run (2026-04-09, agent-smith repo, claude-sonnet-4). Other rows are estimates based on typical runs. See Cost Tracking for querying your own cost history.

The Complete Workflow

When Agent Smith processes a ticket:

  1. Plan — generates plan.md with the approach
  2. Execute — writes code, guided by the plan
  3. Test — runs the test suite
  4. Result — writes result.md with cost and decision data
  5. PR — commits everything, opens PR, includes result.md

The PR reviewer sees not just the code changes but also:

  • What the agent planned to do
  • What it actually did (and why it deviated, if applicable)
  • How much it cost
  • What decisions it made and why

This transparency is the point. When the agent's code breaks six months later, you'll know what it was thinking.