Connect your AI provider¶

Agent Smith calls the AI provider directly from your infrastructure — no SaaS in between, no proxy. Pick the one you have an API key for (or run Ollama for fully local).

Every provider config goes under agents: in agentsmith.yml and gets a catalog key you reference from projects.X.agent. You can have more than one agent registered (a Claude one and a local Ollama one for cost-sensitive runs, for example) and pick per project.

Model roles¶

Agent Smith uses different models at different points in a run. Each agent block has a models: map with one entry per role:

Role	What it does	Model size that makes sense
`scout`	Walks the codebase, picks relevant files. High call volume.	Cheap. `gpt-4.1-mini`, `claude-haiku`, `gemini-flash`.
`primary`	Writes the code, reviews, makes the actual changes.	The good one. `gpt-4.1`, `claude-sonnet`, `gemini-pro`.
`planning`	Generates the plan from the ticket + scout output.	Usually same as `primary`.
`summarization`	Context compaction when conversation gets long.	Cheap. `-mini` / `-haiku`.
`code_map_generation`	Builds a navigable code map (optional).	Cheap.
`context_generation`	Generates the per-repo `.agentsmith/context.yaml` during `init-project`.	Cheap.

Roles you don't configure fall back to primary. If primary isn't set either, the agent block is invalid.

Anthropic Claude¶

Direct Anthropic API. First-class support with prompt caching on by default. Subscription OAuth tokens (sk-ant-oat01-…) work as the key too — handy if your Claude usage is subscription-funded rather than API-billed; note the conservative default rate limits below.

agents:
  default-claude:
    type: claude
    cache: { is_enabled: true, strategy: automatic }   # default — leave on
    retry: { max_retries: 5, initial_delay_ms: 2000, backoff_multiplier: 2.0 }
    compaction: { is_enabled: true, threshold_iterations: 8, keep_recent_iterations: 3 }
    models:
      scout:   { model: claude-haiku-4-5-20251001 }
      primary: { model: claude-sonnet-4-6 }
      planning:      { model: claude-sonnet-4-6 }
      summarization: { model: claude-haiku-4-5-20251001 }

secrets:
  claude_api_key: ${ANTHROPIC_API_KEY}

Prompt caching reuses the system prompt + tool definitions + coding principles between iterations of the same run. Typical cost saving on a fix-bug run is 40-60%. The cached share is recorded per LLM call and visible in the dashboard's cost breakdown — a caching-enabled run showing 0% cached means the cache is dead, treat that as an alarm. See Cost tracking.

Rate limits and timeouts (all providers)¶

Two knobs on the agent block you'll want to know exist before they bite:

agents:
  default-claude:
    # ...
    rate_limit:
      requests_per_minute: 50
      input_tokens_per_minute: 80000
    network_timeout_seconds: 300   # default; per-call HTTP timeout

The framework rate-limits itself per (provider, model) with token buckets on requests and estimated input tokens, so a parallel scan doesn't slam into the provider's 429s. When rate_limit is unset, a conservative default is picked based on the agent type — subscription OAuth tokens get a much tighter budget (think 5 requests / 20k tokens per minute) than API keys, so if runs feel throttled on a subscription token, set rate_limit to your actual tier. agent-smith doctor calls this out.

OpenAI¶

Direct OpenAI API. Supports the reasoning models (o3, o4, etc.) — set them under primary if you want them.

agents:
  default-openai:
    type: openai
    retry: { max_retries: 5, initial_delay_ms: 2000, backoff_multiplier: 2.0 }
    models:
      scout:   { model: gpt-4.1-mini, max_tokens: 4096 }
      primary: { model: gpt-4.1,      max_tokens: 8192 }
      planning:      { model: gpt-4.1,      max_tokens: 4096 }
      summarization: { model: gpt-4.1-mini, max_tokens: 2048 }

secrets:
  openai_api_key: ${OPENAI_API_KEY}

Azure OpenAI¶

OpenAI through your Azure subscription. Same models, different routing. Per-model deployment names are required (those map to your Azure deployment slots).

agents:
  azure-openai-default:
    type: azure_openai
    endpoint: https://oai-acme-dev.openai.azure.com
    api_version: 2025-01-01-preview
    cache: { is_enabled: true, strategy: automatic }
    models:
      scout:   { model: gpt-4.1-mini, deployment: gpt-4o-mini-deployment, max_tokens: 4096 }
      primary: { model: gpt-4.1,      deployment: gpt4-1-deployment,     max_tokens: 8192 }
      planning:      { model: gpt-4.1,      deployment: gpt4-1-deployment,     max_tokens: 4096 }
      summarization: { model: gpt-4.1-mini, deployment: gpt-4o-mini-deployment, max_tokens: 2048 }
    pricing:
      models:
        gpt-4.1:      { input_per_million: 2.0,  output_per_million: 8.0,  cache_read_per_million: 0.50 }
        gpt-4.1-mini: { input_per_million: 0.40, output_per_million: 1.60, cache_read_per_million: 0.10 }

secrets:
  azure_openai_api_key: ${AZURE_OPENAI_API_KEY}

The pricing block is optional but recommended — it lets Agent Smith report dollar cost per run. Without it, only token counts are tracked.

Google Gemini¶

Direct Gemini API via Google AI Studio or Vertex AI.

agents:
  default-gemini:
    type: gemini
    models:
      scout:   { model: gemini-2.5-flash }
      primary: { model: gemini-2.5-pro }
      planning:      { model: gemini-2.5-pro }
      summarization: { model: gemini-2.5-flash }

secrets:
  gemini_api_key: ${GEMINI_API_KEY}

Ollama (local)¶

Local models running on your machine. No API key, no internet egress, no cloud cost.

agents:
  local-ollama:
    type: ollama
    base_url: http://localhost:11434
    models:
      scout:   { model: llama3.3:8b }
      primary: { model: llama3.3:70b }
      planning:      { model: llama3.3:70b }
      summarization: { model: llama3.3:8b }

No secrets needed — Ollama is unauthenticated by default. Bring up the Ollama daemon (ollama serve) and pull the models you reference (ollama pull llama3.3:70b). For Docker / k8s hosts, point base_url at the Ollama service (e.g. http://ollama.default.svc.cluster.local:11434).

A 70B model on a 24GB GPU does the job. Smaller models (8B) work for scout / summarization but tend to produce shaky code on the primary role.

OpenAI-compatible (Groq, LM Studio, vLLM, your own endpoint)¶

Any service that speaks the OpenAI Chat Completions API.

agents:
  groq-default:
    type: openai_compatible
    base_url: https://api.groq.com/openai/v1
    models:
      scout:   { model: llama-3.3-70b-versatile }
      primary: { model: llama-3.3-70b-versatile }

secrets:
  groq_api_key: ${GROQ_API_KEY}

The same shape covers LM Studio (http://localhost:1234/v1), vLLM (http://your-vllm-host:8000/v1), and any other endpoint that returns OpenAI-shaped responses.

Picking models per project¶

You don't have to pick one provider for everything. Two patterns work:

One agent per environment. Cheap models for dev, good models for prod:

agents:
  cheap-ollama:   { type: ollama, base_url: http://localhost:11434, models: { primary: { model: llama3.3:70b } } }
  premium-claude: { type: claude, models: { primary: { model: claude-sonnet-4-6 } } }

projects:
  todolist-dev:    { agent: cheap-ollama,   tracker: acme-issues, repos: [todolist] }
  todolist-prod:   { agent: premium-claude, tracker: acme-issues, repos: [todolist] }

Same agent, different model per role. The default. Scout runs on the cheap model, primary on the good one — already shown in every example above.

Cost transparency¶

Whichever provider you pick, every run records token usage and (if pricing is configured) dollar cost into .agentsmith/runs/{run-id}/result.md. Six months later you can answer "what did the auth refactor actually cost?" without guessing. See Cost tracking in Reference for the detail of what gets recorded.

Next¶

First run — once one provider is wired up, agent-smith demo proves it in minutes.
Skills catalog — the role definitions the agent runs come embedded with every release.
Host it.