Skip to content

Connect your AI provider

Agent Smith calls the AI provider directly from your infrastructure — no SaaS in between, no proxy. Pick the one you have an API key for (or run Ollama for fully local).

Every provider config goes under agents: in agentsmith.yml and gets a catalog key you reference from projects.X.agent. You can have more than one agent registered (a Claude one and a local Ollama one for cost-sensitive runs, for example) and pick per project.

Model roles

Agent Smith uses different models at different points in a run. Each agent block has a models: map with one entry per role:

Role What it does Model size that makes sense
scout Walks the codebase, picks relevant files. High call volume. Cheap. gpt-4.1-mini, claude-haiku, gemini-flash.
primary Writes the code, reviews, makes the actual changes. The good one. gpt-4.1, claude-sonnet, gemini-pro.
planning Generates the plan from the ticket + scout output. Usually same as primary.
summarization Context compaction when conversation gets long. Cheap. *-mini / *-haiku.
code_map_generation Builds a navigable code map (optional). Cheap.
context_generation Generates the per-repo .agentsmith/context.yaml during init-project. Cheap.

Roles you don't configure fall back to primary. If primary isn't set either, the agent block is invalid.

Anthropic Claude

Direct Anthropic API. First-class support with prompt caching on by default.

agents:
  default-claude:
    type: claude
    cache: { is_enabled: true, strategy: automatic }   # default — leave on
    retry: { max_retries: 5, initial_delay_ms: 2000, backoff_multiplier: 2.0 }
    compaction: { is_enabled: true, threshold_iterations: 8, keep_recent_iterations: 3 }
    models:
      scout:   { model: claude-haiku-4-5-20251001 }
      primary: { model: claude-sonnet-4-6 }
      planning:      { model: claude-sonnet-4-6 }
      summarization: { model: claude-haiku-4-5-20251001 }

secrets:
  claude_api_key: ${ANTHROPIC_API_KEY}

Prompt caching reuses the system prompt + tool definitions + coding principles between iterations of the same run. Typical cost saving on a fix-bug run is 40-60%.

OpenAI

Direct OpenAI API. Supports the reasoning models (o3, o4, etc.) — set them under primary if you want them.

agents:
  default-openai:
    type: openai
    retry: { max_retries: 5, initial_delay_ms: 2000, backoff_multiplier: 2.0 }
    models:
      scout:   { model: gpt-4.1-mini, max_tokens: 4096 }
      primary: { model: gpt-4.1,      max_tokens: 8192 }
      planning:      { model: gpt-4.1,      max_tokens: 4096 }
      summarization: { model: gpt-4.1-mini, max_tokens: 2048 }

secrets:
  openai_api_key: ${OPENAI_API_KEY}

Azure OpenAI

OpenAI through your Azure subscription. Same models, different routing. Per-model deployment names are required (those map to your Azure deployment slots).

agents:
  azure-openai-default:
    type: azure_openai
    endpoint: https://oai-acme-dev.openai.azure.com
    api_version: 2025-01-01-preview
    cache: { is_enabled: true, strategy: automatic }
    models:
      scout:   { model: gpt-4.1-mini, deployment: gpt-4o-mini-deployment, max_tokens: 4096 }
      primary: { model: gpt-4.1,      deployment: gpt4-1-deployment,     max_tokens: 8192 }
      planning:      { model: gpt-4.1,      deployment: gpt4-1-deployment,     max_tokens: 4096 }
      summarization: { model: gpt-4.1-mini, deployment: gpt-4o-mini-deployment, max_tokens: 2048 }
    pricing:
      models:
        gpt-4.1:      { input_per_million: 2.0,  output_per_million: 8.0,  cache_read_per_million: 0.50 }
        gpt-4.1-mini: { input_per_million: 0.40, output_per_million: 1.60, cache_read_per_million: 0.10 }

secrets:
  azure_openai_api_key: ${AZURE_OPENAI_API_KEY}

The pricing block is optional but recommended — it lets Agent Smith report dollar cost per run. Without it, only token counts are tracked.

Google Gemini

Direct Gemini API via Google AI Studio or Vertex AI.

agents:
  default-gemini:
    type: gemini
    models:
      scout:   { model: gemini-2.5-flash }
      primary: { model: gemini-2.5-pro }
      planning:      { model: gemini-2.5-pro }
      summarization: { model: gemini-2.5-flash }

secrets:
  gemini_api_key: ${GEMINI_API_KEY}

Ollama (local)

Local models running on your machine. No API key, no internet egress, no cloud cost.

agents:
  local-ollama:
    type: ollama
    base_url: http://localhost:11434
    models:
      scout:   { model: llama3.3:8b }
      primary: { model: llama3.3:70b }
      planning:      { model: llama3.3:70b }
      summarization: { model: llama3.3:8b }

No secrets needed — Ollama is unauthenticated by default. Bring up the Ollama daemon (ollama serve) and pull the models you reference (ollama pull llama3.3:70b). For Docker / k8s hosts, point base_url at the Ollama service (e.g. http://ollama.default.svc.cluster.local:11434).

A 70B model on a 24GB GPU does the job. Smaller models (8B) work for scout / summarization but tend to produce shaky code on the primary role.

OpenAI-compatible (Groq, LM Studio, vLLM, your own endpoint)

Any service that speaks the OpenAI Chat Completions API.

agents:
  groq-default:
    type: openai_compatible
    base_url: https://api.groq.com/openai/v1
    models:
      scout:   { model: llama-3.3-70b-versatile }
      primary: { model: llama-3.3-70b-versatile }

secrets:
  groq_api_key: ${GROQ_API_KEY}

The same shape covers LM Studio (http://localhost:1234/v1), vLLM (http://your-vllm-host:8000/v1), and any other endpoint that returns OpenAI-shaped responses.

Picking models per project

You don't have to pick one provider for everything. Two patterns work:

One agent per environment. Cheap models for dev, good models for prod:

agents:
  cheap-ollama:   { type: ollama, base_url: http://localhost:11434, models: { primary: { model: llama3.3:70b } } }
  premium-claude: { type: claude, models: { primary: { model: claude-sonnet-4-6 } } }

projects:
  todolist-dev:    { agent: cheap-ollama,   tracker: acme-issues, repos: [todolist] }
  todolist-prod:   { agent: premium-claude, tracker: acme-issues, repos: [todolist] }

Same agent, different model per role. The default. Scout runs on the cheap model, primary on the good one — already shown in every example above.

Cost transparency

Whichever provider you pick, every run records token usage and (if pricing is configured) dollar cost into .agentsmith/runs/{run-id}/result.md. Six months later you can answer "what did the auth refactor actually cost?" without guessing. See Cost tracking in Reference for the detail of what gets recorded.

Next

  • First run — once one provider is wired up, do a run.
  • Skills catalog — the role definitions the agent uses come from a separately-versioned repo.
  • Host it.