Skip to content

Polling Setup

Polling is an alternative ingress path to webhooks. Instead of the platform pushing events to Agent Smith, Agent Smith pulls eligible tickets on an interval. Both paths feed the same TicketClaimService — downstream behaviour is identical.

Use polling when your Agent Smith deployment cannot accept inbound HTTP from the platform (private Kubernetes, no public ingress, restrictive firewall). Use webhooks when you need low-latency triggering and have a reachable endpoint.

For the deeper comparison: Polling vs Webhooks.

Prerequisites

  • Agent Smith running in server mode (agent-smith server --port 8081).
  • Redis reachable via REDIS_URL env var. Polling needs the same Redis as the claim/queue infrastructure. If REDIS_URL is unset or Redis is unreachable, the poller reports Disabled/Degraded on /health and /health/ready returns 503 — see Server Resilience.
  • The configured ticket auth (e.g. GITHUB_TOKEN) has read access to the project AND write access to issue labels — pollers and the lifecycle transitioner both write agent-smith:* labels.

Current Platform Coverage

All four platforms support polling. Each cycle does two listings per project:

  1. Discovery — finds new tickets carrying any trigger label from pipeline_from_label (e.g. fix, feature, security-review) that have no agent-smith:* lifecycle label yet. Equivalent to what the webhook handler does on receipt of a label event.
  2. Catchup — finds tickets already tagged agent-smith:pending, for any case where a previous cycle claimed but didn't transition (crash, restart, transient API failure).

Both result sets are deduped by ticket id and filtered to lifecycle ∈ {none, Pending} before producing claim requests, so a ticket already enqueued/in-progress is never reclaimed.

Platform Discovery API Catchup API
GitHub GET /issues?labels={triggerLabel}&state=open (one call per trigger label, deduped) GET /issues?labels=agent-smith:pending&state=all
GitLab GET /issues?labels={triggerLabel}&state=opened (one call per trigger label, deduped) GET /issues?labels=agent-smith:pending&state=opened
Azure DevOps WIQL [System.Tags] CONTAINS '{l1}' OR [System.Tags] CONTAINS '{l2}' AND [System.State] IN (openStates) WIQL [System.Tags] CONTAINS 'agent-smith:pending'
Jira JQL project = "{key}" AND labels in ({trigger labels}) AND statusCategory != Done (single call) JQL project = "{key}" AND labels = "agent-smith:pending"

Discovery is off when a project has no pipeline_from_label entries (e.g. webhook-only deployments) — only the catchup query runs, preserving legacy behavior.

Required token scopes

Platform Scope / permission
GitHub Token with repo (read access to issues + write access to labels)
GitLab Personal access token with api scope
Azure DevOps PAT with Work Items: Read & Write
Jira API token + email; user must have Browse Projects + Edit Issues on the project

Platform-specific notes

  • Azure DevOps: WIQL also filters by the project's configured open_states (default New/Active/Committed). A Pending-tagged work item already in Closed is not picked up.
  • Jira: Label-mode only in the current implementation. If tickets.project is set in agentsmith.yml, the JQL is scoped to that project key; otherwise the search is instance-wide and matches any issue with the lifecycle label. Native-status-mode polling (probing JiraWorkflowCatalog for transitions) is deferred.
  • GitLab: Listing returns at most 100 issues per cycle (per-page max). Backlog drains naturally over multiple cycles.

Minimal Configuration

projects:
  my-api:
    source:
      type: GitHub
      url: https://github.com/org/my-api
    tickets:
      type: GitHub
      url: https://github.com/org/my-api
      auth: token
    pipeline: fix-bug

    polling:
      enabled: true            # default: false

That's the minimum. Defaults are sensible:

Key Default Description
enabled false Whether to poll this project
interval_seconds 60 Base sleep between poll cycles
jitter_percent 10 Random ±% applied to the interval

Full Configuration

projects:
  my-api:
    # ... source, tickets, agent ...

    pipeline: fix-bug

    # Trigger config decides which pipeline a polled ticket runs.
    # Polling and webhook share this section — pipeline_from_label
    # maps a user-facing label on the ticket to a pipeline; lifecycle
    # labels (agent-smith:*) are filtered before matching. First key
    # in the map whose value appears on the ticket wins; default_pipeline
    # is the fallback when nothing matches. Same semantics on both paths
    # since p0099a.
    github_trigger:
      default_pipeline: fix-bug
      pipeline_from_label:
        agent-smith: fix-bug
        security-review: security-scan

    polling:
      enabled: true
      interval_seconds: 30
      jitter_percent: 15

agent:
  queue:
    max_parallel_jobs: 4       # consumer-side backpressure
    consume_block_seconds: 5
    shutdown_grace_seconds: 30

Coexistence with Webhooks

Both paths can be active for the same project. Whichever fires first wins the SETNX claim-lock; the second sees the ticket already in Enqueued status and returns AlreadyClaimed cleanly. No duplicate pipeline runs.

Use this for redundancy: webhook for low-latency in normal operation, polling as a safety net during webhook outages or platform delivery delays.

How Polling Runs

A single replica per process holds the agentsmith:leader:poller Redis lease (30s TTL, renewed every 10s). The leader runs PollerHostedService, which:

  1. Loops over every project with polling.enabled: true.
  2. Calls each platform poller's PollAsync in parallel via Task.WhenAll with a 20s per-poller timeout.
  3. For every returned ClaimRequest, calls TicketClaimService.ClaimAsync sequentially.
  4. Sleeps for min(interval_seconds across pollers) ± jitter, then loops.

If the leader pod crashes, another pod acquires the lease within ~30s. Followers don't poll but still process queue items as workers (the consumer is per-pod, not leader-only).

Pipeline routing per ticket

Each candidate is routed individually via pipeline_from_label — the same map webhooks use. The shared PipelineResolver (in Application/Services/Polling):

  1. Strips lifecycle labels (agent-smith:*) from the ticket's labels — they never satisfy a pipeline_from_label key.
  2. Iterates pipeline_from_label in YAML insertion order; first key whose value is a label on the ticket wins.
  3. Falls back to default_pipeline if no key matches and the map is empty; returns null otherwise (caller's last-resort default applies — "fix-bug").

Same semantics as the per-platform webhook resolvers. Operators with an existing webhook config get label-aware polling automatically — no migration.

Operator Tasks

Bootstrap a project for polling

  1. Add polling: { enabled: true } to the project in agentsmith.yml.
  2. Restart the Agent Smith server (or wait — config is reloaded each cycle, but a restart guarantees clean DI registration of the poller).
  3. On the next cycle, eligible Pending-labelled tickets begin claiming.

Add a Pending-labelled ticket without a webhook

Add the configured trigger label (e.g. agent-smith) to a GitHub issue. The poller picks it up on its next cycle. Lifecycle proceeds: Pending → Enqueued → InProgress → Done/Failed.

Reset a stuck ticket

If a ticket is stuck in agent-smith:in-progress and you've confirmed no pipeline is actually running:

  • Wait up to 1 minute — StaleJobDetector reverts InProgress without a heartbeat to Pending automatically.
  • Or manually: remove the agent-smith:in-progress label. The next claim attempt treats it as Pending.

Disable polling for a project

Set polling.enabled: false (or remove the section). Pollers re-register on the next config load.

Troubleshooting

Stuck leader

Symptom: no poll cycles for >1 minute even though config has enabled: true.

Check: redis-cli GET agentsmith:leader:poller returns a non-empty token but no replica is logging poll cycles. The likely cause is a leader pod that died without releasing — wait for the 30s TTL, then verify a new pod acquires.

Rate-limit breaches

Symptom: 403 / 429 from the platform.

Mitigation: increase interval_seconds (60 → 120 or 180) and ensure jitter_percent is non-zero. The poll uses one listing request per cycle per project, plus one ClaimAsync (which itself reads the ticket once before transitioning), so cycle cost scales with project count, not ticket count.

Orphaned Enqueued tickets

Symptom: tickets are stuck in agent-smith:enqueued but never run.

Check: is a pod with IRedisJobQueue consumer running? agent-smith server starts both the consumer and the poller leader; without it, queue items just accumulate. EnqueuedReconciler will eventually re-push within 10 minutes, but the consumer must run to drain the queue.

Polling enabled but no claims

Symptom: polling.enabled: true, ticket has the trigger label, but nothing happens.

Checks (in order):

  • The trigger label on the ticket is one of the keys in pipeline_from_label (e.g. fix, feature) — discovery picks it up automatically. Manually adding agent-smith:pending is no longer necessary.
  • The token has the listing scope from the table above.
  • For Jira, tickets.project is set if you have multiple projects on the same instance — otherwise the JQL search may match issues you didn't expect.
  • For Azure DevOps, the work item is in one of tickets.open_states (default New/Active/Committed).
  • Check the leader log: agentsmith:leader:poller is held by exactly one pod, and it logs poll cycles.