Polling vs Webhooks¶

Both ingress paths feed the same TicketClaimService — once a claim succeeds, the pipeline runs identically. The choice is operational: latency, network topology, cost, and platform coverage.

Quick Decision¶

You have...	Use
A reachable HTTPS endpoint and want minimum latency	Webhooks
Network egress only (private K8s, no inbound)	Polling
A flaky webhook delivery history and want self-healing	Both (coexistence is safe)
A cost-sensitive deployment with rare ticket activity	Webhooks (polling does API calls even when nothing happens)

Decision Matrix¶

Dimension	Webhooks	Polling
Latency from label to claim	Sub-second (delivery time)	`interval_seconds ± jitter` (default 60s)
Network requirement	Public HTTPS endpoint reachable from the platform	Outbound only — Agent Smith calls the platform's REST API
Setup per platform	Webhook config in platform UI + secret in env	One YAML stanza, no platform-side changes
API call cost (steady state)	Zero unless events fire	Two listings per cycle per project (discovery + catchup) — bounded, project-scoped
API call cost (active)	One per event	Same listing cost — events ride within the existing cycle
Ingress coverage	Tickets entering with a trigger label fire immediately	Tickets entering with a trigger label are discovered next cycle (≤ `interval_seconds`)
Resilience to receiver pod restart	Events delivered while pod down are lost (platform-dependent retry)	Next cycle catches up; no events to miss
Resilience to platform delivery delays	Subject to platform's webhook queue health	Independent — pulls from current state
Multi-replica behaviour	Every replica receives webhooks; whoever wins SETNX claims	Single leader polls; followers run consumer + housekeeping
Authentication	Webhook signature (HMAC for GitHub, token for GitLab, etc.)	Same auth as the ticket provider (PAT, API token)
Platform coverage today	All four (GitHub, GitLab, AzDO, Jira)	All four (GitHub, GitLab, AzDO, Jira; Jira is label-mode only)

Common Scenarios¶

"We run in private Kubernetes with no public ingress"¶

Polling is the only option. Set up:

projects:
  internal-api:
    polling:
      enabled: true
      interval_seconds: 60

Works for all four platforms — see polling.md for per-platform listing details and required token scopes.

"We have webhooks but want a safety net"¶

Enable both. They coexist via the SETNX claim-lock — first claimer wins, second sees AlreadyClaimed. Polling at a relaxed interval (180–300s) acts as a periodic reconcile against missed webhook deliveries:

projects:
  my-api:
    github_trigger:
      pipeline_from_label:
        agent-smith: fix-bug
    polling:
      enabled: true
      interval_seconds: 180

In normal operation: the webhook fires within seconds and the poller's next cycle finds nothing. When a webhook delivery fails: the poller picks up the orphaned Pending ticket within ~3 minutes.

"We need <5 second latency"¶

Webhooks. Polling's floor is interval_seconds minus jitter (so ~54s with defaults). Setting interval_seconds: 5 would meet the latency requirement but burns rate limit — and your platform may throttle you.

"We're cost-sensitive and tickets are rare"¶

Webhooks. A project with one triggered ticket per week, polled every 60s, makes 10,080 listing calls per week to find one ticket. Webhooks make exactly one inbound delivery for that ticket. The platform's free tier likely covers both, but the math swings hard for high project counts.

"We don't trust our webhook signature secret rotation"¶

Polling has a smaller secret blast radius — it uses your existing ticket-provider auth (GITHUB_TOKEN etc.) which already exists for the rest of Agent Smith. No webhook secret to manage, rotate, or leak.

Hybrid: Webhook for Triggers, Poll for Recovery¶

A pattern that doesn't appear elsewhere in this comparison: use webhooks for normal triggering (low latency, cost-effective) and polling at a long interval purely for recovery against occasional webhook misses.

polling:
  enabled: true
  interval_seconds: 300    # 5 min — pure safety net
  jitter_percent: 20

A 5-minute reconcile interval costs ~288 listings per project per day. For most teams that's negligible against API quota and vastly preferable to a Friday-evening missed webhook becoming a Monday-morning surprise.

What Polling Does Not Do¶

No replay of missed events. Polling looks at current ticket state, not event history. If a ticket was Pending → Enqueued → Done in five seconds and the polling interval is 60s, the poller sees Done and does nothing — which is correct.
No comment/PR triggers. Polling drives the ticket-lifecycle path only. PR comments, dialogue answers, and other free-form triggers still need the webhook path. PR-comment webhooks are independent of the lifecycle and don't go through TicketClaimService.
No real-time ack. A webhook responds with HTTP 202 (or 200) in milliseconds, giving the platform a delivery confirmation. Polling has no equivalent — there's no caller to acknowledge. That's a feature when the platform is unreachable, but a downside if your audit trail expects platform-side delivery records.

Polling Setup — config and operations
Webhook Configuration — the alternative ingress
Ticket Lifecycle — what happens after a claim
Setup Guides — overview of all ingress options