2-min read

AgentStatus × Decagon

Independent assurance for production customer AI, with ~10 days of evidence on a 19-brand cohort.

AgentStatus is how platform and GTM teams prove behaviour in the wild: continuous, controlled validate traffic, gold-based expectations, drift-aware alerting, and optional adversarial conformance, run from 800+ nodes across 30 countries. We sit next to Decagon's runtime (HTTP and streaming chat paths your customers already hit in the open). We do not replace Decagon's orchestration, integrations, or safety stack.

800+

Validation nodes

30

Countries

~10 days

Decagon monitoring window

19

Brand monitors

agentstatus.dev | partner brief

What we understand about Decagon

Embedded AI that has to work when traffic, content, and policy pressure are real.

Decagon's public product story centers on Agent Operating Procedures (AOPs), natural-language workflow specs teams can iterate quickly, plus omnichannel execution (chat, email, voice) and always-on scale. That combination is exactly where outside-in drift shows up first: policy tweaks, retrieval changes, routing edges, and model swaps that never ship as a press release but still move customer-visible behaviour.

Decagon powers concierge-grade support automation for a long tail of recognizable enterprises and consumer brands. In practice, buyers are judging not only tone and resolution rate, but whether the assistant stays reachable, coherent, and policy-aligned as traffic mixes, locales, and abuse patterns change.

For a platform like Decagon, the adjacent question an independent layer can answer is: from consumer-like networks, on the same paths users use, does the assistant still do the right thing this week, with evidence that survives a security review, not a screenshot deck.

What AgentStatus is

Continuous, controlled validate traffic against production agent surfaces.

AgentStatus runs scheduled, repeatable validations against customer-visible agent endpoints (including multi-turn and streaming paths where applicable), classifies outcomes into defensible verdict tiers, and retains evidence-grade previews and aggregates for post-mortems.

Where enabled, we run conformance-style stress validations and keep structured pass/fail detail, useful when security, trust, or procurement asks for something stronger than a dashboard chart.

For support and commerce agents, the expensive failure mode is rarely "hard down." It is silent quality decay: wrong retrieval, brittle refusals, policy misses under pressure, or regional breakage you only see from outside your own VPC.

Where we fit

Complement, not overlap.

01

Inside-out quality vs outside-in behaviour

Decagon ships orchestration, integrations, and the product primitives customers rely on. AgentStatus answers the adjacent question: from consumer-like networks, on the same paths users hit, does the assistant still do the right thing this week?

02

Eval-time demos vs sustained production telemetry

Launch metrics and curated demos are necessary; they are not sufficient when the surface area is multi-tenant and the world changes daily. Distributed validations give you a time series of behaviour, not a one-off scorecard.

03

Global execution footprint

800+ nodes across 30 countries is the proof we are not a synthetic check from a single cloud region. For embedded support on global brands, it matters that assurance traffic originates where real users originate, not only where CI runners live.

04

Partner-friendly integration posture

We assume credential-gated surfaces, customer-approved monitoring, and conservative rate limits. The goal is evidence risk teams can accept, not drive-by scraping. We state posture explicitly in the evidence section: public surfaces only, conservative limits, minimal retention.

The split

Two truths, one story.

Decagon, inside-out

• Embedded customer AI & routing
• Integrations & workflow logic
• Tenant-specific configuration
• Support outcomes & analytics
• Platform scale & roadmap

AgentStatus, outside-in

• Scheduled validate traffic
• Gold libraries & drift signals
• Multi-turn / streaming paths where enabled
• Verdict-tier evidence + optional conformance detail
• 800+ nodes across 30 countries

Proof of scale

Plain definitions, no inflation.

A. Posture first. How we operated: all activity targeted publicly reachable, customer-visible chat surfaces that use Decagon-shaped HTTP and WebSocket flows, with conservative rate limits and no attempt to bypass authentication. We did not harvest tenant back-office data. What we retained is verdict metadata, latency and pass-rate aggregates, short response previews, and conformance outcomes, the minimum needed to prove behaviour, not to reconstruct customer records.

B. What we measured (Decagon-only). Over a ~10-day window ending late April 2026, we ran 19 independent monitors of customer-visible Decagon HTTP production-shaped endpoints on a 12-hour cadence. That produced 417 aggregate rora_results snapshots and ~3,980 underlying validate executions in our telemetry, plus ~900 structured conformance rows where guardrail-style validations were enabled. In our taxonomy, UP means the run met configured health expectations; DEGRADED means transport succeeded but a configured semantic or policy check (gold, expectation, or conformance) failed, reachable but wrong or unsafe under test, not "server down." AUTH_ERROR means we hit an auth, entitlement, or quota gate on the path we used (including cases where the surface expects credentials we did not possess).

C. Honest mix and confidentiality. The snapshot distribution is mostly UP, with a non-trivial DEGRADED tail and a small AUTH_ERROR set. We show the honest mix, not a cherry-picked win rate. Per-brand verdict mix, degradation categories, and example validate rows are available on request under mutual confidentiality (we do not put customer brand names on a first-touch web page).

D. Definitions. A rora_results row is one scheduled aggregate snapshot for a monitored configuration (not "ARR customers"). Validation executions are the underlying checks that roll up into pass rate, latency, and verdict. Conformance rows are outcomes from adversarial-style validations where that program was enabled. These metrics are not Decagon SLAs, revenue, or customer counts unless separately agreed in writing.

What we are not claiming

An independent layer that coexists.

We are not a replacement for Decagon's platform, routing, retrieval, or customer-specific policy. We are an independent layer that produces repeatable, externally executed evidence about how customer-visible agents behave over time, and where they start to drift.

What we'd like from this conversation

Asks.

01

Validate the fit

Where would Decagon want independent assurance artifacts surfaced: partner GTM, enterprise security reviews, or joint customer success? Where should everything remain native to Decagon's own analytics?

02

A practical next step

A small, named cohort (sandbox or live-with-consent) where we align on gold prompts, streaming expectations, and rate limits, then agree on a shared definition of healthy that both sides can defend in front of a CIO.

03

Partner path

If there's a path to work together, we'd want a practical conversation about credentialed monitoring, tenant scoping, and customer approval. Those three decisions determine whether outside-in evidence is useful to your enterprise customers or just noise on the roadmap.

Closing

Decagon helps enterprises ship and scale customer AI that actually runs in production. AgentStatus helps those same enterprises prove, continuously, that it still behaves

the way security, procurement, and brand teams require, with evidence that survives scrutiny outside the demo room.

Chat with Dulra & Roman Why AgentStatus

Figures above reflect a time-bounded monitoring window in production (April 2026) on 19 Decagon HTTP monitors with 12-hour scheduling. Metrics are stated with explicit definitions: a rora_results row is one scheduled aggregate snapshot for a monitored configuration; validate executions are underlying checks contributing to aggregates; conformance rows reflect adversarial-style validations where enabled. These metrics are not revenue, customer counts, or Decagon-specific SLAs unless separately agreed in writing.