2-min read

AgentStatus × Nextiva

Outside-in monitoring for Nextiva's AI infrastructure and the customer deployments running on it.

We ran a short-period validation sweep across three Nextiva-powered customer agents to ground the conversation in real data. Two findings worth flagging, plus a proposal to extend.

11M+

Validations run

6,000+

Agents tracked

800+

Devices

30

Countries

agentstatus.dev | partner brief

What we ran

What we ran.

We sent realistic customer-style questions to three Nextiva-powered customer-facing agents from real consumer devices across the US. Every validation asked the agent something it should know how to answer in its business domain — logistics questions to a logistics agent, real estate questions to a real estate agent, and so on.

310 total validations across three anonymized tenants over a short period.

Finding 1

The agents are technically working. They're not doing their actual job.

The headline number is 51% of validations came back degraded. But the more important story is what kind of degradation.

Tenant	Vertical	Validations	Worked correctly	Did the job well
Tenant A	Logistics	99	37%	0 / 28
Tenant B	Nonprofit	118	53%	0 / 48
Tenant C	Real estate	93	57%	0 / 24

The "did the job well" column is the one that matters.

When we asked these agents domain-specific questions — questions that an agent built for that business should be able to handle — zero out of 100 responses passed. Every single one. Across all three tenants. Across all three verticals.

The agents are responding. The transport layer is fine. The latency is fine. But the answers are generic, ungrounded, or off-topic for the business they're meant to serve. A logistics agent giving non-logistics answers to logistics questions. A real estate agent giving non-real-estate answers to real estate questions.

This is the failure mode platform-level monitoring doesn't catch. Internal uptime checks look at "did the agent respond?" Outside-in checks look at "did the agent respond correctly for its domain?" An agent can be 100% up and 0% useful at the same time.

Finding 2

Same platform, very different results per tenant.

Tenant A is degraded on 63% of validations. Tenant C is degraded on 43% of validations. Same platform, same short period, same kind of validations — but a 20-point spread in baseline reliability.

That spread is invisible from inside the platform, because platform health looks at the platform as a whole. It only surfaces when you compare tenants head-to-head from outside, which is what AgentStatus does by default.

For enterprise sales conversations, this matters — it's the question Nextiva's prospects' CISOs will ask: "how do we know our deployment will perform like Tenant C and not like Tenant A?" Outside-in evidence is the answer.

Anonymized tenant names available to Nextiva under mutual confidentiality.

Two ways in

Two ways AgentStatus shows up for Nextiva.

Direct

Nextiva builds and ships NextivaOne and Nextiva AI Agent. Outside-in evidence on those products tells your team how they actually behave in customer hands — across regions, across capabilities, beyond what internal QA can simulate.

ISV partnership

Nextiva's customers deploy agents on the platform. Their CISOs and procurement teams want independent evidence those agents are doing the right thing. AgentStatus surfaces in the alliance program as the reliability layer that answers that question — co-sell, marketplace listing, revenue share.

Most enterprise CCaaS deployments will eventually want both.

Honest framing

What this is, and what it isn't.

This is a short-period snapshot with enough volume to make the findings real, not anecdotal.

This isn't a long-term trend study. We paused the validations after the window pending Nextiva's input on scope — the right next step is a longer pilot with Nextiva's product team confirming which domain questions are fair validations for each tenant's vertical.

A 2-week extension would tell us whether the domain-accuracy issue is structural (the agents really aren't grounded in their verticals) or methodological (our domain questions need refinement against how Nextiva defines correctness). Either answer is useful.

The ask

What we're proposing.

A two-week pilot. Direct on Nextiva's AI infrastructure, ISV-shaped on three to five customer tenants, or both. We align scope with your product team upfront. Weekly reports. Honest finding at the end.

Closing

Three tenants. A short period. Two findings. We'd love to hop on a call and walk through what two weeks looks like — direct, ISV, or both.

Chat with Dulra & Roman Why AgentStatus

Contact

dulra@carmel.soroman@carmel.so

A short-window study concluding May 2026, on publicly-reachable Nextiva-powered customer agent chat surfaces. Validations ran at conservative rate limits (15-min intervals) with no auth bypass; no tenant data collected beyond verdict metadata, latency aggregates, gold-prompt outcomes, and short previews. Domain accuracy checks use gold prompts grounded in each agent's actual business context, scored against expected domain terms. Tenant names anonymized; available to Nextiva under mutual confidentiality. AgentStatus is independent outside-in production monitoring for AI agents and is not affiliated with Nextiva.