AgentStatus × Cognigy partner brief
Outside-in validation for Cognigy AI agents.
Seven checks from real home networks, status through alerts, alongside Cognigy Insights, OData, and Live Agent. We complement inside-out visibility; we do not replace it. No instrumentation on your stack. Synthetic validations only, no production transcripts.
What we understand about Cognigy
Cognigy already gives enterprises strong inside-out visibility across voice, chat, and messaging.
Cognigy.AI spans build, deploy, analyze, and optimize across voice, chat, and messaging. Cognigy Insights delivers conversation analytics and NLU performance. OData API feeds BI teams raw conversation data. Live Agent handles escalation; Agent Copilot supports agents in real time.
For outside-in validation, Cognigy's REST Endpoint is the natural surface: POST with userId, sessionId, and Bearer token, the same path a channel integration would use.
What we validate
Uptime is table stakes. We watch the rest too.
Reliability, consistency, and robustness. From home networks, on a schedule. Plain verdicts you can explain to your boss.
Reliability
Does it keep working, stay fast, and behave the same over time?
Can people reach it?
CheckNot just 200 OK from your office. We hit it from home networks where your users actually are.
Fast enough to feel alive
CheckSlow first replies feel broken even when the answer eventually shows up. We watch time-to-first-byte, not just total wait.
From home networks
CheckSame agent, different country, different result. Geo blocks and CDN quirks show up here first.
Keeps working, run after run
CheckOne green check means nothing. We track whether it stays healthy across dozens of scheduled runs.
When behavior starts to drift
CheckSlow leaks matter. Pass rates, latency, and answer patterns shifting week over week get flagged before users notice.
Answer quality
Not just reachable. Actually helpful, shaped right, and checked more than once.
Are the answers good?
CheckWe read replies like a human would: pass, degraded, or inconclusive, with a short explanation you can act on.
Two reviewers had to agree
CheckWhen there is no single right string, two independent reviewers weigh in. Disagreement means we say inconclusive, not a false alarm.
Your example questions
CheckBring your own scenarios: refunds, escalations, edge cases you already worry about.
Questions for what it actually does
CheckWe watch how the agent behaves in the wild and keep testing with fresh example questions that match its real job.
Exact shape, every time
CheckJSON APIs and structured replies should look like you promised. We check the shape, not just the vibe.
Extra eyes when it looks fishy
CheckAutomated checks miss nuance. Flag samples for a human look, then turn repeat mistakes into a rule.
Conversations
Does it finish the job? Real goals, pursued over several messages, judged on the outcome.
A user with a real goal
CheckWe give a simulated user something real to want, like a refund, a booking, or a fix, and let them pursue it over several messages, clarifying and pushing back like a real person.
Did they get what they came for?
CheckWhen the conversation ends, an independent reviewer reads the whole transcript and makes one call: goal achieved or not.
We don't take its word for it
CheckIf the agent claims success, we double-check the claim against outside evidence where it exists, such as a cited page, a structured field, or a second reviewer, before counting it.
Consistency
Same situation, same story. No flip-flopping when wording, mood, or follow-ups change.
Same question, different words
CheckRephrased prompts should not flip the answer. We flag replies that swing for no good reason.
Calm ask vs panicked ask
CheckA billing FAQ and a locked-out account are not the same urgency. Good agents step up when stakes rise.
Follow-ups still line up
CheckAsk A, then B, then C. Later answers should not contradict what the agent said two turns ago.
Robustness
Tools, streams, rules, and people trying to break it.
Do tools get called?
CheckSend questions that should trigger an action. Catch broken integrations before customers hit them.
MCP servers
CheckDiscover tools and resources, run calls, and check the outputs for agents wired through MCP.
Streaming that finishes
Checkstream · 1/4 chunks
Streaming endpoints can hang, stall, or never complete. We catch that, not just the final text blob.
Your safety rules
CheckTell us what the agent must never do, and what it should still do. We check both sides.
When someone tries to trick it
CheckPushy, weird, or adversarial prompts. Does the agent stay on policy or fold?
When it breaks, you know
Alerts, deploy gates, and one place to see it all.
Alerts when it breaks
CheckSlack, webhooks, PagerDuty, with enough context to fix it, not just a red dot.
Block bad deploys before ship
CheckWire us into CI so a broken agent does not merge just because the unit tests passed.
One dashboard
CheckEvery check rolls up into a simple grid: what is OK, what is still collecting, and what needs attention.
Architecture
Validations run from home networks, hit your REST endpoint, and return as dashboard evidence.
Fabric nodes execute transport validations: reachability, node-side gold checks, streaming latency, and tool or MCP conformance against your REST endpoint. Raw results return to the AgentStatus backend, which schedules cycles, runs portfolio validations and adaptive scenarios, and applies the instrument layer on top of node evidence.
That instrument layer is not a single semantic pass. It includes format contracts, dual-review semantic scoring, metamorphic and task checks, outcome verification where configured, and statistical rollups: pass^k stability, drift baselines, alerts, and explain traces in the dashboard. Your Cognigy platform (Insights, OData, transcripts) stays inside your boundary. We do not pull from it.
What runs on nodes vs. the backend
Nodes execute transport. The backend runs instruments and rollups. Fabric nodes sit on residential networks in 30+ countries and hit your REST Endpoint the way a real channel would. The backend receives those raw results and runs the multi-instrument scoring layer plus portfolio statistics. Semantic review is one instrument among several, not the whole system.
Results are verdict metadata, latency, region, and evidence snippets, not production conversation exports.
Data privacy
You do not need to pass customer conversation data to AgentStatus for this to work.
Most teams ask whether they must pass customer conversation data to AgentStatus. You do not. Pilot default is Cognigy sandbox only: synthetic prompts you define, validation credentials you provision, no end-customer PII, no production transcript pipeline.
What we need
- • REST Endpoint URL + auth for validation traffic
- • Synthetic prompts / scenarios you approve
- • Agent responses to those validation prompts
- • Verdict metadata (pass/fail, latency, region)
What we do not need
- • Cognigy Insights exports or OData bulk feeds
- • Production conversation transcripts
- • End-customer PII
- • Access inside your VPC beyond the validation endpoint
Credential-based, customer-approved monitoring is the right model for enterprise trust. We can share data-handling detail, retention, and audit evidence under NDA for security review.
Where we fit
AgentStatus complements Cognigy Insights. It does not compete with it.
Cognigy sees inside the platform; we see what the channel actually delivered to a user.
Cognigy sees what happens inside the platform and what you export to your stack. AgentStatus answers a different question: what did the real channel actually deliver from a specific geography, network path, and latency profile?
Insights aggregates conversations; we independently run seven validation groups from outside your stack.
Insights gives you conversation truth. We verify reachability, answer correctness, multi-turn task completion, consistency under rephrasing, and robustness under stress, with evidence procurement can audit.
We execute from eight hundred residential nodes across thirty countries, not a single cloud region.
Issues that only reproduce from specific locations or ISPs surface here first, common in Cognigy's enterprise and telco customer base.
We only connect with credentials and scenarios your team approves.
No automatic discovery of Cognigy customers. Sandbox REST Endpoint, agreed synthetic scenarios, customer-approved credentials, aligned with procurement and data governance.
The split
Inside-out conversation truth and outside-in validation answer different questions.
Cognigy: inside-out
- • Build, deploy, analyze, optimize
- • Cognigy Insights dashboards
- • OData API event exports
- • Transcripts and session-level data
- • Live Agent escalation and Copilot
AgentStatus: outside-in
- • We check whether the agent is reachable from real home networks right now.
- • We track whether it keeps working, stays fast, and holds its pass rate over time.
- • We grade whether answers are actually good, not just HTTP 200.
- • We run goal-driven conversations and verify whether the user got what they came for.
- • We test whether replies stay consistent when wording, stakes, or follow-ups change.
- • We stress tools, streams, safety rules, and adversarial inputs before customers do.
- • We alert your team when something breaks, with evidence attached.
Proof of scale
We state scale metrics with plain definitions so procurement can audit the claims.
On the order of 20 million validation runs across the network. On the order of 8,000 agent records in our system, configurations we track, including evaluation and pipeline agents, not "8,000 paying customers."
Stricter production-only definitions available under NDA.
What we are not claiming
We are an independent evidence layer, not a replacement for Insights or OData.
We are not a replacement for Cognigy Insights, OData exports, or the Live Agent transcript layer. We help teams correlate outside-in validation outcomes with inside-out conversation truth when both matter to the buyer.
Suggested next steps
A sandbox pilot is the lowest-risk way to see if this fits your governance rules.
Start with a two-week sandbox pilot using synthetic scenarios only.
Sandbox REST Endpoint (URL, userId, sessionId, Bearer token), agreed synthetic scenarios, no production traffic, no end-customer data. Written report: what we validated, what passed, what drifted.
Walk through security and data governance before anything touches production.
How AgentStatus connects under enterprise privacy requirements: validation-traffic boundaries, least privilege, retention, audit evidence. We expect this before any production-adjacent work.
Decide whether internal QA, joint customer proof, or both is the right first use case.
Cognigy-internal QA, a joint customer who wants third-party evidence alongside Insights, or both.
Closing
Cognigy helps enterprises build and operate serious AI agents at scale. AgentStatus helps them prove continuously those agents behave the way policy and customers require, globally, with evidence that holds up under scrutiny.
Metrics use explicit definitions: validation runs are scheduled executions; agent records are database rows, not revenue customers. Cognigy product references reflect public documentation as of this note.