2-min read

AgentStatus × AIUC

AIUC-1 certifies. AgentStatus verifies, every day.

AIUC-1 sets the bar at certification. We make sure your agents are still clearing it on day fifty, with scheduled outside-in tests from 800+ real consumer devices across ~30 countries, checking that answers stay correct after the audit date.

17M+

tests

6,000+

agents

700+

residential devices

30

countries

agentstatus.dev | partner brief

Why we're reaching out

Certification is point-in-time. Reliability is continuous.

AIUC-1 is the first AI agent standard with real teeth, six pillars, quarterly updates, MITRE as a technical contributor, Schellman as the independent auditor. UiPath's certification required 2,000+ technical evaluations at audit time.

The Reliability pillar, agents behaving predictably and consistently in production, is exactly what we measure continuously. Not at certification. Every day. From outside the customer's stack, on the same networks their users are on.

That's the gap we'd like to fit into: between the audit and the renewal, generating the evidence that the bar is still being cleared.

What AgentStatus is

We continuously test AI agents and check if the answers are correct.

Scheduled tests run from an independent network of 800+ consumer devices in ~30 countries, with expected answers and drift detection when behaviour slips. Repeatable proof from where real users actually are, not from two AWS regions.

That includes multi-turn conversations and tool handoffs, so you can show what was exercised, from where, and what changed week over week.

Where we fit

Two lanes, one trust story.

AIUC, Their lane

• Defines the standard (AIUC-1, six pillars)
• Audit + certificate via Schellman
• Insurance backstop when things go wrong

AgentStatus, Our lane

• Generates the evidence, continuously
• Outside-in tests between audit dates
• Prevention signal: catches drift before it becomes a claim

One sentence. Certification answers "did we meet the bar then?" AgentStatus answers "is it still true this week, from real places on the internet?"

Proof of scale

What we've run so far.

~10M test runs in ~2 months across the network. ~6,000 agents being tracked, including ones from companies you'd recognise (specifics under NDA).

We've also caught node operators trying to game the network with datacenter VMs instead of real consumer devices, the same kind of adversarial behaviour AIUC-1 is designed to make harder. Detection is built into the product.

What we are not claiming

An independent layer that coexists.

We are not AIUC-1 auditors, not AIUC-1, and not an insurance company. We do not replace AIUC's standard or policies. We're the continuous evidence layer that sits between them.

What we'd like from this conversation

A 2-week sandbox pilot.

01

One certified or candidate agent

A surface AIUC has worked with, Intercom, Ada, ElevenLabs, or another, with one agreed set of expected answers.

02

A fixed 2-week window

We run scheduled outside-in checks and share pass/fail rates, drift events, and geography/network-split results.

03

A 30-minute readout

Does this belong next to AIUC-1 as ongoing evidence between audits?

Closing

AIUC gives enterprises reason to sign. AgentStatus helps them keep the story true in production, every day, from where real users actually are.

Chat with Dulra & Roman Why AgentStatus

Contact

dulra@carmel.soroman@carmel.so

"Test runs" and "agent rows" mean what we said above. AIUC descriptions are from public pages and announcements, not an endorsement by AIUC.