Does AgentDiff need access to our repo?

It runs as a status check via GitHub Actions or your CI of choice. We never read your source.

Can we self-host validations?

Enterprise plans support customer-controlled validation runners for VPC-only agents.

How do you price validation volume?

Per validation. One validation = one evaluation prompt against one region. No seat licensing.

What does the status badge look like?

A lightweight SVG with last-verdict, last-run timestamp, and a click-through to the hosted page. Brandable.

CI/CD-native validation for the AI features inside your product.

We run the same evaluation prompts your users would run, on every pull request and every provider update. And we block the deploy or page the on-call the moment your agent's answers regress.

User-side validation isn't theory.We've been running it.

Live infrastructure

8k+

Agents continuously monitored across the global network.

18M+

USER-SIDE VALIDATIONS

30+

Countries covered

What breaks today

The failure modes your current stack misses

A PR shipped a prompt change. Quality dropped 12%.

Your unit tests pass. Your eval suite passes. Your customers notice.

OpenAI silently updated a model.

No release notes. Your agent's answer distribution shifted overnight. Your churn dashboard will catch it next quarter.

Your status page says everything is fine.

It is reporting endpoint health. Customers report broken answers. Trust erodes faster than uptime falls.

AgentDiff

PR-time behavioral diff, blocking when quality regresses.

Every PR runs the same evaluation prompts against the new version. Diffs against baseline. Blocks merge if Quality Score drops below your threshold.

GitHub status check
Per-prompt diff
Approve-anyway with reason

PR-time behavioral diff, blocking when quality regresses.

Drift watch

Provider-side change detection, attributed and alerted.

We baseline your agent's behavior. When the provider ships something that changes it, you get a diff with the affected regions and prompts.

Hourly behavioral baseline
Provider attribution
PagerDuty and Slack routing

Verified status

Status pages backed by outside-in evidence.

Embeddable status badges and a hosted page that report verdicts your customers can audit. No more 'all systems operational' when answers are broken.

Embeddable badge
Public or gated page
Per-region detail

Status pages backed by outside-in evidence.

How a pilot runs

From first validation to signed report in two weeks

Step 01

Connect

Point Agent Status at the user-facing surface of your agent. No SDK, no instrumentation. Average setup is under five minutes.

Step 02

Watch

Live verdicts stream in from every region you serve. Drift and latency alerts route to PagerDuty or Slack, with a signed report on every run.

Questions we hear most

Frequently asked

Ship faster. Catch regressions before your users.

Spin up a validation in under five minutes. No credit card. First 100 runs free.