Back to website
2-min read

AgentStatus × Pydantic

Outside-in monitoring for Logfire, from real residential devices in 30 countries.

You mentioned Pydantic uses Checkly today and finds it a bit minimal, that you'd want an agent that says "what's the state of my system" with click-click-go simplicity. AgentStatus is that, built for AI agents specifically. Logfire instruments what your agents do inside. AgentStatus observes what they do outside.

20M+
Validations run
6,000+
Agents tracked
800+
Residential devices
30+
Countries
agentstatusagentstatus.dev | partner brief

What we understand about Logfire

OpenTelemetry-native observability is the right inside-out story.

Logfire's internal-trace coverage is excellent: prompt lifecycle, tool calls, distributed tracing. The adjacent question is what those agents look like from outside the application, on real consumer networks, across regions, in multi-turn conversations under semantic pressure.

Internal tracing sees what the agent did from inside its own process. It doesn't see what a user in Mumbai actually got back, or whether voice paths drift differently than chat paths.

AgentStatus dashboard showing fleet snapshot, active alerts, and recent validation activity
Live verdict mix and active alerts across a monitored fleet

What AgentStatus is

What AgentStatus does that Checkly architecturally doesn't.

Checkly is excellent at HTTP and browser synthetic monitoring. We're not pitching replacement. We're pitching the four things they don't cover, that matter for AI agents.

CapabilityAgentStatusCheckly
Multi-turn conversational validationsStateful conversations across turns, including under semantic pressure.Multistep API checks, independent HTTP requests, no conversation state.
Adversarial conformance validationsGold prompts, semantic-superposition attacks, boundary pressure, scored at per-validate-slot resolution.JSON path equality, status codes, latency thresholds.
Voice channelVoice agents (ElevenLabs, Retell, Vapi-shaped) on the same pipeline.HTTP and browser only.
Network origin800+ real residential devices on consumer ISPs, 30 countries.~20 cloud data center regions.

First three are categorical, not implementation, differences.

Agent detail showing latency, uptime, gold pass and run history
Per-agent verdict, latency distribution, and 24h uptime

Where we fit

Complement, not overlap.

01

What outside-in surfaces

Recent representative findings on production fleets we've operated:

AgentStatus drift signal view with 7-day run metrics heatmap and outlier runs
Drift signal, run metrics heatmap, and concentrated outliers in one view
  • High-single-digit to ~20% verdict-tier degradation on otherwise-healthy fleets, mostly latency SLA from real consumer networks, invisible from cloud-region validations.
  • Semantic-tail concentration: 246 gold-fail rows resolved to 227 failures on one boundary-awareness validate slot. Targeted, fixable. Only legible because evidence preserves per-slot resolution.
  • Region-specific failures invisible to centralized telemetry, agents that pass internal evals and cloud-region checks, but consistently fail from specific residential ISPs.

Whether these exist on Logfire is what two weeks would answer.

Per-test conformance results showing pass and fail concentration across validate slots
Per-validate-slot resolution, concentration becomes legible
02

The two-week parallel run

Point us at Logfire's public endpoints. We run multi-turn validations covering common Logfire flows, gold-prompt and conformance checks, from 5 to 10 residential locations, alongside your existing Checkly setup. Weekly report. No Checkly replacement. No call required to start.

What we need: Logfire endpoint(s), auth method if any, green light for conservative-rate residential validations.

Honest finding at the end: "this is useful, keep going" or "Checkly is sufficient." Either is fine.

The split

Two truths, one story.

Logfire, inside-out

  • Internal traces of agent execution
  • Span-level visibility into LLM, tool, DB calls
  • OpenTelemetry instrumentation in your code
  • Inside-the-VPC observability

AgentStatus, outside-in

  • External validations of agent behavior on public paths
  • Verdict-tier evidence + conformance outcomes per validate
  • Multi-turn, adversarial validations, no instrumentation required
  • 800+ residential devices across 30 countries

Proof of scale

Plain definitions, no inflation.

Validations target publicly reachable, customer-visible agent surfaces with conservative rate limits and no attempt to bypass authentication. No tenant data is collected. Retained artifacts are verdict metadata, latency and pass-rate aggregates, short response previews, and structured gold and conformance outcomes, enough to prove behavior, not reconstruct customer records.

What we are not claiming

An independent layer that coexists.

We are not a replacement for Checkly's general HTTP and browser synthetic monitoring, or for Logfire's inside-out tracing. We are the outside-in layer for AI agents specifically: multi-turn, adversarial, voice-capable, residential.

What we'd like from this conversation

Asks.

01

Endpoints

Logfire endpoint(s) to monitor, or 'public, no auth, you pick'.

02

Auth

Auth method if any (bearer / header / none).

03

Green light

Permission to run conservative-rate validations from residential nodes for two weeks.

Closing

You said you'd be open to having a look, this is the look. If interesting, reply "go"

and we set it up tomorrow.

Hypothetical two-week parallel monitoring offer on Logfire's publicly-reachable surfaces. Findings cited are anonymized representative patterns from multi-tenant agent fleets, not specific to Logfire. Validations target publicly-reachable surfaces with conservative rate limits; no tenant data is collected. AgentStatus is independent outside-in production monitoring for AI agents and is not affiliated with Pydantic.