2-min read

AgentStatus × Pydantic

Outside-in monitoring for Logfire, from real residential devices in 30 countries.

You mentioned Pydantic uses Checkly today and finds it a bit minimal, that you'd want an agent that says "what's the state of my system" with click-click-go simplicity. AgentStatus is that, built for AI agents specifically. Logfire instruments what your agents do inside. AgentStatus observes what they do outside.

20M+

Validations run

6,000+

Agents tracked

800+

Residential devices

30+

Countries

agentstatus.dev | partner brief

What we understand about Logfire

OpenTelemetry-native observability is the right inside-out story.

Logfire's internal-trace coverage is excellent: prompt lifecycle, tool calls, distributed tracing. The adjacent question is what those agents look like from outside the application, on real consumer networks, across regions, in multi-turn conversations under semantic pressure.

Internal tracing sees what the agent did from inside its own process. It doesn't see what a user in Mumbai actually got back, or whether voice paths drift differently than chat paths.

AgentStatus dashboard showing fleet snapshot, active alerts, and recent validation activity — Live verdict mix and active alerts across a monitored fleet

What AgentStatus is

What AgentStatus does that Checkly architecturally doesn't.

Checkly is excellent at HTTP and browser synthetic monitoring. We're not pitching replacement. We're pitching the four things they don't cover, that matter for AI agents.

Capability	AgentStatus	Checkly
Multi-turn conversational validations	Stateful conversations across turns, including under semantic pressure.	Multistep API checks, independent HTTP requests, no conversation state.
Adversarial conformance validations	Gold prompts, semantic-superposition attacks, boundary pressure, scored at per-validate-slot resolution.	JSON path equality, status codes, latency thresholds.
Voice channel	Voice agents (ElevenLabs, Retell, Vapi-shaped) on the same pipeline.	HTTP and browser only.
Network origin	800+ real residential devices on consumer ISPs, 30 countries.	~20 cloud data center regions.

First three are categorical, not implementation, differences.

Agent detail showing latency, uptime, gold pass and run history — Per-agent verdict, latency distribution, and 24h uptime

Where we fit

Complement, not overlap.

01

What outside-in surfaces

Recent representative findings on production fleets we've operated:

AgentStatus drift signal view with 7-day run metrics heatmap and outlier runs — Drift signal, run metrics heatmap, and concentrated outliers in one view

High-single-digit to ~20% verdict-tier degradation on otherwise-healthy fleets, mostly latency SLA from real consumer networks, invisible from cloud-region validations.
Semantic-tail concentration: 246 gold-fail rows resolved to 227 failures on one boundary-awareness validate slot. Targeted, fixable. Only legible because evidence preserves per-slot resolution.
Region-specific failures invisible to centralized telemetry, agents that pass internal evals and cloud-region checks, but consistently fail from specific residential ISPs.

Whether these exist on Logfire is what two weeks would answer.

Per-test conformance results showing pass and fail concentration across validate slots — Per-validate-slot resolution, concentration becomes legible

02

The two-week parallel run

Point us at Logfire's public endpoints. We run multi-turn validations covering common Logfire flows, gold-prompt and conformance checks, from 5 to 10 residential locations, alongside your existing Checkly setup. Weekly report. No Checkly replacement. No call required to start.

What we need: Logfire endpoint(s), auth method if any, green light for conservative-rate residential validations.

Honest finding at the end: "this is useful, keep going" or "Checkly is sufficient." Either is fine.

The split

Two truths, one story.

Logfire, inside-out

• Internal traces of agent execution
• Span-level visibility into LLM, tool, DB calls
• OpenTelemetry instrumentation in your code
• Inside-the-VPC observability

AgentStatus, outside-in

• External validations of agent behavior on public paths
• Verdict-tier evidence + conformance outcomes per validate
• Multi-turn, adversarial validations, no instrumentation required
• 800+ residential devices across 30 countries

Proof of scale

Plain definitions, no inflation.

Validations target publicly reachable, customer-visible agent surfaces with conservative rate limits and no attempt to bypass authentication. No tenant data is collected. Retained artifacts are verdict metadata, latency and pass-rate aggregates, short response previews, and structured gold and conformance outcomes, enough to prove behavior, not reconstruct customer records.

What we are not claiming

An independent layer that coexists.

We are not a replacement for Checkly's general HTTP and browser synthetic monitoring, or for Logfire's inside-out tracing. We are the outside-in layer for AI agents specifically: multi-turn, adversarial, voice-capable, residential.

What we'd like from this conversation

Asks.

01

Endpoints

Logfire endpoint(s) to monitor, or 'public, no auth, you pick'.

02

Auth

Auth method if any (bearer / header / none).

03

Green light

Permission to run conservative-rate validations from residential nodes for two weeks.

Closing

You said you'd be open to having a look, this is the look. If interesting, reply "go"

and we set it up tomorrow.

Chat with Dulra & Roman Why AgentStatus

Hypothetical two-week parallel monitoring offer on Logfire's publicly-reachable surfaces. Findings cited are anonymized representative patterns from multi-tenant agent fleets, not specific to Logfire. Validations target publicly-reachable surfaces with conservative rate limits; no tenant data is collected. AgentStatus is independent outside-in production monitoring for AI agents and is not affiliated with Pydantic.