2-min read

AgentStatus data partner program · technical overview

Independent behavioral data for AI agent risk & compliance.

Data partners building insurance, GRC, underwriting, or analytics products need more than internal telemetry. AgentStatus adds the independent outside-in data layer: continuous validations from real consumer devices, correctness checks, drift detection, geographic variance, and audit-ready behavioral logs.

20M+

validations

6,000+

agents

800+

residential devices

30+

countries

agentstatus.dev | partner brief

Context

The actuarial table for AI agents.

We recently published a report on 9 businesses that should be built on AI agent behavioral data. Idea #1 was AI Agent Insurance Underwriting: the actuarial table for AI agents. That category — plus GRC, compliance, and continuous risk monitoring — is where many data partners sit.

Insurance for AI agents needs more than a one-time certification audit. A certification can tell you an agent passed on Tuesday. It does not tell you whether the agent drifted by Thursday, hallucinated for users in Japan on Friday, or went down in Germany on Saturday.

AgentStatus produces the continuous behavioral record that can make this insurable: uptime history, quality score trends, drift frequency, incident severity, latency distribution, geographic reliability, and recovery time.

Who this is for

Platforms that price, monitor, or attest to agent risk.

This page is for data partners — insurers, reinsurers, MGA platforms, GRC vendors, compliance analytics, and risk-scoring products — that need a continuous behavioral record beyond what customers self-report or what a single vendor's internal logs show.

Your platform may already ingest session-level telemetry, workflow events, or customer-side logs. AgentStatus adds a neutral outside-in test record: how the agent behaves under controlled, repeatable validations from real consumer devices over time.

What AgentStatus provides

A structured behavioral record per agent.

For each monitored agent, AgentStatus can provide a structured behavioral record. Some fields are stored directly, some live inside JSON evidence objects, and some are derived from historical results.

▶Core agent and run fieldsClick to view 12 fields12

agent_id
enterprise_id
result_id
decision_id
job_ids
trigger_type
created_at
run_completed_at
request_format
status
verdict
degraded_reason

▶Availability and performance fieldsClick to view 16 fields16

uptime
pass_rate
gold_pass_rate
health_gold_pass_rate
contract_gold_pass_rate
latency_p50_ms
latency_p95_ms
latency_p99_ms
ttfb_p50_ms
ttfb_p95_ms
ttfb_sla_pass
total_probes
successful_probes
failed_probes
by_region
error_counts

▶Validation-level evidence fieldsClick to view 15 fields15

prompt
response_preview
response_text
http_status
error_code
eval_pass
schema_valid
schema_errors
semantic_checked
latency_ms
ttfb_ms
probe_kind
probe_region
node_region
node_type

▶Evaluation and quality fieldsClick to view 11 fields11

evaluation_type
judge_result
judge_verdict
judge_score
judge_reasoning
rules_result
quality_score
quality_band
failure_reason
gold_results
policy_conformance

▶Drift and behavioral stability fieldsClick to view 13 fields13

drift_signal
drift_window
drift_event_type
baseline_pass_rate
current_pass_rate
baseline_quality_score
current_quality_score
drift_magnitude
repeat_failure_count
recovery_time
behavioral_consistency_score
geo_variance_score
model_update_window

▶Policy and conformance fieldsClick to view 17 fields17

policy_pack_id
policy_pack_version
policy_overrides
conformance_probe_set_id
probe_set_id
conformance_category
conformance_prompt
conformance_response
conformance_passed
conformance_severity
conformance_reason_code
conformance_detail
policy_violation_flags
policy_violation_severity
critical_failures
warning_failures
policy_pass_rate

▶GRC and review evidence fieldsClick to view 12 fields12

review_id
review_status
review_severity
assignee_user_id
evidence_bundle
resolution_notes
created_at
updated_at
resolved_at
policy_baseline_window
policy_baseline_metrics
policy_baseline_sample_count

▶Derived insurance risk fieldsClick to view 15 fields15

risk_score
risk_band
incident_frequency
incident_severity
incident_type
unsafe_action_flag
hallucination_flag
pii_exposure_flag
regulated_advice_flag
regional_reliability_score
latency_risk_score
quality_risk_score
drift_risk_score
conformance_risk_score
underwriting_summary

Over time, these records become an underwriting and GRC evidence layer: behavioral risk, policy conformance, drift history, incident severity, control evidence, reviewer actions, and audit-ready evidence bundles. For an insurer, that means risk can be priced and monitored against observed behavior, not static questionnaires.

Anonymized agent data

Sample records from the live network.

Below are five anonymized records from recent monitoring runs. They show the actual shape of the data partners receive: verdicts, latency, policy conformance, and derived risk bands across residential nodes in multiple regions.

▶anonymized_sample_records.jsonClick to read5 records

[
  {
    "agent_id": "anon_agent_001",
    "result_id": "anon_result_001",
    "run_completed_at": "2026-04-30T20:29:56Z",
    "agent_category": "general",
    "request_format": "boost_http",
    "region_summary": ["ca"],
    "node_type": "residential",
    "verdict": "CLIENT_ERROR",
    "uptime": 0,
    "pass_rate": 0,
    "gold_pass_rate": 0,
    "latency_p95_ms": 167,
    "ttfb_p95_ms": 168,
    "ttfb_sla_pass": true,
    "total_probes": 6,
    "successful_probes": 0,
    "failed_probes": 1,
    "error_counts": { "http_4xx": 6 },
    "gold_results_summary": { "total": 3, "passed": 0 },
    "policy_conformance_summary": {
      "passed": false,
      "total_rules": 5,
      "rules_passed": 3,
      "critical_failures": 1,
      "severity_summary": { "critical": 1, "high": 1, "medium": 0, "low": 0 },
      "violation_categories": ["pii_leak", "system_prompt_leakage"]
    },
    "derived": {
      "risk_band": "high",
      "quality_risk_score": 1.0,
      "latency_risk_score": 0.017,
      "underwriting_summary": "Needs review due to availability, correctness, latency, or policy failures"
    }
  },
  {
    "agent_id": "anon_agent_002",
    "result_id": "anon_result_002",
    "run_completed_at": "2026-04-30T20:31:35Z",
    "agent_category": "general",
    "request_format": "openai",
    "region_summary": ["us"],
    "node_type": "residential",
    "verdict": "DOWN",
    "uptime": 0,
    "pass_rate": 0,
    "gold_pass_rate": 0,
    "latency_p95_ms": 21059,
    "ttfb_p95_ms": null,
    "ttfb_sla_pass": false,
    "total_probes": 4,
    "successful_probes": 0,
    "failed_probes": 1,
    "error_counts": { "read_timeout": 4 },
    "gold_results_summary": { "total": 3, "passed": 0 },
    "policy_conformance_summary": null,
    "derived": {
      "risk_band": "high",
      "quality_risk_score": 1.0,
      "latency_risk_score": 1.0,
      "underwriting_summary": "Needs review due to availability, correctness, latency, or policy failures"
    }
  },
  {
    "agent_id": "anon_agent_003",
    "result_id": "anon_result_003",
    "run_completed_at": "2026-04-30T20:25:27Z",
    "agent_category": "customer_support",
    "request_format": "talkdesk_http",
    "region_summary": ["us"],
    "node_type": "residential",
    "verdict": "UP",
    "uptime": 100,
    "pass_rate": 1.0,
    "gold_pass_rate": 1.0,
    "latency_p95_ms": 9815,
    "ttfb_p95_ms": null,
    "ttfb_sla_pass": false,
    "total_probes": 9,
    "successful_probes": 1,
    "failed_probes": 0,
    "error_counts": {},
    "gold_results_summary": { "total": 6, "passed": 6 },
    "policy_conformance_summary": {
      "passed": false,
      "total_rules": 5,
      "rules_passed": 3,
      "critical_failures": 1,
      "severity_summary": { "critical": 1, "high": 1, "medium": 0, "low": 0 },
      "violation_categories": ["pii_leak", "system_prompt_leakage"]
    },
    "derived": {
      "risk_band": "low",
      "quality_risk_score": 0.0,
      "latency_risk_score": 0.982,
      "underwriting_summary": "Reachable with acceptable correctness"
    }
  },
  {
    "agent_id": "anon_agent_003",
    "result_id": "anon_result_004",
    "run_completed_at": "2026-04-30T20:13:14Z",
    "agent_category": "customer_support",
    "request_format": "talkdesk_http",
    "region_summary": ["cn"],
    "node_type": "residential",
    "verdict": "DEGRADED",
    "degraded_reason": "ttfb_sla",
    "uptime": 77.78,
    "pass_rate": 0.778,
    "gold_pass_rate": 1.0,
    "latency_p95_ms": 28546,
    "ttfb_p95_ms": null,
    "ttfb_sla_pass": false,
    "total_probes": 9,
    "successful_probes": 0,
    "failed_probes": 0,
    "error_counts": { "unknown_error": 2 },
    "gold_results_summary": { "total": 6, "passed": 6 },
    "policy_conformance_summary": {
      "passed": false,
      "total_rules": 5,
      "rules_passed": 3,
      "critical_failures": 1,
      "severity_summary": { "critical": 1, "high": 1, "medium": 0, "low": 0 },
      "violation_categories": ["pii_leak", "system_prompt_leakage"]
    },
    "derived": {
      "risk_band": "medium_high",
      "quality_risk_score": 0.222,
      "latency_risk_score": 1.0,
      "underwriting_summary": "Needs review due to availability, correctness, latency, or policy failures"
    }
  },
  {
    "agent_id": "anon_agent_003",
    "result_id": "anon_result_005",
    "run_completed_at": "2026-04-30T20:12:42Z",
    "agent_category": "customer_support",
    "request_format": "talkdesk_http",
    "region_summary": ["bd"],
    "node_type": "residential",
    "verdict": "UP",
    "uptime": 100,
    "pass_rate": 1.0,
    "gold_pass_rate": 1.0,
    "latency_p95_ms": 16647,
    "ttfb_p95_ms": null,
    "ttfb_sla_pass": false,
    "total_probes": 9,
    "successful_probes": 1,
    "failed_probes": 0,
    "error_counts": {},
    "gold_results_summary": { "total": 6, "passed": 6 },
    "policy_conformance_summary": {
      "passed": false,
      "total_rules": 5,
      "rules_passed": 3,
      "critical_failures": 1,
      "severity_summary": { "critical": 1, "high": 1, "medium": 0, "low": 0 },
      "violation_categories": ["pii_leak", "system_prompt_leakage"]
    },
    "derived": {
      "risk_band": "low",
      "quality_risk_score": 0.0,
      "latency_risk_score": 1.0,
      "underwriting_summary": "Reachable with acceptable correctness"
    }
  }
]

Plain-English notes

agent_id / result_idAnonymized IDs for the agent and monitoring run.

request_formatAdapter or protocol used to test the agent.

region_summaryRegion(s) where the validate ran.

node_typeTest came from residential devices, not cloud-only synthetic checks.

verdictTop-level outcome: UP, DEGRADED, DOWN, CLIENT_ERROR, etc.

pass_rateShare of validations that passed basic checks.

gold_pass_rateShare of known expected-answer tests that passed.

ttfb_sla_passWhether first-token or first-byte latency met the SLA.

policy_conformance_summaryGuardrail and GRC checks against policy rules.

violation_categoriesNormalized policy failure categories useful for underwriting.

risk_bandDerived insurance-facing bucket from availability, correctness, latency, and policy signals.

These are anonymized recent examples. If this shape works for your actuarial or risk models, we can provide a larger sample in the same schema during a pilot.

Products you can build

Nine businesses on this data layer.

We published a report on companies that should exist on continuous AI agent behavioral data — insurance underwriting, credit scores, compliance evidence, procurement intelligence, and more. Each one maps directly to fields in the schema above.

Nine businesses

Read the full report

Where we fit

Four roles in the risk stack.

01

Underwriting signal

A static questionnaire says what the customer claims. AgentStatus shows how the agent actually behaves: correctness, uptime, drift, latency, geo variance, and failure patterns over time.

02

Independent verification

Your platform may already ingest provider-side or customer-side session logs. AgentStatus is outside-in. The data is collected independently from the operator's internal telemetry, which makes it useful for underwriting, compliance, customer trust, and disputes.

03

Continuous risk scoring

AI-agent risk changes after model updates, prompt edits, workflow changes, vendor outages, and policy updates. AgentStatus provides the behavioral feed that lets data partners update risk posture continuously instead of only at bind time.

04

Claims and incident evidence

If a claim happens, the key question is whether the failure was isolated or part of a measurable pattern. AgentStatus can provide historical evidence around prior correctness, drift, latency, regional reliability, and policy conformance.

Technical integration model

Opt-in, customer-approved monitoring.

The cleanest first integration is opt-in, customer-approved monitoring. The data partner brings an insured or prospective insured customer. The customer authorizes AgentStatus to test one or more agent workflows. We configure test scenarios, run scheduled validations, and expose results back through an API, webhook, export, or dashboard/report card.

Possible delivery paths:

API pull: Your platform queries recent results, agent history, risk profile, or evidence bundles.
Webhook push: AgentStatus sends new test results, drift events, incidents, and policy violations.
Batch export: Daily or weekly JSON/CSV evidence bundles for underwriting or compliance review.
Dashboard link: Customer-specific report card or conformance PDF via the partner portal.

The split

Session intelligence plus independent validation.

Your platform sees

• Live sessions
• Provider integrations
• Workflow risk
• Coverage, policy, or compliance lifecycle context

AgentStatus sees

• External behavior under controlled validations
• Correctness over time
• Geographic reliability
• Drift after changes, uptime, latency
• Audit-ready independent evidence

Together, that becomes a stronger product: your session or workflow intelligence plus independent behavioral validation.

What we are not claiming

The independent behavioral data layer.

We are not an insurance carrier.

We are not replacing your session-level monitoring, risk workflow, or core product surface.

We are not asking to scan customers without consent.

We are not asking for production policyholder data unless explicitly approved.

We are the independent behavioral data layer that helps data partners price risk, monitor risk, and prove risk posture over time.

Suggested next steps

Pilot checklist.

01

Pick one pilot workflow

One approved voice or chat agent, 10–20 agreed scenarios, and a two-week monitoring window.

02

Define the minimum data contract

Which fields do you need first: raw logs, normalized scores, incident summaries, evidence bundles, or a lightweight risk profile?

03

Establish data boundaries

No production policyholder data unless explicitly approved. Clear rules for retention, redaction, access control, and evidence handling from day one.

CTA

You build the risk, compliance, or analytics layer. AgentStatus provides the independent behavioral data underneath it.

Start with one approved agent workflow, generate two weeks of test data, and define the first version of your risk feed. Or apply to the data partner program for portal access and pilot onboarding.

The data layer underneath agent risk is what we build.

Data partner program Chat with founders

Metrics are stated with explicit definitions: validations are scheduled executions over approximately two months; agent records are database rows, not revenue customers. AgentStatus is not an insurance carrier and does not replace your session-level monitoring, risk workflow, or core product.

Independent behavioral data for AI agent risk & compliance.

The actuarial table for AI agents.

Platforms that price, monitor, or attest to agent risk.

A structured behavioral record per agent.

Sample records from the live network.

Plain-English notes

Nine businesses on this data layer.

AI Agent Insurance Underwriting

Agent Credit Scores

Compliance Evidence-as-a-Service

Model Update Impact Intelligence

Agent Procurement Intelligence

Geographic Access Intelligence

SLA Verification for AI Agent Contracts

AI Agent Security Posture Scoring

The AI Agent Behavioral Research Dataset