Latency & TTFB SLA

Fast responses matter. Agent Status tracks latency and enforces SLA thresholds to ensure your agent provides a good user experience.

Why Latency Matters

A correct response that takes 30 seconds is a failed response for your users. Studies show:

<1 second — Users feel the response is instant
1-3 seconds — Acceptable for most applications
3-10 seconds — Users get impatient, may retry
>10 seconds — Users abandon, assume it's broken

Agent Status helps you catch slow responses before users complain.

Metrics We Track

Time to First Byte (TTFB)

The time from request sent to first byte received. This is the key UX metric — it's when the user sees "typing..."

Request sent → [network] → Server processing → [network] → First byte
              └──────────────── TTFB ─────────────────────┘

Total Latency

Full response time from request to complete response.

Latency Percentiles

We track percentiles across all validations:

Metric	Meaning
`latency_p50_ms`	Median (50% faster than this)
`latency_p95_ms`	95th percentile (most users)
`latency_p99_ms`	99th percentile (worst case)

TTFB Percentiles

Same percentiles for time-to-first-byte:

Metric	Meaning
`ttfb_p50_ms`	Median TTFB
`ttfb_p95_ms`	95th percentile TTFB
`ttfb_max_ms`	Slowest TTFB observed

TTFB SLA Threshold

You can set a TTFB SLA threshold (default: 5000ms / 5 seconds).

TTFB SLA: 5000ms

If TTFB exceeds your threshold:

The validation is marked as SLA miss
Overall verdict may be DEGRADED
You'll see ttfb_sla_pass: false in results

This catches "slow but technically working" scenarios.

Impact on Verdict

Latency affects your verdict:

Scenario	Verdict
All fast, all correct	UP
Fast, some correctness issues	DEGRADED
Slow (SLA miss), all correct	DEGRADED
Slow + incorrect	DOWN

You can have perfect correctness and still be DEGRADED if responses are too slow.

Latency Breakdown

Agent Status records timing components to help diagnose slow responses:

Component	What It Measures
`dns_ms`	DNS resolution time
`tcp_ms`	TCP connection time
`tls_ms`	TLS handshake time
`ttfb_ms`	Time to first byte
`total_ms`	Total response time

This helps identify whether slowness is:

Network — High DNS/TCP/TLS times
Server — High TTFB after connection established
Response size — Total >> TTFB (large response)

Configuring TTFB SLA

When creating or editing an agent:

TTFB SLA Threshold: 5000 (ms)

Recommendations:

Agent Type	Recommended TTFB SLA
Chatbot	3000-5000ms
Code assistant	5000-10000ms
Simple Q&A	2000-3000ms
Streaming responses	1000-2000ms (first chunk)

Viewing Latency Data

In Dashboard

Latency shown on agent cards
Historical graph shows P50/P95 over time
Click a run to see per-validation latencies

Via API

GET /api/v1/agents/{agent_id}/status

Returns:

{
  "latency_p50_ms": 1234,
  "latency_p95_ms": 2456,
  "ttfb_p50_ms": 890,
  "ttfb_sla_pass": true
}

Troubleshooting Slow Responses

High DNS time:

DNS server issues
Missing DNS caching
CDN not configured

High TCP/TLS time:

Server far from users
No edge caching
TLS misconfiguration

High TTFB (after connection):

Slow model inference
Database queries
Upstream API calls
Cold starts (serverless)

High total (after TTFB):

Large response payloads
Streaming taking too long
Network throttling