Latency & TTFB SLA
Fast responses matter. Agent Status tracks latency and enforces SLA thresholds to ensure your agent provides a good user experience.
Why Latency Matters
A correct response that takes 30 seconds is a failed response for your users. Studies show:
- <1 second — Users feel the response is instant
- 1-3 seconds — Acceptable for most applications
- 3-10 seconds — Users get impatient, may retry
- >10 seconds — Users abandon, assume it's broken
Agent Status helps you catch slow responses before users complain.
Metrics We Track
Time to First Byte (TTFB)
The time from request sent to first byte received. This is the key UX metric — it's when the user sees "typing..."
Request sent → [network] → Server processing → [network] → First byte
└──────────────── TTFB ─────────────────────┘
Total Latency
Full response time from request to complete response.
Latency Percentiles
We track percentiles across all validations:
| Metric | Meaning |
|---|---|
latency_p50_ms | Median (50% faster than this) |
latency_p95_ms | 95th percentile (most users) |
latency_p99_ms | 99th percentile (worst case) |
TTFB Percentiles
Same percentiles for time-to-first-byte:
| Metric | Meaning |
|---|---|
ttfb_p50_ms | Median TTFB |
ttfb_p95_ms | 95th percentile TTFB |
ttfb_max_ms | Slowest TTFB observed |
TTFB SLA Threshold
You can set a TTFB SLA threshold (default: 5000ms / 5 seconds).
TTFB SLA: 5000ms
If TTFB exceeds your threshold:
- The validation is marked as SLA miss
- Overall verdict may be DEGRADED
- You'll see
ttfb_sla_pass: falsein results
This catches "slow but technically working" scenarios.
Impact on Verdict
Latency affects your verdict:
| Scenario | Verdict |
|---|---|
| All fast, all correct | UP |
| Fast, some correctness issues | DEGRADED |
| Slow (SLA miss), all correct | DEGRADED |
| Slow + incorrect | DOWN |
You can have perfect correctness and still be DEGRADED if responses are too slow.
Latency Breakdown
Agent Status records timing components to help diagnose slow responses:
| Component | What It Measures |
|---|---|
dns_ms | DNS resolution time |
tcp_ms | TCP connection time |
tls_ms | TLS handshake time |
ttfb_ms | Time to first byte |
total_ms | Total response time |
This helps identify whether slowness is:
- Network — High DNS/TCP/TLS times
- Server — High TTFB after connection established
- Response size — Total >> TTFB (large response)
Configuring TTFB SLA
When creating or editing an agent:
TTFB SLA Threshold: 5000 (ms)
Recommendations:
| Agent Type | Recommended TTFB SLA |
|---|---|
| Chatbot | 3000-5000ms |
| Code assistant | 5000-10000ms |
| Simple Q&A | 2000-3000ms |
| Streaming responses | 1000-2000ms (first chunk) |
Viewing Latency Data
In Dashboard
- Latency shown on agent cards
- Historical graph shows P50/P95 over time
- Click a run to see per-validation latencies
Via API
GET /api/v1/agents/{agent_id}/status
Returns:
{
"latency_p50_ms": 1234,
"latency_p95_ms": 2456,
"ttfb_p50_ms": 890,
"ttfb_sla_pass": true
}
Troubleshooting Slow Responses
High DNS time:
- DNS server issues
- Missing DNS caching
- CDN not configured
High TCP/TLS time:
- Server far from users
- No edge caching
- TLS misconfiguration
High TTFB (after connection):
- Slow model inference
- Database queries
- Upstream API calls
- Cold starts (serverless)
High total (after TTFB):
- Large response payloads
- Streaming taking too long
- Network throttling
Related Articles
Agent vs Validator Errors
Not all failures are your fault. Agent Status distinguishes between problems with your agent and problems with our infrastructure.
Geographic Validation
Agent Status tests your agent from real devices around the world. Here's why that matters.
Evaluation Prompts: Why They Matter
Evaluation prompts are Agent Status's secret weapon for detecting when an agent is 'up but broken.'