Support CenterConceptsLatency & TTFB SLA
Back to Concepts

Latency & TTFB SLA

Fast responses matter. Agent Status tracks latency and enforces SLA thresholds to ensure your agent provides a good user experience.

Why Latency Matters

A correct response that takes 30 seconds is a failed response for your users. Studies show:

  • <1 second — Users feel the response is instant
  • 1-3 seconds — Acceptable for most applications
  • 3-10 seconds — Users get impatient, may retry
  • >10 seconds — Users abandon, assume it's broken

Agent Status helps you catch slow responses before users complain.

Metrics We Track

Time to First Byte (TTFB)

The time from request sent to first byte received. This is the key UX metric — it's when the user sees "typing..."

Request sent → [network] → Server processing → [network] → First byte
              └──────────────── TTFB ─────────────────────┘

Total Latency

Full response time from request to complete response.

Latency Percentiles

We track percentiles across all validations:

MetricMeaning
latency_p50_msMedian (50% faster than this)
latency_p95_ms95th percentile (most users)
latency_p99_ms99th percentile (worst case)

TTFB Percentiles

Same percentiles for time-to-first-byte:

MetricMeaning
ttfb_p50_msMedian TTFB
ttfb_p95_ms95th percentile TTFB
ttfb_max_msSlowest TTFB observed

TTFB SLA Threshold

You can set a TTFB SLA threshold (default: 5000ms / 5 seconds).

TTFB SLA: 5000ms

If TTFB exceeds your threshold:

  • The validation is marked as SLA miss
  • Overall verdict may be DEGRADED
  • You'll see ttfb_sla_pass: false in results

This catches "slow but technically working" scenarios.

Impact on Verdict

Latency affects your verdict:

ScenarioVerdict
All fast, all correctUP
Fast, some correctness issuesDEGRADED
Slow (SLA miss), all correctDEGRADED
Slow + incorrectDOWN

You can have perfect correctness and still be DEGRADED if responses are too slow.

Latency Breakdown

Agent Status records timing components to help diagnose slow responses:

ComponentWhat It Measures
dns_msDNS resolution time
tcp_msTCP connection time
tls_msTLS handshake time
ttfb_msTime to first byte
total_msTotal response time

This helps identify whether slowness is:

  • Network — High DNS/TCP/TLS times
  • Server — High TTFB after connection established
  • Response size — Total >> TTFB (large response)

Configuring TTFB SLA

When creating or editing an agent:

TTFB SLA Threshold: 5000 (ms)

Recommendations:

Agent TypeRecommended TTFB SLA
Chatbot3000-5000ms
Code assistant5000-10000ms
Simple Q&A2000-3000ms
Streaming responses1000-2000ms (first chunk)

Viewing Latency Data

In Dashboard

  • Latency shown on agent cards
  • Historical graph shows P50/P95 over time
  • Click a run to see per-validation latencies

Via API

GET /api/v1/agents/{agent_id}/status

Returns:

{
  "latency_p50_ms": 1234,
  "latency_p95_ms": 2456,
  "ttfb_p50_ms": 890,
  "ttfb_sla_pass": true
}

Troubleshooting Slow Responses

High DNS time:

  • DNS server issues
  • Missing DNS caching
  • CDN not configured

High TCP/TLS time:

  • Server far from users
  • No edge caching
  • TLS misconfiguration

High TTFB (after connection):

  • Slow model inference
  • Database queries
  • Upstream API calls
  • Cold starts (serverless)

High total (after TTFB):

  • Large response payloads
  • Streaming taking too long
  • Network throttling

Need more help?

Our support team is available to assist you

Contact Support