Your agent works. Your model provider is up. Your prompt evals pass. Your dashboard is green.
Then a customer pings you: "The 'lookup_order' tool keeps failing."
You check the agent endpoint, 200 OK. You check the LLM API, green. You check your retrieval store, fine. Two hours later you discover the problem wasn't in any of those layers. The Model Context Protocol (MCP) server that exposes your tools to the agent had silently changed transports during a deploy. The agent was reaching it. The handshake was failing. Every tool call was returning an error wrapped in a perfectly valid agent response.
This is the layer almost no monitoring stack covers, and the one most likely to break next.
Why MCP Is Different (And Dangerous)
MCP is the protocol that exposes tools, resources, and prompts to LLM agents. In 2025 it was a curiosity. By mid-2026 it's everywhere, Claude Desktop, Cursor, agentic IDEs, internal LLM gateways, and most production agent stacks now sit on top of one or more MCP servers.
Three properties make MCP failures uniquely invisible to traditional monitoring:
- The agent is the consumer, not the user. A broken MCP call doesn't return a 5xx to your user-facing API. It returns a successful agent response that contains an error message, an apology, or worse, a hallucinated answer.
- Transports drift. A server can switch between HTTP, SSE, and Streamable HTTP between deploys. Your agent SDK may downgrade silently. Your monitoring sees a 200 because tools/list returned something, even if that something is unusable.
- The contract isn't a URL. It's a JSON-RPC method set.
tools/list,resources/list,prompts/list, plus per-tooltools/call. None of those show up in URL-level uptime checks.
If you're monitoring MCP servers the way you monitor REST APIs, you're not monitoring them at all.
Failure Mode 1: The Transport Swap
In the Agent Status validate path, this is the difference between:
result['error'] = 'transport_not_supported: sse'
result['transport_type'] = 'sse'
result['transport_supported'] = Falseand the silent-200 your dashboard would otherwise report.
Failure Mode 2: The Vanishing Tool
A real validate should answer three distinct questions:
| Question | Validation |
|---|---|
| Is the server reachable? | tools/list returns 200 with valid JSON-RPC |
| Are the right tools present? | Diff result.tools against expected set |
| Do the tools actually work? | tools/call with a known input then assert known output |
A tool that exists in the listing but errors on every call is the most expensive kind of broken: it looks healthy and acts dead.
Failure Mode 3: The Slow Discovery
Failure Mode 4: The Auth Boundary
Failure Mode 5: The Smithery / Gateway Hop
What "Real" MCP Monitoring Looks Like
A complete MCP validate answers six questions on every cycle:
- Reachability, did
tools/listsucceed? - Transport, is the response in a transport my agent's SDK supports?
- Inventory, are the expected tools/resources/prompts present and correctly named?
- Schema, does each tool's input/output contract match what the agent prompt expects?
- Behavior, does at least one
tools/callper critical tool return the expected output for a known input? - Performance, is discovery + invocation latency within the user-experience SLA?
A failure of any one is a customer-visible defect, even when the URL still returns 200.
This is exactly the validate surface Agent Status runs against your MCP servers. The verdict isn't UP/DOWN based on HTTP. It's:
UP, reachable, transport supported, all tool validations passDEGRADED(mcp_tool_fail), reachable but one or more tool calls failDEGRADED(mcp_transport_unsupported: <type>), server changed transports out from under your clientDOWN(mcp_unreachable), handshake failed entirely
Each verdict is attributable, alertable, and actionable.
The Validation Profiles That Matter
Different MCP servers need different validate depths. We use four:
| Profile | What it does | When to use |
|---|---|---|
health_only | tools/list only | Public servers, cheap continuous checks |
full_discovery | tools + resources + prompts | Servers exposing more than just tools |
tool_contract | Discovery + named tools/call validations | Production servers with known critical tools |
full_validation | Discovery + auto-generated validations from listings | Servers under active development |
Most teams default to health_only, which is roughly equivalent to a TCP ping. It's better than nothing, and it misses everything that matters.
Quick Wins
If you only do three things this week:
1. Diff your tool inventory daily
Snapshot tools/list once a day. Diff against yesterday. Alert on any change you didn't ship.
2. Validation at least one critical tool end-to-end
Pick the single MCP tool whose failure would hurt customers most. Call it on every validate cycle with a known input. Assert known output. This catches 80% of "broken in a way nobody notices" cases.
3. Validation through the same path your agent uses
If your agent calls MCP through a gateway, validate through the gateway. If it authenticates, your validate authenticates. The validate's job is to be indistinguishable from the agent, anything else is theater.
The Bottom Line
The MCP layer sits between your agent and everything it can actually do. When it breaks, your agent doesn't fail, it lies. It returns confident, fluent responses that just happen to omit the capabilities it lost.
Traditional uptime monitoring catches none of this. URL pings, HTTP status codes, and even LLM evaluators can't see a tool that's been silently renamed, a transport that quietly switched, or a gateway that started rejecting connections at the boundary.
If your agent depends on MCP, and increasingly, every production agent does, you need monitoring designed for the protocol, not for the URL it happens to live behind.
Your MCP server is one renamed tool away from a customer-visible regression. The only question is whether you'll find out from a validate or from a support ticket.