MCP in Production: Giving AI Agents Safe Hands, Not Just a Voice

A chat window is a voice. It can explain, summarize, and draft. But the moment you ask it to look something up in your actual systems — your document library, your database, your CRM — the voice goes quiet, or worse, it makes something up.

The fix is giving the model hands: real, governed connections to the systems where your knowledge and work actually live. Model Context Protocol (MCP) is the most practical way I've found to do that, and I've now wired it into a production enterprise platform. This post covers what MCP is in plain terms, why it beats the integration approach most teams start with, where the security lines belong, and what a sensible two-week pilot looks like.

## What MCP actually is

MCP is an open protocol, originally published by Anthropic and now adopted broadly across the industry, that standardizes how an AI model connects to tools and data sources.

Before MCP, every connection between a model and a system was a custom build. You wrote glue code that described your database to GPT, then different glue code to describe it to Claude, then maintained both as the APIs changed underneath you. With three models and five systems, that's fifteen integrations. The math gets ugly fast.

MCP collapses that. You stand up one MCP server in front of a system — your standards library, your ticketing system, your data warehouse — and any MCP-compatible client can use it. The server publishes what tools it offers ("search documents," "fetch record," "create draft"), what inputs they take, and what they return. The model discovers those tools at runtime and decides when to call them.

The mental model I give executives: MCP does for AI-to-system connections what USB did for peripherals. You stop caring which model is on the other end, and you stop rebuilding the plug every time something changes.

## Why standardization beats one-off integrations

I built things the hard way first, so I can tell you specifically where one-off integrations break down.

**They couple you to a model vendor.** Custom function-calling code written for one provider's API format doesn't transfer cleanly. When a better or cheaper model ships — and in this market, one ships every quarter — your integrations are an anchor. With MCP servers, swapping the model is a config change, not a rewrite.

**They rot quietly.** A bespoke integration is code only one person understands, exercised only by one workflow. The underlying API changes, nothing alerts you, and the agent starts failing in ways that look like model stupidity. A standardized server is a single, testable surface with one place to fix.

**They don't compound.** This is the strategic point. The third MCP server you build is dramatically more valuable than the first, because every agent in your organization can now reach three systems instead of one. An agent that can search your document library *and* check your CRM *and* query your warehouse can answer questions no single integration could. One-off integrations are dead ends; servers are infrastructure.

The honest counterpoint: if you have exactly one model talking to exactly one system and that will never change, a direct integration is less ceremony. I have yet to meet the enterprise where that stays true for six months.

## What changes when your knowledge becomes agent-accessible

The concrete case: on the Spark AI platform I built for a standards-development organization, one of the services I'm proudest of is an MCP-based standards search. An agent connects through an MCP server to the organization's full technical standards library and answers engineering questions with citations back to the specific documents.

Here's what surprised me about what that changed.

**The questions got better.** When search is a keyword box, people ask keyword questions. When an agent can traverse a standards library, people ask the question they actually have: "What do our standards say about this connector type, and have the requirements changed?" The agent decomposes that into multiple searches, reads the results, and synthesizes an answer — with citations, so an engineer can verify rather than trust.

**Citations stopped being optional.** This was the single most important design decision. An answer about a technical standard that can't point to its source is a liability, not a feature. The MCP tools were built so that every retrieval returns document identifiers the agent must surface. Grounding the model in retrievable documents, and forcing it to show its work, is what makes the output usable in an environment where being wrong has consequences.

**The same server fed multiple experiences.** Once the standards library had an MCP front door, it wasn't only the chat agent that could use it. Other tools on the platform could reach the same governed interface. That's the compounding effect in practice: build the door once, and everything that comes later walks through it.

The general lesson for any CTO: your highest-value MCP target is the system your experts query every day but everyone else avoids because the interface is painful. Document libraries, internal wikis, and read-heavy databases are ideal first candidates.

## Security boundaries: where to draw the lines

This is the part decision-makers worry about most, and they're right to. An agent with hands can break things. The discipline that's worked for me in production:

**Read-only first.** Your first MCP servers should not be able to write anything. Search, fetch, summarize — that's it. A read-only agent over your standards library or warehouse can deliver most of the early value at a fraction of the risk. Earn the right to write operations with months of observed behavior, not weeks.

**Scoped credentials, never master keys.** The MCP server gets its own service account with the minimum permissions for its published tools, nothing more. If the server only needs to search documents, its credentials cannot touch user records. And the agent never sees raw credentials at all — it sees tools. The server holds the keys; the model holds nothing.

**The user's permissions are the ceiling.** An agent acting for a user must never see more than that user could see directly. Enforce this in the data layer — row-level security, properly scoped API tokens — not in the prompt. A prompt that says "don't show restricted documents" is a suggestion. A database policy is a fact.

**Human gates on anything irreversible.** When you do graduate to write operations, every consequential action goes through a human approval step. On the broader platform this MCP work lives in, autonomous pipelines draft and propose, but a person approves before anything ships. The agent prepares the work; a human owns the decision. That pattern has never once felt like bureaucracy — it's what lets you sleep.

**Log everything.** Every tool call, every input, every result. When an agent does something odd — and one eventually will — the difference between a five-minute diagnosis and a lost week is whether you can replay exactly what it saw and did.

None of this is exotic. It's the same least-privilege thinking you already apply to human access, applied to a new kind of user.

## Lessons from wiring it for real

A few things I'd tell my past self before the first production deployment:

**Tool descriptions are prompts.** The model decides which tool to call based on the description your server publishes. Vague descriptions produce vague behavior. I've gotten more reliability improvement from rewriting a tool description than from upgrading the model.

**Fewer, better tools win.** An agent staring at thirty overlapping tools picks wrong constantly. Five well-named tools with crisp boundaries outperform a kitchen sink every time.

**Design for token budgets.** A naive "search" tool that returns fifty full documents will blow out the model's context and degrade every answer. Return compact summaries with identifiers, and give the agent a separate fetch tool for drilling into one document. Retrieval and reading are different operations; model them that way.

**Test the agent, not just the server.** Your server can be perfectly correct and the agent can still use it badly. Build a small suite of realistic questions and check the agent's tool-call patterns against them whenever you change a model, a description, or a tool.

## What a two-week MCP pilot looks like

If this sounds worth testing, here's the shape of a pilot I'd actually run — scoped tightly enough to finish, real enough to mean something.

**Days 1–3: Pick one system and one audience.** One read-heavy knowledge source (document library, wiki, product database) and one team of five to ten people who query it weekly. Write down the ten questions they ask most. Those are your acceptance tests.

**Days 4–7: Stand up a read-only MCP server.** Two or three tools, maximum: search, fetch, maybe list. Scoped service credentials. Logging on every call from day one. Many systems already have community or vendor MCP servers you can adapt rather than build.

**Days 8–10: Connect an agent and grind the test questions.** Run all ten, inspect the tool calls, fix descriptions and result formats until the agent reliably finds and cites the right material. This iteration loop is where the pilot is won or lost.

**Days 11–14: Put it in front of the pilot team.** Real users, real questions, with logs running. Collect every miss. At the end you'll have three things: a working agent over a real system, a log-backed view of what people actually asked, and an evidence-based answer to "should we do more of this."

That last artifact is the valuable one. It turns the next conversation from speculation into a roadmap.

## Where to go from here

MCP isn't the whole answer to enterprise AI — you still need the agent design, the security model, and the judgment about which workflows deserve automation. But it's the connective layer that makes everything else reusable, and it's the difference between a chatbot that talks about your business and an agent that works inside it.

If you want to see the platform where this thinking runs in production, the [Spark AI case study](/projects/spark-ai-platform) covers the broader system. And if you're weighing an MCP pilot for your own document library, database, or CRM, [get in touch](/contact) — scoping that first two-week build is exactly the kind of conversation I enjoy.