Security & Access Control

MCP Best Practices: Production Checklist

By Ralph Duin·

The gap between a working MCP server and an enterprise-ready one is mostly a checklist problem. Teams that skip it end up with unauthenticated endpoints, hung agent loops from unbound tool calls, and no audit trail when something goes wrong. This guide organizes the critical MCP best practices into three domains—Security, Operations, and Governance—so each item lands with the team that owns it.


Security Checklist

These items are one-liners here; for the full threat model and prioritized controls, see the MCP Security Enterprise Guide.

1. Require mutual TLS or token-based auth on every connection. Unauthenticated MCP endpoints are open sockets—at enterprise scale, any internal service reachable by an agent is a lateral-movement target.

2. Apply least-privilege scopes to every tool. Blanket admin credentials in an MCP server are the equivalent of a service account with domain-admin rights.

3. Validate and sanitize all inputs before passing them to tools. Prompt injection via MCP is real; attacker-controlled text in a document can become an unauthorized tool call if the server trusts model output blindly.

4. Run each MCP server in an isolated process or container. A compromised tool server should not be able to read the secrets or memory of adjacent servers—process isolation is the simplest blast-radius limiter.

5. Rotate credentials and tool tokens on a defined schedule. Long-lived static credentials are the most common source of post-breach persistence.

6. Log every tool invocation with full request and response payloads. You cannot investigate an incident you cannot replay.


Operations Checklist

This is the domain most checklists omit. The items below—timeouts, circuit breakers, retry budgets, schema versioning—are where production MCP deployments actually fail.

7. Register all MCP servers through a central gateway, not point-to-point. Ad-hoc direct connections proliferate silently; a gateway gives you a single plane to enforce policy and observe traffic. See What Is an MCP Gateway? for the full architectural case.

8. Version every tool schema and treat breaking changes like an API contract. Models cache tool descriptions; a silent schema change breaks downstream agents in ways that are hard to trace without version history. Tag schemas with a version field and maintain a changelog—schema drift is the most common cause of silent agent regressions.

9. Set explicit timeouts on all tool calls. A slow or hung external tool stalls an agent's entire reasoning loop. No tool call should run unbounded; set a wall-clock limit appropriate to the tool's expected latency.

10. Add circuit breakers between the agent runtime and each tool server. Repeated failures against a degraded backend cascade into workflow failures across every session sharing that server. A circuit breaker opens after a threshold of errors and allows periodic probe requests to detect recovery—preventing the thundering-herd retries that worsen outages.

11. Define retry budgets and idempotency requirements per tool. Not all tools are safe to retry—a payment or email tool called twice due to a transient failure is a production incident. Classify each tool as idempotent or non-idempotent at registration time and enforce retry limits accordingly.

12. Monitor token consumption per tool and per model session. A single misconfigured agent calling a verbose tool in a loop can exhaust a month's token budget overnight. Alert on per-session token velocity, not just aggregate monthly spend.

13. Implement graceful degradation for unavailable tool servers. Agents should continue with partial capability rather than hard-failing. Design fallback paths before you need them in an outage—"tool unavailable" should be a handled state, not an unhandled exception.


Governance Checklist

For the broader AI governance framework these items sit inside, see Enterprise AI Governance.

14. Maintain a published inventory of all active MCP servers and tools. You cannot govern what you cannot enumerate; an up-to-date registry is the foundation for compliance reporting.

15. Assign a human owner to each MCP server, with a review cadence. Ownerless servers accumulate permissions and drift from their original purpose.

16. Enforce approval workflows for new tool registrations in production. An unreviewed tool granting excessive access to sensitive systems is a higher risk than the inconvenience of a review gate.

17. Produce auditable records of which model accessed which tool under which policy. Tool-level audit logs are the atomic unit of AI decision evidence for regulators and internal auditors.

18. Review tool permissions quarterly alongside your access certification cycle. MCP servers accumulate scope creep the same way service accounts do; periodic recertification catches permissions that outlived their purpose.

19. Establish incident-response playbooks specific to MCP scenarios. A prompt injection event, an unauthorized tool call, and a data exfiltration via MCP each require different containment steps. Generic IR playbooks miss MCP-specific signals.


What Changes at Enterprise Scale

Volume and heterogeneity are the two forcing functions. Thousands of tool calls per hour make manual review impossible and require automated anomaly detection; dozens of MCP servers built by different teams in different languages mean governance must be enforced at the gateway layer, not inside each server individually.


Frequently Asked Questions

Do these best practices apply to locally hosted MCP servers?

Yes. Local servers are often treated as inherently trusted, but they run on machines that can be compromised and are called by models that accept untrusted input. Apply auth, isolation, and logging regardless of where the server runs.

How often should we review our MCP tool inventory?

Align reviews with your existing access recertification cycle—typically quarterly. Trigger an out-of-cycle review whenever a new team onboards, a new data source is connected, or a security incident touches any MCP component.

Is a gateway strictly required, or can we enforce controls per-server?

Per-server enforcement scales poorly. Each new server requires its own configuration, logging integration, and policy implementation. A gateway centralizes all of that—one place to update a policy, one place to pull audit logs. Beyond a handful of servers, the operational cost of the per-server approach consistently outweighs the complexity of running a gateway.

MCP Beast

MCP Beast implements this checklist as infrastructure. The gateway enforces authentication, scoping, timeouts, and circuit breakers at the connection layer—without modifying individual servers. The registry maintains a live inventory of every server and tool with owner metadata. Audit logs are produced at the tool-call level in a structured format compatible with SIEM ingestion. Quarterly access reviews are supported with exportable permission snapshots.

See how MCP Beast maps to each item on this checklist →


Related: