Is MCP safe? The security risks AI assistants don't mention

Published '2026-07-05'

You connected your AI assistant to a dozen MCP servers this year. Calendar, email, knowledge base, the works. Here's what nobody mentioned when you clicked allow: the protocol underneath those connections was built for functionality, with security left as someone else's problem. Peer-reviewed research published in 2026 shows the attacks succeed far more often than you'd guess from clicking allow.

Quick answer > MCP (Model Context Protocol) is not safe by default. A 2026 DSN paper showed tool-poisoning attacks succeed 80–100% of the time against Claude Sonnet 4 and Gemini 2.5 Pro. A separate audit of six public MCP registries found 833 vulnerable servers, 9 leaked GitHub tokens (5 still valid), and over 200 hijackable accounts. Treat every MCP server as untrusted code.

What does an MCP server actually have access to?

An MCP server is a process that exposes tools to an AI assistant. Once you connect one, the assistant can call any tool that server exposes, with whatever credentials you handed it during setup. There is no permission prompt per call in most clients. You authorized the server once; it can do whatever its tools allow for as long as it stays connected.

In practice that means: if you connected an MCP server to read your calendar, it can read your calendar. If that server is malicious or gets compromised later, it can still read your calendar. The protocol does not constrain what the server does with the data it reads, and most clients do not log individual tool calls in a way you can review afterward.

The attack surface grows with every server you add. Two servers from different authors can both claim the tool name send_email, and your AI client may not be able to tell them apart.

How does a tool-poisoning attack work?

A 2026 paper accepted to the DSN conference (arXiv 2510.16558) documents two attack classes that work against the AI clients people actually use.

The first is tool-name collision. When two MCP servers expose tools with identical names, Cursor silently invokes the first-listed server, ignoring its own mcp_<server>_<tool> prefix disambiguation. In the paper's controlled test, the model selected a tool from server B, and Cursor called server A instead. The user saw no warning. The bug is corroborated by at least four Cursor forum threads and a matching issue in OpenAI's Agents SDK.

The second is tool poisoning. A malicious server crafts its tool descriptions or error messages to include instructions that override the model's actual task. When the model reads the tool list (which it does before every call), it follows the injected instruction. Across Cursor, Windsurf, and Cline, the attack succeeded 80–100% of the time on Claude Sonnet 4 and Gemini 2.5 Pro. GPT-4o was more conservative but not immune. Related work (MCPTox, MindGuard) confirms the threat class is real and growing.

This isn't a theoretical vulnerability. It is a near-deterministic attack against the models most people are using right now.

What went wrong on public MCP registries?

The same paper audited six public MCP registries (67,057 servers total) in mid-2025 and found:

833 servers with confirmed security vulnerabilities.
9 GitHub personal access tokens leaked in configuration examples on mcp.so. 5 were still valid when the researchers checked.
212 deleted GitHub accounts that could be re-registered by an attacker, plus 304 redirected accounts vulnerable to reclamation. Because most MCP servers are hosted on public GitHub repos, a reclaimed account means a hijacked server, and every client that ever connected to it.

Anyone running an MCP server they found on a public registry is depending on the author's operational hygiene. Most authors are hobbyists. Some have stopped maintaining the repo. A few have abandoned the GitHub account.

Which AI clients are most exposed?

Any client that auto-loads servers and passes tool descriptions to the model is exposed to tool poisoning. The DSN paper tested Cursor, Windsurf, and Cline; the mechanism applies to any MCP client.

Claude Desktop, VS Code Copilot, and ChatGPT's Developer Mode MCP all read tool descriptions into the model context. None of them sandbox the server process by default. ChatGPT's MCP support (still in Developer Mode beta as of July 2026) is remote-only, which removes the localhost attack vector but does nothing if the remote server itself is compromised.

How do you use MCP safely?

You don't have to stop using MCP. You do have to stop trusting it.

Audit what you've connected. Open your client's MCP config (in Claude Desktop it's claude_desktop_config.json) and remove any server you don't actively use. Every connected server is a live attack surface.
Read the source. If the server is on GitHub, check the last commit date, the issue tracker, and whether the author's account is active. Abandoned repos with live tokens are exactly the hijack scenario the researchers found.
Scope the credentials. Never give an MCP server a token with more permissions than the server needs. A calendar reader should not have write access to your entire Google Workspace.
Prefer revocable auth. Long-lived OAuth tokens that live for the whole token family are the wrong default for something you don't fully trust. Use scoped, individually revocable tokens where the platform offers them. Calmara's MCP layer, for example, issues cmra_pat_* personal access tokens that are SHA-256 hashed at rest, scoped to a subset of 13 OAuth scopes, and revocable one at a time without invalidating anything else.
Run it where you can watch it. A self-hosted MCP server on your own infrastructure is auditable in a way a third-party hosted one never is. Calmara ships a self-hosted MCP container for exactly this reason: every mutating tool call lands in a tool_call_audit log, and per-tool allowlists let you restrict which tools each user can call. Restricted tools return unknown_tool to unauthorized callers, so the tool's existence doesn't leak either.

Treat MCP like a browser extension from an unknown developer. Sometimes that's fine. Sometimes it's how the credentials leave the building.

FAQ

Is MCP safe to use?

MCP is safe to use the way browser extensions are safe to use: some are fine, some are malicious, and the protocol itself does not protect you. Verify the source, scope the credentials, and remove servers you don't use.

Has anyone actually been hacked through MCP?

No public breach has been attributed to an MCP tool-poisoning attack as of July 2026, but the research showing 80–100% attack success rates is recent. The vulnerability class is established; the incidents usually lag the research by months.

Does Anthropic vet MCP servers?

Anthropic curates a directory of connectors for Claude, but a curated connector and an arbitrary third-party MCP server are different things. Servers you add manually to claude_desktop_config.json are not vetted by anyone.

Which AI models are most vulnerable to MCP tool poisoning?

Per the DSN 2026 paper, Claude Sonnet 4 and Gemini 2.5 Pro were successfully attacked 80–100% of the time across Cursor, Windsurf, and Cline. GPT-4o was more conservative but not immune.

Can I use MCP without exposing my credentials?

Not really. The whole point of MCP is that the server does something on your behalf, which requires credentials. The mitigation is scoping (give the server the minimum it needs) and revocation (use tokens you can kill individually).

Is Calmara's MCP safer?

Calmara ships a self-hosted MCP container with scoped revocable tokens (cmra_pat_*), a per-tool user allowlist, and a tool-call audit log. That is a meaningful defense-in-depth difference versus connecting to an unvetted server from a public registry. It is not a complete defense (no MCP layer is), but it removes the "I connected something I don't understand and now my credentials are gone" failure mode.