
As AI agents move from assistants to operators, securing the model is no longer enough. The real risk now lives in the tools agents can access and the context they carry across systems.
Security always follows scale
The internet has a habit: we ship protocols that work first, then we secure them once adoption makes the risk impossible to ignore.
HTTP was revolutionary, but “HTTP over TLS” became the pragmatic default once the web scaled, and the HTTPS scheme and behavior were standardized early through RFC 2818.
That pattern matters because the Model Context Protocol (MCP), is starting to look less like a niche developer convenience and more like shared infrastructure for agentic systems.
If you like analogies, think of MCP as a universal power adapter for agents: suddenly your assistant can plug into everything. Convenient, yes. Also a much bigger fire risk if you do not standardize safety.
MCP is turning into critical infrastructure
Anthropic introduced MCP as an open standard for connecting AI assistants to the systems where data lives, aiming to replace one off bespoke connectors with a single protocol.
In practice, MCP gives you a consistent way to plug an AI system into tools, services, and data sources via MCP hosts, clients, and servers. The spec formalizes how those components communicate, negotiate capabilities, and think about security in real deployments.
The ecosystem is also being built in public, which is a strong signal that MCP is positioning itself as a widely adopted, vendor-neutral layer, not a single-vendor feature.
The reason MCP spreads is the same reason app ecosystems spread: it turns bespoke integrations into a reusable connector layer. But that convenience also moves the security boundary. When an agent can read files, query databases, and take actions through external tools, the attack surface becomes the tool layer and its surrounding trust model, not just the model prompt.
So the question is not whether MCP is useful. It clearly is. The sharper question is:
What happens when MCP scales faster than security habits do?
We are already seeing the incident pattern
We do not need to speculate as we already have early, concrete examples of the failure modes.
Data exfiltration via MCP connected applications. Invariant Labs describes an MCP related WhatsApp exploitation scenario focused on extracting chat history through the MCP integration surface.
Prompt injection leading to access of sensitive GitHub content. Simon Willison documents a GitHub MCP exploitation scenario involving prompt injection patterns that result in unintended access to private repositories through MCP connected workflows.
Classic vulnerability classes showing up in MCP tooling. NIST’s National Vulnerability Database entry for CVE-2025-6514 describes OS command injection exposure in “mcp-remote” when connecting to untrusted MCP servers, rooted in crafted input from an authorization endpoint response URL.
Different stories, same theme: once you add tool access, model steering becomes an operational security problem, not a prompt engineering problem.
And this is exactly why the UK National Cyber Security Centre (NCSC) has warned against treating prompt injection as a simple analog of SQL injection. Even the title is a forcing function: you cannot just reuse the old playbook unchanged.
Secure MCP is emerging, but security does not end at the protocol boundary
Researchers from Huazhong University of Science and Technology in China propose “Secure Model Context Protocol” (SMCP) as a layered security framework aimed at closing gaps in how agents connect to tools.
The interesting part is not the branding. It is the security shape of the proposal: SMCP tries to make trust measurable by turning “who is calling what, on whose behalf, with what risk” into protocol-native data.
- Trusted component registry establishes a unified identity and trust anchor, with structured identity codes, credential binding, and pre-access verification so only registered components participate in the invocation plane.
- Security extended capabilities push security up into discovery, attaching attributes like sensitivity level, minimum identity assurance, allowed delegators, retention requirements, and even whether outputs may enter long-term memory or be used for training, so the tool advertises constraints before it is invoked.
- Continuous security context makes call chain identifiers, authentication assertions, delegator chains, risk level, and data sensitivity first-class fields that travel with each invocation, so downstream systems can enforce policy with context instead of guesswork.
- Policy enforcement with obligations separates decision from enforcement, using explicit outcomes like PERMIT, DENY, or PERMIT WITH OBLIGATIONS, where obligations can include masking, aggregation, or limits, and invocations can return DOWNGRADED with applied security recorded.
- Correlated auditability ties tool-side audit references to end-to-end call chains and signed responses, so investigations can reconstruct what happened across agent, tool, and policy engine boundaries.
If MCP is the universal power adapter, SMCP is the fuse box: identity, policy, and traceability built into the wiring rather than bolted on afterwards.
But even if SMCP, or a secure evolution of MCP, becomes mainstream, it will not magically make real deployments safe. Protocol guarantees do not automatically fix weak implementations, risky defaults, or the messy reality of tool sprawl.
So what should defenders do now, while the ecosystem is still forming?
The risk is multi-step, not a single bug
A secure protocol reduces whole classes of accidental insecurity, but agent failures are often multi-step.
The recent “Promptware Kill Chain” paper, with authors affiliated with Harvard Kennedy School at Harvard University and the Munk School at the University of Toronto, frames attacks on LLM applications as staged campaigns: initial access, privilege escalation, persistence, lateral movement, and actions on objective.
This matters because MCP style systems can make later stages easier. Tools provide privilege. Memory and retrieval can become persistence. Connectors can become lateral movement.
And the web is not a safe environment for agents to “just read.” WebSentinel, whose authors are affiliated with Duke University and UC Berkeley, focuses on detecting and localizing prompt injection attacks for web agents, highlighting why localization matters for response and recovery.
Taken together, the lesson is straightforward:
Protocol hardening matters, but it does not prove that your specific deployment cannot be abused.
That proof only comes from validation.
The Hadrian lens: verification beats optimism
During my previous Hadrian research article, the core argument in “Beyond EPSS” is that probability scores and theoretical prioritization do not keep pace with current attacker timelines, and that adversarial validation is what turns uncertainty into evidence.
MCP fits this exact failure mode. It introduces new high leverage seams that do not look like a classic CVE backlog. Even worse, an attacker does not need to “compromise the model.” They can compromise or manipulate the context an agent consumes, or the tools it can invoke.
The Hadrian Offensive Security Benchmark Report 2026 reinforces the same point with blunt guidance: shift from visibility first to verification first, treat AI as a production dependency, and continuously test AI exposed attack paths.
MCP makes that message more urgent, because it multiplies both the number of things that might be risky and the speed at which a small mistake becomes an incident.
So the operational move is simple to say, and hard to execute:
Treat MCP exposed systems like internet facing production infrastructure, then continuously validate the paths attackers would actually use.
How to operationalize verification first for MCP
Start with inventory and ownership. Enumerate every MCP server, what it can do, where it runs, and who owns it. Treat it like a high leverage interface, because it is.
Define the trust model explicitly. Which third party servers are allowed, how are they authenticated, and what is the default when the agent cannot prove identity or policy compliance. SMCP is useful here as a reference architecture even if you are not implementing it today.
Assume the web is hostile. Web agents operate inside an untrusted execution environment, and prompt injection can be embedded in the very webpages the agent is asked to read. WebSentinel is a useful reminder that detection is not enough if you cannot localize what was contaminated.
Validate the seams attackers will actually use. The early MCP incident write ups and the CVE show that both social steering and classic vulnerability classes can appear in this ecosystem. You want evidence of exploitability and blast radius, not a theoretical checklist.
Measure the outcome. The Benchmark Report notes that many programs still measure coverage more than verified reductions in exploitable exposure. For agents, track the removal of validated agent attack paths, not just the number of policies you wrote.
Conclusion
MCP is becoming infrastructure. Secure MCP style thinking is necessary. If you ship MCP servers at scale without verification, you are creating a high speed path from untrusted input to high privilege action.
Security still needs proof.
Download the Benchmark Report
To benchmark your own program against what is actually being exploited, download Hadrian’s Offensive Security Benchmark Report 2026. It includes guidance on verification first security and how to continuously test AI exposed attack paths.
If you are deploying MCP servers or agentic workflows and want to move from theoretical safety to measurable assurance, connect with me on LinkedIn. I work with teams to translate Secure MCP principles into adversarial validation and structured evaluation benchmarks, so autonomy, high-privilege tool use, and context propagation are continuously tested against real attack paths, not assumed to be safe.
References
[1] – https://www.rfc-editor.org/rfc/rfc2818.html
[2] – https://www.anthropic.com/news/model-context-protocol
[3] – https://modelcontextprotocol.io/specification/2025-11-25
[4] – https://github.com/modelcontextprotocol
[5] – https://invariantlabs.ai/blog/whatsapp-mcp-exploited
[6] – https://simonwillison.net/2025/May/26/github-mcp-exploited/
[7] – https://nvd.nist.gov/vuln/detail/CVE-2025-6514
[8] – https://www.ncsc.gov.uk/blog-post/prompt-injection-is-not-sql-injection
[9] – https://arxiv.org/pdf/2602.01129
[10] – https://arxiv.org/pdf/2601.09625
[11] – https://arxiv.org/pdf/2602.03792v1
[12] – https://hadrian.io/blog/beyond-epss-redefining-validation-in-ctem
[13] – https://hadrian.io/resources/2026-offensive-security-benchmark-report







