Security & Access Control

AI Agent Security: Threats AppSec Misses

By Antoine van der Lee·

Security teams that spent years hardening web apps and APIs are discovering that ai agent security breaks nearly every assumption those controls were built on. The attack surface is different. The failure modes are different. And the budget line that covers it is, in most enterprises, still blank.

This article maps the real threats — not the theoretical ones — so you can start honest conversations with leadership before an incident forces the issue.


Why Traditional AppSec Fails for Agents

Classic application security is built around one mental model: a human authenticates, a session is created, requests are audited, and the session ends. WAFs, SAST scanners, and RBAC policies all assume a bounded, human-initiated interaction.

AI agents violate every one of those assumptions. They act autonomously over extended time horizons, call tools in sequences no human choreographed at runtime, and operate under system prompts that security teams often never review. A WAF cannot catch an agent doing something semantically wrong but syntactically valid. A SAST scanner cannot find a vulnerability in a model's reasoning. Traditional AppSec is necessary — and no longer sufficient.

Four failure modes define the gap: non-human identity sprawl, runaway blast radius, autonomous loops with no kill switch, and indirect injection via documents, email, and memory stores. Each is addressed below.


Non-Human Identity Sprawl

The fastest-growing identity problem in enterprise environments is not human accounts — it is non-human identities (NHIs). Service accounts, API keys, OAuth tokens, and agent credentials are proliferating faster than IAM teams can track them.

AI agents accelerate this in two ways. First, each agent deployment typically needs its own credentials to call downstream tools and APIs: a single agentic workflow touching five systems can generate five new service identities, each with its own secret, rotation schedule, and blast radius if compromised. Second, because agents are easy to spin up — often by developers or line-of-business teams without formal IT review — those identities frequently bypass the provisioning controls that govern human accounts.

The result: credentials living in environment variables, CI pipelines, and model context windows, none of which appear in your identity governance platform. Governing NHIs for agents requires the same rigor applied to human identities — centralized issuance, automatic rotation, least-privilege scoping, and continuous discovery of out-of-band credentials. See AI Agent Access Control for a detailed framework.


Blast Radius of Autonomous Actions

Human operators make mistakes slowly. An analyst who misconfigures a firewall rule does so in one deliberate step. An AI agent can issue hundreds of API calls in seconds, each compounding the previous.

When an agent has broad permissions and makes a wrong decision — because of a bad prompt, a malicious injection, or a model hallucination — damage propagates across systems before any alert fires. Consider a customer-support agent with write access to a CRM, an email system, and a billing platform: a single injected instruction in a customer message could update records, send unauthorized communications, and modify billing data in one autonomous session. The permissions were each individually reasonable. The combination was never evaluated as a risk surface.

Blast radius control means thinking about agent permissions the way you think about network segmentation — not just "can this agent do X?" but "if this agent is compromised or confused, what is the maximum damage it can inflict?" That analysis must drive permission scoping before deployment, not after.


Runaway Loops and Kill Switches

Agents that call tools can also call agents. Multi-agent architectures are becoming common because they perform better on complex tasks. They also introduce a failure mode most security runbooks don't cover: the runaway loop.

A runaway loop occurs when an agent repeatedly calls a tool or re-invokes itself because its exit condition is never met — due to ambiguous instructions, unexpected tool outputs, or a model that misinterprets state. With no hard stop, the agent burns compute, makes API calls, and potentially exfiltrates or mutates data until an operator manually intervenes, if they notice at all.

Kill switches are a basic operational control. At minimum, every agent deployment needs:

  • Hard iteration caps — a maximum number of tool calls per session, enforced at the infrastructure layer, not the model layer.
  • Human-in-the-loop checkpoints — mandatory pauses before high-consequence actions: deletes, sends, payments.
  • Out-of-band termination — an operator can stop an agent without routing through the agent itself.
  • Session time-to-live — sessions exceeding a wall-clock threshold are automatically terminated and logged.

None of these require exotic tooling. All of them are routinely absent in early-stage agentic deployments.


Indirect Injection via Documents, Email, and Memory Stores

Agents introduce exfiltration vectors that don't exist in conventional software. The most significant is indirect prompt injection: an attacker embeds instructions in content the agent will process — a web page, a document, an email — and those instructions redirect the agent's behavior, including instructing it to summarize and transmit sensitive context it has accumulated. For a deeper treatment of this attack class, see Prompt Injection and MCP.

Because the agent's output channel — a chat response, an API call, a document it writes — is also a data channel, exfiltration can look identical to normal operation. There is no outbound port to block, no unusual binary to detect. The agent is doing exactly what it was designed to do, and the attacker has simply redirected where the output goes.

Secondary paths include:

  • Tool call parameters — sensitive data passed as arguments to external APIs that log request payloads.
  • Model context leakage — system prompts containing credentials or PII that the model reflects back in responses.
  • Long-term memory stores — vector databases that accumulate session context and are insufficiently access-controlled.

DLP strategies for agents require monitoring the semantic content of tool calls and outputs, not just network flows — a significant capability gap for most enterprise security teams today.


Building a Defensible Posture

The goal is not to prevent all agent use; the business value is real. The goal is to deploy agents without introducing unquantified risk.

A defensible posture starts with inventory — knowing every agent running in your environment, what identity it uses, and what tools it can call. Least-privilege enforcement and continuous observability (every tool call logged in a queryable format) then give you the foundation to enforce policy: actions above a risk threshold require approval, and certain action classes are categorically blocked regardless of prompt. Runbooks should treat "agent behaving unexpectedly" as a distinct incident class with defined triage steps. The full governance and policy layer is covered in Enterprise AI Governance; protocol-layer controls that complement identity enforcement are in MCP Security: Enterprise Guide.


MCP Beast

MCP Beast is an enterprise control plane that enforces the identity, policy, observability, and kill-switch controls described above across your MCP-connected agent fleet.


Frequently Asked Questions

Are AI agent security risks materially different from API security risks?

Yes. APIs have bounded, human-defined behavior. Agents have emergent behavior driven by model reasoning, meaning the same agent can behave differently given identical inputs depending on context accumulated earlier in a session. Threat modeling approaches that work for APIs do not transfer directly.

How should we handle non-human identities created by agent frameworks without IT involvement?

Start with discovery: audit your CI pipelines, environment variable stores, and secrets managers for credentials not provisioned through your standard IAM process. Then establish a policy that any credential capable of calling production APIs must go through formal provisioning, regardless of whether a human or a deployment script created it.

Is prompt injection a realistic enterprise threat or mostly theoretical?

It is realistic and actively exploited in research environments. The practical risk scales with how much external content your agents process — web scraping, email processing, and document analysis all represent live attack surfaces. Defense requires input sanitization, output monitoring, and architectural choices that limit what an injected instruction can do even if the model follows it.