AI Security

Agentic AI and the Identity Gap

David Goldschlag · May 17, 2026 · 14 min read

The adoption curve for agentic AI systems in production environments has diverged sharply from the security infrastructure available to govern them. Teams are deploying autonomous agents that call APIs, read and write databases, spawn sub-agents, and execute code on external systems — and the identity model governing those actions is, in most cases, either a shared API key stored in environment variables or nothing at all.

This isn't a new problem in kind. The same pattern played out with microservices — teams decomposed monoliths faster than security teams could develop service-to-service authentication frameworks. It played out with cloud infrastructure — teams provisioned cloud resources faster than IAM policies could be rationalized. What's different with agentic AI is the velocity: the gap between deployment and security infrastructure is measured in weeks, not years, and the agents themselves make more access decisions per unit time than any previous generation of automated systems.

Non-human identity management is the missing layer. The question is what "built correctly" actually looks like for AI agents, and where the non-human identity problem for agents differs from the same problem for conventional microservices.

What Makes Agent Identity Different

For a conventional microservice, identity is relatively static. The payment service has an identity; that identity is defined at deployment time; the identity doesn't change while the service is running. The authorization policy that governs what the payment service can access is a function of what the payment service is, not what it's currently doing.

Agents are different in at least two dimensions:

Dynamic spawning: An orchestrator agent can spawn sub-agents at runtime based on the task it's been given. Those sub-agents have an implicit identity relationship to the parent — they're acting on behalf of the orchestrator's mandate — but conventional workload identity systems don't have a way to express or enforce that relationship. The sub-agent gets its own identity, but the link to the parent's authorization context is lost.

Task-variable access patterns: A conventional service makes the same types of requests to the same downstream systems over its lifetime. An agent's access pattern varies dramatically by task: a research agent given a narrow question accesses a small set of documents; the same agent given a broad analysis task might attempt to access an entire knowledge base. The identity infrastructure needs to answer "is this agent authorized to access this resource for this task?" — a much richer question than "is this service authorized to call this endpoint?"

Neither of these differences changes the fundamental primitives of workload identity — SPIFFE IDs, OIDC tokens, access policies. But they do require the identity layer to be richer and more dynamic than what's needed for statically-deployed microservices.

The Three Identity Questions for Any Agent

Before deploying an agent system to production, you need answers to three questions that the identity model must be able to address:

1. What is this agent?

This is the attestation question. What verifiable fact proves that the workload making an access request is the agent you think it is, rather than some other process that's gotten hold of a token? For Kubernetes-deployed agents, the answer is the SPIFFE ID backed by the projected service account token — the Kubernetes API server attests that this pod is running under this service account. For agents running in Lambda or Cloud Run, the answer is the platform execution role attested by the cloud provider's instance metadata service.

The attestation mechanism needs to be cryptographic, not based on claimed identity. An agent saying "I am the document retrieval agent" is not attestation. The Kubernetes API server signing a token that encodes the pod's service account is attestation.

2. What is this agent authorized to do?

This is the access policy question. The answer should be explicit and verifiable, not implied by the agent's behavior or assumed from its role description. An agent authorized to read customer records should have a policy that specifies: client workload = this SPIFFE ID, server workload = customer records database, access scope = read-only.

The policy should be the source of truth, not the agent code. If a security engineer wants to understand what a deployed agent can access, they should be able to read the access policies — not audit the agent code to infer what API calls it might make.

3. Can you trace this access event back to its cause?

This is the audit question. For a conventional service, the answer is usually yes: the service is statically deployed, each access event is attributable to a service invocation which is attributable to a user request through the distributed trace. For agents, especially long-running or autonomously-triggered ones, the causal chain is harder to maintain. An agent that wakes up on a schedule and processes a backlog of tasks needs its access events to be attributable to specific task IDs, not just to "the agent ran at 2am."

The audit trail infrastructure needs to propagate a task context through the agent's access events, even when those events span multiple downstream systems and multiple agent invocations.

The Minimum Viable Identity Model

For teams deploying agentic AI today, the minimum viable identity model has five components:

Per-agent-type SPIFFE identities. Orchestrators, retrievers, executors, and any other distinct agent types each get their own Kubernetes service account and corresponding SPIFFE ID. Agents of the same type share an identity; agents of different types are distinguishable in access logs and policies. This is the foundation — without this, everything else is building on shared credentials, and you can't enforce access policies or interpret audit logs.

Explicit access policies per agent type per server workload. For each agent type, define every server workload it's allowed to access. Start from zero permissions and add only what's needed. A retrieval agent that needs read access to a vector store and nothing else should have exactly one server workload in its access policy set.

Short-lived tokens, not stored keys. Agents should not hold long-lived credentials. Every access event should use a token issued for that specific access, expiring within minutes. This limits the blast radius when an agent process is compromised — the attacker has tokens that expire rather than persistent credentials.

Task context in every access log. Build a task ID or pipeline run ID into the agent's context at invocation time and propagate it through all access events. The access log should show not just which agent made which request, but which task invocation triggered it.

Break-glass procedure per agent type. Define what happens when an agent type needs to be immediately suspended — what credential revocation looks like, how to disable the policy, and what the operational impact is. This should be a documented runbook, not something you're figuring out during an incident.

Where Current Frameworks Fall Short

Most agent orchestration frameworks — systems for building multi-agent pipelines — don't have native concepts of workload identity. They provide tool invocation, context management, and inter-agent communication, but they don't issue per-agent identities, don't manage credential scope, and don't produce structured access audit logs.

This is an infrastructure problem that agent frameworks have reasonably decided isn't their layer. The argument is that identity and access control belong in the underlying platform, not in the agent framework itself. That's correct in principle but creates a gap in practice: the platform identity infrastructure (Kubernetes service accounts, IAM roles) exists, but the connection between "this agent is making this tool call" and "this workload identity is authorizing this access" isn't made automatically. Someone has to build that connection, and currently that someone is usually no one.

The gap shows up when you try to answer the three identity questions above without purpose-built infrastructure. What is this agent? The orchestration framework says "it's the retrieval agent," but that's a string in the agent config, not a cryptographically attested identity. What is it authorized to do? Whatever the code lets it do. Can you trace this? Only if you instrumented the framework's tool calls to emit correlation IDs, which most teams haven't.

The Longer Horizon: Dynamic Identity and Task-Scoped Credentials

The per-agent-type identity model described above is the right first step, but it doesn't address the dynamic spawning problem: when an orchestrator spawns a sub-agent for a specific task, that sub-agent should ideally have a credential scoped to the specific task, not just to its agent type's general permissions.

Task-scoped credentials require a richer identity model: at task dispatch time, the orchestrator requests a task-specific token for the sub-agent that encodes the task's authorization context (which customer's data, which scope of operations, what time window). The sub-agent presents this task token when making downstream access requests. The downstream system can enforce task-level scoping in addition to agent-type-level scoping.

This is achievable with RFC 8693 token exchange, as described in our earlier post on autonomous AI pipelines. The orchestrator's SPIFFE token is combined with the task authorization context to produce a delegated token for the sub-agent. The sub-agent can only access resources within the delegated scope, even if its agent-type-level policy would permit broader access.

Most production deployments aren't there yet. The minimum viable model is the right starting point. But designing for task-scoped credentials from the beginning — even if you implement type-level scoping initially — means you won't need to rebuild the identity architecture when you need finer granularity.

Why This Matters Now

The agent deployment window is open. Teams that are shipping agentic systems to production right now are defining what their identity architecture looks like — by design or by default. The "by default" path is shared credentials, no audit trail, and access scope limited only by what the agent code happens to request. That path works until it doesn't: until a compromised agent has persistent credentials to your production systems, until an incident requires forensics across an audit trail that doesn't have the right attribution, until an auditor asks what each agent is authorized to access and the honest answer is "we're not sure."

The "by design" path requires investment that teams under deployment pressure often don't feel they can make. We're not saying slow down — the product velocity is real and the competitive pressure is real. We're saying the identity infrastructure work is smaller than it appears when you're looking at it from the outside, especially if you start with the minimum viable model rather than trying to implement the full task-scoped delegation system immediately. The five components above can be implemented in a few days for a new agent system, and they provide a foundation that's genuinely auditable and secure.

The agents are here. The identity infrastructure needs to catch up.