Security Engineering

OIDC Token Lifecycle Deep Dive

Raj Patel · · 14 min read
OIDC Token Lifecycle Deep Dive

OIDC tokens are the core primitive in workload authentication — they're what replaces passwords and API keys when you build a secretless identity model. But "use OIDC tokens" glosses over the decisions that determine whether your implementation is actually secure: how claims are scoped, how validation is done correctly, what expiry windows make sense, and what you need to log to reconstruct an authentication event later.

This post goes through the lifecycle of an OIDC token used for workload-to-workload authentication: from issuance through validation and logging. We'll focus on the security-relevant decisions, not the OAuth 2.0 flow plumbing that's already well-documented.

Anatomy of an OIDC JWT

A JWT has three base64url-encoded sections separated by dots: header, payload, and signature. The header identifies the signing algorithm and key ID. The payload carries the claims. The signature proves the token was issued by the expected issuer and hasn't been modified.

For workload authentication, a typical payload looks like:

{
  "iss": "https://idp.internal/oauth2",
  "sub": "spiffe://cluster.internal/ns/payments/sa/payment-processor",
  "aud": ["https://vault.internal"],
  "iat": 1746200000,
  "exp": 1746200900,
  "nbf": 1746200000,
  "jti": "8f4e2a91-3c7b-4d5e-b1f2-9a0e3d6c8b5f",
  "workload": {
    "cluster": "prod-us-east-1",
    "namespace": "payments",
    "service_account": "payment-processor",
    "pod_name": "payment-processor-7d9f8c6b5-xr2kp"
  },
  "scope": "vault:read:secret/payments/*"
}

Walk through the security-relevant claims:

iss (Issuer)

The issuer claim identifies who signed the token. The resource server (Vault, your service, Snowflake) must validate that the issuer matches an expected value. An issuer mismatch should be an immediate hard rejection — it means the token was signed by an unexpected authority and either reflects a misconfiguration or an attempted token injection attack.

The issuer URL must be HTTPS. It should point to a discovery document at https://<issuer>/.well-known/openid-configuration that includes the JWKS URL. Token validators should fetch the JWKS dynamically from that discovery document, not hardcode the keys — hardcoded keys make key rotation require code changes.

sub (Subject)

For workloads, the subject should be the SPIFFE ID: a URI in the form spiffe://<trust-domain>/<path> that uniquely identifies the workload within your infrastructure. This is the claim the resource server uses for authorization — "which workload is this?"

The trust domain (cluster.internal in the example) should be consistent within a trust boundary. If you have multiple Kubernetes clusters, they can share a trust domain or have separate ones — the decision depends on whether you want cross-cluster identity claims to be valid at your resource servers.

Avoid using machine-generated IDs (instance IDs, pod UIDs) as the subject. Use the stable logical identity — service account name, role name, pipeline identifier. Instance-level identities make authorization policy maintenance difficult because the subject changes with every deployment.

aud (Audience)

Audience binding is one of the most important security controls in a token. The audience claim specifies which resource server this token is intended for. The resource server must validate that its identifier appears in the aud claim — if it doesn't, the token must be rejected even if the signature is valid.

Without audience validation, a valid token issued for one service can be replayed at any other service. A token issued for vault.internal would also work at snowflake.internal if they both accept tokens from the same issuer and don't check audience. That's a lateral movement path: compromise one service's token, use it everywhere.

Audience values should be specific resource server URIs, not broad scope strings. https://vault.internal is correct. internal-services is too broad — it describes a class of services, not a specific one.

iat, exp, nbf (Time claims)

All three time claims matter for workload tokens:

  • iat (issued at): when the token was created. Used for logging and anomaly detection. A token with an iat far in the past but still within exp could indicate a token that was delayed in transit or, more concerning, was stored and replayed.
  • exp (expires at): when the token becomes invalid. For workload tokens, 15 minutes is a reasonable default. Shorter (5 minutes) provides better security properties but increases failure risk if there's any clock skew between issuer and validator. Longer (1 hour) is acceptable for batch jobs with long startup times.
  • nbf (not before): the token is not valid before this time. Typically set to the same time as iat, but can be set slightly in the future to handle a propagation window — useful if the token is generated before the workload that needs it starts.

Clock skew tolerance on the validator side should be narrow — 30 to 60 seconds is typical. Some validators allow 5-minute skew tolerance by default, which effectively turns a 15-minute token into a 25-minute token. Tighten that to 60 seconds unless you have specific infrastructure with known clock sync issues.

jti (JWT ID)

The jti claim is a unique identifier for the token. Its security value depends on whether you implement a token replay cache. If you maintain a cache of recently-seen jti values and reject tokens whose jti has already been seen, you prevent an attacker who intercepts a valid token from reusing it within its validity window.

A token replay cache requires state that your validators need to share. For distributed services, this means a shared cache (Redis, Memcached) with TTL matching the token exp. It's operationally more complex than stateless validation. The tradeoff is real: replay attacks against short-lived workload tokens are rare in practice, but the window exists. Our recommendation is to implement replay detection for high-value resource servers (secrets managers, databases) and skip it for lower-sensitivity services where the operational cost isn't justified.

Validation Sequence

A secure token validation implementation checks these things in order:

  1. Structural validity: Is it a properly formatted JWT? Three base64url-encoded sections? Valid JSON in header and payload?
  2. Algorithm check: Is the algorithm in the header an expected algorithm? Reject alg: none absolutely. Restrict to RS256, RS384, PS256, or ES256. Never accept HS256 for workload tokens — symmetric algorithms require the validator to know the secret, which defeats the purpose of a PKI-based identity model.
  3. Signature verification: Fetch the public key from the JWKS endpoint using the kid (key ID) from the header. Verify the signature. Fail hard if the key is not found — don't fall back to trying all keys in the JWKS, as that creates an oracle attack surface.
  4. Issuer match: Does iss match a configured trusted issuer?
  5. Audience match: Does aud contain this resource server's identifier?
  6. Time validity: Is the current time after nbf and before exp? Apply narrow clock skew tolerance.
  7. Subject authorization: Does the sub claim match a policy that allows this workload to access this resource? This is your authorization check, separate from authentication.
  8. Optional: JTI replay check: Has this jti been seen in the replay cache?

Steps 1-6 are authentication. Step 7 is authorization. Many implementations conflate them, but keeping them separate makes it easier to reason about security properties and to change authorization rules without touching the authentication logic.

JWKS Caching and Key Rotation

Every token validation requires access to the signing public key. Fetching the JWKS on every token validation would be both slow and a significant load on the JWKS endpoint. Validators should cache the JWKS with a TTL determined by the Cache-Control header on the JWKS response — typically 1 to 24 hours.

Key rotation creates a cache invalidation problem. When the issuer rotates its signing keys, existing cached JWKS entries won't include the new key. Validators will fail to verify tokens signed with the new key until the cache expires.

The correct handling is: on a kid not-found error (the key ID in the token header isn't in the cached JWKS), immediately re-fetch the JWKS before returning a validation failure. This handles the case where a key rotation happened after the last cache fill. If the re-fetched JWKS still doesn't contain the kid, return a validation failure — the token's key ID is genuinely unknown.

This logic is in most mature JWT libraries, but verify it for whichever library you're using. Libraries that don't implement the re-fetch-on-cache-miss will have an outage window proportional to their cache TTL every time keys are rotated.

What to Log for Each Token Validation

Every token validation event should produce a structured log entry. At minimum:

{
  "event": "token_validation",
  "result": "success",
  "jti": "8f4e2a91-3c7b-4d5e-b1f2-9a0e3d6c8b5f",
  "iss": "https://idp.internal/oauth2",
  "sub": "spiffe://cluster.internal/ns/payments/sa/payment-processor",
  "aud_presented": ["https://vault.internal"],
  "aud_expected": "https://vault.internal",
  "iat": 1746200000,
  "exp": 1746200900,
  "time_until_exp_seconds": 840,
  "validator_id": "vault-prod-us-east-1",
  "resource": "secret/payments/stripe-key",
  "operation": "read",
  "source_ip": "10.0.4.23",
  "ts": "2025-05-16T14:13:40Z"
}

For failure cases, add a failure_reason field with a specific value from a controlled vocabulary: expired, invalid_signature, unknown_issuer, audience_mismatch, not_yet_valid, jwt_replay. Avoid free-form error messages in structured log fields — they're hard to alert on and query.

Log failures at a higher severity than successes. A burst of invalid_signature failures from a specific source IP warrants an alert. A burst of expired failures might indicate a clock sync problem or a pipeline that's not refreshing tokens correctly.

Expiry Windows in Practice

The choice of token TTL involves a tradeoff between security and operational complexity. Shorter TTLs reduce the window for credential misuse but increase the frequency of token refresh operations and the blast radius of token service outages (if your token service is down, nothing can get new tokens).

For Aembit-issued workload tokens, we default to 15 minutes. The reasoning: 15 minutes is short enough that a leaked token has a limited useful life, but long enough to tolerate normal operational variance like a token service being briefly overloaded. It's also the TTL used by AWS STS for session tokens in many configurations, which means 15-minute tokens are what platform engineers are used to reasoning about.

We've seen teams go as low as 5 minutes and encounter issues when there's any network latency or retry logic in their pipeline — a token fetched at the start of a network-call-heavy operation can expire during that operation if the operation takes longer than expected. If you go below 10 minutes, build in pre-expiry refresh logic on the client side, not just try-and-retry on validation failure.

We're not saying 15 minutes is the right number for every system. A secrets manager might warrant 5-minute tokens because the sensitivity of the assets is very high. A read-only analytics database might be fine with 60-minute tokens where the operational simplicity benefit outweighs the marginally larger compromise window.

Token Exchange and Delegated Identity

The OAuth 2.0 Token Exchange specification (RFC 8693) defines how a service can exchange one token for another — converting a platform token (Kubernetes projected service account token) into a resource-server-specific token, or enabling delegation where service A calls service B on behalf of a user or upstream workload.

In workload identity systems, token exchange is how you implement the attestation-to-credential conversion: the workload presents a platform identity assertion (a Kubernetes OIDC token, an AWS STS token) to a token exchange endpoint, which validates the platform identity and issues an application-layer token scoped for specific resource access.

The exchange request carries the subject token plus the intended audience:

POST /oauth2/token HTTP/1.1
Content-Type: application/x-www-form-urlencoded

grant_type=urn:ietf:params:oauth:grant-type:token-exchange
&subject_token=eyJ...  (platform OIDC token)
&subject_token_type=urn:ietf:params:oauth:token-type:jwt
&audience=https://vault.internal
&scope=vault:read:secret/payments/*

The exchange endpoint validates the subject token, checks that the requesting workload is allowed to access the requested audience and scope, and issues a new token. This is the mechanism Aembit uses to convert platform identity into scoped resource credentials without any stored secrets in the chain.