Security

The Credential Rotation Trap

David Goldschlag · Apr 21, 2024 · 9 min read

The 90-day rotation policy exists in security standards because it's better than never rotating. It's not based on evidence that 90 days is the right interval — it's a heuristic from an era when credentials were primarily user passwords. For non-human workload credentials, 90-day rotation is both too infrequent and structurally unreliable. Teams that have 90-day rotation policies for service accounts and API keys mostly don't rotate on schedule, and the reason isn't lack of discipline.

Why Rotation Fails in Practice

Imagine a production service that connects to a PostgreSQL database using a password stored in a Kubernetes Secret. Rotating that password requires: generating a new password in your secrets manager, updating the database user's password, updating the Kubernetes Secret, triggering a rolling restart of the service (or waiting for the secret to be re-read, if you've implemented that), verifying the new credentials work, and decommissioning the old credentials.

Each step is straightforward in isolation. Together, they form a coordination problem across multiple systems. If your PostgreSQL is managed by one team, your Kubernetes deployment by another, and your secrets manager by a third, rotation requires coordination across all three. If the service is in production and the rolling restart causes brief connection drops, you need a maintenance window or a connection retry strategy that survives the cutover.

This is not a team discipline problem. This is a system that was not designed for rotation. The credential was issued once, stored in many places, and every consumer of that credential must be updated simultaneously. The difficulty of rotation is proportional to the number of places the credential lives and the number of systems that need to change in coordination.

The Propagation Problem

Long-lived credentials proliferate. When a credential is created for one purpose, it gets copied. The Postgres password goes into the Kubernetes Secret, the developer's local .env file for testing, a Terraform variable in the infrastructure repo, a note in a Confluence doc, possibly an email sent during a production incident last year. Each copy is a separate place that needs to change when you rotate, and you often don't know about all of them until one breaks after rotation.

This is the propagation trap. The longer a credential has lived, the more places it exists in, the harder it is to rotate safely. Security guidance says rotate frequently. The practical effect of that guidance is usually: teams rotate recently-created credentials and don't rotate old ones, because old credentials have unknown propagation and high rotation risk.

The result: the credentials with the highest proliferation risk — old, widely distributed, connected to critical systems — are exactly the ones that get rotated least often.

What "Safe Rotation" Actually Requires

For rotation to work reliably, you need:

A complete inventory of where the credential is used. This is harder than it sounds. For a secret that's been in production for two years, you need to know every service that reads it, every automation script that uses it, and every developer who has a local copy. Most teams don't have this inventory.
Dual-write capability during cutover. The old credential should remain valid for long enough that all consumers can pick up the new one. This means the backing service (database, API provider, etc.) must support two active credentials simultaneously. PostgreSQL can — you create a new user or update the password, and the old connection sessions can drain. AWS IAM access keys support this — you can have two active keys per user simultaneously. Many SaaS APIs do not — there's one API key and changing it immediately invalidates the old one.
Automated propagation. If updating a rotated credential requires manually updating 12 different places, you will not rotate on schedule. The only rotation that happens reliably is the rotation that requires zero human coordination — the system issues a new credential, distributes it automatically, and deactivates the old one without anyone needing to act.

Most rotation implementations in growing teams satisfy condition 2 for some credentials and fail on conditions 1 and 3. They rotate what's easy to rotate (secrets manager entries with one consumer) and leave unrotated what's hard (credentials with multiple consumers or poor inventory).

The 90-Day Heuristic Is the Wrong Frame

The security value of rotation is reducing the exposure window if a credential is compromised. If a database password is stolen today and you rotate every 90 days, the attacker has up to 90 days of access before the stolen credential stops working. If you rotate every 30 days, the window is 30 days. If you rotate every day, the window is 24 hours.

The logical endpoint of this reasoning is continuous rotation — credentials that expire automatically in minutes or hours, so a stolen credential is useless almost immediately after theft. This isn't a hypothetical. AWS STS temporary credentials are valid for 15 minutes to 36 hours. OIDC tokens from GitHub Actions are valid for a configurable period, defaulting to 1 hour. SPIFFE SVIDs default to 1-hour TTLs. Kubernetes ServiceAccount tokens bound to pods expire when the pod terminates.

The difference between these ephemeral credentials and the 90-day-rotation credential is not frequency of change — it's architecture. Ephemeral credentials are issued just-in-time by a trusted authority that evaluates whether the requesting workload is authorized at that moment. They don't need to be propagated to multiple places because they're never stored. They don't need rotation ceremonies because they expire automatically. The "rotation" happens continuously, invisibly, without any manual coordination.

We're not saying 90-day rotation is worthless for credentials that genuinely can't be made ephemeral. We're saying that treating the rotation interval as the primary security lever misses the structural problem: credentials that need manual rotation will be rotated infrequently regardless of the stated policy.

Vault and HashiCorp's Dynamic Secrets Model

HashiCorp Vault's dynamic secrets feature is worth understanding as an example of the ephemeral architecture applied to database credentials. Rather than storing a database password and rotating it, Vault dynamically creates a new database user with a unique random password when a service requests credentials. The credentials are valid for a configurable TTL (e.g., 1 hour). When the TTL expires, Vault automatically revokes the database user.

The result: every service request gets unique, time-limited credentials. There's no "stored credential" to rotate. The compromise exposure window is bounded by the TTL regardless of when the compromise occurred. Revocation is automatic.

This is the right architecture for any credential that can support it. The question to ask about each credential type in your system: can we make this ephemeral and issued-on-demand, rather than static and rotated periodically? For many credentials, the answer is yes — with the right tooling in front of the backing service.

The Audit Trail Problem

Rotation under the static-credential model has a secondary problem: audit trails. When a database credential is shared across multiple service instances (a common pattern for read replicas and connection pooling), any access log entry from that database says "credential X authenticated." It doesn't say which service instance used the credential, which pod, which request chain. All consumers are indistinguishable from the database's perspective.

With dynamic, per-workload credentials, the database logs can show which unique credential was used — and each credential maps to a specific workload that was issued it. The audit trail is structurally richer without any additional log collection work, because the uniqueness of the credential carries identity information.

What to Actually Do

There's no single migration path from static-rotated to ephemeral credentials, because the feasibility depends on each credential type. A practical framework:

Cloud provider access keys (AWS IAM access keys, GCP service account keys): Migrate to short-lived STS credentials via OIDC federation (for CI/CD) or instance metadata (for EC2/Fargate/Lambda). These are static credentials that have a well-supported ephemeral alternative. The migration is typically a week of work per credential type.

Database passwords: The hardest category. Options: Vault dynamic secrets (requires Vault), AWS RDS IAM authentication (for RDS with the appropriate driver support), or per-pod database users generated at pod startup and revoked on shutdown. None of these are trivial, but they're all achievable for greenfield or sufficiently motivated teams.

Third-party API keys: Often the most intractable. Many SaaS APIs don't support short-lived token models. The practical answer here is a secrets manager with automatic rotation and a robust propagation mechanism — not ephemeral issuance. This is the category where rotation hygiene matters most because ephemeral issuance often isn't available.

The goal isn't to eliminate every rotation ceremony immediately. The goal is to understand which credentials in your system are structurally resistant to good rotation hygiene and to prioritize replacing those with ephemeral-issuance alternatives, one category at a time. That's the actual security improvement — not stricter enforcement of the 90-day interval on a system that's architected to resist it.