Compliance

Before Your Next SOC 2 Audit: Non-Human Identity Checklist

Raj Patel · Feb 27, 2025 · 12 min read

SOC 2 auditors ask four questions about every credential in your environment: What is it? Who created it? When does it expire? What does it access? Human identities are manageable — you have an IdP, user provisioning flows, offboarding checklists. But service credentials? Most teams can't answer all four questions without a multi-day investigation.

This is the non-human identity gap. It's not a gap in intent — most security teams know they should have this visibility. It's a gap in tooling and process. Service credentials accumulate over time, created by engineers who've since left, embedded in deployment scripts from 2021, rotated manually when someone remembers. By the time an auditor asks you to demonstrate control, you're reconstructing history from git blame and Slack archives.

What follows is a structured checklist we've developed through building Aembit. It's not about achieving perfection before your audit — it's about knowing your actual posture so you can answer honestly and show a credible remediation path.

The Four Audit Questions, Applied to Service Credentials

1. Inventory: What exists?

Start with an enumeration pass. You're looking for every credential that a non-human workload uses to authenticate to another system. This includes:

AWS IAM access keys (long-lived, not role-based)
Database connection strings with embedded passwords
API keys stored in environment variables, secret managers, or config files
Service account keys for GCP, including downloaded JSON key files
Kubernetes service account tokens projected to pods
OAuth client secrets used by backend services
SSH keys used for service-to-service connections
mTLS client certificates issued to specific services
Webhook signing secrets
SMTP credentials for notification services

The enumeration itself is non-trivial. You'll need to pull from: your secrets manager (Vault, AWS Secrets Manager, GCP Secret Manager), environment variable configurations in your deployment system (Kubernetes Secrets, ECS task definitions, Helm values files), CI/CD secrets (GitHub Actions secrets, GitLab CI variables), your IAM console filtered to service accounts and non-federated access keys, and any hardcoded credentials you can surface through static analysis.

Static analysis tools like trufflehog or gitleaks help with the git history scan. For live infrastructure, you're combining aws iam list-access-keys across all users, gcloud iam service-accounts list with key enumeration, and equivalent queries per cloud provider.

aws iam generate-credential-report
aws iam get-credential-report --query 'Content' --output text \
  | base64 -d \
  | awk -F',' 'NR>1 && $9=="true" {print $1, $10, $11}'

That last command pulls IAM users with active access keys and their last-used dates. The output is often alarming.

2. Provenance: Who created it?

For each credential identified, you want a creation audit trail: who created it, through what mechanism, and when. AWS CloudTrail, GCP Audit Logs, and GitHub audit logs all retain this information — but only if the credential was created recently enough and if you've been collecting logs that far back.

For credentials created before your log retention window, you're often relying on git blame in your infrastructure-as-code repositories. This is imperfect. The person who committed the Terraform that created the IAM user may not be the person who decided it needed admin permissions.

The honest answer here for many teams is: "We know what exists; we don't have complete provenance for credentials older than 18 months." That's a legitimate SOC 2 finding with a straightforward remediation path — enforce IaC for all new credential creation so provenance is always in version control going forward.

3. Expiry: When do they rotate?

This is where the distinction between secret-bearing and secretless architectures becomes concrete. Long-lived credentials don't expire — they rotate on a schedule you define and enforce, or they don't rotate at all. Short-lived tokens (OIDC, STS AssumeRole, SPIFFE SVIDs) expire in minutes to hours and never need a rotation policy because they're ephemeral by design.

For your audit inventory, categorize credentials by type:

Static credentials with rotation policy: Document the rotation interval, mechanism (manual, automated), and last rotation date. If rotation is manual, document who performs it and how compliance is tracked.
Static credentials with no rotation policy: These need immediate attention. Flag age. AWS IAM access keys older than 90 days with active usage are a common finding.
Short-lived tokens: Document the TTL, issuer, and renewal mechanism. No rotation policy needed — expiry is the control.

A credential rotation matrix is a useful artifact to produce for auditors. Rows are credential types; columns are rotation interval, last rotation, enforcement mechanism (manual/automated), and owner.

4. Scope: What do they access?

Every credential should have a documented scope: the minimum set of permissions required for the workload to function. In practice, credentials tend to accumulate permissions over time as engineers add capabilities without removing old ones.

AWS IAM Access Analyzer and the equivalent in GCP (IAM Recommender) can identify over-privileged service accounts by comparing granted permissions against observed usage. These tools are imperfect — they can't observe permissions that were granted but never exercised during the measurement window — but they surface obvious cases.

For your audit, you want to document: what each service account is authorized to do, what it actually does (from access logs where available), and any gap between those two. The gap is your over-provisioning surface.

Producing the Audit Artifact

Auditors want a credential inventory document. The exact format varies by auditor, but the minimum viable content is:

Credential identifier (ARN, key ID, or internal name)
Credential type (IAM access key, service account key, database password, API key)
Owner team
Creation date (or "unknown, pre-log-retention")
Last rotation date
Rotation interval / policy
Accessing system (which workload uses this credential)
Accessed system (what the credential authenticates to)
Permission scope summary
Remediation status if flagged

Producing this document manually for a first audit is acceptable. Maintaining it manually for subsequent audits is not — it will drift from reality immediately. The audit artifact needs to be generated from live system state, not maintained as a separate spreadsheet.

What Aembit Tracks Continuously

When workloads authenticate through Aembit, the platform maintains this inventory automatically. Every workload identity — defined as a SPIFFE-based selector matching a specific Kubernetes service account, ECS task definition, or compute identity — has a corresponding record of which client workloads hold it, which server workloads it accesses, and what policy constraints govern that access.

Because Aembit uses ephemeral OIDC tokens rather than stored credentials, the "expiry" column in your inventory becomes trivially answerable: all credentials expire in 15 minutes. There's no rotation management surface to audit because there are no long-lived credentials to rotate.

We're not suggesting that deploying Aembit is a prerequisite for passing a SOC 2 audit — you can pass with well-documented manual processes. The point is that the audit questions themselves reveal what good credential management looks like: bounded scope, short or zero lifetime, automatic provenance, and continuous inventory. Those properties are architecturally easier to achieve when the credential layer is built around token exchange rather than secret storage.

Legacy Services and the Honest Exceptions List

Not everything can be moved to token-based authentication quickly. Legacy services with JDBC connection strings, third-party APIs that only accept API keys, on-premises systems with static credentials — these exist in real infrastructure and auditors know it.

The correct approach for these is an exceptions list: explicitly documented credential types that can't yet be migrated, with the reason, the compensating controls in place (network segmentation, vault-based rotation, break-glass access procedures), and a target migration date if one exists.

An exceptions list is not a weakness in your SOC 2 response. It demonstrates that you understand your environment and have made deliberate tradeoffs rather than accidentally overlooking the exceptions. Auditors respond better to "here's what we can't change yet and why, here are the controls around it" than to a partial inventory that doesn't acknowledge the hard cases.

Remediations That Show Up Well in Audits

If you're doing your first comprehensive credential inventory and finding gaps, here are the remediations that have the most audit impact relative to implementation effort:

Enforce IaC for all credential creation. Any IAM role, service account, or API key created outside of Terraform/Pulumi/CloudFormation should be flagged for deletion or import. This gives you provenance for everything created from the enforcement date forward.

Enable rotation enforcement for secrets in your secrets manager. AWS Secrets Manager and HashiCorp Vault can enforce rotation intervals. Enable this for every secret type that supports it, even if the interval is 180 days — it's better than no enforcement.

Purge unused credentials. Run the IAM credential report and identify access keys not used in the last 90 days. Disable them with a 2-week notice period; if nothing breaks, delete them. This reduces your audit surface meaningfully.

Document the credential-to-workload mapping. Even a spreadsheet that says "this database password is used by the reporting service, which runs as ECS task X" is more auditable than undocumented credentials. Start with your most sensitive systems.

Migrate CI/CD to OIDC federation. GitHub Actions, GitLab CI, and CircleCI all support OIDC-based federation with AWS and GCP. Replacing IAM access keys in your CI pipelines with short-lived federated tokens eliminates one of the highest-risk static credential categories in most environments.

Timing and Sequencing

If your audit is 60 days away, you can realistically complete an inventory pass, implement one or two high-visibility remediations, and document your exceptions list. That's enough to demonstrate control over the credential lifecycle even if the environment isn't perfect.

What you cannot do in 60 days is migrate all credentials to ephemeral tokens, implement comprehensive access analysis, and build automated continuous inventory. Plan that work for the next audit cycle with a clear roadmap showing directional progress.

The auditors who matter are evaluating whether you understand your environment and are moving in the right direction, not whether you've achieved a perfect credential-free architecture. Show the inventory, show the remediation work in flight, show the exceptions with compensating controls, and be direct about where the gaps are and why.