Secrets Management Deep Pattern
Beyond `vault-pattern.md` — operational specifics: dynamic secrets, short-lived credentials, secret-zero, secret-less architectures, OIDC federation.
Secrets Management Deep Pattern
Beyond vault-pattern.md — operational specifics: dynamic secrets, short-lived credentials, secret-zero, secret-less architectures, OIDC federation.
TL;DR (human)
Long-lived secrets are the highest-value target. Modern best practice: short-lived dynamic secrets, OIDC federation, secret-less where possible (workload identity). The vault still exists but issues credentials valid for minutes, not years.
For agents
Secret lifecycle taxonomy
| Class | Lifetime | Issuance |
|---|---|---|
| Static long-lived | Months-years (API keys, DB passwords) | Manual; rotated rarely |
| Static short-lived | Days (refresh tokens) | Manual; auto-refresh |
| Dynamic short-lived | Minutes-hours (just-in-time DB creds) | On request, per session |
| Workload identity (secret-less) | Per-request | Federation; never stored |
Move toward dynamic + workload identity. Static long-lived are the leak target.
Dynamic secrets (Vault example)
Instead of:
# .env (committed)
DATABASE_URL=postgresql://app:long_lived_password@host:5432/dbDo:
// at startup
const creds = await vault.read("database/creds/app-readonly");
// creds.username and creds.password are issued for 1 hour.
// New connection uses them; renew before expiry.Vault issues a new DB user on-the-fly; revokes on expiry. Compromise window: the lifetime, not "forever".
Workload identity (the secretless future)
Modern cloud: a workload's identity is its IAM role. Example (AWS):
EC2 instance → role "app-prod" → policy allows access to S3 / RDS / Secrets ManagerThe workload calls AWS APIs; AWS verifies the role via instance metadata; no secret in code.
For Kubernetes: IRSA (IAM Roles for Service Accounts) — each pod has its own role.
For CI/CD: OIDC federation:
GitHub Actions workflow → assumes AWS role (verified via OIDC token) → temporary credsNo long-lived secret in CI. The OIDC trust path is the only thing required.
This is the single highest-leverage security move modern teams make. Adopt aggressively.
Cross-cloud federation
- AWS ↔ GCP: workload identity federation (no long-lived service-account keys).
- AWS ↔ Azure: similar.
- Cloud ↔ on-prem: harder; usually requires a bridge vault.
Secret zero — the bootstrap problem
To read from the vault, you need credentials. To get credentials, you need to read from the vault. Bootstrap.
Solutions:
- Cloud workload identity: trust the cloud's identity for the first credential.
- Hardware token: physical key for the first credential (HSM, TPM).
- Operator-injected: human pastes initial seed on first boot; rotated immediately.
- Sealed initial secret: deploy with sealed credentials only the trusted runtime can unseal.
Secret zero is the hardest. Get the rest of your hygiene right first.
Secret types + storage
| Type | Storage |
|---|---|
| API keys (third-party) | Vault; rotate quarterly |
| DB passwords | Vault; preferably dynamic |
| Encryption keys (DEK) | Vault; wrapped by KEK in KMS |
| Encryption keys (KEK) | KMS / HSM; never extractable |
| TLS certs | Cert manager + ACME (Let's Encrypt) or internal CA |
| OAuth refresh tokens | Vault; per-user; per-connector |
| JWT signing keys | Vault; rotated on schedule; old keys retained for verification |
| Webhook secrets | Vault; per-integration |
Per-environment isolation
- Dev vault separate from prod vault.
- Dev workloads cannot reach prod vault.
- Different sealer keys per environment.
- Different ACLs; no cross-env reads.
Common mistake: shared vault with namespace separation. One ACL bug = cross-env leak.
Auditing access
Every vault read / write logs:
- Caller (principal id).
- Secret path.
- Timestamp.
- Source (IP, host).
- Outcome (granted / denied).
Per audit-ledger-pattern.md: the audit log itself goes into the same signed ledger.
Anomaly detection: a service that reads secret X 1×/hour suddenly reads 100×/hour = either expected pattern change or compromise. Surface; investigate.
Operator access
Humans accessing prod secrets:
- Step-up auth (2FA / hardware key).
- Time-boxed grant (per
rbac-pattern.mdbreak-glass). - Audit-logged with reason.
- Notification to security team.
- Auto-revoke after window.
Operators reading prod secrets should be an exception, not routine. If routine, you have automation gaps.
Secret in environment variables
Common but problematic:
- Visible in process listings (
ps,/proc). - Inherited by child processes.
- Often leaked into logs / error dumps.
Mitigations:
- Read once into memory; clear env var.
- Memory-only; don't write to disk.
- Logger redactor knows env var keys.
Better: don't put secrets in env at all. Read from vault at startup.
Secrets at build vs runtime
| Stage | Should contain secrets? |
|---|---|
| Source code | Never |
| Lock files | No |
| Build artifacts (image, bundle) | No — secrets are runtime concerns |
| Container env vars (at run) | OK if vault-injected, never baked in |
| Runtime memory | Yes, transiently |
| Logs | Never (redact) |
A container image with baked-in secrets gets pulled by N developers, lands in image registries, leaks.
Sealed secrets (for GitOps)
When secrets live in git (rare; usually avoided):
- SOPS + KMS (Mozilla): encrypt before commit; decrypt at deploy.
- Sealed-Secrets (Bitnami): asymmetric encryption; controller decrypts in-cluster.
- AWS Secrets Manager + external-secrets operator: secrets in cloud; manifest references them.
For most cases, secrets don't live in git. The above are for unavoidable GitOps integration.
Common failure modes
- Long-lived secrets in CI. Single most common leak. → OIDC federation.
- One secret used across environments. Dev compromise = prod compromise. → Per-env.
- Secret rotated but consumers not updated. Outage. → Tooling that pushes to all consumers atomically.
- Vault unreachable = total outage. App can't read any secret. → Local cache with short TTL; circuit-breaker semantics.
- HSM not used for KEK. Sealer key on disk. → Always external-keystore for KEK.
- No anomaly detection on vault. Compromise goes undetected. → Per-principal read-rate monitoring.
See also
vault-pattern.md— vault basics.secrets-leak-postmortem-playbook.md— when a leak happens.audit-ledger-pattern.md— access audit trail.../quality/ci-cd-pipeline-pattern.md— OIDC federation in CI.rbac-pattern.md— break-glass for operator access.