Sometimes secrets are nothing more than just credentials that are used for authenticating client applications & users to provide them access to sensitive systems, services, and information. How would you operate these secrets effectively and at scale such that they remain secrets?
Because secrets have to be distributed securely, Secrets Management solutions must account for and mitigate the risks to these secrets, in-transit, at-rest and in-use.
Do not hold your breath, secrets management vendors will not handle all that for you and that is simply because they cannot. Don’t get me wrong, these vendors do provide you with the foundation to manage secrets, but there is still responsibility on your end to keep your secrets management solution secure.

Tip #1: For each service your workload uses, be well aware of the shared responsibility model defined by the provider and make sure you understand where the responsibility of the vendor ends and yours begins.

Generally speaking, cryptographic private and symmetric keys are managed separately via Key Management System (KMS) and Hardware Security Module (HSM) technologies and this topic by itself deserves its own post. These technologies have a lot in common and they usually complement one another as illustrated in Figure-1. Secrets management is a higher level concept, don’t confuse it with KMS. Secrets management technologies include built in KMS capabilities to support the cryptographic operations require to manage secrets.

Figure-1 – Secrets Management, KMS and HSM

The Darkness Before The Light

Just a few years back enterprises suffered from massive proliferation of passwords, passphrases, private/symmetric cryptographic keys, API keys – scattered all over the place. The business had to make sure these secrets are stored, distributed and rotated in a secure manner and with minimal/no impact on production environments. The ability to keep humans away from secrets was limited. Operating these processes was not a picnic either, which led to certain shortcuts like infrequent secrets’ rotations, ‘standard’ well-known passwords, etc. To make the chaos less chaotic, wherever possible, we implemented single-sign-on (LDAP, Kerberos, SAML federation, etc.), which helped reducing the volume of secrets we had to manage. But we still had more than handful of secrets to handle. So we encrypted them using keys, which by themselves are just more secrets to protect, and sometimes those keys were managed via HSM, which was not always used for the right reasons. If that’s not bad enough, there was no standard, unified approach for operating these technologies and automation was a luxury we rarely could afford.. As you can imagine, this approach did not scale very well and it was quite a nightmare for both operations and security personnel.

What Good Looks Like?

The ideal secrets management solution allows both, humans and machines, to securely access and use secrets. Furthermore, it limits the attack surface by making sure our secrets are either automatically rotated or automatically provisioned & revoked in relatively short time intervals (emulating temporary credentials). Basically, the entire lifecycle of our secrets is fully/mostly automated with no/minimal human intervention. Keeping humans away reduces the risk of secrets being leaked eventually.

Depending on your business, you may need your customers to authorize your system so it can access their SaaS accounts on their behalf. Sounds familiar? For example, it could be an application that organize users’ photos in Google Photos, or maybe an application, which read out loud users’ new emails in their Gmail accounts, etc. For that to work, your system stores and uses secrets allowing your system to access customers’ SaaS accounts on their behalf. Your customers trust you to keep their precious secrets in your hands, do not disappoint them, keep these secrets safe, make them important to you as they are for them. To manage customers’ secrets properly we would want to maintain tenant isolation and prevent cross-tenant access. We would keep these secrets away even from our system administrators. Also, as opposed to systems’ secrets, customers’ secrets would often need to be supported at a much higher scale. Moreover, despite our desire to always automate secret rotation with no human intervention, occasionally, this is simply not up to us, as some service providers do not support it. In these cases we monitor the manual secret rotation process and alert our security personnel if for any given secret the rotation policy is not met.
From the moment a secret is created up to the point it is being deactivated, it must be secured all the way.

Tip #2: If you have a viable option to use temporary credentials to access a resource/service (e.g., AWS IAM Temporary Security Credentials), seriously consider giving it precedence over alternatives that involve the use of secrets (e.g., on AWS there are multiple alternatives you can use other than SSH keys).

Let’s go through some use cases that illustrate these three main phases of storing a secret, distributing it to destination(s), and rotating it, if possible, automatically.

Use Case #1: Incident-Response Playbook

Let’s assume your system makes use of a secrets management solution and you are the SecOps of the company. At some point in time your security monitoring system generates an alert indicating that the master credentials (system secret) of a very sensitive database has been leaked. Even though the database resides in the company private network, the company policy guides you to follow an incident-response playbook that address this exact incident. As you can imagine, this sequence of steps is also quite useful to remediate automatic secret rotation failure. This process is illustrated in Figure-2:

SecOps: sign-in to the cloud account
SecOps: for the compromised database credentials, trigger a secret rotation sequence via the Secrets Manager
Secrets Manager: generates a new random secret, rotates the database credentials either directly or via the identity-provider it is configured to use (e.g., Microsoft Active Directory), tests the new credentials and then stores them in the Secrets Manager.
Secrets Manager: notifies the application that the database credentials changed
Applications: retrieves the new database credentials
Applications: re-initializes its database connections and starts using them

CloudOps, who may apply certain database schema fixes, goes through a flow very similar to ‘Applications’ flow in order to gain database access.

Tip #3: while secure access to secrets reduces risk, the preferred approach is always to automate your way out of needing human access in the first place.

Figure-2 – Incident-Response Playbook – Compromised (Secret) Credentials

Tip #4: Leverage the alternating users rotation strategy and keep credentials for two users in one secret in order to support high-availability.

Tip #5: Your secret management solution is incomplete if it is not connected to a SIEM or alike to monitor your secret management and alert on any access anomalies, non-compliance secrets (e.g., secrets which failed to be rotated) and other threats. Since secrets are often credentials of other service providers, make sure these services are regularly audited and connected to your SIEM as well.

Use Case #2: Manually Operating Secret Rotation

Although a great deal of services already support API based credentials rotation, occasionally, we come across services which do not (e.g., at the time of writing, AWS SSO Automatic Provisioning & its access-tokens is one such example). Even though automatic rotation cannot be supported in these cases, as described in Figure-3, we can still detect those secrets that must be rotated just before they fail to meet security compliance policies:

Secrets Manager: using its internal scheduler, generates an event notifying that a given secret is reaching its time for rotation.
Secrets Manager: processes the event by sending an alert to SecOps
SecOps: sign-in to the cloud account
SecOps: follows the manual sequence defined by the service provider to rotate the credentials.
SecOps: stores the credentials in the Secrets Manager.
Secrets Manager: notifies the application that the service credentials changed
Applications: retrieves the new service credentials
Applications: reestablishes its service connections and starts using it

Figure-3 – Monitoring & Manually Operating Secret Rotation (if you must…)

Tip #6: For improved performance and reliability, especially in highly-distributed systems, consider reducing the coupling with the secrets manager by distributing and caching secrets for short time intervals in a secure, ephemeral, local store, which allows applications to process secrets most efficiently.

Tip #7: Secrets are best protected within the secrets manager, if a secret must be highly protected while it is being used, distributing it to a non-compliance store might not be an option. This is where technologies such as AWS Nitro Enclave can really shine. They supports isolated execution environment, which allows you to protect sensitive data when it is in use even in untrusted environments.

Use Case #3: Authorize Access To 3rd Party Service Account

Figure-4 illustrates the use-case in which a customer of yours follows the steps to authorize your application to access their 3rd party SaaS account (e.g., Salesforce CRM) on their behalf:

Customer: sign-in to your system.
Application: redirects the user to the IdP of the customer’s 3rd party SaaS provider requesting the customer to authorize your system.
Customer: sign-in to her SaaS account and submits her consent to authorize your system.
Customer: stores a secret that grants our application permissions to access the customer’s 3rd party SaaS account.
Secrets Manager: notifies the application that the secret of that customer changed.
Applications: retrieves the new customer’s secret
Applications: connects to the customer’s 3rd party SaaS account and starts using it.

Secrets management that can scale in proportion with your number of customers, plays an important role here.

Tip #8: Every service has its limits (e.g., AWS Secrets Manager quotas), make sure your secrets management solution complies with your requirements (e.g., latency, number of secrets, API call rate limits, secret's size, etc.)

Figure-4 – Customer Authorizes Access To Her 3rd Party Service Account

Tip #9: to prevent cross-tenant access make sure your SaaS architecture enforces tenant-aware authorization policies.

Tip #10: adhere to the principle of least privilege and enforce separation of duties with appropriate authorization for each interaction with your secret management solution. A secret management solution that is integrated with a strong identity foundation is a key prerequisite to enable that.

Use Case #4: Automated Secret Rotation

Automating secret rotation significantly reduces credentials leakage risk just by eliminating the need to run such sensitive security operations manually.

Tip #11: be biased towards automating secrets rotation, it is a key enabler for scaling out your security operations around secrets management.

Figure-5 illustrates the use-case in which a service provider is initialized with a secret just once during its deployment and then the automatic secret rotation kicks off immediately:

CI/CD: a deployment tool configures the service-provider to use credentials it randomly generates on the fly. The deployment tool then use the same credentials to initialize a secret’s value in the Secrets Manager.
Secrets Manager: if the secret has just been initialized, automated secret rotation is triggered almost immediately, otherwise, using secrets manager internal scheduler, automated secret rotation is triggered in fixed time intervals. In both cases, the secrets manager generates a new secret’s value.
Secrets Manager: rotates the database credentials and tests them.
Secrets Manager: notifies the application that the database credentials changed
Applications: retrieves the updated credentials of the service provider.
Applications: starts using the updated credentials to access the service provider.

Tip #12: the more frequent the secret rotation is, the more difficult it is for a potential intruder to gain unauthorized access to it. 

Tip #13: You should keep critical manual procedures available for use when automated procedures fail - monitor your automated secret rotation process and trigger incident-response procedure whenever the automatic process falls short.

Enough With The Mumbo Jumbo

There are many ways to put our theory into practice. Let’s walk through one implementation example that combines both operational excellence and of course, security.

In our imaginary business, we run a very successful online cookies store. The extremely naive functional view of our eCommerce system and the flow for buying cookies is illustrated in Figure-6.

Figure 6 – arealcookie.com online store – functional view

The payment microservice redirect the user to pay via Paypal
The payment microservice receives a callback from Paypal confirming the payment
The payment microservice publishes an event confirming the order request
The Messenger and Order microservices consume the published event in parallel. The Messenger microservice sends an email to the user confirming the order. The Order microservice is taking care of fulfilling the order request

Our application workload runs on AWS, in its managed Kubernetes cluster, AWS EKS. The messaging system the application uses is a managed version of RabbitMQ, AWS AmazonMQ. The secrets management solution we use allows our three microservices to securely access their unique credentials so they can authenticate and gain access to AWS AmazonMQ. Moreover, the secrets management solution takes care of automatically and securely rotating these credentials of AWS AmazonMQ, which makes our CISO extremely delighted that no human has to execute this delicate runbook manually and on a routine basis.
Table-1 provides the reasoning for the technology choices we made.

Table-1 – Secrets Management Solution – Technology Stack

	Technology	Reasoning
	AWS Secrets Manager	Features: extensible secret rotation, event-driven triggers, tagging, versioning, structured & binary secrets, fine-grained permissions, auditing, etc. Interfaces: UI, CLI and API user-friendly interfaces Best of Suite: SaaS that is pre-integrated with AWS services (e.g., CloudWatch, EventBridge, CloudTrail, Config, IAM, KMS, etc.) High-Availability: 99.9% (including cross-region replication support) Audit & Security Monitoring: via integration with CloudWatch, CloudTrail, Config. Security Hub, etc. Compliance: HIPAA, PCI, ISO, etc.
	AWS KMS	Features: master key rotation, event-driven triggers, tagging, versioning, fine-grained permissions, auditing, symmetric and asymmetric keys,high-standards for cryptography, natively support envelope-encryption, protect secrets in transit and at rest, etc. Interfaces: UI, CLI and API user-friendly interfaces Best of Suite: SaaS that is pre-integrated with the majority of AWS services High-Availability: 99.999% (including cross-region replication support) Durability: 99.999999999% Scalability: automatically scale to meet the demand (see AWS KMS quotas) Audit & Security Monitoring: via integration with CloudWatch, CloudTrail, Config. Security Hub, etc. Compliance: ISO, PCI-DSS, SOC, etc.
	Kubernetes Secrets	Features: makes secrets easily and natively accessible by authorized service accounts assigned to Kubernetes Pods. The secrets are kept in the etcd encrypted at rest by AWS EKS KMS plugin plugin.
	Kubernetes External Secrets	Features: allows using external secret management systems (as the source of truth) to securely add secrets in Kubernetes. Interfaces: It extends the Kubernetes API by adding an `ExternalSecrets` object using Custom Resource Definition and a controller to implement the behavior of the object itself. The conversion from `ExternalSecrets` is completely transparent to Pods that can access Kubernetes Secrets normally. Integration: support native integration with cloud providers’ identities, service accounts and IAM Security: supports fine-grained access permissions Multi-Cloud: supports AWS System Manager, Akeyless, Hashicorp Vault, Azure Key Vault, Google Secret Manager and Alibaba Cloud KMS Secret Manager Possible Future Alternative: AWS Secrets and Configuration Provider, ASCP (and implementations for other cloud providers: GCP, Azure) for the Kubernetes Secrets Store CSI Driver. This is still alpha but it is definitely something to consider once it becomes production ready.

Modern secrets management technologies (e.g., Akeyless, Hashicorp Vault) are equipped with much more than just secrets management and may combine several disciplines e.g., secrets, PAM (allows authorized clients to get temporary credentials to target systems that are supported by the PAM function), KMS, PKI, etc. At the time of writing, AWS Secrets Manager still does not support the common functionality that allow an authorized identity to get a unique, temporary credentials, on demand, to access various 3rd party services. For PKI and KMS AWS offers complementary managed services (AWS KMS and AWS Certificate Manager Private Certificate Authority).
If you choose to go with a managed secrets management technology (SaaS), keep in mind that even though it greatly simplifies security operations, you are still fully responsible for certain things and you must be aware of your vendor’s Shared Responsibility Model.
If you choose to go with a self-managed secrets management technology, you have much more work to do to securely operate it in an effective manner.
In our examples, we embraced the best-of-suite strategy and chose to use AWS Secrets Manager, which is SaaS and pre-integrated secrets management technology with all the important services of AWS.
Going with the cloud provider native secrets management may save you all kind of wiring and integrations you would otherwise implement yourself if you were to take a different route. In addition, the implementation and operations are often quite consistent within the same cloud provider, e.g., on AWS: resource-based policies, IAM policies, KMS CMK, IaC, CLI, API, AWS Config, AWS Security Hub, CloudTrail, AWS EventBridge, etc.

Tip #14: On AWS, prefer a multi-account strategy to isolate workloads by systems and also by SDLC environment (e.g., sandbox, development, staging, production, etc.). An independent and isolated secrets management instance shall be used by each of these accounts preferably sharing no secret with the other accounts.

Now let’s see how it all fits together, Figure-7 illustrates this architecture in some greater level of details.

Figure-7 – arealcookie.com online store – technical view

Using an internal scheduler, the AWS Secrets Manager triggers invocation to the Secret Rotation Lambda function
The Secret Rotation Lambda function computes a new password and makes an API call to AWS Secrets Manager to save a new version of the secret in a Pending stage.
The Secret Rotation Lambda function calls Amazon MQ API to set a new password for the application user.
The Secret Rotation Lambda tests the new password by creating a new RabbitMQ connection to AWS AmazonMQ broker.
Secret Rotation Lambda function finishes the rotation flow by promoting the stage of the secret to be Current.
The Secret Rotation Lambda function asynchronously invokes the Secret2EKS Lambda function to notify the EKS cluster about the updated secret.
1. [6-error] On Secret Rotation Lambda function error, a message with details about the failed event is published to AWS SNS topic for a Dead-Letter-Queue.
2. [7-error] AWS SQS queue that is subscribed to the AWS SNS DLQ topic, queues the message keeping it for further error handling processing.
3. [8-error] SecOps gets notified about the error and she executes a playbook to investigate and take actions to remediate the problem.
The Secret2EKS Lambda function applies corresponding ExternalSecret objects to the AWS EKS Cluster API.
Kubernetes External Secrets makes a call to AWS Secrets Manager GetSecretValue API to retrieve the secret corresponding to the ExternalSecret object.
Using the secret retrieved from AWS Secrets Manager, Kubernetes External Secrets applies Kubernetes Secret object to the AWS EKS Cluster API corresponding to the ExternalSecret object.
Using AWS EKS Cluster API, our application running in AWS EKS Cluster gets to use the Kubernetes Secret object.

In this scenario we would follow tip #4 and implement user toggling to ensure stability during secret rotation.
Our application must do one of two things: either polls the secret from the volume to detect updates (e.g., reload feature of Spring Cloud Kubernetes) and re-initialize RabbitMQ connections or alternatively, it lazily re-initialize RabbitMQ connections once a connection is failing due to invalid credentials.

Tip #15: Once you know what you need to protect, you can begin developing secrets management strategies. However, before you spend a dollar of your budget or an hour of your time implementing a secrets management solution to reduce risk, be sure to consider which risk you are addressing, how high its priority is, and whether you are approaching it in the most cost-effective way.

Wrapping Up

Making decisions around secrets management technology is never easy. It requires trading off one item against another, cost, reliability, operational excellence and of course, security. But in this post we preferred, more than anything, to focus on how to distribute secrets safely to applications and also on how to support short secret rotation intervals. This is because no matter how secure your secrets management technology is, once secrets are leaving its secure boundaries there is always that risk they will be compromised.

I hope you find the use-cases and the tips presented here – valuable.
I tried to focus on those secrets management areas you should deeply care about even if you are already paying for the best secrets management technology out there.