Considering Secrets Management

Posted on July 16, 2016

We have seen an explosion of solutions arise in the secrets management space over the last 2 years. Each has its particular strengths and competition in any space is a healthy thing. Much like Docker before it, these solutions have a simple and straight forward utility that anyone can appreciate. However, on consideration, we can find a much more profound architectural paradigm hiding beneath the surface which is obvious to some users in situations that demand them, but much more difficult to grok for those in environments less in need. Toward this paradigm shift I’d like to expend some consideration.

What is Secrets Management?

Secrets Management, on the surface, is a way of centralizing sensitive information which is usually the means by which to access yet more sensitive data, such as user/pass to a database containing customer data. In the past these types of secrets were manually put on servers by an administrator during installation. When Configuration Management appeared, these secrets were placed in Chef, Puppet or CFEngine. Today, clouds like AWS have strict access policies (IAM in AWS) which can be inherited by resources (such as EC2 instances) to provide access to databases (like Dynamo) or S3 buckets which contain the secrets, which works well in cloud with this type of support but aren’t portable. A more portable solution has been to share the secrets in a database or web server of some type which is protected by SSL/TLS and/or some other type of key… but the inevitable problem is, how do you get the key on the system? The answer is to go back to the beginning of this cycle, an administrator puts it there during installation or puts it in Config Management…. and around we go.

So lets step way back and ask ourselves a question:

Why do we need secrets management?

There are several problems we face today which make this a more pressing problem than in the past. The first is that we run more “servers” (VMs, Instances, Containers, etc) now than we did in the past. Not only more, but the lifespans are shorter, in some cases they exist for only minutes. This means that manually placing secrets isn’t practical, and in some cases not even possible. That suggests that we should “bake” the secrets into our images or go with configuration management, assuming the latter is even possible (which for Docker containers isn’t).

The second is that we increasingly desire for our application, scripts, tools, and config management to be publicly accessible. In some cases we’re out-right open sourcing these things and making them available on GitHub for the world. In most cases we’re moving these assets around so much between CI solutions, developer laptops, in one or more clouds, etc, that our ability to reliably control them is untenable, which makes the only viable option to ensure these assets do not themselves possess these secrets.

The third is that we should be rotating these secrets from time-to-time, so that in the event of exposure (real or imagined) of a secret we limit our exposure and can quickly change them to something new. This has always been a problem, particular for SSL/TLS certificates which would have 3+ year expiration periods. With regards to passwords, it would take so long to change the passwords of all the clients to match the database that you’d be forced to take a maintenance window. The result is that you would just avoid changing passwords and rotating keys because of the anger it might cause in the rest of the organization. You’d only do it when you really had to, which was normally too late, and extremely difficult because you do it so rarely and then play the “what did we miss?” game.

The fourth is that secrets are usually known to humans. Fear of loosing access to your own resources is a bigger threat than those secrets falling into the wrong hands, so there are copies of them all over the place in insecure and un-audited places. The best way to gain access to an environment is to first gain access to an administrator workstation or laptop, they likely have all the information and keys you need and its likely less protected than the production environment.

The fifth is a follow on to the fourth, we need a way to audit access to our secrets. In many environments, when a senior administrator leaves the company you much change all passwords everywhere because the admin likely knew them. You don’t actually know, but because its possible, you should as best practice, simply change them all. That then runs us back into the third problem. We don’t know what secrets are where or with who, and we don’t have any reliable way to find out.

The sixth, generating passwords and keys is rarely standardized. Even if you do have a standard, you can’t generally ensure its being followed. Are all the keys 2048bits in strength? Are the passwords longer than 20 chars with lots of special chars? At best you create a policy, try your best to ensure compliance with the policy, and hope no one is lying or forgetful. But thats a lot of if’s.

…. and there are more, but we’ll stop there.

So the next question is to ask:

What should we expect of Secrets Management?

It needs to be accessible by both humans and machines. Ideally we store all our secrets in a small and controlled number of places to facilitate rotations and changes. It needs to allow the creation of keys and passwords according to a policy, so we always adhere to our standards. It needs to be auditable, so we can determine exposure both to humans and machines. It needs to work with our configuration management but not be part of it, both so that our config management and scripts don’t need to be tightly controlled but also so that we can manage secrets in places where config management is difficult, impossible, or inadvisable.

Most of all, we have to trust our secrets management more than anything else in our infrastructure. Lots of things in our infrastructure can break, but if our gateway to accessing our infrastructure breaks we’re just totally f….ed. Its got to be bullet proof.

A critical consideration for any type of secrets management will be considering which parts of your infrastructure should have access to it and to what degree. We always want to follow the Least-Access Principle, of limiting what anything can see via some policy. But even so, not all parts of infrastructure has the same risk profile. The risk associated with a web server which is receiving requests from external users is higher than that of a database server which is a layer deeper, but even that has a high risk because the web server is generally authorized to talk to the database, which makes it higher risk than an LDAP server which your web server doesn’t know about, but if your web servers OS is authorized to contact LDAP it is higher risk than a system which can’t be discovered by breaching the edge servers. And so there is a cascade of services. This is the principle behind relying on AWS’s security services, they occur (effectively) out-of-band, beneath the VM’s, and thus being significantly harder to exploit.

What is security?

It is said, “There is no such thing as security.” By which it is meant, “There is no such thing as impenetrable security”. Which is to say, “If an attacker really wants to get in, they will.” The point being that given unlimited time and resources and will power, any security barrier can be overcome, even if that means paying off an employee or socially engineering a support staff member.

This is a super important idea to grasp for understanding security! The reciprocal principle derived from the above reasoning is that “security” is a series of barriers which require greater time, resources, and will power to overcome than are reasonably available to a potential attacker. Consider your house! You buy a nice bolt lock for the door, but this assumes the attacker is trying to be covert and won’t simply smash in a window. You may put a fence up, even one with barbed wire, but this assumes that someone won’t just use bolt-cutters to cut a hole in it. And regardless, we put these barriers in place to slow down and deter the casual attacker who is unwilling to go to these extremes. If an attacker really wants to rob you they’ll cut down the fence, break through a window, hold you at gun point and take the loot, but very few attackers are so bold. On the internet, where anonymity is easier to procure, the boldness is much higher by default, but the principles remain the same. At the end of the day, the best home defense is an alarm which will cause an armed response (by an owner or the police) to directly confront the situation, but we known even then resources, time, and will power are still a factor…. but I won’t be that grim.

Lets bring it back to Secrets Management specifically. The point is that there is no completely fool proof method of providing automated access to secrets, there will always be a “What if…?” scenario that you can’t protect against. Therefore we much do a risk analysis of our infrastructure to determine what series of barriers to present the attacker with can dissuade them because of a lack of time, resources, and will power. A firewall reduces attack surface limiting the resources available for exploit. A scary “UNAUTHORIZED ACCESS WILL BE PROSECUTED” doesn’t stop anything but it does reduce the willingness of an attacker to proceed. A multi-layered key system increases the time required for exploit, which may not stop an attack but it can slow it down, increasing the probability of detection to bring humans to combat the threat.

This is why I hate it when people say, “There is no such thing as security.” That statement is unfruitful. Rather lets focus on composing a variety of protections together to form a strong barrier, the most important of which is detection, so prevent the majority of attackers and detect the others in the shortest amount of time possible and have prepared procedures for response (sometimes called “break-glass” procedures) to those threats.

When it comes to accessing secrets from a machine, we have a unique problem:

What about the chicken and egg problem for unattended access?

This is the problem that has kept most of us, I think, from really adopting secrets management in the past. Sure, you can centralize your secrets on a web server or in a database, but you still have to give some secret to your machines so they can access it. And this is the worst part, the secret your now spreading around your infrastructure is the most valuable secret you have because its the key that unlocks all the other keys.

Many solutions available don’t address this problem. You have to distribute a set of TLS certs and keys to all your servers or a symmetric key… but how does that key get to the server? You end up going back through the list of problems that started us down this journey in the first place! The only advantage then is that you distribute one secret instead of many. This is the secrets management regression loop of despair. The best answers then regresses to using cloud specific tools like AWS IAM, which means your non-cloud applications are just out of luck. Round and round we go.

But now there is a solution that is finally exciting:

How Hashicorp’s Vault changes the game!

Of our problem set, several solutions are available for centrally storing and auditing access to secrets. In fact, most databases can do that. We could even construct access policies in the database to limit access to secrets. Thats not terribly exciting or new. But what about the rest of our problems?

First, Vault immediately beats any database by not only storing secrets but actually producing them. This solves several of our problems, such as ensuring no human ever see’s them or has reason to keep a local copy. It also means that rotating secrets is easier because we don’t need to rely on a variety of generators, we have just one place to create, store and share the secrets.

Secondly, Vault offers a uniquely large (and extensible) variety of methods of accessing secrets, both by humans and machines. Humans can use systems they are already familiar with, such as LDAP, new methods such as GitHub access tokens, solution specific solutions like tokens or Vault specific username and password, and in a recent update we even have Multi-Factor Authentication (MFA) via a Duo integration.

But most important of all, it provides some new and useful machine access capabilities that are cloud agnostic. Tokens and TLS can be used but run us back into that chicken and the egg problem. Using a variety of different authentication types, we can compose solutions which are finally viable. In my next blog we’ll consider several of them in turn to hopefully find one or more viable options for your deployments.