First of all, KMS are encryption services not only for virtual machines, but also for any other IT infrastructure objects. Therefore, KMS can already be in your organization, and it may not be attached to VMware vSphere. You can use any KMS server, but VMware certified only the following systems from the list:
If you have another KMS provider, do not worry, it will most likely work if it supports KMIP 1.1 key management protocol.
You need to start with the availability of KMS-server. As you understand, this is the most important server in your infrastructure, more important than the DNS or domain controller. If you do not have DNS, everything does not work too, as in the case of KMS, but if KMS storage data is not available, it is a disaster for the organization. And there is absolutely no way to restore them.
Thus, KMS becomes a business critical point not only of infrastructure, but of business in general. And the questions of backup and its availability are questions number one. Let’s look at several basic concepts of KMS:
- KMIP (Key Management Interoperability Protocol) is the standard by which the KMIP client can communicate with KMS servers. Each year, it undergoes a serious test at the RSA conference.
- DEK (Data Encryption Key). This is the key that the ESXi host generates when it needs to encrypt a virtual machine. This key is also used to decrypt VM data, it is written in VMX / VM Advanced settings in encrypted form.
- KEK (Key Encryption Key). This is the key that DEK encrypts. The KEK key ID is also in VMX / VM Advanced settings.
- Key Manager Cluster / Alias is a collection of FQDN/IP addresses for all KMS servers that are replicated to each other. It is also stored with the encrypted machine in VMX/VM Advanced settings so that after it is turned on, you can access the necessary KMS cluster, get a KEK for the vCenter server that will unlock the VM.
- VM and vSAN encryption – this, in fact, from the point of view of implementing the encryption mechanism is the same. That is, both types of encryption use the same libraries, configurations, and interfaces.
When we talk about a KMS cluster, these are just KMS servers that replicate key stores. They do not provide access to each other as for example VMware HA does. For example there are servers KMS-A, KMS-B and KMS-C, in the storage where adding a key to a new VM it appears instantly.
If KMS-A is not available as a server, the vCenter server requests the keys from the KMS-B host. If KMS-A is not available as a service (that is, the server is running and the KMS service is not responding), then vCenter waits 60 seconds to restore the service and only then switches to KMS-B. This behavior can not be changed.
The best practices for using KMS are:
- If possible, delegate responsibilities for supporting the KMS infrastructure to a separate team (Security Team) or a person.
- Do not put KMS servers in a virtual machine on the same cluster (for example vSAN) which is also encrypted. Otherwise the box will close (the cluster will be turned off), and the key from it will remain inside it. That is, it is necessary to keep at least one of the KMS servers outside of such a denial domain.
- The previous point applies to both vCenter and PSC servers. That they can decrypt other servers, they should be accessible in case of failure.
Now I have to say about disaster recovery of KMS servers. A typical cluster within one site looks like this:
But if there is an accident on this site which will affect all 3 servers the business may come to an end. Therefore, ideally, you should use a configuration with two sites:
It is clear that small and medium-sized businesses do not have reserve sites, as a rule. In this case, the public cloud Amazon AWS or Microsoft Azure can help. A lot of money is not needed to maintain one VM.
Well, if you use a multi-site configuration, you should definitely consider a procedure for accessing KMS servers within the site. If there are KMS-A, KMS-B and KMS-C servers on the first site, and KMS-D, KMS-E and KMS-F on the second, then the first order should be A, B, C, D, E , F, and in the second D, E, F, A, B, C.
If you use the same configuration on the second site, the vCenter server can go to the servers of another site, which may be unavailable, but only bypassing all its servers will start using local KMS.