The Mystery of the KBS Identity

One simple question has confounded countless developers working on Confidential Containers; how do we know we are connecting to the correct KBS? For context, KBS is short for Key Broker Service, which is the trusted entity that conditionally grants access to client secrets. The term relying party could be used to describe the KBS. Inside the guest, there is a Key Broker Client (KBC) built into the Attestation Agent (AA). The KBC talks to the KBS to get container decryption keys among other things.

The connection between the KBC and the KBS is secured with public key cryptography. The KBC generates a random keypair and sends the public key to the KBS when requesting confidential resources. Since the KBC has the lifespan of one VM, it makes sense for it to have an ephemeral keypair. The hash of the public key is included in the hardware evidence, which is also sent to the KBS. With this evidence, the KBS (with the help of an Attestation Service) can verify that the public key it receives from the KBC was generated inside a real TEE with a certain initial TCB. This is precisely what the KBS needs to validate before releasing client secrets to the KBC.

While it's no problem to send the public key of the KBC to the KBS along with the hardware evidence, it's not as clear how we should get the public key of the KBS to the KBC. Some KBCs sidestep this question by setting up the secure channel via RSA, but even if RSA does not require the public key of the KBS to be known to the KBC, it might still seem like we should verify that we are talking to the right party.

Why don't we measure the key?

The KBS is a long running process with a fixed public key known to the guest owner. Since verifying the KBS public key would be part of the setup of the secure channel between, you can't use this secure channel to provision the key. The public key is not confidential so instead of secret injection, we could provide the key as part of the measured boot state of the guest. For SEV-SNP this could be done by putting the public key or a hash of the public key in the host data field. There is a corresponding field for TDX and we could even do this with SEV by injecting the key into the firmware binary. Unfortunately, there are two issues with these approaches.

First, Confidential Containers is designed to use a two-stage measurement system. The first stage is the measurement of the initial TCB by the confidential hardware. The second stage is the measurement/decryption of the workload container images. These stages are decoupled so that any container can run in the same confidential environment without adjustments to the initial TCB. The host data is a first-class entity in the SNP Attestation Report, so changes to it do not affect other parts of the measurement. Even so, continuously updating the host report is in tension with the goal of having a generic guest measurement and a relatively simple verification process. For platforms like SEV that do not have a host data field, measuring the KBS public key would complicate verifying the attestation evidence even more.

Even if the public key of the KBS were in the host data field, however, it's not clear that this would provide an additional security guarantee. The host data is validated by the KBS. Let's imagine that a malicious CSP tampers with the network to connect the KBC to a malicious KBS. The CSP could also change the host data field to point to the public key of that KBS. Since the KBS is malicious it can validate the hardware evidence any way it chooses and establish a secure channel with the KBC. The presence of the KBS public key in the host data field does absolutely nothing to guarantee that we are talking to the correct KBS. The KBS is essentially being asked to validate itself, something that a malicious KBS can do.

Fortunately, there is a much simpler way to know that we've connected to the correct KBS. Only the correct KBS will have our secrets. If we can run an encrypted container image, then we must have connected to the KBS with the keys to decrypt this image. More specifically, the execution of a confidential workload should be gated on the receipt of a secret. This could mean that the container image itself is encrypted and contains some identifying information, such as the credentials for a database. Since workloads can make requests to the Attestation Agent, we could also use a signed container image that requests secrets from the AA.

Unfortunately, not all resources that the KBS provides are confidential. The KBS can also be used to provision the policy and key information for validating image signatures. Image signature validation requires public keys. Since these aren't secret, these don't confirm the identity of the KBS. A malicious KBS could provide policies and keys to validate just about any image. As a result, workloads cannot rely on signature validation alone. This would break the edict mentioned above; workloads must be gated on the receipt of a secret. We can use signatures in combination with secrets, as many pods likely will.

Another Perspective

The above is the response that I have typically given to people asking about how the KBS public key is provisioned. Recently, however, I have begun to look at the issue from a different perspective that I think is much more intuitive. It turns out that even the simplest questions posed here rely on some assumptions that lead to confusion. Let's start from the basics.

Imagine that you have just inherited the Klopman Diamond. You want to keep it in a safe place so you visit your local bank. You decide to rent out a safe deposit box. Before depositing your diamond, you inspect the box to make sure that it is sturdy and that it is empty. This is akin to an attestation. You are validating the isolation mechanism and initial TCB of the box. Once you are satisfied, you use the key to lock up the box with the diamond inside. There is only one key. This is your secure connection to the box.

While the bank probably asked you for some money (much like a CSP) at no point did the safety deposit box try to inspect you. The safety deposit box is an inanimate container. It does not care who it is being rented by. The more important thing is that the renter is satisfied with the guarantees of the box. This maps directly onto Confidential Containers and perhaps confidential computing more generally. In the previous section we were conceptualizing the flow from the perspective of the KBC. Instead, we should think about it from the perspective of the KBS, because the KBS operates on behalf of the client. In short, we don't need to validate the identity of the KBS because we are the KBS.

In Confidential Computing we usually talk about the hardware as the root of trust. The hardware is the root of trust of an enclave, but in Confidential Containers, it's really the KBS that is the root of trust of a workload. Through attestation this trust is extended to the enclave. It's tempting to think about this process the other way around, but this causes confusion. If we start from the KBS, most of the questions evaporate. Including the question that started everything off.

Let’s say that the CSP spins up a VM for us, but then manipulates the network such that the KBC connects to a KBC that does not belong to us. This is no different from a bank renting out a safety deposit box to someone that is not us. There might be some orchestration snafus if the resources are misallocated, but this is not a security issue. The boxes are generic. I will use any one that meets my standards. There is a lurking concern, however. Surely it would be an issue if a user becomes convinced that they are communicating with a workload that does not actually belong to them. It’s important that only one guest can have privileged communication with an enclave. In other words, it’s important that an enclave connects only with one KBS. Otherwise, a pod might end up with a mix of containers representing different entities, some of which could be hostile to each other. Fortunately, this situation is relatively easy to avoid. The Attestation Agent must connect with only one KBS, just like a safety deposit box should have only one key.

There is one final scenario to consider. Let’s say that after you leave the bank one of the workers rearranges the labels on all the safety deposit boxes. This would be annoying, but it wouldn’t compromise confidentiality. You are still the only one who can access your box. It might take a while to find it again, but you’ll know you’ve got the right one when you use your key to open it and you find all your stuff inside. If you didn’t have anything in your safety deposit box or if you only deposited some non-identifying pocket lint, then you might not be as confident that you found your box again. This highlights the importance of having a non-generic workload. In Confidential Containers enclaves are generic. The secrets provided by the KBS are the identity of a guest.

These aren’t new conclusions. In fact, these are the same conclusions we reached in the first half of this article. To me this viewpoint from the perspective of the KBS rather than the KBC is a lot easier to think about.

@danmihai1

@fitzthum Thanks for explaining again! I missed initially the idea of replaying the detailed log during attestation.

I will think more about this, but currently I am down to this other set of concerns about the RTMR/vTPM + CoCo Log proposal:

Containers in a pod can be re-started, and livenessProbes can be executed after attestation too. Therefore, these actions wouldn’t be included in the attestation log. (This is the item you already mentioned that you’ll think more about)

afaik restarting containers isn't very common, but I agree that there are some cases we'll need to think about with timing.

Requires a custom CoCo protocol for providing the log to the Verifier and/or Relying Party.

I think we will want to add an endpoint to the KBC that a workload can use to get the evidence. This will return the attestation report and the log and some metadata. Exposing attestation reports to the guest directly means that a workload will be platform specific. We should be able to generalize that with a KBC endpoint. It's true that this format might not be immediately compatible with existing non-standard or proprietary verifiers, but that should mainly be a question of conversion.

Requires a custom CoCo implementation for replaying the log in the Verifier and/or Relying Party service.

Yeah, we would need some code to replay the log.

Checking the value of HOST_DATA/MRCONFIGID/CONFIGID in a Verifier and/or Relying Party service is more common, since those are fields intended for configuring the TEE.

I don't know if any approach is particularly common yet.

After a successful hack of the guest VM software, it will be harder for the K8s operator to help with forensic analysis:

If any secrets were provisioned, the KBS would have the log, which should be handy for forensics. In a debug scenario you can use the debug console to read the log, which would be very useful.

...

A customer already trusts the TEE. Trusting a vTPM implementation too weakens the confidentiality promise.

If the vTPM is properly implemented (see AMD's linux-svsm), it will not change the trust model. The vTPM is part of the TEE. In TDX RTMRs are part of the standard measurement flow and don't change the trust model at all.

fitzthum/kbs-identity.md

The Mystery of the KBS Identity

Why don't we measure the key?

Another Perspective

fitzthum commented Feb 17, 2023 •

edited

Loading

Uh oh!

Xynnn007 commented Feb 20, 2023

Uh oh!

thomas-fossati commented Mar 7, 2024

Uh oh!

fitzthum/kbs-identity.md

The Mystery of the KBS Identity

Why don't we measure the key?

Another Perspective

fitzthum commented Feb 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xynnn007 commented Feb 20, 2023

Uh oh!

thomas-fossati commented Mar 7, 2024

Uh oh!

fitzthum commented Feb 17, 2023 •

edited

Loading