[RFD] TPM Enrollment and secure secret delivery #40

alexlovelltroy · 2024-06-21T18:49:47Z

Attestation Background

Attestation is a method for verifying the integrity of a computer’s software, hardware, and firmware using a Trusted Platform Module (TPM). The TPM creates cryptographic measurements, or "quotes," reflecting the system's state, including its software and firmware configuration. These quotes are assembled into a report that is signed by the TPM through an embedded Public Key Infrastructure (PKI), which keeps private keys secure within the TPM.

During remote attestation, this signed report is sent to a remote verifier, which uses PKI to authenticate and validate it. The report includes a nonce—a unique, random number generated for each request—to prevent replay attacks and ensure the report's freshness. The verifier checks the report against expected values, leveraging PKI to confirm the authenticity of the quotes and the system’s integrity. Successful validation allows the verifier to grant access or permissions, affirming the system's trustworthiness.

Beyond integrity verification, the same PKI framework used in attestation can facilitate secure communications between nodes. The TPM can encrypt information so that only nodes with the corresponding TPM can decrypt and read it, ensuring data security. Additionally, PKI allows nodes to prove that a message originated from a TPM-equipped system by signing the message with the TPM's private key. Recipients use PKI to verify this signature, confirming both the message’s origin and its authenticity. This combined use of PKI and TPM strengthens security by enabling both secure connections and reliable verification of communications.

Bootstrapping Attestation

Bootstrapping remote attestation requires establishing trust in TPMs themselves. This foundational trust is essential for the effective functioning of the attestation process.

Initial trust is established through key provisioning and certification. When a TPM is initialized, it generates a primary endorsement key (EK) and additional keys for various functions. The EK is used to obtain a certificate from a trusted Certificate Authority (CA), known as the Endorsement Certificate (EKCert), which binds the TPM’s public key to its identity. This endorsement, signed by a trusted CA, establishes a basis of trust for the TPM’s operations.

With initial trust established, the remote attestation process can proceed. The remote verifier sends a request to the TPM, including a nonce to ensure the report’s freshness. The TPM generates a signed quote, which includes the nonce and a measurement of the system’s state. This quote serves as proof of the system's integrity. The verifier uses the TPM’s EKCert to validate the TPM’s public key and the authenticity of the quote. Successful validation confirms the TPM’s trustworthiness and the system’s integrity.

Ongoing trust management involves periodic attestation checks to ensure system integrity, key rotation to maintain security, and mechanisms for certificate revocation if a TPM is compromised. These practices help maintain the robustness and reliability of the attestation framework.

Infrastructure Challenges for Remote Attestation

Managing the original Endorsement Keys (EKs) securely is a critical challenge, especially when integrating new computers as new racks are delivered or nodes are swapped. The integrity of the attestation process depends on the secure handling of these keys from generation to deployment. Either in the factory, or on delivery, each TPM generates unique EKs that must be securely transmitted to a trusted Certificate Authority (CA) for certification. Ensuring these EKs are encrypted and protected during transmission is essential to prevent unauthorized access.

Ongoing management of EKs also includes secure handling of key rotations and updates. Procedures must be in place to address the replacement of TPMs and the updating or revocation of EKs to maintain system integrity and trustworthiness.

OpenCHAMI Attestation and Enrollment Service

This RFD proposes a process for managing enrollment keys and supporting remote attestation.

Extend OpenCHAMI to use TPMs for identity

In today's system, each node is primarily identified by the xname which denotes a location in the system. This idea of location as primary identifier is inherited from CSM through our use of the CSM service SMD as the primary inventory interface in OpenCHAMI. Location, while unique across the system, isn't stable. It is possible and even somewhat common to remove a blade from one chassis and replace it in another chassis. Tracking errors per blade as it is moved from one part of the system to another is possible in CSM, but not trivial.

The TPM contains several pieces of data that can be used for identity and are both unique and stable. The specification for TPM 2.0 which is linked below in the references describes two ids that are practical for our use.

The 802.1AR standard defines two Device Identity (DevID) types, depending on the CA signing the issued
certificates and expectations for certificate lifetimes.
The initially installed identity is defined as an IDevID/IAK (“I” for initial) and is installed by the product OEM. The
IDevID credential is intended to be usable for the life of the product. The IDevID/IAK is expected to be created at
device manufacturing time.
On the other hand, owner-created and signed identities are named LDevID & LAK (“L” for local). LDevID/LAK
credentials may “overlay” IDevID/IAK credentials, thereby replacing IDevID with LDevID in operation. Alternatively,
IDevID/IAK and LDevID/LAK may be used for different purposes. LDevID/LAK certificates are not expected to be
long-lived certificates. LDevID/LAK credentials are expected to be removed when a device is “zeroized” or is at its
end of life.

One approach would be to include a TPM identifier as an additional piece of data stored by SMD and provide functions for interacting with the unique and stable identities in addition to the xnames.

A second approach would be to create a new service for externally managing these identities and provide integrations with SMD and other microservices.

I recommend the first approach as identity and inventory are intrinsically linked. Keeping them separate introduces race conditions and other potential consistency problems.

Extend OpenCHAMI to boot a dedicated discovery image for collecting TPM keys/IDs

The remote attestation process requires establishing a collection of valid Public Keys/Certificates that identify the TPMs and which can be compared with responses in the remote attestation process. We have considered several options for a process that works with OpenCHAMI.

A fully manual process that involves sysadmins booting the node, logging in as root, retrieving the appropriate certificates and IDevIDs from the tpm, and adding them to SMD.
A fully automatic process in the boot cycle that sends appropriate certificates and DevIDs to an unauthenticated internal service with network protections.
A dedicated system image that allows ansible to connect and retrieve certificates and DevIDs.

We believe the first option to be the most secure, but it is also the most labor intensive. We did not pursue it as impractical.

We believe the second option creates an opportunity for a rogue device to register itself as a fake node and could provide an avenue for future attacks. If we can adjust network settings or provide other protections, it might be workable. We chose not to pursue it at this time.

The third option appears to provide us with the security and manageability we need and allows us to build on tooling we already have. We are pursuing this option, but are ensuring that other options remain available for sites that do not wish to use ansible in this way.

References

alexlovelltroy · 2024-08-06T20:31:45Z

Further work here could include using the attestation process to set up a wireguard tunnel for cloud-init and removing the need for authentication in cloud-init itself.

alexlovelltroy · 2024-08-19T12:41:11Z

https://github.com/keylime/keylime May provide much of the functionality needed for the actual enrollment and management of certs/keys

davidallendj · 2024-08-19T13:31:12Z

https://github.com/keylime/keylime May provide much of the functionality needed for the actual enrollment and management of certs/keys

Should be looking here at the rust version instead since the python version is deprecated?

alexlovelltroy · 2024-08-19T13:37:35Z

I think both repos are relevant.

From the README:

Keylime consists of three main components; The Verifier, Registrar and the Agent.

The Verifier continuously verifies the integrity state of the machine that the agent is running on.

The Registrar is a database of all agents registered with Keylime and hosts the public keys of the TPM vendors.

The Agent is deployed to the remote machine that is to be measured or provisioned with secrets stored within an encrypted payload released once trust is established.

I think only the agent has been rewritten in rust and move to the other repo. I don't see evidence that they are moving the Registrar or Verifier to rust at this point.

alexlovelltroy · 2024-08-29T12:41:38Z

Relevant to this discussion:

https://datatracker.ietf.org/doc/html/rfc9334

alexlovelltroy · 2024-08-29T12:45:44Z

Possible alternative to Keylime. Doesn't look ready for primetime yet.

https://github.com/veraison

dev-zero · 2024-09-16T10:11:46Z

(All views expressed are my own. If at all, they originate from my role as an OpenCUBE developer.)
For the text itself the part about permitting replacement of TPMs and using an embedded IDevID as alternative stable ID to the xname seems to be inconsistent. Furthermore a TPM is just one secure enclave, CPUs, GPUs, NICs and DPUs may provide their own in the future. Furthermore with current architectures PCIe devices are privileged, meaning that an alternative ID might be a hash over device IDs instead. To ensure a machine is re-enrolled if crucial components change.
On the other hand whether someone will actually want to implement that is a different question.
Nonetheless, I would therefore suggest to split this RFD into 2 parts:

rollout of TPM enrolment and ongoing remote attestation
new unique ID

In general I think that tracking faults per device is yet another story and this data should be tied to subcomponents instead if possible.

I recommend the first approach as identity and inventory are intrinsically linked. Keeping them separate introduces race conditions and other potential consistency problems.

I agree that additional IDs must be maintained within SMD. Keeping it separate makes it unmaintainable, quickly.

Further work here could include using the attestation process to set up a wireguard tunnel for cloud-init and removing the need for authentication in cloud-init itself.

How the actual transport happens is secondary to this proposal I think. I would prefer HTTP mTLS over a full separate protocol, though.

We believe the second option creates an opportunity for a rogue device to register itself as a fake node and could provide an avenue for future attacks. If we can adjust network settings or provide other protections, it might be workable. We chose not to pursue it at this time.

I am missing how the third option improves on the MITM situation. But then I am fine with either.

What I am missing here is a disclaimer that this proposal is geared towards managed nodes and already assumes that the environment where OpenCHAMI runs is to be considered secure.

alexlovelltroy added this to Roadmap Project Jun 21, 2024

alexlovelltroy converted this from a draft issue Jun 21, 2024

alexlovelltroy added the rfd Request for Discussion label Jun 21, 2024

alexlovelltroy added the WIP Work In Progress label Aug 4, 2024

alexlovelltroy mentioned this issue Aug 21, 2024

[RFD] Managing Unique Names in ochami microservices #10

Open

alexlovelltroy assigned njones-lanl Aug 22, 2024

alexlovelltroy moved this to In Progress in Roadmap Project Aug 22, 2024

alexlovelltroy mentioned this issue Aug 26, 2024

Support Encrypted Workloads #44

Open

alexlovelltroy mentioned this issue Nov 7, 2024

[DEV] Integrate Keylime for Secure Attestation in Deployment Recipes OpenCHAMI/deployment-recipes#83

Open

alexlovelltroy mentioned this issue Mar 14, 2025

[RFD] Network Management and Configuration Service #70

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFD] TPM Enrollment and secure secret delivery #40

[RFD] TPM Enrollment and secure secret delivery #40

alexlovelltroy commented Jun 21, 2024 •

edited

Loading

alexlovelltroy commented Aug 6, 2024

alexlovelltroy commented Aug 19, 2024

davidallendj commented Aug 19, 2024 •

edited

Loading

alexlovelltroy commented Aug 19, 2024

alexlovelltroy commented Aug 29, 2024

alexlovelltroy commented Aug 29, 2024

dev-zero commented Sep 16, 2024

[RFD] TPM Enrollment and secure secret delivery #40

[RFD] TPM Enrollment and secure secret delivery #40

Comments

alexlovelltroy commented Jun 21, 2024 • edited Loading

Attestation Background

Bootstrapping Attestation

Infrastructure Challenges for Remote Attestation

OpenCHAMI Attestation and Enrollment Service

Extend OpenCHAMI to use TPMs for identity

Extend OpenCHAMI to boot a dedicated discovery image for collecting TPM keys/IDs

References

alexlovelltroy commented Aug 6, 2024

alexlovelltroy commented Aug 19, 2024

davidallendj commented Aug 19, 2024 • edited Loading

alexlovelltroy commented Aug 19, 2024

alexlovelltroy commented Aug 29, 2024

alexlovelltroy commented Aug 29, 2024

dev-zero commented Sep 16, 2024

alexlovelltroy commented Jun 21, 2024 •

edited

Loading

davidallendj commented Aug 19, 2024 •

edited

Loading