Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image metadata without affecting the CAS #59

Closed
vishvananda opened this issue May 2, 2016 · 20 comments
Closed

Image metadata without affecting the CAS #59

vishvananda opened this issue May 2, 2016 · 20 comments

Comments

@vishvananda
Copy link
Contributor

It would be nice to be able to add metadata to an image (or a layer) without affecting the CAS id for the image. This allows for images to be built reproducibly but stills allows for keeping useful information with the image like the build date, build software versioning, build machine, etc.

This could go into a specific directory (.meta/) inside the image that can be ignored when generating the content id, or it could be shipped in a separate filesystem structure that is outside the "data" filesystem for the image.

@vishvananda
Copy link
Contributor Author

I note that there is some discussion in #22 about where signatures should be stored so that they don't disrupt the CAS hash as well. It seems that this is a general issue.

@vbatts
Copy link
Member

vbatts commented May 2, 2016

Like the annotations field?

On Mon, May 2, 2016 at 5:03 PM Vish Ishaya [email protected] wrote:

I note that there is some discussion in #22
#22 about where
signatures should be stored so that they don't disrupt the CAS hash as
well. It seems that this is a general issue.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#59 (comment)

@wking
Copy link
Contributor

wking commented May 2, 2016

On Mon, May 02, 2016 at 01:56:02PM -0700, Vish Ishaya wrote:

It would be nice to be able to add metadata to an image (or a layer)
without affecting the CAS id for the image.

I like TUF's ‘custom’ on the signed name ↔ hash assertion 1. You
could use that for, among other things, expiration timestamps 2.

@vishvananda
Copy link
Contributor Author

@vbatts interesting, I missed that addition. It depends on whether the json for the image is part of the CAS or lives separately. Does the image itself have a reproducible ID or is that outside the scope of the spec? I suppose annotations could work if the image ID were generated by filtering out the annotations field and taking a sha of the json file.

My goal would be to have two different build machines building an image from the same source, throwing their own annotations in, but still producing the same image id.

@wking
Copy link
Contributor

wking commented May 2, 2016

On Mon, May 02, 2016 at 02:13:15PM -0700, Vish Ishaya wrote:
“I suppose annotations could work if the image ID were generated by
filtering out the annotations field and taking a sha of the json
file.”

^ this makes annotations unsignable and breaks content-addressability
for the image, so I'd rather not ;).

“My goal would be to have two different build machines building an
image from the same source, throwing their own annotations in, but
still producing the same image id.”

This is going to be hard. For one thing, you'd need to use
compatible, stable tar and JSON implementations. But it would be cool
if you got it working [1](assuming your mutated annotations were
living above the image level of the CAS hierarchy).

 Subject: Re: OCI Bundle Digests Summary
 Date: Thu, 15 Oct 2015 16:52:42 -0700
 Message-ID: <[email protected]>

@vishvananda
Copy link
Contributor Author

@wking I currently have tar and json producing stable results for a custom build system so I'm not so worried there. I understand the desire to keep everything content-addressible, but it is also important to have build metadata available. In this case I would think that annotations are not the right place for this information because it is useful at the very least to determine that multiple builds have produced the same manifest.json, and for image ids to remain stable if nothing has changed.

Maybe what we actually want is for build servers to be able to sign existing manifests and put their metadata somewhere else. If there was a way to search these metadata signatures, you could potentially see all build servers that produced the same image along with their metadata.

@vishvananda
Copy link
Contributor Author

vishvananda commented May 2, 2016

I think the ideal solution is essentially something akin to a git annotated tag. Each build server could create a signed tag with their build metadata. Then there could be some way to find all annotated tags for a given image id.

Then a user could download an image based on the sha of the tag (which would be unique per build) or the sha of the image id. The runtime (docker, rkt, etc.) could resolve tags to the sha of the underlying image for display purposes so it would be easy to determine if multiple machines are running the same image even if they downloaded different "tags"

@wking
Copy link
Contributor

wking commented May 2, 2016

On Mon, May 02, 2016 at 03:14:41PM -0700, Vish Ishaya wrote:

I think the ideal solution is essentially something akin to a git
annotated tag.

This sounds like “custom metadata in the signed name ↔ hash assertion”
1 to me, so +1.

@philips
Copy link
Contributor

philips commented May 3, 2016

@vishvananda Happy to consider a concrete proposal. What is the use case?

@stevvooe
Copy link
Contributor

stevvooe commented May 4, 2016

Complex hash funnels typically lead to fragile and insecure behavior. It makes implementations fragile and creates an opening for unverified data to creep through. Anything that requires pre-processing before calculating a hash is suspect.

The right approach to handling ephemeral metadata is to appropriately layer it. Only change fields at the level that corresponds to the build. For example, rather than pushing build version to the image config, keep it at the tag-level.

An annotated tag-like object is probably the right balance. The key here is keeping the lowest layers (image config, layer file) as simple as possible then tag these hashed artifacts with metadata.

@philips
Copy link
Contributor

philips commented May 4, 2016

@stevvooe Yep, +1 on all points.

I would be OK with considering the schema for this "external" metadata in this repo. So, if @vishvananda comes up with a good use case and wants to drive the work I am OK with it.

@philips
Copy link
Contributor

philips commented May 6, 2016

@vishvananda Want to try and start on something next week so we can consider it for v0.2.0?

@vishvananda
Copy link
Contributor Author

My current thinking is that this should be merged with the idea of signing. And that signed objects are canonical json that live in the cas as a reference to another object. Something along the lines of:

{ "signature": {...}, "metadata": {...}, "object": "some-cas-image-id" }

The runtime software (rkt, docker) can choose to treat this as a "tag" (in the sense of git annotated tags) on the underlying object. The object in this case would be the image json, but other objects could theoretically be signed as well.

@wking
Copy link
Contributor

wking commented May 9, 2016

On Mon, May 09, 2016 at 09:37:21AM -0700, Vish Ishaya wrote:

The runtime software (rkt, docker) can choose to treat this as a
"tag" (in the sense of git annotated tags)…

A crucial piece of annotated tags that your JSON sketch misses is the
name being assigned. In 1 I have a similar sketch that includes a
‘name’ property, which allows you to say “I assert that the manifest
at sha256:ca6e… is ‘debian 8.0’”.

@vishvananda
Copy link
Contributor Author

@wking agreed. Both "name" and "signature" could be embedded in metadata and interpreted by tools, but I think it makes sense to treat those two as special and split them out of the generic metadata.

@philips philips added this to the post-v1.0.0 milestone May 24, 2016
@philips
Copy link
Contributor

philips commented May 24, 2016

I am putting this as a post-v1.0.0 milestone as it feels like a nice to have with other dependencies like #22.

@philips
Copy link
Contributor

philips commented Jun 2, 2016

cc @gtank

@vbatts
Copy link
Member

vbatts commented Mar 9, 2017

@vishvananda I think this can be down now with the image-layout, and an index (old manifest-list). I'm inclined to close this issue out.

@vishvananda
Copy link
Contributor Author

vishvananda commented Mar 9, 2017 via email

@vbatts
Copy link
Member

vbatts commented Mar 9, 2017

fixed by #533

@vbatts vbatts closed this as completed Mar 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants