-
Notifications
You must be signed in to change notification settings - Fork 30
Proposal: Return all recursive references in referrer API #80
Comments
Given that flat response list, how should I reconstruct the original tree structure? As an example, how could I determine that scan-result-365-signature refers to scan-result-365, and not scan-result-11? |
What's the user scenario behind it? I'm wondering if this could imply a new API, perhaps /index? My motivation was that I wanted to keep the current referrers response schema intact, and for use my use case I didn't need that info. Some possibilities: Add
Nested responses (new API?)
|
Don't you need that information to construct the same tree structure in the target registry/repo, that you're copying into? If you really don't care about the tree structure, why not just store them all as referring to the root element directly? |
As for that, that's implementation decisions that hasn't been made yet 😛 My thoughts are that since the copy is incrementally executed manifest by manifest, the target system could incrementally build up an index containing the tree structure as it parses each newly copied manifest. |
Let's visualize the DAG as mentioned by @nelson-wu.
To copy the GET /v2/net-monitor/manifests/<v1_digest>
GET /v2/net-monitor/manifests/<scan-result-1_digest>
GET /v2/net-monitor/blobs/<scan-result-1-signature_digest>
...
GET /v2/net-monitor/manifests/<scan-result-365_digest>
GET /v2/net-monitor/blobs/<scan-result-365-signature_digest> Besides, clients need to recursively find the up edges / ancestors of GET /oras/artifacts/v1/net-monitor/manifests/<v1_digest>/referrers
GET /oras/artifacts/v1/net-monitor/manifests/<scan-result-1_digest>/referrers
...
GET /oras/artifacts/v1/net-monitor/manifests/<scan-result-365_digest>/referrers I think @nelson-wu is trying to convey the following GET /oras/artifacts/v1/net-monitor/manifests/<scan-result-1_digest>/referrers
...
GET /oras/artifacts/v1/net-monitor/manifests/<scan-result-365_digest>/referrers in a single referrer API call GET /oras/artifacts/v1/net-monitor/manifests/<v1_digest>/referrers so that we can reduce overall |
Is the question related to ordering of the references? |
If the limiting/filtering is motivated by performance concerns, yes. The idea behind this is users may not want to traverse the whole tree to find what they want out of performance concerns. We could have some sort of filter, returning top N of property X. This would allow clients to get a much smaller list of artifacts. However I think the potential issue is we'd have to implement a limited set of filters that may or may not match client side needs. If customers want something more specific, i.e. "I want top N artifacts that have this annotation but not this other annotation sorted by date ascending" we'll have to end up implementing a whole SQL query engine. I think we could provide a top-down view of the whole artifact graph, similar to a sitemap, and customers can decide for themselves what to pull. It would allow them to address their own performance concerns, and give them more freedom to do their own filtering. |
Performance is always a good thing to think about, but is that the motivator here? The 365+ scans on an image is interesting. Although, I'm wondering if we'd really even hit 365 as a total. How many images last that long, before they are rebuilt, and replaced with a newer version (tag)? Do we need to re-scan archived images that are maintained for compliance reasons, and no longer in deployment? Are the images actually scanned every day? Or, would you do an initial scan, and catalog what's in the image? Then, when a new java vulnerability is discovered, the scanner takes its inventory and scans the java-based images. Inventorying on the SBOM is even more interesting. Should we optimize the scans, and have fewer, more accurate? If you have a history of scans, do you need all of them? Or, just the last n from each signing authority? I'd just suggest let's start small, and increment, based on specific use cases. |
Closing for now. As we have more usage, we can reconsider, and reactivate. |
Motivation
Right now it appears that in many use cases we consider all the artifacts referencing an image to be indivisible from the image. What this means is in scenarios where the image needs to be moved from system A to system B, all of the artifacts associated with the image need to be moved with the image.
During that move operation, discovering all the artifacts associated with the image can be a costly operation that involves many recursive calls to the referrer API. Consider the following example:
Problem
If image
net-monitor:v1
has daily scans, and each scan is signed, there would be the following image structure:To move
net-monitor:v1
and all its associated artifacts, the following network calls would be neededA total of 731 referrers calls is needed just to move this one image. This is extremely computationally expensive and may cause livelocks or be exploited for DDoS attacks.
Proposed solution
The referrers API would accept a
recursive=true
query parameter.When this is true, it would return all the artifacts transitively referencing an image in a flat list.
This would allow all 731 calls to this API to be shortened to just 1 call.
The text was updated successfully, but these errors were encountered: