Skip to content
This repository was archived by the owner on May 15, 2024. It is now read-only.

Add sorting by artifact creation datetime #82

Merged
merged 1 commit into from
Mar 10, 2022

Conversation

michaelb990
Copy link
Contributor

I've been hearing feedback about issues with
returning an unordered list of references for
use cases that may have a high number of artifacts.
Scan results or short-lived signatures are two examples
of this. In most cases, the artifact which was created
the most recently is usually the more important one.
This is an initial attempt at specifying how that might
look like.

Would love feedback about:

  • Sorting mechanism. In talking with @SteveLasker and others, the creation time seems like the right field to sort by. Using the annotation instead of the time the artifact was pushed allows for an image with its related artifacts to be copied between registries while keeping a consistent order.
  • I'm suggesting that artifacts without the created annotation be sorted last in the list of results.
  • If registries are expected to sort based on this field, should we add it to the result descriptor items? In order to sort by this field, registries will have to store it, so it seems rather trivial to include it in the response list from the /referrers api.

@michaelb990 michaelb990 requested a review from a team as a code owner December 17, 2021 18:56
Copy link
Contributor

@SteveLasker SteveLasker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I'm betting we'll add other criteria, over time, I'd suggest having a default sort order of desc on created, but provide for other sorting options in the future.
For instance, to specify the sort order of asc, the user would pass ?orderBy=created:desc Which raises a good question. What are good patterns for including two values on a parameter?
?artifactType=example/scan&sortBy=created:asc


**Pre-Defined Annotation Keys:**
This defines a set of keys that have been pre-defined for use by authors of ORAS artifacts.
- `org.cncf.oras.artifact.created` date and time on which the artifact was created (string, date-time as defined by [RFC 3339][rfc-3339])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we only have one annotation now, should this be placed in an ./annotations.md file? I'm sure we'll have many more, which we'll add as needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's nicer to have inline until it gets too big. Is there a reason to move it to a new file now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do it later, as we have more annotations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note on the annotation namespace. We'll want to change this, but I created #89 to track this, so I'd suggest we merge this, and do a separate PR to address 89 as we close on the namespace.

@@ -36,7 +36,8 @@ Future Minor versions MUST be additive.
## Artifact Referrers API results

- Implementations MUST implement [paging](#paging-results).
- Implementations MAY implement [`artifactType` filtering](#filtering-results).
- Implementations MUST implement [sorting](#sorting-results)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure we'll get some pushback on this, but I agree we MUST provide sorting, or the paging becomes only partially helpful. Paging and sorting are combined to get the top N of a given artifactType.
I suppose we could also throw a 400 when sortby parameter is passed? But, I'd suggest results MUST be sorted descending by default, with the option to sortby=created:asc

@@ -141,6 +142,13 @@ The value of the header would be:
```
Please see [RFC5988][rfc5988] for details.

### Sorting Results
The `/referrers` API MUST allow for artifacts to be sorted by the date and time in which they were created, which SHOULD be included in the artifact manifest's list of `annotations`.
The artifact's creation time MUST be the value of the `org.cncf.oras.artifact.created` annotation, as specified in the [artifact-manifest spec][artifact-manifest-spec].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest link to a new ./annotations.md file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acking this is deferred to when we have more annotations.

@SteveLasker
Copy link
Contributor

Adding xref to #80

@SteveLasker SteveLasker added this to the 1.0.0-draft.2 milestone Jan 6, 2022
@michaelb990
Copy link
Contributor Author

I think that I'd prefer to not include sorting method/parameters in this version. We don't support multiple sort orders or attributes in any other API on the registry, and doing that here creates additional complications, especially for smaller registries. Having a single order allows for the most common use case to be addressed without going overboard. This proposal already makes me a little concerned about overcomplicating, so I would also appreciate feedback on how to simplify this.

@SteveLasker
Copy link
Contributor

I can go either way on the annotations file.
The latest usecases around sorting the list of scan results, signatures is what prompted this change. If we want to say it's optional, that's good as well. But, I think registries will implement this pattern, and if we don't include it as an optional approach in the specs, "the proverbial API cat will be out of the bag", and we'll never get a standard.
Can we start with an optional standard?

@michaelb990
Copy link
Contributor Author

What do you mean by start with an optional standard?

The way the PR currently reads, the sorting is a MUST. I was pushing back against the need to support more than descending ordering and/or sorting by other attributes. I think we should not plan to add that support, as that begins to cause issues for a registry implemented as a static website/filesystem and requires additional indexes/fields to be lifted out of the manifest.

@michaelb990
Copy link
Contributor Author

Just thinking out loud here -- should we consider changing this to the time the actual artifact was pushed?

I worry about lifting an annotation out of the manifest and using it for sorting. Especially considering it's expected to be a time value. If someone includes this annotation, but doesn't conform to a certain time format, what is the expected behavior? Should we store the value of this annotation in our database as a string? Should we attempt to parse the time and convert it to something standard so that multiple time formats can be used? What if someone puts a version number in the time field to get the sorting functionality? Do we start rejecting manifests that contain this annotation but aren't conforming to the time RFC?

If we changed to the actual time of push, instead of lifting an annotation:

Pros

  • simpler to implement and understand
  • it would always exist for every artifact with the same format

Cons

  • ordering wouldn't be preserved when a client copies a list of artifacts
    • could it? what if a well-behaved client pushed the references for an image in order?
    • maybe ordering is less important for content copied between registries, not sure 🤔

@SteveLasker
Copy link
Contributor

SteveLasker commented Jan 15, 2022

should we consider changing this to the time the actual artifact was pushed?

That was the initial thought. However, since our goal is to enable content promotion, what does "push" mean? Is it the push to the first registry, and it's maintained as it's promoted to other registries? How does the 2nd registry know it's the second registry? (your cons list). We did discuss putting the dependency on clients to pull/push in the right order, but that just leaves a long list of holes that would surface, making the value and ordering non-trusted.

This is how we landed on sorting by the new annotation, as the client determines the created date, and the registry just reflects that. Similar to a file on your local disk. As a file is copied across folders, and different machines, the last modified date (pseudo for created date) is maintained.

If someone includes this annotation, but doesn't conform to a certain time format, what is the expected behavior?

This is an excellent question, and something we'll need to cope with as annotations become more generally used.
I'd suggest this is like any other non-typed language. When the value is parsed, if it cant be converted, it's effectively null. The value is maintained in the manifest, (as we can't change it), but any query on it would treat the value as null and not sort it when queried as a date. As registries start supporting meta-data querying and a user was troubleshooting and asked: org.cncf.oras.artifact.created=jan-02-2022 which is an invalid date format, they'd get the artifact back. As that was a string comparison. If they were sorting on date/desc we could either ignore the date, or sort it at the end of the list of valid dates.

I realize we're starting to deal with a larger set of issues, but this is the right time to start dealing with issues I think we recognize we need to solve. ie meta-data querying

BTW, I'm also ok if we don't require this in the spec. I do want to enable a registry to be able to implement the sorting pattern. and it be consistent when other registries decide to implement it.

I do not think we should let an unknown quantity (if any) static registries limit the rest of the dynamic registries from being to deliver on this pretty well defined need to sort scan results/updates/signatures/sboms in an order.

Copy link
Contributor

@SteveLasker SteveLasker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@michaelb990 michaelb990 merged commit 95bbea5 into oras-project:main Mar 10, 2022
@michaelb990 michaelb990 deleted the sorting-proposal branch March 10, 2022 22:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants