Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new abstract methods for reporting source provenance #1997

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

gtristan
Copy link
Contributor

@gtristan gtristan commented Mar 4, 2025

This is an initial implementation for the proposal up at: https://lists.apache.org/thread/q6gxjpld2vb1c9rqlsv24m12c087snc4

Some thoughts about this approach:

  • This prioritizes machine readability and standardization of source provenance and version information

    Sources have a lot of freedom in how they implement things, and so we may very well need to expand on the types and constants added here, such as SourceInfoMedium, SourceVersionType, etc.

    The idea here is to have greater certainty about how sources are obtained, even if this cannot be covered by all currently existing source implementations (e.g. I didn't initially add a bzr medium for which we have a plugin, or a cvs medium for which we do not yet have a plugin).

    Aspirationally, forcing this data to be precise can allow adjacent tooling to do useful things.

  • This drops the freeform "public data" mentioned in the proposal discussion

    My rationale for this choice in this branch, is that ultimately we want a data with a constant shape, and if for examlpe, we want the user to be able to override or assist a source with determining the reported "version", then the Source implementation already has everything it needs to do so:

    • it can add additional configuration keys for the user to configure
    • it has the power to implement collect_source_info() however it wants.
  • This does not yet attempt to cover the concept of tracking information

    I would like to consider this, but we should think carefully about how this can be useful. For instance, some git plugins have different interpretations of what their "tracking" strings mean, sometimes following a branch head, sometimes looking for the latest tag in history which matches a given regular expression.

    If we export this tracking information, it should probably be useful for external tooling to figure out how to do the tracking and come to the same conclusion, otherwise it is unclear what this is useful for.

  • This does not cover the CVE information

    While the SourceInfo objects representing a source's provenance is a list, I believe that the CVE information continues to be a per-element concept.

    For example, when we have applied security patches to a module, those security patches are, themselves, sources, with provenance of being revisioned in the local project

gtristan added 4 commits March 4, 2025 20:03
This is an opaque public data structure passed through SourceFetcher.fetch()
which, if provided, must be used when invoking Source.translate_url()
This comes with some data classes to describe source provenance
and versioning information.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant