Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crates.io: Crate Deletions #3660

Merged
merged 1 commit into from
Aug 29, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions text/3660-crates-io-crate-deletions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
- Start Date: 2024-06-20
- RFC PR: [rust-lang/rfcs#3660](https://github.com/rust-lang/rfcs/pull/3660)

# Summary

This RFC proposes a mechanism for crate authors to delete their crates from crates.io under certain conditions.


# Motivation

There are a variety of reasons why a crate author might want to delete a crate or version from crates.io:

* You published something accidentally.
* You wanted to test crates.io.
* You published content you didn't intend to be public.
* You want to rename a crate. (The only way to rename a package is to re-publish it under a new name)

The current [crates.io usage policy](https://crates.io/policies) says:

> Crate deletion by their owners is not possible to keep the registry as immutable as possible.

This restriction makes sense for the majority of crates that have been around for a while and are actively used, but the above list of reasons shows that there are valid use cases for allowing crate authors to delete their crates without having to contact the crates.io team.

To make this process easier for our users and to reduce the workload of the crates.io team dealing with such support requests, we propose to codify our current set of informal rules into a formal policy that allows crate authors to delete their crates themselves under certain conditions (see below).


# Proposal

We propose to allow crate authors to delete their **crates** from crates.io under the following conditions:

* The crate has been published for less than 72 hours,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that even if there is a crate that depends on this crate, we still allow users to delete it within 72 hours?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems different from the npm policy. Not sure if this is going to lead us to the pypi situation.(At least we have time limitation 😆)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that even if there is a crate that depends on this crate, we still allow users to delete it within 72 hours?

yes

This seems different from the npm policy.

I admit that I don't remember why I didn't include the restriction in this case 😅

I don't think it is a case that will come up particularly often though. If this is a new crate (< 72 hours) then it usually won't have the popularity yet to get a lot of reverse dependencies in that time frame. And if the popularity happens to exist because of the popularity of the author then that author will most likely be aware of the consequences of deleting a popular new crate.

As I mentioned before in other comments, we can always adjust the rules if we notice that something isn't quite working as intended :)

* or if all the following conditions are met:
* The crate has a single owner,
* The crate is not depended upon by any other crate on crates.io (i.e. it has no reverse dependencies),
* The crate has been downloaded less than 100 times for each month it has been published.

We also propose to allow crate authors to delete **versions** of their crates from crates.io under the following conditions:

* The version has been published for less than 72 hours.
Comment on lines +37 to +39
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this will replace yank in a lot of situations and feel like more friction or guard rails are needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem worrisome to me. I think depending on a deleted crate would be very disruptive for those users. I'm uncertain of the exact behavior, but I think users of those crates will either get a confusing download error (if using Cargo.lock), or a confusing resolution error (if not using Cargo.lock).

I think it would be helpful to make sure the UI has loud and explicit warnings about the consequences of this.

Would it be possible to also warn owners of reverse dependencies when this happens?

Would it be possible to include a marker in the index similar to yanked so that cargo can provide a better error message?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to also warn owners of reverse dependencies when this happens?

Well, there aren't any owners of reverse dependencies, because "there are no reverse dependencies" is a requirement for deletion in this RFC.

I do think it's a good idea to include a marker in the index to provide a better error message for people who depend on the deleted crate in things which are not published to crates.io.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there aren't any owners of reverse dependencies, because "there are no reverse dependencies" is a requirement for deletion in this RFC.

I'm responding specifically to the version deletion, which does allow reverse dependencies. IIUC, the "no reverse dependencies" is for entire crate deletions only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to make sure the UI has loud and explicit warnings about the consequences of this.

I agree. This is certainly not an action that should be taken lightly. If implemented as a button on the website it should have a confirmation step that highlights the consequences of this action. If implemented as a cargo plugin or in cargo itself a similar confirmation step will probably make sense.

Would it be possible to also warn owners of reverse dependencies when this happens?

  • for crates that are deleted within 72 hours of being created: yes, that could be possible
  • for crates that are deleted after 72 hours: in this case deletion is not possible if reverse deps exist
  • for versions that are deleted: since most reverse deps are using implicit ^ version requirements the reverse dependency would fall back to the previous version, so I'm not sure if a warning is necessary in this case

Would it be possible to include a marker in the index similar to yanked so that cargo can provide a better error message?

generally, yes. when I thought about this question yesterday I thought a { "version": "1.2.3", "deleted": true } would be sufficient, but I have a feeling that older cargo versions won't like such an index record particularly much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, so if I understand correctly a v: 3 with deleted: true isn't really helping if old cargo would essentially ignore the v and deleted fields 🤔

I assume if we only included the v, version and deleted fields then old cargo would consider the whole index file invalid? or would it only consider these records invalid?

anyway, I'm open to adjusting the index to add the deleted information in some way if it helps cargo to display better errors and if we can figure out a way to make it work reasonably well with older cargo versions :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume if we only included the v, version and deleted fields then old cargo would consider the whole index file invalid? or would it only consider these records invalid?

No, cargo parses line-by-line, so it would just skip over deleted versions.

I'm not personally too concerned about older cargos. I'm more concerned about current cargo's ability to display a reasonable message.

For the record, I still think this 72-hour policy is dangerous and am concerned about the consequences.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Turbo87 I noticed that this conversation was marked as resolved, but I didn't see a response or corresponding update to the RFC to resolve it. Would it be possible to at least add the index considerations to the unresolved section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to at least add the index considerations to the unresolved section?

sure, I'll add that :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This was marked resolved without any commits being pushed
  2. That still leaves "this is too lax". Ideally this risk would be covered in the Drawbacks ("if we make the interface for deletion too easy and the metrics too light, people might over-rely on it rather than yanking") or at minimum left Unresolved. I would very much want other crates.io reviewers to see this and weigh this when approving the RFC.

Copy link
Member

@joshtriplett joshtriplett Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The version has been published for less than 72 hours.
* The version has been published for less than 72 hours.
* There are no published crates that depend on the version (with a version requirement not satisfied by any other published version).

Otherwise, this may produce the result that people are hesitant to count on new versions even of well-established crates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that only looking at published version requirements is enough for version deletion. A new crate with no reverse dependencies is unlikely to be picked up and used from git repositories within its first 72 hours. A new version of a widely used crate is likely to have dozens of repos updating to it within a day via dependabot or similar mechanisms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the dependencies requirement. I see two scenarios:

  • crate A is used as a dependency of another library crate B published on crates.io: in this case the dependency declaration doesn't need to be bumped to always match the latest version of crate B since the implicit ^ version requirement makes cargo always choose the latest available version and it makes it possible to for downstream projects to choose which version to use (e.g. to not yet upgrade to https://github.com/servo/rust-url/releases/tag/v2.5.1 as a recent example 😅).

  • crate A is used as a dev-dependency of another library, or as a proper dependency in some application: in this case you usually want to track the latest version and potentially even pin that version if you're using services like renovatebot. such services can already deal with deletions on registries like npm so I would assume they could also handle it for crates.io. worst case: you try to compile a project and cargo greets to with a "can not find matching version" error. I expect these cases to be pretty rare though, and I assume in most cases the crate author is likely to publish a fixed version afterwards anyway.

It is also worth keeping the technical complexity of such a check in mind. We would have to iterate through all dependency records in our database for this crate and run them against all of the versions of the crate. I'm not sure how viable that would be.

I'm open to a downloads requirement though, but as @Nemo157 pointed out, services like renovatebot will likely cause a large number of downloads immediately after release for the more popular packages, which will prevent them from ever deleting a version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dependency declaration doesn't need to be bumped to always match the latest version of crate B since the implicit ^ version requirement makes cargo always choose the latest available version and it makes it possible to for downstream projects to choose which version to use

I wouldn't agree with this assumption. Some projects immediately bump the dependency declaration. The reason for doing that is that without testing minimal versions, there's no way to know if you are depending on something that was updated in a newer version. The only safe path is to also update Cargo.toml to the latest version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a recent Cargo team meeting, we were discussing whether we should abandon -Zminimal-version and instead be more proactively moving version requirements up, which would hit this case more often.

See https://internals.rust-lang.org/t/zminimal-versions-cargo-update-and-cargo-upgrade/21335

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, if we start with a more strict policy, it will be easier to relax it later. However, if we just publish this policy, it might be difficult to add additional requirements later on.

I'm curious to know if I used that version (= 1.2.3) and released my crate in a very short time. what should I do after this particular version of my dependency crate is deleted and what happens when cargo resolves my dependency? What happens to packages that depend on my crate? If this happens, do I need to remove my crate as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what should I do after this particular version of my dependency crate is deleted

if you're the author of a crate depending on a deleted version then you should probably widen the dependency to a previous version of that crate and release a new version with that widened dependency declaration.

do I need to remove my crate as well?

I don't think so. Removing/Deleting crates should be reserved for critical situations like the detection of malicious code or if you unintentionally published something that wasn't intended to be public. You can publish a new version of your crate and if you want you can yank the version of your crate that depends on the deleted dependency version.

what happens when cargo resolves my dependency?

as far as I remember cargo will show an error that it can't find a matching version for the dependency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're the author of a crate depending on a deleted version then you should probably widen the dependency to a previous version of that crate and release a new version with that widened dependency declaration.

This comes across as a pretty glib stance towards breaking people and requiring ecosystem churn because of it. "Re-release" is no always a trivial answer.

I don't think so. Removing/Deleting crates should be reserved for critical situations like the detection of malicious code or if you unintentionally published something that wasn't intended to be public.

That implies that you are wanting a specific set of behaviors, or culture, around removing packages but this RFC does nothing to establish this culture. For ideas on how to instill such a culture, see #3660 (comment)


These crate owner actions will be enabled by two new API endpoints:

- `DELETE /api/v1/crates/:crate_id` to delete a crate
- `DELETE /api/v1/crates/:crate_id/:version` to delete a version


# Drawbacks

> Why should we *not* do this?

The main drawback of this proposal is that it makes the crates.io registry less immutable.
This could lead to confusion if a crate is deleted that is depended on by other projects that are not published on crates.io themselves.
However, we believe that the conditions we propose are strict enough to prevent this from happening in practice due to the additional download threshold.

Another potential drawback is that it can create confusion on when it would be better to yank a version instead of deleting it.
We plan to address this by adding a note to the usage policy that explains the difference between yanking and deleting a version, and when to use which action based on the list in the [Motivation](#motivation) section above.


# Rationale and alternatives

> Why is this design the best in the space of possible designs?

The proposed design is based on the current informal rules that the crates.io team uses to decide whether to delete a crate or version.
These rules have been derived from the npm registry, which has a similar policy (see below).
We believe that the proposed conditions are strict enough to prevent accidental deletions while still allowing crate authors to delete their crates in the cases where it makes sense.

> What other designs have been considered and what is the rationale for not choosing them?

We considered not having restrictions on the number of reverse dependencies, but since that would leave the package index in an inconsistent state, we decided to require that the crate has no reverse dependencies.
Situations like the [`everything` package on npm](https://uncenter.dev/posts/npm-install-everything/) require manual intervention anyway, so we decided to keep the restrictions strict.

> What is the impact of not doing this?

The proposed design is based on the current informal rules that the crates.io team uses to decide whether to delete a crate or version. If we don't implement this proposal, we will continue to rely on the crates.io team to handle these requests manually, which is time-consuming and error-prone.

# Prior art

## npm

The main inspiration for this proposal comes from the npm registry, which has a similar policy for deleting packages and versions:

- https://docs.npmjs.com/policies/unpublish
- https://docs.npmjs.com/unpublishing-packages-from-the-registry

The npm registry started with a more permissive policy, but had to tighten it over time.
It started out with a policy that allowed package owners to delete their packages at any time, but this led to a number of issues, [such as packages being deleted that were depended on by other packages](https://en.wikipedia.org/wiki/Npm_left-pad_incident).
Their policy was later changed to require that packages can only be deleted within 72 hours of being published, and then [changed again in January 2020](https://blog.npmjs.org/post/190553543620/changes-to-npmunpublish-policy-january-2020) to allow deletions outside the 72-hour window under certain conditions.


## PyPI

The Python Package Index (PyPI) still allows package owners to delete their packages (or a subset of released files) at any time.
A member of the PyPI team has proposed to [stop allowing deleting things from PyPI](https://discuss.python.org/t/stop-allowing-deleting-things-from-pypi/17227) due to the same issues that the npm registry faced. The most current proposed ruleset can be found [here](https://discuss.python.org/t/stop-allowing-deleting-things-from-pypi/17227/71).

Their proposal is also inspired by the npm registry policy, but notably does not include a reverse dependency restriction. It seems that PyPI might not currently be tracking dependencies between packages, which would make it harder for them to implement such a restriction.

## Others

<https://discuss.python.org/t/stop-allowing-deleting-things-from-pypi/17227/59> contains a list of other package registries and their deletion policies.


# Unresolved questions

## Should names of deleted crates be blocked so that they can't be re-used?

The reason for this would be to prevent someone else from re-publishing a crate with the same name, which could lead to potential security issues.
Due to the restrictions on the number of downloads and reverse dependencies, this seems like a low risk though.
The advantage of allowing others to re-use such names is that it allows name-squatted/placeholder crates to be released back to the community without the crates.io team having to manually intervene.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we expecting to at least clearly flag in the UI or index that the crate existed under some other author / version series? I could see this being very confusing when looking at e.g. crater results or other cases where we try to pull old crates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A marker in the index is one of several topics discussed at #3660 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I saw that, but I guess I interpreted that as mostly being when the version isn't reused - it's not obvious (lacking dates?) that we have a mechanism for identifying a crate after this RFC other than by hash of it's contents, since the same name and version can refer to ~infinitely many different hashes afaict.


The npm registry blocks re-use of deleted package names for 24 hours.


## Should deleted versions be blocked from being re-uploaded?

Since version deletions would also be possible for widely used crates, it might make sense to block re-uploads of deleted versions to prevent security issues.
However, this would make it impossible to fix a mistakenly published new major version, for example.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this impossibility is referencing? If you publish a 2.0.0, realize you left an API token or something in it and delete it, then you can still publish a 2.0.1 as a new major version. Blocking a single version number indefinitely seems fine to me as there's ~infinite mostly equivalent numbers to use instead (the one case where there isn't is if you were publishing with gaps in your patch numbers and going back to fill in a missing patch number, but I've never seen that done with semver).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. And there's a lot of advantage in saying "this old version might stop existing, but it'll never show up again with a different hash". (Also keep in mind that people might have cached local builds with the old version.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one place where i could see that running into problems is where you're trying to track exact versions of some 3rd-party library, where if they publish a particular version, you'd want to publish the exact same version number even if you mistakenly published something with that version number before

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nemo157 I personally agree, but I've seen enough cases of "the marketing team" pushing for "the big 2.0.0 release" and if that version number has already been burned it becomes a problem. But that's why it's in the "Unresolved questions" section, I'm open to input on what the community prefers. I tend to follow what other successful projects/communities/registries are doing unless there are good reasons against it, and since npm blocks deleted version numbers I would personally vote for that option too.


The npm registry blocks re-uploads of deleted versions indefinitely.


## Should we keep and mark deleted versions in the index?

The cargo team has expressed interest in potentially keeping deleted versions in the index and marking them as deleted, so that this information can be used to improve dependency resolution messages. It will have to be researched if this can be accomplished without breaking older cargo versions that expect a certain index format. It might be possible to only add these markers to the sparse index, which is only used by newer cargo versions.


# Future possibilities

It is conceivable that the restrictions could be adjusted in the future if the crates.io team finds that the proposed restrictions are too strict or too lenient. For example, the download threshold could be adjusted based on how well the proposed ruleset will work in practice.

Once the backend of crates.io has been updated to support this feature, we could also consider adding a web interface for crate owners to delete their crates and versions directly from the crates.io website. Similarly, we could add a subcommand to the `cargo` CLI, either implemented as a plugin or as part of the main `cargo` codebase.