Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer requirements with upper bounds #13273

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

notatallshaw
Copy link
Member

@notatallshaw notatallshaw commented Mar 8, 2025

Fixes: #12993
Fixes: #12990

This is the last PR related to get_preference I am making this release cycle, this one provides significant performance improvements for some pathological resolutions, and I don't have any examples where it makes the resolution worse.

There are a number of intuitive ways to think about why this would help find a resolution faster, but the one I most like is operators that do not allow versions after a certain version limit the possible versions to some finite set (ignoring that technically a package could never bump their major/minor version number) where as other operators still allow for unbounded future versions to be selected.

Checking against the scenarios in Pip-Resolution-Scenarios-and-Benchmarks, there are two problematic scenarios that go from failing to succeeding compared to master branch:

Difference for scenario scenarios/problematic.toml - autogluon:
    	Success: False -> True
    	Failure Reason: Build Failure -> None
    	Distinct Sdists visisted: 1 -> 2 (200.00%)
    	Distinct Wheels visisted: 162 -> 259 (159.88%)
    	Total visisted packages: 2398 -> 1254 (52.29%)
    	Total visisted requirements: 23945 -> 13006 (54.32%)
    	Total rejected requirements: 12791 -> 4101 (32.06%)
    	Total pinned packages: 729 -> 536 (73.53%)
    	Total rounds: 1283 -> 768 (59.86%)

Difference for scenario scenarios/problematic.toml - sphinx:
    	Success: False -> True
    	Failure Reason: Resolution Too Deep -> None
    	Distinct Sdists visisted: 1 -> 0 (0.00%)
    	Distinct Wheels visisted: 124 -> 79 (63.71%)
    	Total visisted packages: 97707 -> 80 (0.08%)
    	Total visisted requirements: 225360 -> 225 (0.10%)
    	Total rejected requirements: 137840 -> 6 (0.00%)
    	Total pinned packages: 8113 -> 82 (1.01%)
    	Total rounds: 15005 -> 82 (0.55%)

Note, for the latter scenario I have the benchmark tool stop at ~15k rounds, where as normally pip won't stop till 200k rounds.

@notatallshaw notatallshaw force-pushed the prefer-requirements-with-upper-bounds branch from 2d2ad3c to de28f96 Compare March 8, 2025 20:31
@notatallshaw notatallshaw added this to the 25.1 milestone Mar 8, 2025
@pfmoore
Copy link
Member

pfmoore commented Mar 8, 2025

My only concern with this is that it could result in people thinking that upper bounds are good, whereas it's generally acknowledged that they are a bad idea in practice. I can imagine people seeing this change and thinking that it means that if they add upper version caps to their requirements, they will get faster resolution times "magically".

While the performance numbers can't lie, I would be much happier if we could give a clear explanation of why and when this improves resolution times, in a way that doesn't make it appear that you'll always be better with an upper cap. In particular, the intuition you suggest makes no sense to me - upper version limits don't change an infinite (or even large) set of possibilities into a finite one, the set of possibilities is always finite.

I think this is more a matter of how we present the change than a problem with the actual logic, though. If I understand correctly, the real saving is due to the fact that, because pip always tries the latest version available first, requirements that guarantee that one or more of the newest versions will not work allow us to skip those newer versions sooner, so we should prefer them over requirements that don't. The key is that an upper cap is only useful to us if it actually excludes new versions, and therefore it's an explicit "the project is known not to work with recent versions of this dependency" rather than the pre-emptive "we've only tested with dependency versions up to X" cap that's called out as problematic in the article I linked.

I don't know how we get that much nuance1 in a release note, though. Maybe we'd be best simply not explaining in the news item - just saying something like "Improve the heuristics for determining the order for resolving dependencies", and adding a short explanatory note ("because this allows us to skip guaranteed-incompatible later versions sooner" or similar) to the docstring and documentation.

Footnotes

  1. About something that's frankly an obscure internal detail.

@notatallshaw
Copy link
Member Author

@pfmoore I would say your whole comment is on point and I agree with your concern. In fact it's exactly because adding upper bound requirements makes resolution hard that it's useful to consider them before other requirements, so I definitely don't want to encourage a proliferation of them.

I will rethink the documentation, explaining more of the nuance, and the news items, simplifying it to not go into the nuance.

@notatallshaw
Copy link
Member Author

notatallshaw commented Mar 8, 2025

In particular, the intuition you suggest makes no sense to me - upper version limits don't change an infinite (or even large) set of possibilities into a finite one, the set of possibilities is always finite.

It's not really important, but the thought process goes like this:

Let's say you have versions are released in a strictly monotonically increasing fashion (not always true but often true), e.g. 1.0, 1.1, 1.2, 1.3, ..., 1.99, and there is a requirement <1.100, then in general that upper bound will limit the number of possibilities to 100 versions for the rest of time, but an unbounded number of versions can be released after this, e.g. 1.100, ..., 1.10000, ..., 2.0, 2.1, ..., etc.

Another, related way, of thinking about it is that upper bounding make statements about the future of a package, which usually is not known for sure, and is therefore can be problematic, but lower bounding requirements say something about the past which can be known, so tend not to be problematic, and resolving problematic requirements first is usually better.

Your intuitive explanation is probably a better one, and I'll update to reflect.

@potiuk
Copy link
Contributor

potiuk commented Mar 9, 2025

@pfmoore I would say your whole comment is on point and I agree with your concern. In fact it's exactly because adding upper bound requirements makes resolution hard that it's useful to consider them before other requirements, so I definitely don't want to encourage a proliferation of them.

100%. And thanks for the article link @pfmoore - I have not seen it before, but this is pure gold.

@notatallshaw
Copy link
Member Author

I've updated docs and news entry based on feedback.

I'm limiting this PR to just touching the get_preference docs here, but I do think I it's worthwhile in a seperate PR to take a look over the dependency resolution docs in general. I'm planning to raise another PR soon that makes ResolutionTooDeep a diagnostic error, as part of that pointing users to the resolution docs, so I will reread over the docs and see what can be improved.

@notatallshaw notatallshaw force-pushed the prefer-requirements-with-upper-bounds branch from a5bc3d2 to 1e68d9c Compare March 9, 2025 20:57
Copy link
Member

@ichard26 ichard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm far from being knowledgeable on dependency resolution, I did spend some time reading the pip documentation on dependency resolution. This makes a lot more sense. LGTM (but this isn't a formal approval).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants