-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly rounded floating point div_euclid
.
#134145
base: master
Are you sure you want to change the base?
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @workingjubilee (or someone else) some time within the next two weeks. Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (
|
At a surface level, I'm wondering if these would be better in libm so we get the more full test suite against mpfr. |
We already ship |
Sorry, I deleted a comment you were replying to that asked "is it OK for I'm happy to move this to |
How does this relate to #134062? |
That PR -- my draft PR -- explores how far one can go without doing a full softfloat implementation of this operation. On the one hand, that means it can be implemented directly on intrinsic floating point operations. On the other, it means that it has a restriction on its codomain (the maximum magnitude of the quotient that can be supported). This PR from @tczajka takes the softfloat implementation approach. This means that it does not have the codomain restriction, but it has to implement more of the operation in software. |
It'd need libs-api acceptance, of course, but my feeling is that, whether with or following on to this fix, that we should expose |
I'm now doing some largish refactoring to clean this up a bit, and deal with the |
`f{16,32,64,128}/soft.rs` are now very similar. f32 and f64 almost identical. `U256` helper moved to `crate::u256`. `(-5f32).div_euclid(f32::INFINITY)` now returns -1 not 0.
OK done refactoring. |
It's easy to add these using the functions implemented in this PR. |
@tczajka: What thoughts do you have on how we could best later go about optimizing this? That is, we don't necessarily need to do that before merging this, but I'm curious about your thoughts on the avenues we might be able to pursue for optimization on various platforms. It might also be interesting if you have any preliminary benchmarking data of this PR. Perhaps it could be compared with other libraries capable of supporting this operation. It might also be compared with the approach in #134062 (that approach itself could be optimized further also). If the performance hit is significant, I wonder whether we should explore using the #134062 method as a fast path and then falling back to the softfloat approach for larger quotients (at the cost of making the operation for those larger quotients somewhat slower as some work would be wasted). |
This comment has been minimized.
This comment has been minimized.
3bef9eb
to
009cf4b
Compare
I have added benchmarks for Before the change (incorrectly rounded):
After the change (correctly rounded):
So the corrected implementation actually improves performance significantly in many cases, and there is no significant performance regression. |
The current (old) implementation is also really soft-float. The operation is not built into hardware. Additionally, it uses My implementation is one integer division with some quick adjustments for correct rounding. This is why you see large performance gains. One thing that has room for optimization is |
Thanks for writing up the careful work in this PR and putting it forward. I'm convinced this approach is the right one. (I've closed #134062, which was always just kind of an exploration of the possible, in favor of this.) |
oh dear |
The code for |
Maybe (or maybe not) there's some decent way to factor a unified implementation over all four with generics. The behavior differences for Probably I have in mind something like this: Playground link |
I still think that a better place for more complex soft float routines is probably in We don't currently have anything like this though, @Amanieu would there be any concerns here? |
☔ The latest upstream changes (presumably #135947) made this pull request unmergeable. Please resolve the merge conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally getting around to this. I have some documentation-related comments for a first pass since with that concern out of the way the rest of this is much less, relatively speaking, to review.
@@ -235,10 +237,14 @@ impl f128 { | |||
|
|||
/// Calculates Euclidean division, the matching method for `rem_euclid`. | |||
/// | |||
/// This computes the integer `n` such that | |||
/// In infinite precision this computes the integer `n` such that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually I see this arranged as "This computes the operation on the integer n
with infinite precision"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a nit, but I'd say the calculation is done "in infinite precision" rather than "with" it, just as I might say that a calculation is done "in a finite field".
Of course, if we want, this choice could be avoided by phrasing it differently. E.g., IEEE 754 tends to say things like "...computes the infinitely precise product x × y..." or that "every operation shall be performed as if it first produced an intermediate result correct to infinite precision and with unbounded range".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are fine too. It's honestly at least partly that "In infinite precision" feels like an odd way to start the sentence. It places it next to the subject ("this", the function) when it should be attached either to the object or the verb.
/// However, due to a floating point round-off error the result can be rounded if | ||
/// `self` is much larger in magnitude than `rhs`, such that `n` can't be represented | ||
/// exactly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the "with infinite precision" sound incorrect. It is not a random "round-off error", the rounding is a consequence of fitting the result into the floating-point value's representation. It is questionable to call it an "error" at all, since it is a deliberate choice in how floats work. Please be clear when identifying such. We can afford a few words.
/// In infinite precision the return value `r` satisfies `0 <= r < rhs.abs()`. | ||
/// | ||
/// However, due to a floating point round-off error the result can round to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments
/// | ||
/// # Precision | ||
/// | ||
/// The result of this operation is guaranteed to be the rounded | ||
/// infinite-precision result. | ||
/// | ||
/// The result is precise when `self >= 0.0`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Floating-point operations are always precise (in the sense that they are always precisely conforming to floating-point semantics), they are not always accurate, see https://en.wikipedia.org/wiki/Accuracy_and_precision for more on the distinction I am drawing.
/// The result is precise when `self >= 0.0`. | |
/// The result is exactly accurate when `self >= 0.0`. |
We could also say within 0 ULP, I suppose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably I'd say the results of floating-point operations are always accurate to a given degree of precision.
We could side-step this and say e.g. that the "computed result is the same as the infinitely precise result when..." or "the infinitely precise result is exactly representable when...".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "exactly representable" form sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation-related comments I have on f128.rs apply here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation-related comments I have on f128.rs apply here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation-related comments I have on f128.rs apply here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another partial review, this time of the soft-float tests. Their current state points to a need for slightly more documentation and perhaps a slightly different organization of the code proper. The way things currently are, I do not quite agree with any assessment that sounds like "well, you only need to review one type's implementation", since I need to make sure that there isn't some important deviation.
abs: Positive::Finite(PositiveFinite { exp: -16494, mantissa: 1 << 112 }), | ||
}); | ||
|
||
assert_eq!(Representation::new(f128::MIN_POSITIVE / 2.0), Representation { | ||
sign: Sign::Positive, | ||
abs: Positive::Finite(PositiveFinite { exp: -16495, mantissa: 1 << 112 }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 1 << 112
is a meaningful magic number, and should be derivable from the existing constants (SIGNIFICAND_BITS
and so on).
Can we lift this into a const MANTISSA_TOP_BIT
(or whatever) that computes it? If you want to also assert_eq!
that const against 1 << 112
still, that's fine too.
|
||
assert_eq!(Representation::new(f128::MAX), Representation { | ||
sign: Sign::Positive, | ||
abs: Positive::Finite(PositiveFinite { exp: 16271, mantissa: (1 << 113) - 1 }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise const MANTISSA_MAX
abs: Positive::Finite(PositiveFinite { exp: -149, mantissa: 0x800000 }), | ||
}); | ||
|
||
assert_eq!(Representation::new(f32::MIN_POSITIVE / 2.0), Representation { | ||
sign: Sign::Positive, | ||
abs: Positive::Finite(PositiveFinite { exp: -150, mantissa: 0x800000 }), | ||
}); | ||
|
||
assert_eq!(Representation::new(f32::MAX), Representation { | ||
sign: Sign::Positive, | ||
abs: Positive::Finite(PositiveFinite { exp: 104, mantissa: 0xffffff }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a different way to write these mantissa values than is done elsewhere...?
} | ||
|
||
/// Euclidean division. | ||
#[allow(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this allow
...?
|
||
assert_eq!(Representation::new(f64::MIN_POSITIVE), Representation { | ||
sign: Sign::Positive, | ||
abs: Positive::Finite(PositiveFinite { exp: -1074, mantissa: 1 << 52 }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...this exponent is a fair bit away from -1021
, so I am going to have to go over a number of things very carefully, and I am not immediately seeing an explanation of the encoding choice in this module. I'd like to see a comment somewhere (perhaps on the struct, or on the associated constants, or this test?), that explains why what I am seeing here is Actually Correct.
/// Represents an `FP` number. | ||
#[derive(Copy, Clone, Debug, Eq, PartialEq)] | ||
struct Representation { | ||
/// Sign. | ||
sign: Sign, | ||
/// Absolute value. | ||
abs: Positive, | ||
} | ||
|
||
#[derive(Copy, Clone, Debug, Eq, PartialEq)] | ||
enum Sign { | ||
Positive, | ||
Negative, | ||
} | ||
|
||
/// Represents a positive number. | ||
#[derive(Copy, Clone, Debug, Eq, PartialEq)] | ||
enum Positive { | ||
Zero, | ||
Finite(PositiveFinite), | ||
Infinity, | ||
NaN, | ||
} | ||
|
||
/// Represents a non-zero, positive finite number. | ||
#[derive(Copy, Clone, Debug, Eq, PartialEq)] | ||
struct PositiveFinite { | ||
/// `number = 2^exp * mantissa`. | ||
exp: Exponent, | ||
/// `2^SIGNIFICAND_BITS <= mantissa < 2^(SIGNIFICAND_BITS + 1)` | ||
mantissa: Bits, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR winds up repeating these types a number of times... can it just be made generic? If we need to have separate impl
blocks for each concrete combination of params, so be it, but we could at least have a unified declaration in an adjacent pub(crate) mod float;
@tczajka If you can make the soft-float impl as generic as possible given the current state of things then I think we can very quickly actually finish this out. |
Pls can we consider |
|
...apologies, just... @tgross35 I am curious to know why, to you, "adding float traits in yet another place" is a sign that this PR should be against another repo, instead of fixing the fact that we have "float traits" everywhere except in the core libraries? It cannot be "because it would be a miserable pile of bikeshedding". We're just inflicting a miserable pile of bikeshedding on everyone every time they have to reimpl their own pile of float traits. Skipping numeric traits in std made sense in 2015 but it doesn't really make the same sense in 2025. |
It also does not actually seem to help us to have three different repositories one must consult to verify that a floating point implementation that we use is correct, either. |
I don't disagree that it would be very nice to have the traits in
I'm not seeing this, got a link?
We do have a mechanism for using nightly features with a fallback. But we may not need to make these const for them to be const if they can become intrinsics (from some discussion with const-eval I'm planning to use the
Yeah, it's not ideal. I would like to merge |
I mean the mention of using a |
Hm. I am not sure some of those tests should not be in libcore, but I do not expect us to add a probably-randomization-driven test suite to our already flakey CI, for reasons previously discussed, so if that is what is ideal then moving this over there is fine. Just as long as it's not about our 99 float traits lying around. |
@tczajka any updates on this? thanks |
Sorry I've been busy with other things, I'm planning to get back to the review within ~2 weeks. |
Fixes #107904.