Correctly rounded floating point `div_euclid`. #134145

tczajka · 2024-12-10T21:30:07Z

rustbot · 2024-12-10T21:30:18Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @workingjubilee (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

@rustbot author: the review is finished, PR author should check the comments and take action accordingly
@rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

tgross35 · 2024-12-10T22:07:42Z

At a surface level, I'm wondering if these would be better in libm so we get the more full test suite against mpfr.

tgross35 · 2024-12-10T23:09:33Z

We already ship libm with the sysroot as part of compiler_builtins, so this would just be via an extern symbol like the other functions that call sys::cmath. I think this is fine for now though, that can just be something to investigate in the future (considering we have some ongoing changes w.r.t. when libm symbols get made available).

tczajka · 2024-12-10T23:14:46Z

We already ship libm with the sysroot as part of compiler_builtins, so this would just be via an extern symbol like the other functions that call sys::cmath. I think this is fine for now though, that can just be something to investigate in the future (considering we have some ongoing changes w.r.t. when libm symbols get made available).

Sorry, I deleted a comment you were replying to that asked "is it OK for std to depend on libm to get this function".

I'm happy to move this to libm if it makes sense, although as I understand it libm exports C libm's API which doesn't have this function, so that may be a bit confusing.

library/std/src/f32/div_euclid.rs

theemathas · 2024-12-11T05:57:46Z

How does this relate to #134062?

traviscross · 2024-12-11T06:37:02Z

How does this relate to #134062?

That PR -- my draft PR -- explores how far one can go without doing a full softfloat implementation of this operation. On the one hand, that means it can be implemented directly on intrinsic floating point operations. On the other, it means that it has a restriction on its codomain (the maximum magnitude of the quotient that can be supported).

This PR from @tczajka takes the softfloat implementation approach. This means that it does not have the codomain restriction, but it has to implement more of the operation in software.

traviscross · 2024-12-11T06:49:00Z

It'd need libs-api acceptance, of course, but my feeling is that, whether with or following on to this fix, that we should expose div_floor, div_ceil, and div_trunc as public methods on the floating point types. Perhaps, or perhaps not, that possibility might influence how we think about code organization here.

library/std/src/f16/div_euclid.rs

tczajka · 2024-12-11T13:20:57Z

I'm now doing some largish refactoring to clean this up a bit, and deal with the (-finite / inf) case correctly as suggested by @quaternic. This will mean that this will be done fully via soft-float including special cases.

`f{16,32,64,128}/soft.rs` are now very similar. f32 and f64 almost identical. `U256` helper moved to `crate::u256`. `(-5f32).div_euclid(f32::INFINITY)` now returns -1 not 0.

tczajka · 2024-12-11T17:48:38Z

OK done refactoring.

tczajka · 2024-12-11T18:00:41Z

It'd need libs-api acceptance, of course, but my feeling is that, whether with or following on to this fix, that we should expose div_floor, div_ceil, and div_trunc as public methods on the floating point types. Perhaps, or perhaps not, that possibility might influence how we think about code organization here.

It's easy to add these using the functions implemented in this PR.

traviscross · 2024-12-12T01:17:46Z

@tczajka: What thoughts do you have on how we could best later go about optimizing this?

That is, we don't necessarily need to do that before merging this, but I'm curious about your thoughts on the avenues we might be able to pursue for optimization on various platforms.

It might also be interesting if you have any preliminary benchmarking data of this PR. Perhaps it could be compared with other libraries capable of supporting this operation. It might also be compared with the approach in #134062 (that approach itself could be optimized further also).

If the performance hit is significant, I wonder whether we should explore using the #134062 method as a fast path and then falling back to the softfloat approach for larger quotients (at the cost of making the operation for those larger quotients somewhat slower as some work would be wasted).

tczajka · 2024-12-12T10:24:04Z

I have added benchmarks for div_euclid.

Before the change (incorrectly rounded):

    f128::div_euclid_large                                             44859.45/iter +/- 1923.27
    f128::div_euclid_medium                                              135.82/iter    +/- 2.15
    f128::div_euclid_small                                                63.94/iter    +/- 0.27
    f16::div_euclid_large                                                 23.89/iter    +/- 1.24
    f16::div_euclid_small                                                 21.83/iter    +/- 1.35
    f32::div_euclid_large                                                 24.49/iter    +/- 0.06
    f32::div_euclid_medium                                                 8.44/iter    +/- 0.19
    f32::div_euclid_small                                                  4.02/iter    +/- 0.01
    f64::div_euclid_large                                                151.79/iter    +/- 0.24
    f64::div_euclid_medium                                                10.56/iter    +/- 0.02
    f64::div_euclid_small                                                  4.16/iter    +/- 0.02

After the change (correctly rounded):

    f128::div_euclid_large                                               565.23/iter     +/- 1.73
    f128::div_euclid_medium                                              414.26/iter     +/- 1.92
    f128::div_euclid_small                                                11.54/iter     +/- 0.01
    f16::div_euclid_large                                                 11.90/iter     +/- 0.45
    f16::div_euclid_small                                                 11.87/iter     +/- 0.42
    f32::div_euclid_large                                                  5.14/iter     +/- 0.36
    f32::div_euclid_medium                                                 4.97/iter     +/- 0.18
    f32::div_euclid_small                                                  4.26/iter     +/- 0.23
    f64::div_euclid_large                                                  8.04/iter     +/- 0.42
    f64::div_euclid_medium                                                 7.19/iter     +/- 0.39
    f64::div_euclid_small                                                  4.75/iter     +/- 0.30

So the corrected implementation actually improves performance significantly in many cases, and there is no significant performance regression.

tczajka · 2024-12-12T12:49:33Z

This PR from @tczajka takes the softfloat implementation approach.

The current (old) implementation is also really soft-float. The operation is not built into hardware. Additionally, it uses % (fmod) internally which is not only not implemented in hardware, but it is also very slow for large numbers.

My implementation is one integer division with some quick adjustments for correct rounding. This is why you see large performance gains.

One thing that has room for optimization is f128. I need 256-bit operations for this, so I use a custom implementation of U256 with a naive division algorithm. Its performance can easily be improved, either by a change in the algorithm, or by using LLVM's built-in i256 type.

traviscross · 2024-12-12T19:39:40Z

Thanks for writing up the careful work in this PR and putting it forward. I'm convinced this approach is the right one.

(I've closed #134062, which was always just kind of an exploration of the possible, in favor of this.)

workingjubilee · 2025-01-05T01:24:14Z

+2,219 −65

oh dear

tczajka · 2025-01-05T14:18:19Z

The code for f32 and f64 is identical, the code for f16 and f128 is a bit different. So I'm open to suggestions for how to best write this with minimal duplication. Macros for f32 and f64?

traviscross · 2025-01-05T16:00:43Z

Maybe (or maybe not) there's some decent way to factor a unified implementation over all four with generics. The behavior differences for f16 and f128 could be conditionalized on some associated constant, e.g.

Probably I have in mind something like this: Playground link

tgross35 · 2025-01-05T23:16:26Z

I still think that a better place for more complex soft float routines is probably in libm to be distributed as part of compiler-builtins - we already have the float traits, u256 (currently in compiler-builtins and doesn't have division, but that can be changed), tests against MPFR, and (soon) exhaustive tests. This isn't technically a math.h routine but we don't need to make it public in libm itself, just for the reexport in builtins.

We don't currently have anything like this though, @Amanieu would there be any concerns here?

bors · 2025-01-24T05:22:57Z

☔ The latest upstream changes (presumably #135947) made this pull request unmergeable. Please resolve the merge conflicts.

workingjubilee

Finally getting around to this. I have some documentation-related comments for a first pass since with that concern out of the way the rest of this is much less, relatively speaking, to review.

workingjubilee · 2025-01-31T20:45:55Z

library/std/src/f128.rs

@@ -235,10 +237,14 @@ impl f128 {

    /// Calculates Euclidean division, the matching method for `rem_euclid`.
    ///
-    /// This computes the integer `n` such that
+    /// In infinite precision this computes the integer `n` such that


Usually I see this arranged as "This computes the operation on the integer n with infinite precision"

It's a nit, but I'd say the calculation is done "in infinite precision" rather than "with" it, just as I might say that a calculation is done "in a finite field".

Of course, if we want, this choice could be avoided by phrasing it differently. E.g., IEEE 754 tends to say things like "...computes the infinitely precise product x × y..." or that "every operation shall be performed as if it first produced an intermediate result correct to infinite precision and with unbounded range".

Those are fine too. It's honestly at least partly that "In infinite precision" feels like an odd way to start the sentence. It places it next to the subject ("this", the function) when it should be attached either to the object or the verb.

workingjubilee · 2025-01-31T20:48:16Z

library/std/src/f128.rs

+    /// However, due to a floating point round-off error the result can be rounded if
+    /// `self` is much larger in magnitude than `rhs`, such that `n` can't be represented
+    /// exactly.


This makes the "with infinite precision" sound incorrect. It is not a random "round-off error", the rounding is a consequence of fitting the result into the floating-point value's representation. It is questionable to call it an "error" at all, since it is a deliberate choice in how floats work. Please be clear when identifying such. We can afford a few words.

workingjubilee · 2025-01-31T20:48:48Z

library/std/src/f128.rs

+    /// In infinite precision the return value `r` satisfies `0 <= r < rhs.abs()`.
+    ///
+    /// However, due to a floating point round-off error the result can round to


same comments

workingjubilee · 2025-01-31T20:51:56Z

library/std/src/f128.rs

    ///
    /// # Precision
    ///
    /// The result of this operation is guaranteed to be the rounded
    /// infinite-precision result.
    ///
+    /// The result is precise when `self >= 0.0`.


Floating-point operations are always precise (in the sense that they are always precisely conforming to floating-point semantics), they are not always accurate, see https://en.wikipedia.org/wiki/Accuracy_and_precision for more on the distinction I am drawing.

Suggested change

/// The result is precise when `self >= 0.0`.

/// The result is exactly accurate when `self >= 0.0`.

We could also say within 0 ULP, I suppose?

Probably I'd say the results of floating-point operations are always accurate to a given degree of precision.

We could side-step this and say e.g. that the "computed result is the same as the infinitely precise result when..." or "the infinitely precise result is exactly representable when...".

The "exactly representable" form sounds good.

workingjubilee · 2025-01-31T20:53:30Z

library/std/src/f16.rs

The documentation-related comments I have on f128.rs apply here.

workingjubilee · 2025-01-31T20:53:44Z

library/std/src/f32.rs

The documentation-related comments I have on f128.rs apply here.

workingjubilee · 2025-01-31T20:54:23Z

library/std/src/f64.rs

The documentation-related comments I have on f128.rs apply here.

workingjubilee

Another partial review, this time of the soft-float tests. Their current state points to a need for slightly more documentation and perhaps a slightly different organization of the code proper. The way things currently are, I do not quite agree with any assessment that sounds like "well, you only need to review one type's implementation", since I need to make sure that there isn't some important deviation.

workingjubilee · 2025-02-07T21:19:01Z

library/std/src/f128/soft/tests.rs

+        abs: Positive::Finite(PositiveFinite { exp: -16494, mantissa: 1 << 112 }),
+    });
+
+    assert_eq!(Representation::new(f128::MIN_POSITIVE / 2.0), Representation {
+        sign: Sign::Positive,
+        abs: Positive::Finite(PositiveFinite { exp: -16495, mantissa: 1 << 112 }),


This 1 << 112 is a meaningful magic number, and should be derivable from the existing constants (SIGNIFICAND_BITS and so on).

Can we lift this into a const MANTISSA_TOP_BIT (or whatever) that computes it? If you want to also assert_eq! that const against 1 << 112 still, that's fine too.

workingjubilee · 2025-02-07T21:20:57Z

library/std/src/f128/soft/tests.rs

+
+    assert_eq!(Representation::new(f128::MAX), Representation {
+        sign: Sign::Positive,
+        abs: Positive::Finite(PositiveFinite { exp: 16271, mantissa: (1 << 113) - 1 }),


likewise const MANTISSA_MAX

workingjubilee · 2025-02-07T21:29:31Z

library/std/src/f32/soft/tests.rs

+        abs: Positive::Finite(PositiveFinite { exp: -149, mantissa: 0x800000 }),
+    });
+
+    assert_eq!(Representation::new(f32::MIN_POSITIVE / 2.0), Representation {
+        sign: Sign::Positive,
+        abs: Positive::Finite(PositiveFinite { exp: -150, mantissa: 0x800000 }),
+    });
+
+    assert_eq!(Representation::new(f32::MAX), Representation {
+        sign: Sign::Positive,
+        abs: Positive::Finite(PositiveFinite { exp: 104, mantissa: 0xffffff }),


This is a different way to write these mantissa values than is done elsewhere...?

workingjubilee · 2025-02-07T21:35:04Z

library/std/src/f128/soft.rs

+}
+
+/// Euclidean division.
+#[allow(dead_code)]


Why this allow...?

workingjubilee · 2025-02-07T21:47:07Z

library/std/src/f64/soft/tests.rs

+
+    assert_eq!(Representation::new(f64::MIN_POSITIVE), Representation {
+        sign: Sign::Positive,
+        abs: Positive::Finite(PositiveFinite { exp: -1074, mantissa: 1 << 52 }),


...this exponent is a fair bit away from -1021, so I am going to have to go over a number of things very carefully, and I am not immediately seeing an explanation of the encoding choice in this module. I'd like to see a comment somewhere (perhaps on the struct, or on the associated constants, or this test?), that explains why what I am seeing here is Actually Correct.

workingjubilee · 2025-02-07T21:55:54Z

library/std/src/f64/soft.rs

+/// Represents an `FP` number.
+#[derive(Copy, Clone, Debug, Eq, PartialEq)]
+struct Representation {
+    /// Sign.
+    sign: Sign,
+    /// Absolute value.
+    abs: Positive,
+}
+
+#[derive(Copy, Clone, Debug, Eq, PartialEq)]
+enum Sign {
+    Positive,
+    Negative,
+}
+
+/// Represents a positive number.
+#[derive(Copy, Clone, Debug, Eq, PartialEq)]
+enum Positive {
+    Zero,
+    Finite(PositiveFinite),
+    Infinity,
+    NaN,
+}
+
+/// Represents a non-zero, positive finite number.
+#[derive(Copy, Clone, Debug, Eq, PartialEq)]
+struct PositiveFinite {
+    /// `number = 2^exp * mantissa`.
+    exp: Exponent,
+    /// `2^SIGNIFICAND_BITS <= mantissa < 2^(SIGNIFICAND_BITS + 1)`
+    mantissa: Bits,
+}


This PR winds up repeating these types a number of times... can it just be made generic? If we need to have separate impl blocks for each concrete combination of params, so be it, but we could at least have a unified declaration in an adjacent pub(crate) mod float;

workingjubilee · 2025-02-07T21:59:53Z

@tczajka If you can make the soft-float impl as generic as possible given the current state of things then I think we can very quickly actually finish this out.

tgross35 · 2025-02-07T22:40:39Z

@tczajka If you can make the soft-float impl as generic as possible given the current state of things then I think we can very quickly actually finish this out.

Pls can we consider libm rather than adding float traits in yet another place

workingjubilee · 2025-02-07T22:52:37Z

"generics" is not necessarily "trait".
libm will not be able to have const traits until those are completely finished.
as already discussed, this PR would probably benefit if it had access to the powers of the compiler (and the compiler was augmented).
just because libs-api feels compelled to ask "what's the use-case?" over "let's make {numeric_types} generic!" doesn't mean there is not one, and it is frankly absurd that it occurs outside the standard library.

workingjubilee · 2025-02-07T23:01:18Z

...apologies, just...

@tgross35 I am curious to know why, to you, "adding float traits in yet another place" is a sign that this PR should be against another repo, instead of fixing the fact that we have "float traits" everywhere except in the core libraries?

It cannot be "because it would be a miserable pile of bikeshedding". We're just inflicting a miserable pile of bikeshedding on everyone every time they have to reimpl their own pile of float traits. Skipping numeric traits in std made sense in 2015 but it doesn't really make the same sense in 2025.

workingjubilee · 2025-02-07T23:22:44Z

It also does not actually seem to help us to have three different repositories one must consult to verify that a floating point implementation that we use is correct, either.

tgross35 · 2025-02-08T00:17:22Z

@tgross35 I am curious to know why, to you, "adding float traits in yet another place" is a sign that this PR should be against another repo, instead of fixing the fact that we have "float traits" everywhere except in the core libraries?

I don't disagree that it would be very nice to have the traits in core. Keeping traits in one place was just one motivation in response to generics. The reasoning is deeper though: libm does have the traits (for floats, integers, doubling width, etc) but it also has a reasonably thorough test suite (tests against MPFR with edge cases, fuzzing, extensive/exhaustive tests), icount benchmarks, u256, and a handful of other code reuse things. Not necessarily that this routine would add that much more, imo we just have a better home for softfloat :)

as already discussed, this PR would probably benefit if it had access to the powers of the compiler (and the compiler was augmented).

I'm not seeing this, got a link?

and libm will not be able to have const traits until those are completely finished.

We do have a mechanism for using nightly features with a fallback. But we may not need to make these const for them to be const if they can become intrinsics (from some discussion with const-eval I'm planning to use the libm generic methods to add an implementation over rustc_apfloat types, which would allow us to just use libm in CTFE).

It also does not actually seem to help us to have three different repositories one must consult to verify that a floating point implementation that we use is correct, either.

Yeah, it's not ideal. I would like to merge libm into compiler-builtins but that has unfortunately been blocked for months on #135501 (well, just making it a subtree isn't, but some of the restructuring I would like to do afterward is). After that we could probably bump to a subtree here, TBD.

workingjubilee · 2025-02-08T00:24:32Z

as already discussed, this PR would probably benefit if it had access to the powers of the compiler (and the compiler was augmented).

I'm not seeing this, got a link?

I mean the mention of using a u256 provided directly by the compiler: there's nothing technically infeasible about us doing so, given that LLVM supports it.

workingjubilee · 2025-02-08T00:33:07Z

Hm. I am not sure some of those tests should not be in libcore, but I do not expect us to add a probably-randomization-driven test suite to our already flakey CI, for reasons previously discussed, so if that is what is ideal then moving this over there is fine.

Just as long as it's not about our 99 float traits lying around.

Dylan-DPC · 2025-03-06T19:21:21Z

@tczajka any updates on this? thanks

tczajka · 2025-03-06T21:49:47Z

Sorry I've been busy with other things, I'm planning to get back to the review within ~2 weeks.

Correctly rounded floating point div_euclid.

3deff84

Fixes rust-lang#107904.

rustbot assigned workingjubilee Dec 10, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Dec 10, 2024

quaternic reviewed Dec 11, 2024

View reviewed changes

library/std/src/f32/div_euclid.rs Outdated Show resolved Hide resolved

traviscross reviewed Dec 11, 2024

View reviewed changes

library/std/src/f16/div_euclid.rs Outdated Show resolved Hide resolved

traviscross reviewed Dec 11, 2024

View reviewed changes

library/std/src/f16/div_euclid.rs Outdated Show resolved Hide resolved

Fix overflow bug in f16::div_ceil

4534005

Software div_euclid refactoring.

cedc718

`f{16,32,64,128}/soft.rs` are now very similar. f32 and f64 almost identical. `U256` helper moved to `crate::u256`. `(-5f32).div_euclid(f32::INFINITY)` now returns -1 not 0.

rustbot added has-merge-commits PR has merge commits, merge with caution. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 12, 2024

This comment has been minimized.

Sign in to view

div_euclid benchmarks

009cf4b

tczajka force-pushed the exact_div_euclid branch from 3bef9eb to 009cf4b Compare December 12, 2024 10:19

rustbot removed has-merge-commits PR has merge commits, merge with caution. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 12, 2024

div_euclid: add a fast path for overflow

d0f8bae

traviscross mentioned this pull request Dec 12, 2024

Draft: Fix {f16,f32,f64,f128}::div_euclid #134062

Closed

jieyouxu added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Dec 19, 2024

workingjubilee self-requested a review January 5, 2025 01:27

workingjubilee requested changes Jan 31, 2025

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 31, 2025

workingjubilee requested changes Feb 7, 2025

View reviewed changes

	/// The result is precise when `self >= 0.0`.
	/// The result is exactly accurate when `self >= 0.0`.

Correctly rounded floating point div_euclid. #134145

Are you sure you want to change the base?

Correctly rounded floating point div_euclid. #134145

Conversation

tczajka commented Dec 10, 2024

rustbot commented Dec 10, 2024

tgross35 commented Dec 10, 2024

tgross35 commented Dec 10, 2024 • edited Loading

tczajka commented Dec 10, 2024

theemathas commented Dec 11, 2024

traviscross commented Dec 11, 2024

traviscross commented Dec 11, 2024

tczajka commented Dec 11, 2024

tczajka commented Dec 11, 2024

tczajka commented Dec 11, 2024

traviscross commented Dec 12, 2024 • edited Loading

This comment has been minimized.

tczajka commented Dec 12, 2024

tczajka commented Dec 12, 2024

traviscross commented Dec 12, 2024 • edited Loading

workingjubilee commented Jan 5, 2025

tczajka commented Jan 5, 2025

traviscross commented Jan 5, 2025 • edited Loading

tgross35 commented Jan 5, 2025

bors commented Jan 24, 2025

workingjubilee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

traviscross Feb 1, 2025 • edited Loading

Choose a reason for hiding this comment

workingjubilee Feb 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

traviscross Feb 1, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

workingjubilee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

workingjubilee commented Feb 7, 2025

tgross35 commented Feb 7, 2025

workingjubilee commented Feb 7, 2025 • edited Loading

workingjubilee commented Feb 7, 2025 • edited Loading

workingjubilee commented Feb 7, 2025

tgross35 commented Feb 8, 2025

workingjubilee commented Feb 8, 2025

workingjubilee commented Feb 8, 2025

Dylan-DPC commented Mar 6, 2025

tczajka commented Mar 6, 2025

Correctly rounded floating point `div_euclid`. #134145

Correctly rounded floating point `div_euclid`. #134145

tgross35 commented Dec 10, 2024 •

edited

Loading

traviscross commented Dec 12, 2024 •

edited

Loading

traviscross commented Dec 12, 2024 •

edited

Loading

traviscross commented Jan 5, 2025 •

edited

Loading

traviscross Feb 1, 2025 •

edited

Loading

workingjubilee Feb 3, 2025 •

edited

Loading

traviscross Feb 1, 2025 •

edited

Loading

workingjubilee commented Feb 7, 2025 •

edited

Loading

workingjubilee commented Feb 7, 2025 •

edited

Loading