Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly rounded floating point div_euclid. #134145

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

tczajka
Copy link

@tczajka tczajka commented Dec 10, 2024

Fixes #107904.

@rustbot
Copy link
Collaborator

rustbot commented Dec 10, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @workingjubilee (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Dec 10, 2024
@tgross35
Copy link
Contributor

At a surface level, I'm wondering if these would be better in libm so we get the more full test suite against mpfr.

@tgross35
Copy link
Contributor

tgross35 commented Dec 10, 2024

We already ship libm with the sysroot as part of compiler_builtins, so this would just be via an extern symbol like the other functions that call sys::cmath. I think this is fine for now though, that can just be something to investigate in the future (considering we have some ongoing changes w.r.t. when libm symbols get made available).

@tczajka
Copy link
Author

tczajka commented Dec 10, 2024

We already ship libm with the sysroot as part of compiler_builtins, so this would just be via an extern symbol like the other functions that call sys::cmath. I think this is fine for now though, that can just be something to investigate in the future (considering we have some ongoing changes w.r.t. when libm symbols get made available).

Sorry, I deleted a comment you were replying to that asked "is it OK for std to depend on libm to get this function".

I'm happy to move this to libm if it makes sense, although as I understand it libm exports C libm's API which doesn't have this function, so that may be a bit confusing.

@theemathas
Copy link
Contributor

How does this relate to #134062?

@traviscross
Copy link
Contributor

How does this relate to #134062?

That PR -- my draft PR -- explores how far one can go without doing a full softfloat implementation of this operation. On the one hand, that means it can be implemented directly on intrinsic floating point operations. On the other, it means that it has a restriction on its codomain (the maximum magnitude of the quotient that can be supported).

This PR from @tczajka takes the softfloat implementation approach. This means that it does not have the codomain restriction, but it has to implement more of the operation in software.

@traviscross
Copy link
Contributor

It'd need libs-api acceptance, of course, but my feeling is that, whether with or following on to this fix, that we should expose div_floor, div_ceil, and div_trunc as public methods on the floating point types. Perhaps, or perhaps not, that possibility might influence how we think about code organization here.

@tczajka
Copy link
Author

tczajka commented Dec 11, 2024

I'm now doing some largish refactoring to clean this up a bit, and deal with the (-finite / inf) case correctly as suggested by @quaternic. This will mean that this will be done fully via soft-float including special cases.

`f{16,32,64,128}/soft.rs` are now very similar. f32 and f64 almost
identical.

`U256` helper moved to `crate::u256`.

`(-5f32).div_euclid(f32::INFINITY)` now returns -1 not 0.
@tczajka
Copy link
Author

tczajka commented Dec 11, 2024

OK done refactoring.

@tczajka
Copy link
Author

tczajka commented Dec 11, 2024

It'd need libs-api acceptance, of course, but my feeling is that, whether with or following on to this fix, that we should expose div_floor, div_ceil, and div_trunc as public methods on the floating point types. Perhaps, or perhaps not, that possibility might influence how we think about code organization here.

It's easy to add these using the functions implemented in this PR.

@traviscross
Copy link
Contributor

traviscross commented Dec 12, 2024

@tczajka: What thoughts do you have on how we could best later go about optimizing this?

That is, we don't necessarily need to do that before merging this, but I'm curious about your thoughts on the avenues we might be able to pursue for optimization on various platforms.

It might also be interesting if you have any preliminary benchmarking data of this PR. Perhaps it could be compared with other libraries capable of supporting this operation. It might also be compared with the approach in #134062 (that approach itself could be optimized further also).

If the performance hit is significant, I wonder whether we should explore using the #134062 method as a fast path and then falling back to the softfloat approach for larger quotients (at the cost of making the operation for those larger quotients somewhat slower as some work would be wasted).

@rustbot rustbot added has-merge-commits PR has merge commits, merge with caution. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 12, 2024
@rustbot

This comment has been minimized.

@rustbot rustbot removed has-merge-commits PR has merge commits, merge with caution. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 12, 2024
@tczajka
Copy link
Author

tczajka commented Dec 12, 2024

I have added benchmarks for div_euclid.

Before the change (incorrectly rounded):

    f128::div_euclid_large                                             44859.45/iter +/- 1923.27
    f128::div_euclid_medium                                              135.82/iter    +/- 2.15
    f128::div_euclid_small                                                63.94/iter    +/- 0.27
    f16::div_euclid_large                                                 23.89/iter    +/- 1.24
    f16::div_euclid_small                                                 21.83/iter    +/- 1.35
    f32::div_euclid_large                                                 24.49/iter    +/- 0.06
    f32::div_euclid_medium                                                 8.44/iter    +/- 0.19
    f32::div_euclid_small                                                  4.02/iter    +/- 0.01
    f64::div_euclid_large                                                151.79/iter    +/- 0.24
    f64::div_euclid_medium                                                10.56/iter    +/- 0.02
    f64::div_euclid_small                                                  4.16/iter    +/- 0.02

After the change (correctly rounded):

    f128::div_euclid_large                                               565.23/iter     +/- 1.73
    f128::div_euclid_medium                                              414.26/iter     +/- 1.92
    f128::div_euclid_small                                                11.54/iter     +/- 0.01
    f16::div_euclid_large                                                 11.90/iter     +/- 0.45
    f16::div_euclid_small                                                 11.87/iter     +/- 0.42
    f32::div_euclid_large                                                  5.14/iter     +/- 0.36
    f32::div_euclid_medium                                                 4.97/iter     +/- 0.18
    f32::div_euclid_small                                                  4.26/iter     +/- 0.23
    f64::div_euclid_large                                                  8.04/iter     +/- 0.42
    f64::div_euclid_medium                                                 7.19/iter     +/- 0.39
    f64::div_euclid_small                                                  4.75/iter     +/- 0.30

So the corrected implementation actually improves performance significantly in many cases, and there is no significant performance regression.

@tczajka
Copy link
Author

tczajka commented Dec 12, 2024

This PR from @tczajka takes the softfloat implementation approach.

The current (old) implementation is also really soft-float. The operation is not built into hardware. Additionally, it uses % (fmod) internally which is not only not implemented in hardware, but it is also very slow for large numbers.

My implementation is one integer division with some quick adjustments for correct rounding. This is why you see large performance gains.

One thing that has room for optimization is f128. I need 256-bit operations for this, so I use a custom implementation of U256 with a naive division algorithm. Its performance can easily be improved, either by a change in the algorithm, or by using LLVM's built-in i256 type.

@traviscross
Copy link
Contributor

traviscross commented Dec 12, 2024

Thanks for writing up the careful work in this PR and putting it forward. I'm convinced this approach is the right one.

(I've closed #134062, which was always just kind of an exploration of the possible, in favor of this.)

@jieyouxu jieyouxu added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Dec 19, 2024
@workingjubilee
Copy link
Member

+2,219 −65

oh dear

@workingjubilee workingjubilee self-requested a review January 5, 2025 01:27
@tczajka
Copy link
Author

tczajka commented Jan 5, 2025

The code for f32 and f64 is identical, the code for f16 and f128 is a bit different. So I'm open to suggestions for how to best write this with minimal duplication. Macros for f32 and f64?

@traviscross
Copy link
Contributor

traviscross commented Jan 5, 2025

Maybe (or maybe not) there's some decent way to factor a unified implementation over all four with generics. The behavior differences for f16 and f128 could be conditionalized on some associated constant, e.g.

Probably I have in mind something like this: Playground link

@tgross35
Copy link
Contributor

tgross35 commented Jan 5, 2025

I still think that a better place for more complex soft float routines is probably in libm to be distributed as part of compiler-builtins - we already have the float traits, u256 (currently in compiler-builtins and doesn't have division, but that can be changed), tests against MPFR, and (soon) exhaustive tests. This isn't technically a math.h routine but we don't need to make it public in libm itself, just for the reexport in builtins.

We don't currently have anything like this though, @Amanieu would there be any concerns here?

@bors
Copy link
Contributor

bors commented Jan 24, 2025

☔ The latest upstream changes (presumably #135947) made this pull request unmergeable. Please resolve the merge conflicts.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally getting around to this. I have some documentation-related comments for a first pass since with that concern out of the way the rest of this is much less, relatively speaking, to review.

@@ -235,10 +237,14 @@ impl f128 {

/// Calculates Euclidean division, the matching method for `rem_euclid`.
///
/// This computes the integer `n` such that
/// In infinite precision this computes the integer `n` such that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually I see this arranged as "This computes the operation on the integer n with infinite precision"

Copy link
Contributor

@traviscross traviscross Feb 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nit, but I'd say the calculation is done "in infinite precision" rather than "with" it, just as I might say that a calculation is done "in a finite field".

Of course, if we want, this choice could be avoided by phrasing it differently. E.g., IEEE 754 tends to say things like "...computes the infinitely precise product x × y..." or that "every operation shall be performed as if it first produced an intermediate result correct to infinite precision and with unbounded range".

Copy link
Member

@workingjubilee workingjubilee Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are fine too. It's honestly at least partly that "In infinite precision" feels like an odd way to start the sentence. It places it next to the subject ("this", the function) when it should be attached either to the object or the verb.

Comment on lines +245 to +247
/// However, due to a floating point round-off error the result can be rounded if
/// `self` is much larger in magnitude than `rhs`, such that `n` can't be represented
/// exactly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the "with infinite precision" sound incorrect. It is not a random "round-off error", the rounding is a consequence of fitting the result into the floating-point value's representation. It is questionable to call it an "error" at all, since it is a deliberate choice in how floats work. Please be clear when identifying such. We can afford a few words.

Comment on lines +278 to +280
/// In infinite precision the return value `r` satisfies `0 <= r < rhs.abs()`.
///
/// However, due to a floating point round-off error the result can round to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments

///
/// # Precision
///
/// The result of this operation is guaranteed to be the rounded
/// infinite-precision result.
///
/// The result is precise when `self >= 0.0`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Floating-point operations are always precise (in the sense that they are always precisely conforming to floating-point semantics), they are not always accurate, see https://en.wikipedia.org/wiki/Accuracy_and_precision for more on the distinction I am drawing.

Suggested change
/// The result is precise when `self >= 0.0`.
/// The result is exactly accurate when `self >= 0.0`.

We could also say within 0 ULP, I suppose?

Copy link
Contributor

@traviscross traviscross Feb 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably I'd say the results of floating-point operations are always accurate to a given degree of precision.

We could side-step this and say e.g. that the "computed result is the same as the infinitely precise result when..." or "the infinitely precise result is exactly representable when...".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "exactly representable" form sounds good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation-related comments I have on f128.rs apply here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation-related comments I have on f128.rs apply here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation-related comments I have on f128.rs apply here.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 31, 2025
Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another partial review, this time of the soft-float tests. Their current state points to a need for slightly more documentation and perhaps a slightly different organization of the code proper. The way things currently are, I do not quite agree with any assessment that sounds like "well, you only need to review one type's implementation", since I need to make sure that there isn't some important deviation.

Comment on lines +12 to +17
abs: Positive::Finite(PositiveFinite { exp: -16494, mantissa: 1 << 112 }),
});

assert_eq!(Representation::new(f128::MIN_POSITIVE / 2.0), Representation {
sign: Sign::Positive,
abs: Positive::Finite(PositiveFinite { exp: -16495, mantissa: 1 << 112 }),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 1 << 112 is a meaningful magic number, and should be derivable from the existing constants (SIGNIFICAND_BITS and so on).

Can we lift this into a const MANTISSA_TOP_BIT (or whatever) that computes it? If you want to also assert_eq! that const against 1 << 112 still, that's fine too.


assert_eq!(Representation::new(f128::MAX), Representation {
sign: Sign::Positive,
abs: Positive::Finite(PositiveFinite { exp: 16271, mantissa: (1 << 113) - 1 }),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise const MANTISSA_MAX

Comment on lines +12 to +22
abs: Positive::Finite(PositiveFinite { exp: -149, mantissa: 0x800000 }),
});

assert_eq!(Representation::new(f32::MIN_POSITIVE / 2.0), Representation {
sign: Sign::Positive,
abs: Positive::Finite(PositiveFinite { exp: -150, mantissa: 0x800000 }),
});

assert_eq!(Representation::new(f32::MAX), Representation {
sign: Sign::Positive,
abs: Positive::Finite(PositiveFinite { exp: 104, mantissa: 0xffffff }),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a different way to write these mantissa values than is done elsewhere...?

}

/// Euclidean division.
#[allow(dead_code)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this allow...?


assert_eq!(Representation::new(f64::MIN_POSITIVE), Representation {
sign: Sign::Positive,
abs: Positive::Finite(PositiveFinite { exp: -1074, mantissa: 1 << 52 }),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...this exponent is a fair bit away from -1021, so I am going to have to go over a number of things very carefully, and I am not immediately seeing an explanation of the encoding choice in this module. I'd like to see a comment somewhere (perhaps on the struct, or on the associated constants, or this test?), that explains why what I am seeing here is Actually Correct.

Comment on lines +30 to +61
/// Represents an `FP` number.
#[derive(Copy, Clone, Debug, Eq, PartialEq)]
struct Representation {
/// Sign.
sign: Sign,
/// Absolute value.
abs: Positive,
}

#[derive(Copy, Clone, Debug, Eq, PartialEq)]
enum Sign {
Positive,
Negative,
}

/// Represents a positive number.
#[derive(Copy, Clone, Debug, Eq, PartialEq)]
enum Positive {
Zero,
Finite(PositiveFinite),
Infinity,
NaN,
}

/// Represents a non-zero, positive finite number.
#[derive(Copy, Clone, Debug, Eq, PartialEq)]
struct PositiveFinite {
/// `number = 2^exp * mantissa`.
exp: Exponent,
/// `2^SIGNIFICAND_BITS <= mantissa < 2^(SIGNIFICAND_BITS + 1)`
mantissa: Bits,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR winds up repeating these types a number of times... can it just be made generic? If we need to have separate impl blocks for each concrete combination of params, so be it, but we could at least have a unified declaration in an adjacent pub(crate) mod float;

@workingjubilee
Copy link
Member

@tczajka If you can make the soft-float impl as generic as possible given the current state of things then I think we can very quickly actually finish this out.

@tgross35
Copy link
Contributor

tgross35 commented Feb 7, 2025

@tczajka If you can make the soft-float impl as generic as possible given the current state of things then I think we can very quickly actually finish this out.

Pls can we consider libm rather than adding float traits in yet another place

@workingjubilee
Copy link
Member

workingjubilee commented Feb 7, 2025

  1. "generics" is not necessarily "trait".
  2. libm will not be able to have const traits until those are completely finished.
  3. as already discussed, this PR would probably benefit if it had access to the powers of the compiler (and the compiler was augmented).
  4. just because libs-api feels compelled to ask "what's the use-case?" over "let's make {numeric_types} generic!" doesn't mean there is not one, and it is frankly absurd that it occurs outside the standard library.

@workingjubilee
Copy link
Member

workingjubilee commented Feb 7, 2025

...apologies, just...

@tgross35 I am curious to know why, to you, "adding float traits in yet another place" is a sign that this PR should be against another repo, instead of fixing the fact that we have "float traits" everywhere except in the core libraries?

It cannot be "because it would be a miserable pile of bikeshedding". We're just inflicting a miserable pile of bikeshedding on everyone every time they have to reimpl their own pile of float traits. Skipping numeric traits in std made sense in 2015 but it doesn't really make the same sense in 2025.

@workingjubilee
Copy link
Member

It also does not actually seem to help us to have three different repositories one must consult to verify that a floating point implementation that we use is correct, either.

@tgross35
Copy link
Contributor

tgross35 commented Feb 8, 2025

@tgross35 I am curious to know why, to you, "adding float traits in yet another place" is a sign that this PR should be against another repo, instead of fixing the fact that we have "float traits" everywhere except in the core libraries?

I don't disagree that it would be very nice to have the traits in core. Keeping traits in one place was just one motivation in response to generics. The reasoning is deeper though: libm does have the traits (for floats, integers, doubling width, etc) but it also has a reasonably thorough test suite (tests against MPFR with edge cases, fuzzing, extensive/exhaustive tests), icount benchmarks, u256, and a handful of other code reuse things. Not necessarily that this routine would add that much more, imo we just have a better home for softfloat :)

as already discussed, this PR would probably benefit if it had access to the powers of the compiler (and the compiler was augmented).

I'm not seeing this, got a link?

and libm will not be able to have const traits until those are completely finished.

We do have a mechanism for using nightly features with a fallback. But we may not need to make these const for them to be const if they can become intrinsics (from some discussion with const-eval I'm planning to use the libm generic methods to add an implementation over rustc_apfloat types, which would allow us to just use libm in CTFE).

It also does not actually seem to help us to have three different repositories one must consult to verify that a floating point implementation that we use is correct, either.

Yeah, it's not ideal. I would like to merge libm into compiler-builtins but that has unfortunately been blocked for months on #135501 (well, just making it a subtree isn't, but some of the restructuring I would like to do afterward is). After that we could probably bump to a subtree here, TBD.

@workingjubilee
Copy link
Member

as already discussed, this PR would probably benefit if it had access to the powers of the compiler (and the compiler was augmented).

I'm not seeing this, got a link?

I mean the mention of using a u256 provided directly by the compiler: there's nothing technically infeasible about us doing so, given that LLVM supports it.

@workingjubilee
Copy link
Member

Hm. I am not sure some of those tests should not be in libcore, but I do not expect us to add a probably-randomization-driven test suite to our already flakey CI, for reasons previously discussed, so if that is what is ideal then moving this over there is fine.

Just as long as it's not about our 99 float traits lying around.

@Dylan-DPC
Copy link
Member

@tczajka any updates on this? thanks

@tczajka
Copy link
Author

tczajka commented Mar 6, 2025

Sorry I've been busy with other things, I'm planning to get back to the review within ~2 weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

f64::div_euclid and f64::rem_euclid yield inconsistent results
10 participants