-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve precision for Sinh and Cosh TP routines #99763
Improve precision for Sinh and Cosh TP routines #99763
Conversation
private const float Single_LOGV = 0.693161f; | ||
private const float Single_HALFV = 1.0000138f; | ||
private const float Single_INVV2 = 0.24999309f; | ||
|
||
private const ulong Double_MAX_VECTORIZED_VALUE = 0x41600000000000ul; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB there's no double implementation in the amd library, I picked this value deriving from upper bounds used elsewhere and make it small enough such that all our tests pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a good value, it represents 1.9330329145781312E-307
which is so small that essentially no real world input will be vectorized.
I'd expect a more realistic value to at least match float
and be 0x4056_5A9F_8000_0000
, although in practice I'd expect it to be quite a bit higher. If you note the scalar cosh
algorithm, you'll note that double
uses: 0x4086_33CE_8FB9_F87E
, which is the bitwise representation for 710.475860073944
In general, this is a case where porting the scalar kernel is likely overall better. The biggest "blocker" to that is that the double
version currently uses a lookup table for the tail adjustment, which tends to not be trivially or efficiently vectorizable. The float
version does not and uses more branches instead.
-- There are options here, just not "trivial" options, and the PR would be more involved. So for the time being, I think simply matching the "correct" arg max is the more desirable behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my bad, I was toying around with the binary representation of the upper bound to find something that passes and hadn't realized that it ended up being so small! After more careful experimentation it turns out that the maximal cut-off within which we hit our precision targets for double is ~16.5, much lower compared to the float algorithm. Assuming we don't change the algorithm I see two options here:
- Use the reduced cut-off of 16.5 or
- Increase or remove the cut-offs and increase testing tolerances for the hyperbolic methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we really need to use the normal cutoff here, otherwise users will end up seeing major perf slowdowns for somewhat typical inputs.
The main reason for the precision issue is going to come about from LOGV
, HALFV
, and INVV2
being "exact" (or nearest representable). I had given a rough breakdown of where the values came from in #97874 (comment), but didn't end up having time to figure out the right amount of adjustment to give to the values for double
.
If you observe https://github.com/amd/aocl-libm-ose/blob/master/src/optimized/cosh.c, you'll note that values over 20
get simplified because the negative exponential from (e^x + e^-x) / 2
is small enough to not matter any more (we just do exp(|x|) / 2
with the relevant sign from x
while values under this range end up using the head/tail split to account for imprecision).
Given the above and given that we need to call exp
regardless, we should be able to conditionalize the work to keep it accurate and efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, in that case I'd be inclined to close this PR.
Tagging subscribers to this area: @dotnet/area-system-numerics |
It wasn't "missed"... it was removed in 0e6a327. But I don't remember why. |
Maybe because it was missing a double counterpart at the time? That's where the precision issues manifest in our testing. |
The amd algorithm for cosh employs fallback to scalar for values exceeding a specified threshold. This was missed in the TP transcription resulting in loss of precision when passing in large inputs.
Contributes to #98861.