Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

printf: %a output is different from coreutils #7364

Open
drinkcat opened this issue Feb 26, 2025 · 4 comments
Open

printf: %a output is different from coreutils #7364

drinkcat opened this issue Feb 26, 2025 · 4 comments

Comments

@drinkcat
Copy link
Contributor

drinkcat commented Feb 26, 2025

%a should output "Hexadecimal floating point, lowercase"

After fixing #7362, we still see some issues.

It seems like GNU coreutils prefers "shifting" the output so that we have a single hex digit between 0x1 and 0xf before the decimal point, while uutils always picks 0x1. And the output is padded with 0.

$ cargo run printf "%a %a\n" 15.125 16.125
0x1.e400000000000p+3 0x1.0200000000000p+4
$ printf "%a %a\n" 15.125 16.125
0xf.2p+0 0x8.1p+1

The value is technically correct though:

0x1.e400000000000p+3 (1+14/16+4/256)*2**3=15.125
0x1.0200000000000p+4 (1+2/256)*2**4=16.125

(note: be careful to add env before printf as some shell implementations provide built-in printf...)

Also, the behaviour is different across platforms. Running LANG=C env printf '%a %.6a\n' 0.12544 0.12544 in various dockers (gist):

arch %a %.6a
linux-386/linux-amd64 0x8.07357e670e2c12bp-6 0x8.07357ep-6
linux-arm-v5/linux-arm-v7 0x1.00e6afcce1c58p-3 0x1.00e6b0p-3
linux-arm64-v8/linux-mips64le/linux-ppc64le/linux-s390x 0x1.00e6afcce1c58255b035bd512ec7p-3 0x1.00e6b0p-3

According to https://en.cppreference.com/w/c/io/fprintf: The default precision is sufficient for exact representation of the value..

On x86, 16 nibbles = 64 bits are printed at most, including the integer part. That corresponds to the internal x86 80-bit floating point, long double type. printf shifts 3 of the fraction bits in the integer part before the ., so that the whole 64 bits can fit neatly in 16 nibbles when printed. It's interesting that this behaviour is preserved when specifying a precision (e.g. %.6f).

On arm64 (and a bunch of other archs): 28 nibbles = 112 bits are printed after the decimal point. That corresponds to quad-precision 128-bit float. Also long double type.

On arm32: 13 nibbles = 52 bits are printed. That's double-precision 64-bit float. Also long double type.

@tertsdiepraam
Copy link
Member

Yeah this was one of the shortcuts I took while implementing this. I had to do a big refactor and simplified it to using 1 as the first digit.

According to https://en.cppreference.com/w/c/io/fprintf: The default precision is sufficient for exact representation of the value..

While that is a pretty good reference, note that coreutils has a custom implementation that differs slightly in a few places. I might be misremembering (it's a while back), but I think that some C implementation also do the thing where they always use 1 as the first digit.

Also, if I recall correctly, GNU makes a distinction between a default precision and a precision that was specified of the same length, which explains the difference in precision that you're seeing.

@drinkcat
Copy link
Contributor Author

drinkcat commented Feb 28, 2025

I understand one goal of this project is to exactly match the output of GNU coreutils for compatibility? Is that correct?

(for context... I actually bumped into this in an attempt to debug further what's going on in #5759...)

So, there's at least 2 issues here with %a...

First, GNU coreutils appear to always pack 4 bits in the first hex digit (0x8->0xf), on x86(64). But only 1 bit (0x1) on arm platforms. No matter the specified precision. (yes, printf built-in bash -- when run as sh, only packs 1 bit on x86-64 as well)

Second, GNU coreutils appears to use long double on all platforms, that's either 64-bit, 80-bit, or 128-bit float (on arm32, x86(64), and arm64 respectively), and uses But uutils uses f64 for %a printf. So we're too low in precision (on anything but arm32). I wonder if we should switch to BigDecimal here too (like we do in seq).

Generally, do we need to detect the architecture (I assume uutils wants to target at least x86-64 and arm64?), and adjust the number of bits to pack in the first hex digit, then trim the precision from BigDecimal to whatever the long double would be on the given platform?

@tertsdiepraam
Copy link
Member

I understand one goal of this project is to exactly match the output of GNU coreutils for compatibility? Is that correct?

Ah yes, don't take my previous comment as discouraging you, it was meant as quite the opposite!

@tertsdiepraam
Copy link
Member

A small note though. I think you might run into difficulties with long double and it's platform-specificness. You could also argue that uutils emulates the behaviour of some specific architecture, in some sense that actually makes it more portable. I think uutils currently does very few modifications that take architecture differences so seriously. I think that part of uutils compatibility is essentially undefined right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants