Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seq: different output when comparing with GNU #5759

Open
sylvestre opened this issue Dec 31, 2023 · 4 comments
Open

seq: different output when comparing with GNU #5759

sylvestre opened this issue Dec 31, 2023 · 4 comments
Labels

Comments

@sylvestre
Copy link
Contributor

Found with the seq fuzzer

$ ./target/debug/coreutils seq -80 18.333615880731813 8158 > a.txt
$ LANG=C /usr/bin/seq -80 18.333615880731813 8158 > b.txt
$  diff -u a.txt b.txt
--- a.txt       2023-12-31 16:35:42.919872401 +0100
+++ b.txt       2023-12-31 16:36:00.859817826 +0100
@@ -376,75 +376,75 @@
 6795.105955274429875
 6813.439571155161688
 6831.773187035893501
-6850.106802916625314
+6850.106802916625313
 6868.440418797357127
 6886.774034678088940
 6905.107650558820753
 6923.441266439552566
 6941.774882320284379
 6960.108498201016192
-6978.442114081748005
+6978.442114081748004
 6996.775729962479818
 7015.109345843211631
 7033.442961723943444
 7051.776577604675257
 7070.110193485407070
 7088.443809366138883
-7106.777425246870696
+7106.777425246870695
 7125.111041127602509
 7143.444657008334322
 7161.778272889066135
 7180.111888769797948
 7198.445504650529761
 7216.779120531261574
-7235.112736411993387
+7235.112736411993386
 7253.446352292725200
 7271.779968173457013
 7290.113584054188826
 7308.447199934920639
 7326.780815815652452
 7345.114431696384265
-7363.448047577116078
+7363.448047577116077
 7381.781663457847891
 7400.115279338579704
 7418.448895219311517
 7436.782511100043330
 7455.116126980775143
 7473.449742861506956
-7491.783358742238769
+7491.783358742238768
 7510.116974622970582
 7528.450590503702395
 7546.784206384434208
 7565.117822265166021
 7583.451438145897834
 7601.785054026629647
-7620.118669907361460
+7620.118669907361459
 7638.452285788093273
-7656.785901668825086
+7656.785901668825085
 7675.119517549556899
 7693.453133430288712
 7711.786749311020525
 7730.120365191752338
-7748.453981072484151
+7748.453981072484150
 7766.787596953215964
-7785.121212833947777
+7785.121212833947776
 7803.454828714679590
 7821.788444595411403
 7840.122060476143216
 7858.455676356875029
-7876.789292237606842
+7876.789292237606841
 7895.122908118338655
-7913.456523999070468
+7913.456523999070467
 7931.790139879802281
 7950.123755760534094
 7968.457371641265907
 7986.790987521997720
-8005.124603402729533
+8005.124603402729532
 8023.458219283461346
-8041.791835164193159
+8041.791835164193158
 8060.125451044924972
 8078.459066925656785
 8096.792682806388598
 8115.126298687120411
-8133.459914567852224
-8151.793530448584037
+8133.459914567852223
+8151.793530448584036

Probably the usual numerial computing issues

@samueltardieu
Copy link
Contributor

Yes, and uutils is more consistent than GNU coreutils here.

@ghost
Copy link

ghost commented Jan 22, 2024

I'll take on this issue (as a good first issue). Since uutils is more accurate, we'd have to degrade accuracy to mimic GNU coreutils, which would help with compatibility, but there are also cases where accuracy is better. I think the easier way forward would be to add a new flag, but should I have the more accurate version be specified by flag (and have GNU compatibility by default), or have the GNU compatibility be specified by flag?

@ghost
Copy link

ghost commented Feb 16, 2024

Update after some late night readings.
GNU uses the long double datatype, which is most often 10 bytes, but can vary between systems. And since uutils uses a more precise data type, there will always be an accuracy error when comparing between the 2 tools, raw. I thought about updating GNU, but it would be more difficult, because it would more or less necessitate a rewrite of many functions to work with a new custom data type. Or, we can modify uutils to have a flag that specifies using a long double data type as defined by the system. The issue with that is it would still be dependent upon compiler implementations, even on the same system.
I think I'm going to add in a flag to allow the user to degrade performance but match GNU, then find out the size of a long double on that system with that compiler, and use that information accordingly

@drinkcat
Copy link
Contributor

Played a bit with coreutils on different architectures here: https://gist.github.com/drinkcat/04b193dce9429205db3a3ed8dca8a7e6

Difference matrix follows:

  • There are 450 lines in the files
  • arm (32-bit) gives out totally different output (just a few numbers at the beginning are the same, 443/450 are different...)
  • 16 differences between amd64 coreutils and rust version (as we know from above).
  • Rust implementation on arm64 is consistent with arm64-v8, mips64le, ppc64le and s390x coreutils.
X linux-386 linux-amd64 linux-amd64(RUST) linux-arm64-v8 linux-arm-v5 linux-arm-v7 linux-mips64le linux-ppc64le linux-s390x
linux-386 0 0 16 16 443 443 16 16 16
linux-amd64 0 0 16 16 443 443 16 16 16
linux-amd64-rust 16 16 0 0 443 443 0 0 0
linux-arm64-v8 16 16 0 0 443 443 0 0 0
linux-arm-v5 443 443 443 443 0 0 443 443 443
linux-arm-v7 443 443 443 443 0 0 443 443 443
linux-mips64le 16 16 0 0 443 443 0 0 0
linux-ppc64le 16 16 0 0 443 443 0 0 0
linux-s390x 16 16 0 0 443 443 0 0 0

I'm not sure what the objective is in terms on exact compability, but given that coreutils isn't consistent across architectures, I wonder if this is worth fixing, or if this is just something we can accept (and that no one would ever notice...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Status: No status
Development

No branches or pull requests

3 participants