Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go,cmd/link: TestScript/build_issue48319 and TestScript/build_plugin_reproducible failing on LUCI gotip-darwin-amd64-longtest builder due to non-reproducible LC_UUID #64947

Closed
prattmic opened this issue Jan 3, 2024 · 52 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. GoCommand cmd/go NeedsFix The path to resolution is known, but the work has not been done. OS-Darwin
Milestone

Comments

@prattmic
Copy link
Member

prattmic commented Jan 3, 2024

(It is unclear to me if this is an issue with the test, cmd/go, the compiler/linker, or the builder itself)

Example failure: https://ci.chromium.org/ui/inv/build-8759926960216361809/test-results?sortby=&groupby=

Both tests are failing because they aren't getting a reproducible build.

    script_test.go:156: FAIL: testdata/script/build_issue48319.txt:29: cmp -q main.exe main1.exe: main.exe and main1.exe differ
    script_test.go:156: FAIL: testdata/script/build_plugin_reproducible.txt:6: cmp -q a.so b.so: a.so and b.so differ

I haven't yet been able to reproduce on a gomote because the LUCI gomote setup doesn't currently set up Xcode properly, so cgo doesn't work (which these tests require).

cc @bcmills @dmitshur @mknyszek @cagedmantis

@prattmic prattmic added Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. GoCommand cmd/go labels Jan 3, 2024
@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

Workaround to get Xcode on a gomote:

Note: Depending on which machine you get, the mac_toolchain binary referenced below may be at either /Users/swarming/.swarming/w/ir/tools/bin/mac_toolchain or /Volumes/Work/s/w/ir/tools/bin/mac_toolchain.

$ gomote run mpratt-gotip-darwin-amd64-longtest-0 /bin/mkdir /tmp/xcode
$ gomote run mpratt-gotip-darwin-amd64-longtest-0 /Users/swarming/.swarming/w/ir/tools/bin/mac_toolchain install -xcode-version 15a240d -output-dir /tmp/xcode/Xcode.app
$ gomote run mpratt-gotip-darwin-amd64-longtest-0 /usr/bin/sudo xcode-select --switch /tmp/xcode/Xcode.app

@bcmills
Copy link
Contributor

bcmills commented Jan 3, 2024

Might have something to do with code-signing? (But then why aren't those tests failing on the darwin-amd64-longtest legacy TryBots too?)

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

With Xcode installed, this (thankfully) does reproduce (no pun intended):

$ gomote run mpratt-gotip-darwin-amd64-longtest-0 ./go/bin/go test -run=TestScript/build_plugin_reproducible -v cmd/go  
# Streaming results from "mpratt-gotip-darwin-amd64-longtest-0" to "/tmp/gomote2019819704/mpratt-gotip-darwin-amd64-longtest-0.stdout"...
=== RUN   TestScript
vcs-test.golang.org rerouted to http://127.0.0.1:50941
https://vcs-test.golang.org rerouted to https://127.0.0.1:50942
go test proxy running at GOPROXY=http://127.0.0.1:50943/mod
=== RUN   TestScript/build_plugin_reproducible
=== PAUSE TestScript/build_plugin_reproducible
=== CONT  TestScript/build_plugin_reproducible
    script_test.go:132: 2024-01-03T19:36:00Z
    script_test.go:134: $WORK=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/build_plugin_reproducible1539278168
    script_test.go:156: 
        PATH=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/testbin:/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/go/bin:/Users/swarming/.swarming/w/ir/tools/bin:/Users/swarming/.swarming/cipd_cache/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin
        HOME=/no-home
        CCACHE_DISABLE=1
        GOARCH=amd64
        TESTGO_GOHOSTARCH=amd64
        GOCACHE=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/gocache
        GOCOVERDIR=
        GODEBUG=
        GOEXE=
        GOEXPERIMENT=
        GOOS=darwin
        TESTGO_GOHOSTOS=darwin
        GOPROXY=http://127.0.0.1:50943/mod
        GOPRIVATE=
        GOROOT=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/go
        GOROOT_FINAL=
        GOTRACEBACK=system
        TESTGO_GOROOT=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/go
        TESTGO_EXE=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/testbin/go
        TESTGO_VCSTEST_HOST=127.0.0.1:50941
        TESTGO_VCSTEST_TLS_HOST=127.0.0.1:50942
        TESTGO_VCSTEST_CERT=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/vcstest63272679/cert.pem
        TESTGONETWORK=panic
        GOSUMDB=localhost.localdev/sumdb+00000c67+AcTrnkbUA+TU4heY3hkjiSES/DSQniBqIeQ/YppAUtK6
        GONOPROXY=
        GONOSUMDB=
        GOVCS=*:all
        devnull=/dev/null
        goversion=1.22
        CMDGO_TEST_RUN_MAIN=true
        HGRCPATH=
        GOTOOLCHAIN=auto
        newline=
        
        WORK=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/build_plugin_reproducible1539278168
        TMPDIR=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/build_plugin_reproducible1539278168/tmp
        GOPATH=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/build_plugin_reproducible1539278168/gopath
        PWD=/Users/swarming/.swarming/w/itsy7ss432/workdir-swarming-task/tmp/cmd-go-test-555795669/tmpdir1260424131/build_plugin_reproducible1539278168/gopath/src
        
        > [!buildmode:plugin] skip
        [condition not met]
        > [short] skip
        [condition not met]
        > go build -trimpath -buildvcs=false -buildmode=plugin -o a.so main.go
        > go build -trimpath -buildvcs=false -buildmode=plugin -o b.so main.go
        > cmp -q a.so b.so
    script_test.go:156: FAIL: testdata/script/build_plugin_reproducible.txt:6: cmp -q a.so b.so: a.so and b.so differ
--- FAIL: TestScript (0.10s)
    --- FAIL: TestScript/build_plugin_reproducible (8.78s)
FAIL
FAIL    cmd/go  9.089s
FAIL
# Wrote results from "mpratt-gotip-darwin-amd64-longtest-0" to "/tmp/gomote2019819704/mpratt-gotip-darwin-amd64-longtest-0.stdout".
Error running run: unable to execute ./go/bin/go: rpc error: code = Unknown desc = command execution failed: exit status 1

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

Complete recipe:

Note: Depending on which machine you get, the mac_toolchain binary referenced below may be at either /Users/swarming/.swarming/w/ir/tools/bin/mac_toolchain or /Volumes/Work/s/w/ir/tools/bin/mac_toolchain.

$ export GOROOT=/home/prattmic/src/go/ # set to your GOROOT
$ export GOMOTELUCI=true
$ gomote create gotip-darwin-amd64-longtest
mpratt-gotip-darwin-amd64-longtest-1
$ export INSTANCE=mpratt-gotip-darwin-amd64-longtest-1
$ gomote run ${INSTANCE} /bin/mkdir /tmp/xcode
$ gomote run ${INSTANCE} /Users/swarming/.swarming/w/ir/tools/bin/mac_toolchain install -xcode-version 15a240d -output-dir /tmp/xcode/Xcode.app
$ gomote run ${INSTANCE} /usr/bin/sudo xcode-select --switch /tmp/xcode/Xcode.app
$ gomote push ${INSTANCE}
$ gomote run ${INSTANCE} ./go/src/make.bash
$ gomote run ${INSTANCE} ./go/bin/go test -run=TestScript/build_plugin_reproducible -v cmd/go

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

The only differences between a.so and b.so are something near the beginning of the file (still investigating) and the Go Build ID:

diff -C 5 a.hex b.hex
*** a.hex       Wed Jan  3 12:03:42 2024
--- b.hex       Wed Jan  3 12:03:47 2024
***************
*** 117,128 ****
  00000740: 0b00 0000 5000 0000 0000 0000 a70b 0000  ....P...........
  00000750: a70b 0000 5603 0000 fd0e 0000 3300 0000  ....V.......3...
  00000760: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  00000770: 0000 0000 0000 0000 6048 1700 5801 0000  ........`H..X...
  00000780: 0000 0000 0000 0000 0000 0000 0000 0000  ................
! 00000790: 1b00 0000 1800 0000 edfb 2d7d ab6e 374d  ..........-}.n7M
! 000007a0: 8eba 7c75 012c c264 3200 0000 2000 0000  ..|u.,.d2... ...
  000007b0: 0100 0000 0000 0e00 0000 0e00 0100 0000  ................
  000007c0: 0300 0000 0007 f703 2a00 0000 1000 0000  ........*.......
  000007d0: 0000 0000 0000 0000 0c00 0000 3800 0000  ............8...
  000007e0: 1800 0000 0200 0000 0000 3805 0000 0100  ..........8.....
  000007f0: 2f75 7372 2f6c 6962 2f6c 6962 5379 7374  /usr/lib/libSyst
--- 117,128 ----
  00000740: 0b00 0000 5000 0000 0000 0000 a70b 0000  ....P...........
  00000750: a70b 0000 5603 0000 fd0e 0000 3300 0000  ....V.......3...
  00000760: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  00000770: 0000 0000 0000 0000 6048 1700 5801 0000  ........`H..X...
  00000780: 0000 0000 0000 0000 0000 0000 0000 0000  ................
! 00000790: 1b00 0000 1800 0000 f0cb 7393 b5bd 3b76  ..........s...;v
! 000007a0: 9fb5 5f03 dd32 4c8a 3200 0000 2000 0000  .._..2L.2... ...
  000007b0: 0100 0000 0000 0e00 0000 0e00 0100 0000  ................
  000007c0: 0300 0000 0007 f703 2a00 0000 1000 0000  ........*.......
  000007d0: 0000 0000 0000 0000 0c00 0000 3800 0000  ............8...
  000007e0: 1800 0000 0200 0000 0000 3805 0000 0100  ..........8.....
  000007f0: 2f75 7372 2f6c 6962 2f6c 6962 5379 7374  /usr/lib/libSyst
***************
*** 916,928 ****
  00003930: cccc cccc cccc cccc cccc cccc cccc cccc  ................
  00003940: ff20 476f 2062 7569 6c64 2049 443a 2022  . Go build ID: "
  00003950: 4c42 3648 7a64 376b 6c31 6258 726e 7948  LB6Hzd7kl1bXrnyH
  00003960: 697a 5859 2f70 2d7a 3839 4146 354e 6136  izXY/p-z89AF5Na6
  00003970: 6f31 736e 4466 704a 682f 3644 745f 4f44  o1snDfpJh/6Dt_OD
! 00003980: 4769 7571 452d 7652 4e52 7831 5878 2f66  GiuqE-vRNRx1Xx/f
! 00003990: 4735 6f64 7563 424e 6d4f 7053 6455 4e51  G5oducBNmOpSdUNQ
! 000039a0: 7861 5522 0a20 ffcc cccc cccc cccc cccc  xaU". ..........
  000039b0: cccc cccc cccc cccc cccc cccc cccc cccc  ................
  000039c0: 5548 89e5 4883 ec10 4c8b 3dd1 4607 0049  UH..H...L.=.F..I
  000039d0: 8b4f 084c 8b3d c646 0700 498b 170f 1f00  .O.L.=.F..I.....
  000039e0: 4839 c87d 1b73 3948 c1e0 0448 8b0c 0248  H9.}.s9H...H...H
  000039f0: 8b5c 0208 4889 c848 83c4 105d c30f 1f00  .\..H..H...]....
--- 916,928 ----
  00003930: cccc cccc cccc cccc cccc cccc cccc cccc  ................
  00003940: ff20 476f 2062 7569 6c64 2049 443a 2022  . Go build ID: "
  00003950: 4c42 3648 7a64 376b 6c31 6258 726e 7948  LB6Hzd7kl1bXrnyH
  00003960: 697a 5859 2f70 2d7a 3839 4146 354e 6136  izXY/p-z89AF5Na6
  00003970: 6f31 736e 4466 704a 682f 3644 745f 4f44  o1snDfpJh/6Dt_OD
! 00003980: 4769 7571 452d 7652 4e52 7831 5878 2f6d  GiuqE-vRNRx1Xx/m
! 00003990: 436f 5971 6470 5854 386a 7a54 6e64 6d4f  CoYqdpXT8jzTndmO
! 000039a0: 3038 5022 0a20 ffcc cccc cccc cccc cccc  08P". ..........
  000039b0: cccc cccc cccc cccc cccc cccc cccc cccc  ................
  000039c0: 5548 89e5 4883 ec10 4c8b 3dd1 4607 0049  UH..H...L.=.F..I
  000039d0: 8b4f 084c 8b3d c646 0700 498b 170f 1f00  .O.L.=.F..I.....
  000039e0: 4839 c87d 1b73 3948 c1e0 0448 8b0c 0248  H9.}.s9H...H...H
  000039f0: 8b5c 0208 4889 c848 83c4 105d c30f 1f00  .\..H..H...]....

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

Based on the otool output, it looks like this other component is the LC_UUID value: EDFB2D7D-AB6E-374D-8EBA-7C75012CC264 vs F0CB7393-B5BD-3B76-9FB5-5F03DD324C8A.

I don't know MachO very well, but it seems that this is just another build ID...

@bcmills
Copy link
Contributor

bcmills commented Jan 3, 2024

Huh. See also https://bugs.chromium.org/p/chromium/issues/detail?id=1068970. 😵‍💫

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

The output of GODEBUG=gocachehash=1 is identical for both builds.

@bcmills
Copy link
Contributor

bcmills commented Jan 3, 2024

Yeah, looks like the LC_UUID depends on at least the last component of the output file path:
https://github.com/apple-opensource/ld64/blame/e28c028b20af187a16a7161d89e91868a450cadc/src/ld/OutputFile.cpp#L3724-L3733

I'm not sure what _options.buildContextName() is derived from.
Looks like maybe the RC_RELEASE environment variable?
(https://github.com/apple-opensource/ld64/blob/e28c028b20af187a16a7161d89e91868a450cadc/src/ld/Options.cpp#L4529-L4530C30)

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

Thanks for the reference! Looking at the go build -x output, the last few steps are:

GOROOT_FINAL='$GOROOT' /Volumes/Work/s/w/it_9mfvcff/workdir-swarming-task/go/pkg/tool/darwin_amd64/link -o a.out.so -importcfg $WORK/b001/importcfg.link -installsuffix dynlink -pluginpath plugin/unnamed-bf82aa353b25c4b8a6ab19fdb37f3d07a25be28e -buildmode=plugin -buildid=LB6Hzd7kl1bXrnyHizXY/p-z89AF5Na6o1snDfpJh/6Dt_ODGiuqE-vRNRx1Xx/LB6Hzd7kl1bXrnyHizXY -extld=clang $WORK/b001/_pkg_.a
/Volumes/Work/s/w/it_9mfvcff/workdir-swarming-task/go/pkg/tool/darwin_amd64/buildid -w $WORK/b001/exe/a.out.so # internal
mv $WORK/b001/exe/a.out.so b.so

Running the link step (first line) multiple times, even without changing the path, yields a .so with different LC_UUID each time:

$ /Volumes/Work/s/w/it_9mfvcff/workdir-swarming-task/go/pkg/tool/darwin_amd64/link -o a.out.so -importcfg $WORK/b001/importcfg.link -installsuffix dynlink -pluginpath plugin/unnamed-bf82aa353b25c4b8a6ab19fdb37f3d07a25be28e -buildmode=plugin -buildid=LB6Hzd7kl1bXrnyHizXY/p-z89AF5Na6o1snDfpJh/6Dt_ODGiuqE-vRNRx1Xx/LB6Hzd7kl1bXrnyHizXY -extld=clang $WORK/b001/_pkg_.a
$ shasum5.30 a.out.so 
9029867749bbecd942c0037526ababa6f0d83932  a.out.so
$ /Volumes/Work/s/w/it_9mfvcff/workdir-swarming-task/go/pkg/tool/darwin_amd64/link -o a.out.so -importcfg $WORK/b001/importcfg.link -installsuffix dynlink -pluginpath plugin/unnamed-bf82aa353b25c4b8a6ab19fdb37f3d07a25be28e -buildmode=plugin -buildid=LB6Hzd7kl1bXrnyHizXY/p-z89AF5Na6o1snDfpJh/6Dt_ODGiuqE-vRNRx1Xx/LB6Hzd7kl1bXrnyHizXY -extld=clang $WORK/b001/_pkg_.a
$ shasum5.30 a.out.so                                                                                                                                                                                                                                                                                                                                                        
d76b438421660ffd24d4ca06dc30b3150b0b9fee  a.out.so

Diffing the binary shows that the UUID is the only difference (Go Build ID is identical presumably because I'm not running the buildid command).

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

It doesn't seem to be related to the file paths. cmd/link invokes clang like so:

host link: "clang" "-arch" "x86_64" "-m64" "-Wl,-headerpad,1144" "-Wl,-flat_namespace" "-Wl,-bind_at_load" "-dynamiclib" "-o" "a.out.so" "-Qunused-arguments" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/go.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000000.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000001.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000002.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000003.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000004.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000005.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000006.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000007.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000008.o" "/Volumes/Work/s/w/it_9mfvcff/go-link-202179431/000009.o" "-O2" "-g" "-lpthread"

The go-link-202179431 path component changes each iteration, but this can be forced to be the same with -tmpdir /tmp/tmp:

host link: "clang" "-arch" "x86_64" "-m64" "-Wl,-headerpad,1144" "-Wl,-flat_namespace" "-Wl,-bind_at_load" "-dynamiclib" "-o" "a.out.so" "-Qunused-arguments" "/tmp/tmp/go.o" "/tmp/tmp/000000.o" "/tmp/tmp/000001.o" "/tmp/tmp/000002.o" "/tmp/tmp/000003.o" "/tmp/tmp/000004.o" "/tmp/tmp/000005.o" "/tmp/tmp/000006.o" "/tmp/tmp/000007.o" "/tmp/tmp/000008.o" "/tmp/tmp/000009.o" "-O2" "-g" "-lpthread"

Even with identical paths each time we get different output.

@bcmills
Copy link
Contributor

bcmills commented Jan 3, 2024

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

Perhaps, but I'd like to better understand what is happening. Plus it seems like some users may want the UUID, as Chrome did.

FWIW, the clang command consistently generates identical output from the same inputs. It seems it is the output of dsymutil that is differing:

host link: "clang" "-arch" "x86_64" "-m64" "-Wl,-headerpad,1144" "-Wl,-flat_namespace" "-Wl,-bind_at_load" "-dynamiclib" "-o" "a.out.so" "-Qunused-arguments" "/tmp/tmp2/go.o" "/tmp/tmp2/000000.o" "/tmp/tmp2/000001.o" "/tmp/tmp2/000002.o" "/tmp/tmp2/000003.o" "/tmp/tmp2/000004.o" "/tmp/tmp2/000005.o" "/tmp/tmp2/000006.o" "/tmp/tmp2/000007.o" "/tmp/tmp2/000008.o" "/tmp/tmp2/000009.o" "-O2" "-g" "-lpthread"
host link dsymutil: "/tmp/xcode/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/dsymutil" "-f" "a.out.so" "-o" "/tmp/tmp2/go.dwarf"
host link strip: "/tmp/xcode/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/strip" "-S" "a.out.so"
$ for f in /tmp/tmp/*; do shasum5.30 $f; done
60cc7e733a36962e7bc73ee38291e6f37fca8272  /tmp/tmp/000000.o
60cc7e733a36962e7bc73ee38291e6f37fca8272  /tmp/tmp/000001.o
c85aae746d1a1f270a1cea350b377d1b5f9ff376  /tmp/tmp/000002.o
a4741bd785ce981348a908880917a43074de02a7  /tmp/tmp/000003.o
2050abddb774ad2600414caa5596608d5260424c  /tmp/tmp/000004.o
160fbd77666d9949ef8b8fa502ad1e665388dad3  /tmp/tmp/000005.o
0c4db5947508c4a8c035484c4ef97182efb572e1  /tmp/tmp/000006.o
e2724d0e3997897da123884a7dd2496d094bf45e  /tmp/tmp/000007.o
311978e69c428fbfc059f92eb48ae0a8e3e19d80  /tmp/tmp/000008.o
5fc69d54547370cba80b9d5272b62db3126ff85f  /tmp/tmp/000009.o
aa2a253f8abd8c84c3ef27ddfc9c12bc1481f277  /tmp/tmp/go.dwarf
480e9721586f5e764d59826677acff5d6bbe3588  /tmp/tmp/go.o
556b5a818027717b0399b1e94ba268ff147c932e  /tmp/tmp/trivial.c
$ for f in /tmp/tmp2/*; do shasum5.30 $f; done
60cc7e733a36962e7bc73ee38291e6f37fca8272  /tmp/tmp2/000000.o
60cc7e733a36962e7bc73ee38291e6f37fca8272  /tmp/tmp2/000001.o
c85aae746d1a1f270a1cea350b377d1b5f9ff376  /tmp/tmp2/000002.o
a4741bd785ce981348a908880917a43074de02a7  /tmp/tmp2/000003.o
2050abddb774ad2600414caa5596608d5260424c  /tmp/tmp2/000004.o
160fbd77666d9949ef8b8fa502ad1e665388dad3  /tmp/tmp2/000005.o
0c4db5947508c4a8c035484c4ef97182efb572e1  /tmp/tmp2/000006.o
e2724d0e3997897da123884a7dd2496d094bf45e  /tmp/tmp2/000007.o
311978e69c428fbfc059f92eb48ae0a8e3e19d80  /tmp/tmp2/000008.o
5fc69d54547370cba80b9d5272b62db3126ff85f  /tmp/tmp2/000009.o
3c3cf42192f5f6427f05b2dceca7c5733e6f1721  /tmp/tmp2/go.dwarf
480e9721586f5e764d59826677acff5d6bbe3588  /tmp/tmp2/go.o
556b5a818027717b0399b1e94ba268ff147c932e  /tmp/tmp2/trivial.c

(go.dwarf differs)

Edit: I'm not 100% certain about dsymutil being at fault here, as I can't seem to reproduce the non-reproducibility when running clang + dsymutil manually.

@prattmic
Copy link
Member Author

prattmic commented Jan 3, 2024

cc @thanm see #64947 (comment) for reproducer instructions

@bcmills bcmills changed the title cmd/go: TestScript/build_issue48319 and TestScript/build_plugin_reproducible failing continuously on LUCI gotip-darwin-amd64-longtest builder cmd/go: TestScript/build_issue48319 and TestScript/build_plugin_reproducible failing on LUCI gotip-darwin-amd64-longtest builder due to non-reproducible LC_UUID Jan 3, 2024
@bcmills bcmills added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jan 3, 2024
@bcmills bcmills changed the title cmd/go: TestScript/build_issue48319 and TestScript/build_plugin_reproducible failing on LUCI gotip-darwin-amd64-longtest builder due to non-reproducible LC_UUID cmd/go,cmd/link: TestScript/build_issue48319 and TestScript/build_plugin_reproducible failing on LUCI gotip-darwin-amd64-longtest builder due to non-reproducible LC_UUID Jan 3, 2024
@thanm
Copy link
Contributor

thanm commented Jan 4, 2024

I spent a little while looking at this. What's weird is that the actual DWARF in the two go.dwarf files is identical-- what is different is (again) the uuid. E.g.

$  llvm-dwarfdump-16 xxx/tmpdir1/go.dwarf > dw1.txt
$  llvm-dwarfdump-16 xxx/tmpdir2/go.dwarf > dw2.txt
$ diff dw1.txt dw2.txt
1c1
< xxx/tmpdir1/go.dwarf:	file format Mach-O 64-bit x86-64
---
> xxx/tmpdir2/go.dwarf:	file format Mach-O 64-bit x86-64
$
$ llvm-objdump-16 --macho --all-headers xxx/tmpdir1/go.dwarf > h1.txt
$ llvm-objdump-16 --macho --all-headers xxx/tmpdir2/go.dwarf > h2.txt
$ diff h1.txt h2.txt
1c1
< xxx/tmpdir1/go.dwarf:
---
> xxx/tmpdir2/go.dwarf:
3880c3880
<     uuid 3BA8085B-DD85-312C-B9AD-2CEDAE928E62
---
>     uuid E559C1A0-DDFF-3BD3-8CD8-7652DC367F9F
$

So basically what seems to be happening is that dsymutil is generating a different uuid each time and embedding it into the go.dwarf file, in spite of the fact that the dwarf is the same, hmm.

I will spend a little time digging into the dsymutil source code, maybe I can find out more.

@prattmic
Copy link
Member Author

prattmic commented Jan 4, 2024

FWIW, the version of Xcode we're installing is 15.0.0. I peeked at the release notes for 15.0.1 and 15.1 and nothing stood out as a fix for this kind of issue, but I'll see if we can get a different version to test.

@thanm
Copy link
Contributor

thanm commented Jan 4, 2024

OK (duh) in fact dsymutil is just faithfully copying the uuid from its input, so the problem here is that clang is generating a different uuid. I'll look into the clang source code instead.

@prattmic
Copy link
Member Author

prattmic commented Jan 4, 2024

FWIW, it looks like there are more versions of Xcode available to try out, though I haven't tested them:

  • mac_toolchain install -xcode-version 15a240d: 15.0
  • mac_toolchain install -xcode-version 15A507: 15.0.1
  • mac_toolchain install -xcode-version 15C65: 15.1
  • mac_toolchain install -xcode-version 15C5500c: 15.2 (beta, I guess)

@thanm
Copy link
Contributor

thanm commented Jan 4, 2024

OK, I think I am making some progress here. For a while I thought this might be an ld-prime problem, but that turned out to be a red herring. In fact it looks like it is a bit simpler than that.

Running the link with -ldflags=-v -tmpdir=/tmp/tmp I see

# command-line-arguments
HEADER = -H1 -T0x1001000 -R0x1000
host link: "clang" "-arch" "x86_64" "-m64" "-Wl,-headerpad,1144" "-Wl,-flat_namespace" "-Wl,-bind_at_load"
 "-dynamiclib" "-o" "/Users/swarming/.swarming/w/itvprlhos9/workdir-swarming-task/tmp/go-build2833294421/b001/exe/a.out.so" "-Qunused-arguments" "/tmp/tmp/go.o" "/tmp/tmp/000000.o" 
"/tmp/tmp/000001.o" "/tmp/tmp/000002.o" "/tmp/tmp/000003.o" "/tmp/tmp/000004.o" "/tmp/tmp/000005.o" 
"/tmp/tmp/000006.o" "/tmp/tmp/000007.o" "/tmp/tmp/000008.o" "-O2" "-g" "-lpthread" "-ld64"

Note the "-o" output, which incorporates the go build dir go-build2833294421, which is going to vary from build to build. The problem is that this is being incorporated into the dynamic info in the a.out.so output, e.g. from the output of llvm-objdump-16 --macho --all-headers I see:

Load command 4
          cmd LC_ID_DYLIB
      cmdsize 128
         name /Users/swarming/.swarming/w/itvprlhos9/workdir-swarming-task/tmp/go-build1516003297/b001/exe/a.out.so (offset 24)
   time stamp 1 Thu Jan  1 00:00:01 1970
      current version 0.0.0
compatibility version 0.0.0

and the external linker is almost certainly going to hash this section when creating the build ID.

Not sure what the best approach is to fix this. Also not sure why we aren't seeing similar problems with the older gomotes (I will spin one up and compare).

@bcmills
Copy link
Contributor

bcmills commented Jan 4, 2024

@thanm, that sounds very similar to an existing reproducibility workaround here:
https://cs.opensource.google/go/go/+/master:src/cmd/go/internal/work/gc.go;l=649-663;drc=66b8107a26e515bbe19855d358bdf12bd6326347

Perhaps we need to extend that workaround to more build modes, or take a similar approach when running other commands?

@thanm
Copy link
Contributor

thanm commented Jan 4, 2024

Well phooey, I am afraid I've had a Homer Simpson moment here.

My gomote expired, and I created a new one, but when I started using the new one I didn't update the PATH setting in my script, so it wasn't picking up the correct version of Go. It looks like with LUCI gomotes the location of GOROOT is slightly different each time:

bindir from my first gomote: "/Users/swarming/.swarming/w/ituz4dfd04/workdir-swarming-task/go/bin"
bindir from my second gomote: "/Users/swarming/.swarming/w/itvprlhos9/workdir-swarming-task/go/bin"

Oh well, a learning experience I suppose.

That explains why I was not picking up Cherry's fix (https://go-review.googlesource.com/c/go/+/478196, which extends the workaround that you mention Bryan).

Now I'm back to seeing only a difference in the UUID.

@thanm
Copy link
Contributor

thanm commented Jan 4, 2024

One more important bit of info: problem goes away if I build with -extldflags=-ld_classic, meaning that this may be another thing we can add to the long list of problems that crop up with "ld-prime" (e.g. issue #61229).

Looking at the setup we have on our old-style gomotes I see:

$ gomote run `cat mote.txt` softwareupdate --history

Display Name                                       Version    Date                  
------------                                       -------    ----                  
Command Line Tools for Xcode                       14.0       11/07/2022, 16:16:24  
Command Line Tools for Xcode                       14.1       11/07/2022, 16:16:24

e.g. command line tools, not a complete Xcode installation. For the new LUCI gomotes we are obviously a full Xcode install, and we're using version 15, which defaults to ld-prime.

@thanm
Copy link
Contributor

thanm commented Jan 4, 2024

FWIW, it looks like there are more versions of Xcode available to try out, though I haven't tested them:

  • mac_toolchain install -xcode-version 15a240d: 15.0
  • mac_toolchain install -xcode-version 15A507: 15.0.1
  • mac_toolchain install -xcode-version 15C65: 15.1
  • mac_toolchain install -xcode-version 15C5500c: 15.2 (beta, I guess)

I just tested the most recent one (15.2) and it appears to have the same problem. Hmph.

@thanm
Copy link
Contributor

thanm commented Jan 4, 2024

OK, one more update. I can reproduce the problem with just the C compiler, and what I think must be going on is that the name of the output file is being incorporated into the UUID. If I do:

$ clang -arch x86_64 -m64 -Wl,-headerpad,1144 -Wl,-flat_namespace -Wl,-bind_at_load -dynamiclib -o a.so example.cpp
$ clang -arch x86_64 -m64 -Wl,-headerpad,1144 -Wl,-flat_namespace -Wl,-bind_at_load -dynamiclib -o b.so example.cpp
$ llvm-objdump-16 --macho --all-headers a.so > bsh.txt
$ llvm-objdump-16 --macho --all-headers b.so > bsh.txt

then I see a difference, whereas if I instead do

$ clang -arch x86_64 -m64 -Wl,-headerpad,1144 -Wl,-flat_namespace -Wl,-bind_at_load -dynamiclib -o a.so example.cpp
mv a.so b.so
$ clang -arch x86_64 -m64 -Wl,-headerpad,1144 -Wl,-flat_namespace -Wl,-bind_at_load -dynamiclib -o a.so example.cpp
$ llvm-objdump-16 --macho --all-headers a.so > bsh.txt
$ llvm-objdump-16 --macho --all-headers b.so > bsh.txt

The UUIDs are the same (the only thing different in the second example is that both builds target a.so).

How would we feel about changing the test in question to target the same filename? Or does the current ld-prime behavior not really meet our criteria for reproducible builds?

@cherrymui cherrymui removed this from Test Flakes Jan 30, 2024
@dmitshur dmitshur modified the milestones: Backlog, Go1.23 Feb 11, 2024
@prattmic
Copy link
Member Author

Should we add skips for these tests while we figure out how to work around this?

@thanm
Copy link
Contributor

thanm commented Feb 20, 2024

Should we add skips for these tests while we figure out how to work around this?

Probably a good idea. I sent a CL (565376).

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/565376 mentions this issue: cmd/go/testdata/script: add darwin skips for selected buildrepro tests

gopherbot pushed a commit that referenced this issue Feb 20, 2024
Skip two build reproducibility tests (build_issue48319 and
build_plugin_reproducible) on Darwin if GO_BUILDER_NAME is set until
issue 64947 can be resolved; on the LUCI darwin longtest builder the
more contemporary version of Xcode is doing things that are unfriendly
to Go's build reproducibility.

For #64947.

Change-Id: Iebd433ad6dfeb84b6504ae9355231d897d8ae174
Reviewed-on: https://go-review.googlesource.com/c/go/+/565376
Reviewed-by: Cherry Mui <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
@bcmills
Copy link
Contributor

bcmills commented Mar 14, 2024

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/584937 mentions this issue: cmd/go/testdata/script: disable build_plugin_reproducible on darwin

@matloob
Copy link
Contributor

matloob commented May 10, 2024

@thanm Should we backport CL 565376 to Go 1.22? The Go 1.22 darwin/amd64 builders seem to have been broken for a while.

@dmitshur
Copy link
Contributor

Please do. Given there isn't a better fix available, let's do that to get GOOS=darwin builders passing on release branches. Thanks.

@matloob
Copy link
Contributor

matloob commented May 10, 2024

@gopherbot please create a backport issue for Go 1.22

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #67314 (for 1.22).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@dmitshur
Copy link
Contributor

dmitshur commented May 10, 2024

Comment #64947 (comment) suggested trying to set ZERO_AR_DATE=1 in the environment. I gave it a shot, and it in fact worked for me locally with the Xcode version I had installed: the test was passing with that change, and failing without it. However, it turned out not to be enough to get the test passing on the LUCI builder (perhaps because of the difference in Xcode version); see CL 574895 and the trybot run on it.

So ZERO_AR_DATE may still be relevant, but it appears not to be enough on its own.

@dmitshur dmitshur removed the Builders x/build issues (builders, bots, dashboards) label May 10, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/584238 mentions this issue: [release-branch.go1.22] cmd/go/testdata/script: add darwin skips for selected buildrepro tests

gopherbot pushed a commit that referenced this issue May 11, 2024
It's broken with the latest XCode versions, and is also already disabled
on darwin builders. Disable the test to get go test cmd/go working on
local builds again.

For #64947

Change-Id: I5a4b46cf23cbe887df4903f90b54cd2225f51233
Reviewed-on: https://go-review.googlesource.com/c/go/+/584937
Reviewed-by: Dmitri Shuralyov <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Dmitri Shuralyov <[email protected]>
gopherbot pushed a commit that referenced this issue May 13, 2024
…selected buildrepro tests

Skip two build reproducibility tests (build_issue48319 and
build_plugin_reproducible) on Darwin if GO_BUILDER_NAME is set until
issue 64947 can be resolved; on the LUCI darwin longtest builder the
more contemporary version of Xcode is doing things that are unfriendly
to Go's build reproducibility.

For #64947.
Fixes #67314

Change-Id: Iebd433ad6dfeb84b6504ae9355231d897d8ae174
Reviewed-on: https://go-review.googlesource.com/c/go/+/565376
Reviewed-by: Cherry Mui <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
(cherry picked from commit 53708d8)
Reviewed-on: https://go-review.googlesource.com/c/go/+/584238
Reviewed-by: Than McIntosh <[email protected]>
Auto-Submit: Dmitri Shuralyov <[email protected]>
TryBot-Bypass: Dmitri Shuralyov <[email protected]>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/585356 mentions this issue: cmd/go/testdata/script: turn back on build_plugin_reproducible script test

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/585355 mentions this issue: cmd/link/internal/ld: clean tmpdir obj timestamps

gopherbot pushed a commit that referenced this issue May 14, 2024
This patch changes the Go linker to "clean" (reset to Unix epoch) the
timestamps on object files copied to the tmpdir that is presented to
the external linker or archive tool. The intent is to improve build
reproducibility on Darwin, where later versions of xcode seem to want
to incorporate object file timestamps into the hash used for the final
build ID (which precludes the possibility of having reproducible Go
builds). Credit for this idea goes to Cherry (see
#64947 (comment)).

Updates #64947.

Change-Id: I2eb7dddff538e247122b04fdcf8a57c923f61201
Reviewed-on: https://go-review.googlesource.com/c/go/+/585355
Reviewed-by: Cherry Mui <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
@thanm
Copy link
Contributor

thanm commented May 16, 2024

OK, latest go-round: following CL 585355 if I run this sequence of commands (essentially the guts of the plugin repro test) on the gotip-darwin-amd64-longtest gomote (with Xcode 15E204a):

rm -f a.so b.so

# First build
rm -rf /tmp/tmp
mkdir /tmp/tmp
go build -trimpath -buildvcs=false -buildmode=plugin -o a.so \
      -ldflags=-tmpdir=/tmp/tmp main.go 1> err1.txt 2>&1
    
# Second build
rm -rf /tmp/tmp
mkdir /tmp/tmp
go build -trimpath -buildvcs=false -buildmode=plugin -o b.so \
   -ldflags=-tmpdir=/tmp/tmp main.go 1> err2.txt 2>&1

# Compare
cmp a.so b.so

I can execute this 50, 100, 200 times and it works every time. If I remove the -ldflags=-tmpdir=/tmp/tmp flag from the build, then the compare works most of the time but fails maybe once every 10 or 20 iterations.

So: working theory: external linker seems to sometimes (not always, who knows why) take into account the path to the object files being fed it when it generates the UUID.

At this point I am now leaning towards falling back on Cherry's second suggestions: run the external linker and then just stomp on the uuid in the generated binary.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/586079 mentions this issue: cmd/link/internal/ld: rewrite LC_UUID for darwin external links

gopherbot pushed a commit that referenced this issue May 21, 2024
When building Go binaries using external linking, rewrite the LC_UUID
Macho load command to replace the content placed there by the external
linker, so as to ensure that we get reproducible builds.

Updates #64947.

Change-Id: I263a89d1a067807404febbc801d4dade33bc3288
Reviewed-on: https://go-review.googlesource.com/c/go/+/586079
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
@github-project-automation github-project-automation bot moved this from In Progress to Done in Go Compiler / Runtime May 21, 2024
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. GoCommand cmd/go NeedsFix The path to resolution is known, but the work has not been done. OS-Darwin
Projects
Development

No branches or pull requests

8 participants