Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sbcl: Disable failing doCheck on aarch64-darwin #359214

Closed

Conversation

pcboy
Copy link
Contributor

@pcboy pcboy commented Nov 26, 2024

sbcl is failing its test phase on aarch64-darwin.

It was already failing on x86_64-darwin it seems since it had this mention:

  # Tests on ofBorg’s x86_64-darwin platforms are so unstable that a random one
  # will fail every other run. There’s a deeper problem here; we might as well
  # disable them entirely so at least the other platforms get to benefit from
  # testing.
  doCheck = stdenv.hostPlatform.system != "x86_64-darwin"; 

The tests fail on aarch64-darwin with:

error: builder for '/nix/store/kgd0bni68zzrya0r87kn3qfdl074sgjn-sbcl-2.4.4.drv' failed with exit code 1;
       last 25 log lines:
       >  Expected failure:   debug.impure.lisp / (TRACE COMPILER-MACRO REDEFINED)
       >  Expected failure:   debug.impure.lisp / (TRACE WHEREIN ENCAPSULATE NIL)
       >  Expected failure:   debug.impure.lisp / (TRACE WHEREIN RECURSIVE ENCAPSULATE NIL)
       >  Expected failure:   debug.impure.lisp / (TRACE MACRO)
       >  Expected failure:   debug.impure.lisp / (TRACE LABELS WITHIN-MACRO)
       >  Expected failure:   debug.impure.lisp / (TRACE MACRO REDEFINED)
       
>  Expected failure:   debug.impure.lisp / (TRACE CAS)
       >  Expected failure:   debug.impure.lisp / (TRACE CAS GENERIC)
       >  Expected failure:   debug.impure.lisp / (TRACE SETF)
       >  Expected failure:   full-eval.impure.lisp / INLINE-FUN-CAPTURES-DECL
       >  Expected failure:   gc.impure.lisp / M-A-O-THREADLOCALLY-PRECISE
       >  Expected failure:   gc.impure.lisp / PIN-ALL-CODE-WITH-GC-ENABLED
       >  Skipped (broken):   gc.impure.lisp / CODE-ITERATION-FAST
       >  Unexpected success: gc.impure.lisp / PAGE-PROTECTED-P
       >  Expected failure:   gc.impure.lisp / ROSPACE-STRINGS
       >  Expected failure:   packages.impure.lisp / USE-PACKAGE-CONFLICT-SET
       >  Expected failure:   packages.impure.lisp / IMPORT-SINGLE-CONFLICT
       >  Skipped (broken):   run-program.impure.lisp / (RUN-PROGRAM AUTOCLOSE-STREAMS)
       >  Expected failure:   traceroot.impure.lisp / (SEARCH-ROOTS STACK-INDIRECT)
       >  Expected failure:   traceroot.impure.lisp / TRACEROOT-COLLAPSE-LISTS
       >  Expected failure:   traceroot.impure.lisp / (SEARCH-ROOTS SIMPLE-FUN)
       >  Expected failure:   traceroot.impure.lisp / SEARCH-FOR-SYMBOL-NAME
       >  Invalid exit status: run-program.test.sh
       >  (70 tests skipped for this combination of platform and features)
       > test failed, expected 104 return code, got 1
       For full logs, run 'nix log /nix/store/kgd0bni68zzrya0r87kn3qfdl074sgjn-sbcl-2.4.4.drv'.

Running nix-shell -p 'sbcl.overrideAttrs { doCheck = false; } on that aarch64 machine worked.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@pcboy pcboy marked this pull request as ready for review November 26, 2024 08:27
@ofborg ofborg bot added the 6.topic: darwin Running or building packages on Darwin label Nov 27, 2024
@ofborg ofborg bot requested review from Uthar, hraban, lukego, nagy and 7c6f434c November 27, 2024 01:06
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Nov 27, 2024
@hraban
Copy link
Member

hraban commented Nov 27, 2024

error: builder for '/nix/store/kgd0bni68zzrya0r87kn3qfdl074sgjn-sbcl-2.4.4.drv' failed with exit code 1;

Why is it building 2.4.4? Is this on Hydra somewhere? master only has 2.4.{6,9,10} afai can see

@pcboy
Copy link
Contributor Author

pcboy commented Nov 27, 2024

@hraban Good catch.

Indeed it seems 2.4.10 on master currently is successfully passing its tests on aarch64-darwin. Sorry for the false flag.

@pcboy pcboy closed this Nov 27, 2024
@hraban
Copy link
Member

hraban commented Nov 27, 2024

I figured out where this comes from: nixos-24.05 has v 2.4.4. I could repro the bug locally:

;;; Smoke tests: PASS
Unhandled SIMPLE-ERROR in thread #<SB-THREAD:THREAD tid=259 "main thread" RUNNING
                                    {7004AF0143}>:
  The assertion
  (EQUAL STRING "FEEFIE=foefum
")
  failed with STRING = "FEEFIE=foefum
__CF_USER_TEXT_ENCODING=0x15F:0:0
".

This is new; I'm certain github:nixos/nixpkgs/nixos-24.05#sbcl used to work fine. Not sure what changed, and why it doesn't show up on nixpkgs-unstable! @Uthar do you perhaps recognize this at all?

@hraban
Copy link
Member

hraban commented Nov 27, 2024

fwiw the upstream test is here https://github.com/sbcl/sbcl/blob/master/tests/run-program.test.sh#L51-L56 and it didn't change in quite a while

@Uthar
Copy link
Contributor

Uthar commented Nov 27, 2024

Hey @hraban - Unfortunately I don't recognize this issue. I only remember sometimes SBCL breaking after certain MacOS updates, but that was on some clients machine. Though in that case the error would probably be more spectacular. Anyway, maybe we could search sbcl-devel.

@hraban
Copy link
Member

hraban commented Nov 27, 2024

According to https://superuser.com/questions/82123/mac-whats-cfusertextencoding-for it's a environment variable that gets injected into processes by mac if a file ~/.CFUserTextEncoding exists. The envvar has the format <userid in hex>:0:0; hence the 0x15F = 351. I'm surprised we haven't run into this anywhere else...

@Uthar
Copy link
Contributor

Uthar commented Nov 28, 2024

it's a environment variable that gets injected into processes by mac if a file ~/.CFUserTextEncoding exists.

Nasty!

@hraban
Copy link
Member

hraban commented Nov 29, 2024

I ran a bisect and found a specific nixpkgs commit that introduced this bug in the --first-parent lineage of the 24.05 release branch:

23a55bf763106d8c15ddaf9cd60de023d41d7a2a is the first bad commit
commit 23a55bf763106d8c15ddaf9cd60de023d41d7a2a
Merge: ec96bdc34854 a4d32ff88947
Author: Vladimír Čunát <[email protected]>
Date:   Sun Jun 23 11:24:23 2024 +0200

    Merge #319254: staging-next-24.05 iteration 1

    ...into release-24.05

Something in that merge introduces whatever it is that starts triggering this bug. Online resources talk about this behavior existing for >10y now so it's not some new apple feature. I'm very curious why this same bug doesn't rear its head in 24.11 or master, or in other software (I guess sbcl is the only one with a unit test that checks the calling env?)

Very interesting bug!

EDIT: if someone is interested in reproducing this , here's how you use git bisect:

  1. find a known good commit
  2. find a known bad commit (in this case it was origin/nixos-24.05)
  3. find a command that you known triggers the bug , returning 0 if no bug, non-0 if bug (nix-build --no-out-link --expr 'let p = import ./. {} ; in p.sbcl.overrideAttrs { checkPhase = "(cd tests ; ./run-tests.sh -- run-program.test.sh)"; }')
  4. run:
    git bisect start --first-parent <bad> <good>
    git bisect run nix-build --no-out-link --expr 'let p = import ./. {} ; in p.sbcl.overrideAttrs { checkPhase = "(cd tests ; ./run-tests.sh -- run-program.test.sh)"; }'
    

nixpkgs uses a lot of merging so you want to start off with a --first-parent so it only tries commits that were actually in nixos-24.05, rather than intermediate commits from branches.

@hraban
Copy link
Member

hraban commented Nov 29, 2024

Related: I'm still not completely clear on how exactly hydra releases work, but shouldn't this failing sbcl build somehow cause hydra to fail to build 24.05 entirely? Apparently nixos-24.05 has been failing to build sbcl since June 23, and somehow it's fine? How can we get alerted to this? This is a part of nixpkgs that is still kinda voodoo to me.

@hraban
Copy link
Member

hraban commented Dec 2, 2024

I traced it to 57b36ea from #313773 . @toonn does this maybe ring a bell for you? The context here is that since that commit, a test in this compiler has started failing. It uses something like execve and tests that the environment of the child process only has exactly the one variable passed to it, but since that commit it has an extra envvar, __CF_USER_TEXT_ENCODING=0x15F:0:0 (0x15f being 351, the uid of the nix builder process).

It doesn't seem to happen in nixpkgs-unstable, only on 24.05. I can't find much more about this envvar online, and whether or not all software is supposed to just accept that it could get this passed in its env, and ignore it. Is this opt-in? Is the test technically wrong on darwin, or...?

I'm trying to figure out if this will at some point start failing in nixpkgs-unstable as well.

@hraban
Copy link
Member

hraban commented Dec 2, 2024

Including @szlend who, I now see, was the actual author of the patch

@szlend
Copy link
Member

szlend commented Dec 2, 2024

I can't say why exactly this happens but the Darwin stdenv has been redesigned in 24.11/unstable, removing these hooks entirely. So I don't see this coming back all of the sudden.

7c6f434c pushed a commit that referenced this pull request Dec 8, 2024
See #359214 and in particular @szlend’s
comment.  Thanks to @pcboy for reporting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: darwin Running or building packages on Darwin 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants