Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3Packages.jax: fix libstdc++ mismatch when built with CUDA #225661

Merged
merged 2 commits into from
Apr 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions pkgs/development/compilers/cudatoolkit/extension.nix
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,14 @@ final: prev: let
# E.g. for cudaPackages_11_8 we use gcc11 with gcc12's libstdc++
# Cf. https://github.com/NixOS/nixpkgs/pull/218265 for context
backendStdenv = final.callPackage ./stdenv.nix {
nixpkgsStdenv = prev.pkgs.stdenv;
nvccCompatibleStdenv = prev.pkgs.buildPackages."${finalVersion.gcc}Stdenv";
# We use buildPackages (= pkgsBuildHost) because we look for a gcc that
# runs on our build platform, and that produces executables for the host
# platform (= platform on which we deploy and run the downstream packages).
# The target platform of buildPackages.gcc is our host platform, so its
# .lib output should be the libstdc++ we want to be writing in the runpaths
# Cf. https://github.com/NixOS/nixpkgs/pull/225661#discussion_r1164564576
nixpkgsCompatibleLibstdcxx = final.pkgs.buildPackages.gcc.cc.lib;
nvccCompatibleCC = final.pkgs.buildPackages."${finalVersion.gcc}".cc;
};

### Add classic cudatoolkit package
Expand Down
40 changes: 28 additions & 12 deletions pkgs/development/compilers/cudatoolkit/stdenv.nix
Original file line number Diff line number Diff line change
@@ -1,17 +1,33 @@
{ nixpkgsStdenv
, nvccCompatibleStdenv
{ lib
, nixpkgsCompatibleLibstdcxx
, nvccCompatibleCC
, overrideCC
, stdenv
, wrapCCWith
}:

overrideCC nixpkgsStdenv (wrapCCWith {
cc = nvccCompatibleStdenv.cc.cc;
let
cc = wrapCCWith
{
cc = nvccCompatibleCC;

# This option is for clang's libcxx, but we (ab)use it for gcc's libstdc++.
# Note that libstdc++ maintains forward-compatibility: if we load a newer
# libstdc++ into the process, we can still use libraries built against an
# older libstdc++. This, in practice, means that we should use libstdc++ from
# the same stdenv that the rest of nixpkgs uses.
# We currently do not try to support anything other than gcc and linux.
libcxx = nixpkgsCompatibleLibstdcxx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't think libcxx here has the semantics you're looking for.

nixpkgsCompatibleLibstdcxx is added to propagatedDeps but I think that's it; If you look in cc-wrapper, libcxx is only consulted for libc++ include paths when libcxx.isLLVM is present and true:

+ optionalString (libcxx != null || (useGccForLibs && gccForLibs.langCC or false) || (isGNU && cc.langCC or false)) ''
touch "$out/nix-support/libcxx-cxxflags"
touch "$out/nix-support/libcxx-ldflags"
''
+ optionalString (libcxx == null && (useGccForLibs && gccForLibs.langCC or false)) ''
for dir in ${gccForLibs}${lib.optionalString (hostPlatform != targetPlatform) "/${targetPlatform.config}"}/include/c++/*; do
echo "-isystem $dir" >> $out/nix-support/libcxx-cxxflags
done
for dir in ${gccForLibs}${lib.optionalString (hostPlatform != targetPlatform) "/${targetPlatform.config}"}/include/c++/*/${targetPlatform.config}; do
echo "-isystem $dir" >> $out/nix-support/libcxx-cxxflags
done
''
+ optionalString (libcxx.isLLVM or false) ''
echo "-isystem ${lib.getDev libcxx}/include/c++/v1" >> $out/nix-support/libcxx-cxxflags
echo "-stdlib=libc++" >> $out/nix-support/libcxx-ldflags
echo "-l${libcxx.cxxabi.libName}" >> $out/nix-support/libcxx-ldflags
''

Even if nixpkgsCompatibleLibstdcxx.isLLVM were present and true this wouldn't work as intended; the logic for libcxx in cc-wrapper (i.e. -stdlib, cxxabi, include/c++/v1) really is specific to clang and libc++.

Further, setting libcxx does not inhibit cc-wrapper from tacking on the gcc library paths where the cc's libraries (i.e. libstdc++ and friends) to cflags/ldflags.


Unfortunately I don't think cc-wrapper as it exists today has the right knobs to support this use case (cc = gcc but libstdc++ from a different gcc) but I think we can get pretty close with useCcForLibs and gccForLibs = gccFromWhichYouWantToGrabLibstdcxx. Here's a quick PoC (imperfect -- it seems to lose libc headers somehow -- but hopefully gets the point across): dc6a8f9

Note that with the commit above, using "${np.cudaPackages.backendStdenv.cc}/bin/g++ to compile binaries results in them being linked against libstdc++ from gcc12 (observable by running ldd on the binaries). With the PR as it exists right now, such binaries are still linked against libstdc++ from backendStdenv.cc.cc.

Depending on what the requirement here is (just a newer libstdc++? compiler builtins like libgcc_s too? etc) getting cc-wrapper to support this use case might make more sense than adding link options for the desired libstdc++ to packages in an ad-hoc way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't think libcxx here has the semantics you're looking for.

Yes, I was almost sure about that when I added the original line

nixpkgsCompatibleLibstdcxx is added to propagatedDeps but I think that's it

Indeed! And by sheer accident this has been sufficient for most of our broken cuda packages. This is why took a blind eye on the obvious erroneousness of the current "approach" and hastily merged #223664

I think it shouldn't be hard, though, to develop this into a correctly working stdenv while keeping all of the interfaces intact

Even if nixpkgsCompatibleLibstdcxx.isLLVM were present and true this wouldn't work as intended; this logic ... is specific to clang

I suspected so. Again, effectively we've only relied on the propagated build inputs so far, but we definitely should keep it this way for long

Unfortunately I don't think cc-wrapper as it exists today has the right knobs to support this use case

I was afraid of that. I think it would make sense to extend cc-wrapper for our use-case

Here's a quick PoC ... dc6a8f9

Oh wow, thank you so much! I think this is exactly the direction we should be heading in!

With the PR as it exists right now, such binaries are still linked against libstdc++ from backendStdenv.cc.cc

Thankfully, backendStdenv.cc.cc does not really link to its .lib and happily picks up any libstdc++ we put in downstream package's buildInputs... unless they are built with Bazel

But, as I admit above, this behaviour is accidental and we must not rely on it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on what the requirement here is (just a newer libstdc++? compiler builtins like libgcc_s too? etc) getting cc-wrapper to support this use case might make more sense than adding link options for the desired libstdc++ to packages in an ad-hoc way.

At this point, I think we only need to override what c++ stdlib gets written into runpaths of downstream derivations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And by sheer accident this has been sufficient for most of our broken cuda packages. This is why took a blind eye on the obvious erroneousness of the current "approach" and hastily merged #223664

I think it shouldn't be hard, though, to develop this into a correctly working stdenv while keeping all of the interfaces intact

I think it would make sense to extend cc-wrapper for our use-case

But, as I admit above, this behaviour is accidental and we must not rely on it

Ah okay, thanks for clarifying; this is good to hear 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thankfully, backendStdenv.cc.cc does not really link to its .lib and happily picks up any libstdc++ we put in downstream package's buildInputs... unless they are built with Bazel

Oh, interesting...

In general I'd think that your propagated deps would make their way into the linkopts that Bazel uses (propagatedDeps should get picked up by the bintools stdenv hooks and added to NIX_LDFLAGS which buildBazelPackage should map to linkopts).

Maybe it's an ordering thing? I think normally things specified in the NIX_LDFLAGS env var would have precedence over flags in libc-ldflags; with Bazel's linkopts this wouldn't be the case.

Easiest way to test would be to add NIX_LDFLAGS as an action_env in the Bazel options I think.

};
cudaStdenv = overrideCC stdenv cc;
passthruExtra = {
inherit nixpkgsCompatibleLibstdcxx;
# cc already exposed
};
assertCondition = true;
in
lib.extendDerivation
assertCondition
passthruExtra
cudaStdenv

# This option is for clang's libcxx, but we (ab)use it for gcc's libstdc++.
# Note that libstdc++ maintains forward-compatibility: if we load a newer
# libstdc++ into the process, we can still use libraries built against an
# older libstdc++. This, in practice, means that we should use libstdc++ from
# the same stdenv that the rest of nixpkgs uses.
# We currently do not try to support anything other than gcc and linux.
libcxx = nixpkgsStdenv.cc.cc.lib;
})
5 changes: 3 additions & 2 deletions pkgs/development/python-modules/jaxlib/default.nix
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
}:

let
inherit (cudaPackages) cudatoolkit cudaFlags cudnn nccl;
inherit (cudaPackages) backendStdenv cudatoolkit cudaFlags cudnn nccl;

pname = "jaxlib";
version = "0.3.22";
Expand Down Expand Up @@ -81,7 +81,7 @@ let
cudatoolkit_cc_joined = symlinkJoin {
name = "${cudatoolkit.cc.name}-merged";
paths = [
cudatoolkit.cc
backendStdenv.cc
binutils.bintools # for ar, dwp, nm, objcopy, objdump, strip
];
};
Expand Down Expand Up @@ -271,6 +271,7 @@ let
sed -i 's@include/pybind11@pybind11@g' $src
done
'' + lib.optionalString cudaSupport ''
export NIX_LDFLAGS+=" -L${backendStdenv.nixpkgsCompatibleLibstdcxx}/lib"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the order matter here? I'm assuming stdenv's (undesirable) libstdc++ will also be in NIX_LDFLAGS somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any extra -L in NIX_LDFLAGS, but maybe I should quickly check it by adding an echo somewhere

patchShebangs ../output/external/org_tensorflow/third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl
'' + lib.optionalString stdenv.isDarwin ''
# Framework search paths aren't added by bintools hook
Expand Down