Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linking libllvm-19 before using libllvmlite (currently requiring libllvm-15) leads to segmentation faults. #99

Open
1 task done
timostrunk opened this issue Jan 31, 2025 · 8 comments · May be fixed by #100
Open
1 task done
Labels

Comments

@timostrunk
Copy link

timostrunk commented Jan 31, 2025

Solution to issue cannot be found in the documentation.

  • I checked the documentation.

Issue

Disclaimer:

Part of this issue has been discussed in #72 in the scope of libllvm-14 and libllvm-15 and most of it in #84, where @bschindler estimated this might be a symbol isolation bug in libllvm. As this considers new versions and I want it discoverable, I made a new issue.

Summary:

Linking libllvm-19 before using libllvmlite (currently requiring libllvm-15) leads to segmentation faults.

As llvmlite is not linked statically to libLLVM, having both libllvm19 and libllvm15 (currently required by llvmlite) linked dynamically leads to segmentation faults with llvmlite. The impact is that you cannot install the most recent pytorch and numba together and use them without creating segmentation faults.

Reproduction

The following code produces a segmentation fault, if preloading libLLVM-19:

from numba import jit

@jit(nopython=True)
def numba_segfault():
    return 0.0

print(numba_segfault())

start it using:

LD_PRELOAD=$CONDA_PREFIX/lib/libLLVM-19.so python ./only_numba.py

You can create a new conda environment like this:

micromamba create --name=numba_seg numba libllvm19

The complete environment file is here:

name: numba_seg
channels:
- conda-forge
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=2_gnu
- bzip2=1.0.8=h4bc722e_7
- ca-certificates=2025.1.31=hbcca054_0
- ld_impl_linux-64=2.43=h712a8e2_2
- libblas=3.9.0=28_h59b9bed_openblas
- libcblas=3.9.0=28_he106b2a_openblas
- libexpat=2.6.4=h5888daf_0
- libffi=3.4.2=h7f98852_5
- libgcc=14.2.0=h77fa898_1
- libgcc-ng=14.2.0=h69a702a_1
- libgfortran=14.2.0=h69a702a_1
- libgfortran5=14.2.0=hd5240d6_1
- libgomp=14.2.0=h77fa898_1
- libiconv=1.17=hd590300_2
- liblapack=3.9.0=28_h7ac8fdf_openblas
- libllvm15=15.0.7=hb3ce162_4
- libllvm19=19.1.7=ha7bfdaf_1
- liblzma=5.6.3=hb9d3cd8_1
- libnsl=2.0.1=hd590300_0
- libopenblas=0.3.28=pthreads_h94d23a6_1
- libsqlite=3.48.0=hee588c1_1
- libstdcxx=14.2.0=hc0a3c3a_1
- libstdcxx-ng=14.2.0=h4852527_1
- libuuid=2.38.1=h0b41bf4_0
- libxcrypt=4.4.36=hd590300_1
- libxml2=2.13.5=h0d44e9d_1
- libzlib=1.3.1=hb9d3cd8_2
- llvmlite=0.44.0=py312h374181b_0
- ncurses=6.5=h2d0b736_3
- numba=0.61.0=py312h2e6246c_0
- numpy=2.1.3=py312h58c1407_0
- openssl=3.4.0=h7b32b05_1
- pip=25.0=pyh8b19718_0
- python=3.12.8=h9e4cc4f_1_cpython
- python_abi=3.12=5_cp312
- readline=8.2=h8228510_1
- setuptools=75.8.0=pyhff2d567_0
- tk=8.6.13=noxft_h4845f30_101
- tzdata=2025a=h78e105d_0
- wheel=0.45.1=pyhd8ed1ab_1
- zstd=1.5.6=ha6fb4c9_0

Installed packages

List of packages in environment: "/home/strunk/micromamba/envs/numba_seg"

  Name              Version    Build                 Channel    
──────────────────────────────────────────────────────────────────
  _libgcc_mutex     0.1        conda_forge           conda-forge
  _openmp_mutex     4.5        2_gnu                 conda-forge
  bzip2             1.0.8      h4bc722e_7            conda-forge
  ca-certificates   2025.1.31  hbcca054_0            conda-forge
  ld_impl_linux-64  2.43       h712a8e2_2            conda-forge
  libblas           3.9.0      28_h59b9bed_openblas  conda-forge
  libcblas          3.9.0      28_he106b2a_openblas  conda-forge
  libexpat          2.6.4      h5888daf_0            conda-forge
  libffi            3.4.2      h7f98852_5            conda-forge
  libgcc            14.2.0     h77fa898_1            conda-forge
  libgcc-ng         14.2.0     h69a702a_1            conda-forge
  libgfortran       14.2.0     h69a702a_1            conda-forge
  libgfortran5      14.2.0     hd5240d6_1            conda-forge
  libgomp           14.2.0     h77fa898_1            conda-forge
  libiconv          1.17       hd590300_2            conda-forge
  liblapack         3.9.0      28_h7ac8fdf_openblas  conda-forge
  libllvm15         15.0.7     hb3ce162_4            conda-forge
  libllvm19         19.1.7     ha7bfdaf_1            conda-forge
  liblzma           5.6.3      hb9d3cd8_1            conda-forge
  libnsl            2.0.1      hd590300_0            conda-forge
  libopenblas       0.3.28     pthreads_h94d23a6_1   conda-forge
  libsqlite         3.48.0     hee588c1_1            conda-forge
  libstdcxx         14.2.0     hc0a3c3a_1            conda-forge
  libstdcxx-ng      14.2.0     h4852527_1            conda-forge
  libuuid           2.38.1     h0b41bf4_0            conda-forge
  libxcrypt         4.4.36     hd590300_1            conda-forge
  libxml2           2.13.5     h0d44e9d_1            conda-forge
  libzlib           1.3.1      hb9d3cd8_2            conda-forge
  llvmlite          0.44.0     py312h374181b_0       conda-forge
  ncurses           6.5        h2d0b736_3            conda-forge
  numba             0.61.0     py312h2e6246c_0       conda-forge
  numpy             2.1.3      py312h58c1407_0       conda-forge
  openssl           3.4.0      h7b32b05_1            conda-forge
  pip               25.0       pyh8b19718_0          conda-forge
  python            3.12.8     h9e4cc4f_1_cpython    conda-forge
  python_abi        3.12       5_cp312               conda-forge
  readline          8.2        h8228510_1            conda-forge
  setuptools        75.8.0     pyhff2d567_0          conda-forge
  tk                8.6.13     noxft_h4845f30_101    conda-forge
  tzdata            2025a      h78e105d_0            conda-forge
  wheel             0.45.1     pyhd8ed1ab_1          conda-forge
  zstd              1.5.6      ha6fb4c9_0            conda-forge

Environment info

(numba_seg) [strunk@bastille numba]$ micromamba info

       libmamba version : 1.5.6
     micromamba version : 1.5.6
           curl version : libcurl/8.5.0 OpenSSL/3.2.0 zlib/1.2.13 zstd/1.5.5 libssh2/1.11.0 nghttp2/1.58.0
     libarchive version : libarchive 3.7.2 zlib/1.2.13 bz2lib/1.0.8 libzstd/1.5.5
       envs directories : /home/strunk/micromamba/envs
          package cache : /home/strunk/micromamba/pkgs
                          /home/strunk/.mamba/pkgs
            environment : numba_seg (active)
           env location : /home/strunk/micromamba/envs/numba_seg
      user config files : /home/strunk/.mambarc
 populated config files : /home/strunk/.condarc
       virtual packages : __unix=0=0
                          __linux=6.12.10=0
                          __glibc=2.40=0
                          __archspec=1=x86_64-v4
                          __cuda=12.7=0
               channels : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/nodefaults/linux-64
                          https://conda.anaconda.org/nodefaults/noarch
       base environment : /home/strunk/micromamba
               platform : linux-64
@timostrunk
Copy link
Author

To provide more context: During the segfault, the libllvm-15 library calls into the libllvm-19 library:

==7229== Invalid read of size 4
==7229==    at 0x63303F5: AddNodeIDCustom(llvm::FoldingSetNodeID&, llvm::SDNode const*) (in /home/strunk/micromamba/envs/numba_seg/lib/libLLVM.so.19.1)
==7229==    by 0x1FFEFFAC7F: ???
==7229==    by 0x9D37F1BB: llvm::FoldingSetBase::GetOrInsertNode(llvm::FoldingSetBase::Node*, llvm::FoldingSetBase::FoldingSetInfo const&) (in /home/strunk/micromamba/envs/numba_seg/lib/libLLVM-15.so)
==7229==    by 0x9DE516A2: llvm::SelectionDAG::AddModifiedNodeToCSEMaps(llvm::SDNode*) (in /home/strunk/micromamba/envs/numba_seg/lib/libLLVM-15.so)
==7229==    by 0x9DE51BD2: llvm::SelectionDAG::ReplaceAllUsesOfValueWith(llvm::SDValue, llvm::SDValue) (in /home/strunk/micromamba/envs/numba_seg/lib/libLLVM-15.so)
==7229==    by 0x9DE8C5BF: llvm::SelectionDAGISel::MorphNode(llvm::SDNode*, unsigned int, llvm::SDVTList, llvm::ArrayRef<llvm::SDValue>, unsigned int) (in /home/strunk/micromamba/envs/numba_seg/lib/libLLVM-15.so)
==7229==    by 0x9DE8FD67: llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (in /home/strunk/micromamba/envs/numba_seg/lib/libLLVM-15.so)
==7229==    by 0xA0B2C067: (anonymous namespace)::X86DAGToDAGISel::Select(llvm::SDNode*) (in /home/strunk/micromamba/envs/numba_seg/lib/libLLVM-15.so)
==7229==    by 0x9DE89935: llvm::SelectionDAGISel::DoInstructionSelection() (in /home/strunk/micromamba/envs/numba_seg/lib/libLLVM-15.so)
==7229==    by 0x9DE93FA7: llvm::SelectionDAGISel::CodeGenAndEmitDAG() (in /home/strunk/micromamba/envs/numba_seg/lib/libLLVM-15.so)
==7229==    by 0x9DE96B11: llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (in /home/strunk/micromamba/envs/numba_seg/lib/libLLVM-15.so)
==7229==    by 0x9DE99299: llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) [clone .part.0] (in /home/strunk/micromamba/envs/numba_seg/lib/libLLVM-15.so)
==7229==  Address 0x10 is not stack'd, malloc'd or (recently) free'd

Does this mean the symbol isolation bug is due to AddNodeIDCustom being in both libraries?

@sognetic
Copy link

sognetic commented Feb 12, 2025

I know pinging busy maintainers is pretty impolite but given the huge amount of noise in the conda-forge GH project and the fact that this seems to be a direct consequence of the dynamic linking in conda I'll still give it a try:
@jakirkham , any idea how this issue could be solved while still having conda's dynamic linking (if that's really the cause)? Or any idea where one could get help with fixing this issue if it's not fixable here?

@h-vetinari
Copy link
Member

This has been discussed at some length in #72; we used to have some segfaults with numba (through sparse) in pyarrow, which caused me to comment at the time (>1 year ago!)

Given the segfaults with pyarrow + sparse + numba + llvmlite when using shared libllvm, I think we should proceed with linking llvm statically here, as strongly suggested by @Hardcode84 - I feel we're not affording sufficient weight to the opinion of upstream maintainers here; it's fine to insist on our way as long as things work, but here they clearly don't.

Those objecting to linking statically would need to at least propose a solution that avoids those segfaults for arrow (link statically there?), but overall I find these segfaults to be pretty compelling evidence for making a CFEP-18 exception, because of course they could be happening in other circumstances as well.

This never happened though, and eventually the pyarrow segfaults went away, if only because arrow started using a newer llvm. In the meantime I see that other people are also running into this (#84).

I still think llvmlite should switch to linking libllvm statically, and anyone objecting to this needs to actually fix these issues in a reasonable timeframe rather than indefinitely blocking a by-now clearly justified fix.

@timostrunk timostrunk linked a pull request Feb 13, 2025 that will close this issue
5 tasks
@timostrunk
Copy link
Author

I opened a PR to fix the issue: #100

@isuruf
Copy link
Member

isuruf commented Feb 21, 2025

Does anyone have a simple reproducer without LD_PRELOAD?

@timostrunk
Copy link
Author

timostrunk commented Feb 21, 2025

Sure:

With:

 CONDA_OVERRIDE_CUDA=12.6 micromamba create --name=numba_torch python=3.12 pytorch=2.5.1 numba
import torch

def foo(x, y):
    a = torch.sin(x)
    b = torch.cos(y)
    return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))


from numba import jit
import numpy as np

@jit(nopython=True) # Set "nopython" mode for best performance, equivalent to @njit
def zero(): # Function is compiled to machine code when called the first time
    return 0.0

print(zero())

results in segmetation fault

The newest pytorch uses libllvm 20, here it only results in error:

CONDA_OVERRIDE_CUDA=12.6 micromamba create --name=numba_torch python=3.12 pytorch=2.6.0 numba
File "/home/ctfy/micromamba/envs/numba_torch/lib/python3.12/site-packages/llvmlite/binding/module.py", line 115, in verify
    raise RuntimeError(str(outmsg))
RuntimeError: Attribute list does not match Module context!
AttributeList[
  { function => noinline }
]
void (i8*)* @NRT_incref
Attribute list does not match Module context!
AttributeList[
  { function => noinline }
]
void (i8*)* @NRT_decref

@timostrunk
Copy link
Author

timostrunk commented Feb 21, 2025

And for full info: if you remove the CUDA override, it does not install a second libllvm and you have no segmentation fault, but rather just a 0.0 at the end (as expected).

@h-vetinari
Copy link
Member

That sounds like it's coming from triton (it only appears in the CUDA variant of pytorch), which is the only reason why llvm20 shows up anywhere already.

Does anyone have a simple reproducer without LD_PRELOAD?

Building libarrow+pyarrow with LLVM 15 was also a reliable trigger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants