Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX-512f intrinsics fail to compile with MemorySanitizer #957

Open
g2p opened this issue Nov 24, 2020 · 8 comments
Open

AVX-512f intrinsics fail to compile with MemorySanitizer #957

g2p opened this issue Nov 24, 2020 · 8 comments

Comments

@g2p
Copy link

g2p commented Nov 24, 2020

I found an issue when using the memory sanitizer (which requires rebuilding the standard library with extra flags). The compiler has problems generating code for _mm_cvt_roundss_u32 or _mm512_shuffle_ps.

I don't have a CPU supporting these, but enabling sanitizers does require linking everything due to warts in linkers and LLVM's coverage measurement runtime.

Here is a PR showing how to test this from the rust repository: rust-lang/rust#79382

Alternatively, it may be slightly faster to test it like this (also from a rust checkout):

time RUSTFLAGS_NOT_BOOTSTRAP="-Cpasses=sancov -Clink-dead-code -Zsanitizer=memory -C codegen-units=1" ./x.py build -i library/core 

The errors look like this:

LLVM ERROR: Cannot select: 0x7f96e9336bf0: v64i8 = X86ISD::PALIGNR 0x7f96e93296b0, 0x7f96e9381fe8, TargetConstant:i8<8>
  0x7f96e93296b0: v64i8,ch = load<(load 64 from %ir.2245)> 0x7f96f547cba8, 0x7f96e9337270, undef:i64
    0x7f96e9337270: i64 = xor 0x7f96e93298b8, 0x7f96e9279c58
      0x7f96e93298b8: i64,ch = CopyFromReg 0x7f96f547cba8, Register:i64 %1119
        0x7f96e93424a0: i64 = Register %1119
      0x7f96e9279c58: i64 = AssertZext 0x7f96e9272c00, ValueType:ch:i47
        0x7f96e9272c00: i64,ch = CopyFromReg 0x7f96f547cba8, Register:i64 %6
          0x7f96e927fad0: i64 = Register %6
    0x7f96e931fb58: i64 = undef
  0x7f96e9381fe8: v64i8 = bitcast 0x7f96e9344420
    0x7f96e9344420: v16i32,ch = CopyFromReg 0x7f96f547cba8, Register:v16i32 %782
   Compiling rustc-std-workspace-core v1.99.0 (/home/g2p/src/github.com/rust-lang/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/rustc-std-workspace-core)
      0x7f96e927a410: v16i32 = Register %782
  0x7f96e93820b8: i8 = TargetConstant<8>
In function: _ZN4core9core_arch3x867avx512f17_mm512_shuffle_ps17h2adad9c5dc64a280E

And

PHI node operands are not the same type as the result!
  %_msphi_s = phi i32 [ %42, %38 ], [ %35, %31 ], [ %28, %24 ], [ %21, %17 ], [ %14, %10 ]
in function _ZN4core9core_arch3x867avx512f19_mm_cvt_roundss_u3217h42028e7a281c0c10E
LLVM ERROR: Broken function found, compilation aborted!
@tmiasko
Copy link
Contributor

tmiasko commented Nov 24, 2020

Looks like a bug in MemorySanitizer instrumentation pass, I would recommend reporting upstream.

#include <immintrin.h>

unsigned test_mm_cvt_roundss_u32(__m128 __A) {
  return _mm_cvt_roundss_u32(__A, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC);
}
# Note clang has assertions disabled.
$ clang -march=skylake-avx512 a.c -emit-llvm -S -fsanitize=memory
$ opt a.ll
opt: a.ll:36:9: error: stored value and pointer type do not match
  store <4 x i32> %11, i32* bitcast ([100 x i64]* @__msan_retval_tls to i32*), align 8
# Note opt has assertions enabled.
$ clang -march=skylake-avx512 a.c -emit-llvm -S 
$ opt a.ll -msan
opt: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:2684: void {anonymous}::MemorySanitizerVisitor::handleVectorConvertIntrinsic(llvm::IntrinsicInst&, int): Assertion `CopyOp->getType() == I.getType()' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.

@g2p
Copy link
Author

g2p commented Nov 25, 2020

I can try to forward your reproducer, but for now I'm still waiting for Bugzilla account approval.

@tmiasko
Copy link
Contributor

tmiasko commented Nov 25, 2020

Oh, I forgot about those small roadblocks when reporting bugs to LLVM. I opened https://bugs.llvm.org/show_bug.cgi?id=48298.

@g2p g2p changed the title AVX-512f intrinsics fail to compile AVX-512f intrinsics fail to compile with MemorySanitizer Nov 25, 2020
@g2p
Copy link
Author

g2p commented Nov 27, 2020

Thanks, glad it got fixed already!
I was not able to get a similar minimal reduction for _mm512_shuffle_ps.
Because clang is a beast to build (the linker runs out of memory), I don't have a clang build with assertions enabled. The above commands seem to work with this in a.c

#include <immintrin.h>

__m512 test_mm512_shuffle_ps(__m512 __M, __m512 __V) {
  return _mm512_shuffle_ps(__M, __V, 8); 
}

I however get _mm512_shuffle_ps code generation errors with this overall test in a rust checkout:

RUSTFLAGS_NOT_BOOTSTRAP="-Cpasses=sancov -Clink-dead-code -Zsanitizer=memory -C codegen-units=1" ./x.py build -i library/core 

@bjorn3
Copy link
Member

bjorn3 commented Nov 27, 2020

Which clang version did you build? The one in https://github.com/rust-lang/llvm-project?

I however get _mm512_shuffle_ps code generation errors with this overall test in a rust checkout:

RUSTFLAGS_NOT_BOOTSTRAP="-Cpasses=sancov -Clink-dead-code -Zsanitizer=memory -C codegen-units=1" ./x.py build -i library/core 

That command builds the standard library with the bootstrap compiler, which I think currently uses an older version than master. You should try RUSTFLAGS="-Cpasses=sancov -Clink-dead-code -Zsanitizer=memory -C codegen-units=1" cargo +nightly build -Zbuild-std --target x86_64-unknown-linux-gnu I think. (assuming that you use x86_64-unknown-linux-gnu and have the rust-src component installed) (you already tried that in the rust PR that I only now saw)

@g2p
Copy link
Author

g2p commented Nov 27, 2020

First I cherry-picked the fix for the cvt functions from upstream llvm (and committed the llvm submodule in the rust repo), ran my "build core with instrumentation" test, which proved that _mm512_shuffle_ps needs an independent fix.

For more in-depth tests, and hopefully a reduction, I tried to get clang built from a rust checkout using

RUSTBUILD_FORCE_CLANG_BASED_TESTS=1 ./x.py build --stage 1

but ld kept getting killed by the OOM killer, and using LLD failed in a different way (ld.lld: error: asan_malloc_linux.cpp:(.debug_loc+0x222F70): has non-ABS relocation R_386_GOTOFF against symbol 'alloc_memory_for_dlsym').

@tmiasko
Copy link
Contributor

tmiasko commented Nov 28, 2020

#include <immintrin.h>

__m512 test_mm512_shuffle_ps(__m512 __M, __m512 __V) {
  return _mm512_shuffle_ps(__M, __V, 78); 
}
$ clang -cc1 -target-feature +avx512f -ffreestanding -triple x86_64-unknown-linux-gnu -x c a.c -internal-isystem /usr/lib64/clang/11.0.0/include -S -emit-obj -fsanitize=memory
clang: llvm/lib/Target/X86/X86ISelLowering.cpp:12493: llvm::SDValue lowerShuffleAsByteRotate(const llvm::SDLoc&, llvm::MVT, llvm::SDValue, llvm::SDValue, llvm::ArrayRef<int>, const llvm::X86Subtarget&, llvm::SelectionDAG&): Assertion `(!VT.is512BitVector() || Subtarget.hasBWI()) && "512-bit PALIGNR requires BWI instructions"' failed.

Reduced:

; ModuleID = 'a.c'
source_filename = "a.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: norecurse nounwind readnone
define <16 x i32> @shuffle(<16 x i32> %a, <16 x i32> %b) local_unnamed_addr #0 {
entry:
  %c = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 2, i32 3, i32 16, i32 17, i32 6, i32 7, i32 20, i32 21, i32 10, i32 11, i32 24, i32 25, i32 14, i32 15, i32 28, i32 29>
  ret <16 x i32> %c
}

attributes #0 = { norecurse nounwind readnone "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="512" "no-builtins" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 12.0.0 (https://github.com/llvm/llvm-project 530c69e90964444bc916d38b337105ab44f0961b)"}
$ llc a.ll
llc: llvm/lib/Target/X86/X86ISelLowering.cpp:12493: llvm::SDValue lowerShuffleAsByteRotate(const llvm::SDLoc&, llvm::MVT, llvm::SDValue, llvm::SDValue, llvm::ArrayRef<int>, const llvm::X86Subtarget&, llvm::SelectionDAG&): Assertion `(!VT.is512BitVector() || Subtarget.hasBWI()) && "512-bit PALIGNR requires BWI instructions"' failed.

@g2p
Copy link
Author

g2p commented Nov 28, 2020

Thanks! Forwarded: https://bugs.llvm.org/show_bug.cgi?id=48322

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants