[NVPTX] Fix crash caused by ComputePTXValueVTs #104524

justinfargnoli · 2024-08-15T23:38:44Z

When lowering return values from LLVM IR to SelectionDAG, we check that the number of values SelectionDAG tells us to return is equal to the number of values that ComputePTXValueVTs() tells us to return. However, this check can fail on valid IR. For example:

define <6 x half> @foo() {
  ret <6 x half> zeroinitializer
}

ComputePTXValueVTs() tells us to return 3 v2f16 values, while SelectionDAG tells us to return 6 f16 values. Thus, the compiler will crash.

ComputePTXValueVTs() supports all half element vectors with an even number of elements. Whereas SelectionDAG only supports power-of-2 sized vectors. This is the root of the discrepancy.

Assuming that the developers who added the code to ComputePTXValueVTs() overlooked this, I've restricted ComputePTXValueVTs() to compute the same number of return values as SelectionDAG, instead of extending SelectionDAG to support non-power-of-2 sized vectors.

…tionDAG

llvmbot · 2024-08-15T23:39:19Z

@llvm/pr-subscribers-backend-nvptx

Author: Justin Fargnoli (justinfargnoli)

Changes

When lowering return values from LLVM IR to SelectionDAG, we check that the number of values SelectionDAG tells us to return is equal to the number of values that ComputePTXValueVTs() tells us to return. However, this check can fail on valid IR. For example:

define &lt;6 x half&gt; @<!-- -->foo() {
  ret &lt;6 x half&gt; zeroinitializer
}

ComputePTXValueVTs() tells us to return 3 v2f16 values, while SelectionDAG tells us to return 6 f16 values.

ComputePTXValueVTs() supports all half element vectors with an even number of elements. Whereas SelectionDAG only supports power-of-2 sized vectors.

Assuming that the developers who added the code to ComputePTXValueVTs() overlooked this, I've restricted ComputePTXValueVTs() to compute the same number of return values as SelectionDAG instead of extending SelectionDAG to support non-power-of-2 vector sizes.

Full diff: https://github.com/llvm/llvm-project/pull/104524.diff

2 Files Affected:

(modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+7-5)
(added) llvm/test/CodeGen/NVPTX/compute-ptx-value-vts.ll (+45)

diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 43a3fbf4d1306a..878c792a9b06ca 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -207,10 +207,11 @@ static void ComputePTXValueVTs(const TargetLowering &TLI, const DataLayout &DL,
     if (VT.isVector()) {
       unsigned NumElts = VT.getVectorNumElements();
       EVT EltVT = VT.getVectorElementType();
-      // Vectors with an even number of f16 elements will be passed to
-      // us as an array of v2f16/v2bf16 elements. We must match this so we
-      // stay in sync with Ins/Outs.
-      if ((Is16bitsType(EltVT.getSimpleVT())) && NumElts % 2 == 0) {
+      if ((Is16bitsType(EltVT.getSimpleVT())) && NumElts % 2 == 0 &&
+          isPowerOf2_32(NumElts)) {
+        // Vectors with an even number of f16 elements will be passed to
+        // us as an array of v2f16/v2bf16 elements. We must match this so we
+        // stay in sync with Ins/Outs.
         switch (EltVT.getSimpleVT().SimpleTy) {
         case MVT::f16:
           EltVT = MVT::v2f16;
@@ -226,7 +227,8 @@ static void ComputePTXValueVTs(const TargetLowering &TLI, const DataLayout &DL,
         }
         NumElts /= 2;
       } else if (EltVT.getSimpleVT() == MVT::i8 &&
-                 (NumElts % 4 == 0 || NumElts == 3)) {
+                 ((NumElts % 4 == 0 && isPowerOf2_32(NumElts)) ||
+                  NumElts == 3)) {
         // v*i8 are formally lowered as v4i8
         EltVT = MVT::v4i8;
         NumElts = (NumElts + 3) / 4;
diff --git a/llvm/test/CodeGen/NVPTX/compute-ptx-value-vts.ll b/llvm/test/CodeGen/NVPTX/compute-ptx-value-vts.ll
new file mode 100644
index 00000000000000..4c56b5fb5a34c3
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/compute-ptx-value-vts.ll
@@ -0,0 +1,45 @@
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_20
+
+define <6 x half> @half6() {
+  ret <6 x half> zeroinitializer
+}
+
+define <10 x half> @half10() {
+  ret <10 x half> zeroinitializer
+}
+
+define <14 x half> @half14() {
+  ret <14 x half> zeroinitializer
+}
+
+define <18 x half> @half18() {
+  ret <18 x half> zeroinitializer
+}
+
+define <998 x half> @half998() {
+  ret <998 x half> zeroinitializer
+}
+
+define <12 x i8> @byte12() {
+  ret <12 x i8> zeroinitializer
+}
+
+define <20 x i8> @byte20() {
+  ret <20 x i8> zeroinitializer
+}
+
+define <24 x i8> @byte24() {
+  ret <24 x i8> zeroinitializer
+}
+
+define <28 x i8> @byte28() {
+  ret <28 x i8> zeroinitializer
+}
+
+define <36 x i8> @byte36() {
+  ret <36 x i8> zeroinitializer
+}
+
+define <996 x i8> @byte996() {
+  ret <996 x i8> zeroinitializer
+}

llvm/test/CodeGen/NVPTX/compute-ptx-value-vts.ll

gonzalobg · 2024-08-17T14:30:26Z

llvm/test/CodeGen/NVPTX/compute-ptx-value-vts.ll

+; CHECK-NEXT:    st.param.v4.b8 [func_retval0+84], {%rs1, %rs1, %rs1, %rs1};
+; CHECK-NEXT:    st.param.v4.b8 [func_retval0+88], {%rs1, %rs1, %rs1, %rs1};
+; CHECK-NEXT:    st.param.v4.b8 [func_retval0+92], {%rs1, %rs1, %rs1, %rs1};
+; CHECK-NEXT:    st.param.v4.b8 [func_retval0+96], {%rs1, %


Why:

Is a .b16 register being used here for the zero initialization, instead o f a .b8 ?

Is an <3 x i1> taking 3 bytes of storage instead of 1?

I'd expect a warp-wide <32 x i1> mask to be taking 4 bytes, but according to this, it takes 32 bytes.

I'm not suggesting this PR should fix those, but maybe we need an issue tracking these things.

NVPTX data layout says that i1 is stored as 8 bits.

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"

This reverts commit 67de9a5.

This reverts commit a9bb5f4.

justinfargnoli · 2024-08-19T21:04:46Z

llvm/test/CodeGen/NVPTX/vector-returns.ll

+define <2 x i8> @byte2() {
+  ret <2 x i8> zeroinitializer
+}


This test exposes another crash caused by lowering a v2i8 return type.

Unless I'm able to fix the bug before I submit this PR, I'll fix the crash and reenable the test in a separate PR.

What's the status of this crash?

I've briefly looked into it.

I've posted what I found on #104864.

However, I must postpone fixing this until I resolve two other bugs.

llvm/test/CodeGen/NVPTX/vector-returns.ll

Artem-B · 2024-08-19T21:08:06Z

llvm/test/CodeGen/NVPTX/vector-returns.ll

+; CHECK-NEXT:    st.param.v4.b8 [func_retval0+0], {%rs1, %rs1, %rs1, %rs1};
+; CHECK-NEXT:    st.param.v4.b8 [func_retval0+4], {%rs1, %rs1, %rs1, %rs1};
+; CHECK-NEXT:    st.param.v4.b8 [func_retval0+8], {%rs1, %rs1, %rs1, %rs1};
+; CHECK-NEXT:    st.param.v2.b8 [func_retval0+12], {%rs1, %rs1};


Another optimization opportunity would be to lower first 8 bytes in one instruction.

llvm/test/CodeGen/NVPTX/vector-returns.ll

justinfargnoli · 2024-08-27T23:22:34Z

Ping @Artem-B

Artem-B · 2024-08-27T23:44:29Z

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

        NumElts /= 2;
      } else if (EltVT.getSimpleVT() == MVT::i8 &&
-                 (NumElts % 4 == 0 || NumElts == 3)) {
+                 ((NumElts % 4 == 0 && isPowerOf2_32(NumElts)) ||


This condition is rather puzzling.
AFAICT, previously, we'd accept i8 vectors with multiples of 4 elements, and a special case of v3i8, and lowered them all as N * v4i8.

Now we'll only accept multiples of 4, that are also a power of 2 and it's not clear why. Is there any reason we should be able to handle v16i8 here but not v12i8?.

This, at the very least, needs a comment explaining what's going on, or, possibly, a change to the condition to better reflect what we're checking for here.

AFAICT, previously, we'd accept i8 vectors with multiples of 4 elements, and a special case of v3i8, and lowered them all as N * v4i8.

Without this PR, vector types like v12i8 will result in a compiler crash.

This PR should only disable this code on inputs, which, when invoked via LowerReturn(), would've resulted in a crash.

This, at the very least, needs a comment explaining what's going on

I've added a comment in 0c7cdb6.

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

llvm-ci · 2024-08-30T01:28:15Z

LLVM Buildbot has detected a new failure on builder clang-hip-vega20 running on hip-vega20-0 while building llvm at step 3 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/4576

Here is the relevant piece of the build log for the reference

Step 3 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/hip-build.sh --jobs=' (failure)
...
[38/40] : && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -O3 -DNDEBUG  External/HIP/CMakeFiles/memmove-hip-6.0.2.dir/memmove.hip.o -o External/HIP/memmove-hip-6.0.2  --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --hip-link -rtlib=compiler-rt -unwindlib=libgcc -frtlib-add-rpath && cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /usr/local/bin/cmake -E create_symlink /buildbot/llvm-test-suite/External/HIP/memmove.reference_output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/memmove.reference_output-hip-6.0.2
[39/40] /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG  -O3 -DNDEBUG   -w -Werror=date-time --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /buildbot/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc
[40/40] : && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -O3 -DNDEBUG  External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -o External/HIP/TheNextWeek-hip-6.0.2  --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --hip-link -rtlib=compiler-rt -unwindlib=libgcc -frtlib-add-rpath && cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /usr/local/bin/cmake -E create_symlink /buildbot/llvm-test-suite/External/HIP/TheNextWeek.reference_output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/TheNextWeek.reference_output-hip-6.0.2
+ build_step 'Testing HIP test-suite'
+ echo '@@@BUILD_STEP Testing HIP test-suite@@@'
+ ninja -v check-hip-simple
@@@BUILD_STEP Testing HIP test-suite@@@
[0/1] cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
-- Testing: 7 tests, 7 workers --
Testing:  0.. 10.. 20.. 30.. 40
FAIL: test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test (4 of 7)
******************** TEST 'test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test' FAILED ********************

/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out --redirect-input /dev/null --summary /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.time /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/InOneWeekend-hip-6.0.2
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP ; /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2

+ cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP
+ /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2
/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target: Comparison failed, textual difference between 'M' and 'i'

********************
/usr/bin/strip: /bin/bash.stripped: Bad file descriptor
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test


Testing Time: 371.55s

Total Discovered Tests: 7
  Passed: 6 (85.71%)
  Failed: 1 (14.29%)
FAILED: External/HIP/CMakeFiles/check-hip-simple-hip-6.0.2 
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
ninja: build stopped: subcommand failed.
Step 12 (Testing HIP test-suite) failure: Testing HIP test-suite (failure)
@@@BUILD_STEP Testing HIP test-suite@@@
[0/1] cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
-- Testing: 7 tests, 7 workers --
Testing:  0.. 10.. 20.. 30.. 40
FAIL: test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test (4 of 7)
******************** TEST 'test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test' FAILED ********************

/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out --redirect-input /dev/null --summary /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.time /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/InOneWeekend-hip-6.0.2
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP ; /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2

+ cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP
+ /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2
/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target: Comparison failed, textual difference between 'M' and 'i'

********************
/usr/bin/strip: /bin/bash.stripped: Bad file descriptor
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test


Testing Time: 371.55s

Total Discovered Tests: 7
  Passed: 6 (85.71%)
  Failed: 1 (14.29%)
FAILED: External/HIP/CMakeFiles/check-hip-simple-hip-6.0.2 
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
ninja: build stopped: subcommand failed.
program finished with exit code 1
elapsedTime=485.177021

justinfargnoli added 3 commits August 15, 2024 13:27

[NVPTX] Fix crash caused by diff between ComputePTXValueVTs and Selec…

0c7cdb6

…tionDAG

clang-format

60a9e55

Add large test to prevent special casing

c670aa9

justinfargnoli requested a review from Artem-B August 15, 2024 23:38

justinfargnoli self-assigned this Aug 15, 2024

llvmbot added the backend:NVPTX label Aug 15, 2024

justinfargnoli requested a review from ThomasRaoux August 15, 2024 23:39

Artem-B reviewed Aug 15, 2024

View reviewed changes

llvm/test/CodeGen/NVPTX/compute-ptx-value-vts.ll Show resolved Hide resolved

llvm/test/CodeGen/NVPTX/compute-ptx-value-vts.ll Outdated Show resolved Hide resolved

Update test

376a3d3

justinfargnoli requested a review from Artem-B August 16, 2024 19:42

Artem-B reviewed Aug 16, 2024

View reviewed changes

llvm/test/CodeGen/NVPTX/compute-ptx-value-vts.ll Show resolved Hide resolved

justinfargnoli added 2 commits August 16, 2024 18:06

Update test with combinations less than 24 bytes

a9bb5f4

Fix tabs

67de9a5

justinfargnoli requested a review from Artem-B August 17, 2024 01:10

gonzalobg reviewed Aug 17, 2024

View reviewed changes

justinfargnoli added 3 commits August 19, 2024 12:48

Revert "Fix tabs"

79b5c16

This reverts commit 67de9a5.

Revert "Update test with combinations less than 24 bytes"

51afa32

This reverts commit a9bb5f4.

Add vector-returns.ll. XFAIL bc byte2 crashes.

7bd0489

justinfargnoli commented Aug 19, 2024

View reviewed changes

Artem-B reviewed Aug 19, 2024

View reviewed changes

justinfargnoli mentioned this pull request Aug 19, 2024

[NVPTX] Failure to lower v2i8 #104864

Closed

Remove XFAIL. Add FIXME.

3cea7d6

justinfargnoli mentioned this pull request Aug 19, 2024

[NVPTX] When lowering return types we partially vectorized <9 x i8>, but not <8 x i8> #104865

Open

justinfargnoli requested a review from Artem-B August 19, 2024 21:39

Artem-B reviewed Aug 27, 2024

View reviewed changes

Add comment explaining why elements must have a power-of-2 length

de42bd9

justinfargnoli commented Aug 28, 2024

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp Show resolved Hide resolved

justinfargnoli requested a review from Artem-B August 29, 2024 20:57

Artem-B approved these changes Aug 29, 2024

View reviewed changes

justinfargnoli merged commit cdaebf6 into llvm:main Aug 30, 2024

[NVPTX] Fix crash caused by ComputePTXValueVTs #104524

[NVPTX] Fix crash caused by ComputePTXValueVTs #104524

Uh oh!

Conversation

justinfargnoli commented Aug 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 15, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gonzalobg Aug 17, 2024

Choose a reason for hiding this comment

Uh oh!

Artem-B Aug 19, 2024

Choose a reason for hiding this comment

Uh oh!

justinfargnoli Aug 19, 2024

Choose a reason for hiding this comment

Uh oh!

Artem-B Aug 27, 2024

Choose a reason for hiding this comment

Uh oh!

justinfargnoli Aug 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Artem-B Aug 19, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

justinfargnoli commented Aug 27, 2024

Uh oh!

Artem-B Aug 27, 2024

Choose a reason for hiding this comment

Uh oh!

justinfargnoli Aug 28, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Aug 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

justinfargnoli commented Aug 15, 2024 •

edited

Loading

justinfargnoli Aug 28, 2024 •

edited

Loading