-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[NVPTX] Fix crash caused by ComputePTXValueVTs #104524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NVPTX] Fix crash caused by ComputePTXValueVTs #104524
Conversation
|
@llvm/pr-subscribers-backend-nvptx Author: Justin Fargnoli (justinfargnoli) ChangesWhen lowering return values from LLVM IR to SelectionDAG, we check that the number of values
Assuming that the developers who added the code to Full diff: https://github.com/llvm/llvm-project/pull/104524.diff 2 Files Affected:
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 43a3fbf4d1306a..878c792a9b06ca 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -207,10 +207,11 @@ static void ComputePTXValueVTs(const TargetLowering &TLI, const DataLayout &DL,
if (VT.isVector()) {
unsigned NumElts = VT.getVectorNumElements();
EVT EltVT = VT.getVectorElementType();
- // Vectors with an even number of f16 elements will be passed to
- // us as an array of v2f16/v2bf16 elements. We must match this so we
- // stay in sync with Ins/Outs.
- if ((Is16bitsType(EltVT.getSimpleVT())) && NumElts % 2 == 0) {
+ if ((Is16bitsType(EltVT.getSimpleVT())) && NumElts % 2 == 0 &&
+ isPowerOf2_32(NumElts)) {
+ // Vectors with an even number of f16 elements will be passed to
+ // us as an array of v2f16/v2bf16 elements. We must match this so we
+ // stay in sync with Ins/Outs.
switch (EltVT.getSimpleVT().SimpleTy) {
case MVT::f16:
EltVT = MVT::v2f16;
@@ -226,7 +227,8 @@ static void ComputePTXValueVTs(const TargetLowering &TLI, const DataLayout &DL,
}
NumElts /= 2;
} else if (EltVT.getSimpleVT() == MVT::i8 &&
- (NumElts % 4 == 0 || NumElts == 3)) {
+ ((NumElts % 4 == 0 && isPowerOf2_32(NumElts)) ||
+ NumElts == 3)) {
// v*i8 are formally lowered as v4i8
EltVT = MVT::v4i8;
NumElts = (NumElts + 3) / 4;
diff --git a/llvm/test/CodeGen/NVPTX/compute-ptx-value-vts.ll b/llvm/test/CodeGen/NVPTX/compute-ptx-value-vts.ll
new file mode 100644
index 00000000000000..4c56b5fb5a34c3
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/compute-ptx-value-vts.ll
@@ -0,0 +1,45 @@
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_20
+
+define <6 x half> @half6() {
+ ret <6 x half> zeroinitializer
+}
+
+define <10 x half> @half10() {
+ ret <10 x half> zeroinitializer
+}
+
+define <14 x half> @half14() {
+ ret <14 x half> zeroinitializer
+}
+
+define <18 x half> @half18() {
+ ret <18 x half> zeroinitializer
+}
+
+define <998 x half> @half998() {
+ ret <998 x half> zeroinitializer
+}
+
+define <12 x i8> @byte12() {
+ ret <12 x i8> zeroinitializer
+}
+
+define <20 x i8> @byte20() {
+ ret <20 x i8> zeroinitializer
+}
+
+define <24 x i8> @byte24() {
+ ret <24 x i8> zeroinitializer
+}
+
+define <28 x i8> @byte28() {
+ ret <28 x i8> zeroinitializer
+}
+
+define <36 x i8> @byte36() {
+ ret <36 x i8> zeroinitializer
+}
+
+define <996 x i8> @byte996() {
+ ret <996 x i8> zeroinitializer
+}
|
| ; CHECK-NEXT: st.param.v4.b8 [func_retval0+84], {%rs1, %rs1, %rs1, %rs1}; | ||
| ; CHECK-NEXT: st.param.v4.b8 [func_retval0+88], {%rs1, %rs1, %rs1, %rs1}; | ||
| ; CHECK-NEXT: st.param.v4.b8 [func_retval0+92], {%rs1, %rs1, %rs1, %rs1}; | ||
| ; CHECK-NEXT: st.param.v4.b8 [func_retval0+96], {%rs1, % |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why:
- Is a
.b16register being used here for the zero initialization, instead o f a.b8? - Is an
<3 x i1>taking 3 bytes of storage instead of 1?
I'd expect a warp-wide <32 x i1> mask to be taking 4 bytes, but according to this, it takes 32 bytes.
I'm not suggesting this PR should fix those, but maybe we need an issue tracking these things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVPTX data layout says that i1 is stored as 8 bits.
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
| define <2 x i8> @byte2() { | ||
| ret <2 x i8> zeroinitializer | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test exposes another crash caused by lowering a v2i8 return type.
Unless I'm able to fix the bug before I submit this PR, I'll fix the crash and reenable the test in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the status of this crash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've briefly looked into it.
I've posted what I found on #104864.
However, I must postpone fixing this until I resolve two other bugs.
| ; CHECK-NEXT: st.param.v4.b8 [func_retval0+0], {%rs1, %rs1, %rs1, %rs1}; | ||
| ; CHECK-NEXT: st.param.v4.b8 [func_retval0+4], {%rs1, %rs1, %rs1, %rs1}; | ||
| ; CHECK-NEXT: st.param.v4.b8 [func_retval0+8], {%rs1, %rs1, %rs1, %rs1}; | ||
| ; CHECK-NEXT: st.param.v2.b8 [func_retval0+12], {%rs1, %rs1}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another optimization opportunity would be to lower first 8 bytes in one instruction.
|
Ping @Artem-B |
| NumElts /= 2; | ||
| } else if (EltVT.getSimpleVT() == MVT::i8 && | ||
| (NumElts % 4 == 0 || NumElts == 3)) { | ||
| ((NumElts % 4 == 0 && isPowerOf2_32(NumElts)) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition is rather puzzling.
AFAICT, previously, we'd accept i8 vectors with multiples of 4 elements, and a special case of v3i8, and lowered them all as N * v4i8.
Now we'll only accept multiples of 4, that are also a power of 2 and it's not clear why. Is there any reason we should be able to handle v16i8 here but not v12i8?.
This, at the very least, needs a comment explaining what's going on, or, possibly, a change to the condition to better reflect what we're checking for here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, previously, we'd accept i8 vectors with multiples of 4 elements, and a special case of v3i8, and lowered them all as N * v4i8.
Without this PR, vector types like v12i8 will result in a compiler crash.
This PR should only disable this code on inputs, which, when invoked via LowerReturn(), would've resulted in a crash.
This, at the very least, needs a comment explaining what's going on
I've added a comment in 0c7cdb6.
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/4576 Here is the relevant piece of the build log for the reference |
When lowering return values from LLVM IR to SelectionDAG, we check that the number of values
SelectionDAGtells us to return is equal to the number of values thatComputePTXValueVTs()tells us to return. However, this check can fail on valid IR. For example:ComputePTXValueVTs()tells us to return 3v2f16values, whileSelectionDAGtells us to return 6f16values. Thus, the compiler will crash.ComputePTXValueVTs()supports allhalfelement vectors with an even number of elements. WhereasSelectionDAGonly supports power-of-2 sized vectors. This is the root of the discrepancy.Assuming that the developers who added the code to
ComputePTXValueVTs()overlooked this, I've restrictedComputePTXValueVTs()to compute the same number of return values asSelectionDAG, instead of extendingSelectionDAGto support non-power-of-2 sized vectors.