-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[VectorCombine] Apply InstSimplify in scalarizeOpOrCmp to avoid infinite loop #153069
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…avoid infinite loop
|
@llvm/pr-subscribers-vectorizers Author: XChy (XChy) ChangesFixes #153012 As we miss the cost of unfoldable constant expression, we may fold define void @<!-- -->bug(ptr %ptr1, ptr %ptr2, i64 %idx) #<!-- -->0 {
entry:
%158 = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @<!-- -->val to i64)>, i64 %idx, i32 0
%159 = or disjoint <2 x i64> splat (i64 2), %158
store <2 x i64> %159, ptr %ptr2
ret void
}to define void @<!-- -->bug(ptr %ptr1, ptr %ptr2, i64 %idx) {
entry:
%.scalar = or disjoint i64 2, %idx
%0 = or <2 x i64> splat (i64 2), <i64 5, i64 ptrtoint (ptr @<!-- -->val to i64)>
%1 = insertelement <2 x i64> %0, i64 %.scalar, i64 0
store <2 x i64> %1, ptr %ptr2, align 16
ret void
}And it would be folded back in This patch calculates the cost precisely and adds a constraint to prevent splitting constant expression. Full diff: https://github.com/llvm/llvm-project/pull/153069.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 6345b18b809a6..31c8320bd4933 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -769,6 +769,11 @@ bool VectorCombine::foldInsExtBinop(Instruction &I) {
if (!ResultTy)
return false;
+ // Avoid splitting the unfoldable constant expression binop(x,y), otherwise
+ // binop(insert(x,a,idx),insert(y,b,idx)) may be folded back and forth.
+ if (match(VecBinOp, m_BinOp(m_Constant(), m_Constant())))
+ return false;
+
// TODO: Attempt to detect m_ExtractElt for scalar operands and convert to
// shuffle?
@@ -1250,6 +1255,10 @@ bool VectorCombine::scalarizeOpOrCmp(Instruction &I) {
InstructionCost NewCost =
ScalarOpCost + TTI.getVectorInstrCost(Instruction::InsertElement, VecTy,
CostKind, *Index, NewVecC);
+ // Additional cost for unfoldable constant expression.
+ if (!NewVecC)
+ NewCost += VectorOpCost;
+
for (auto [Idx, Op, VecC, Scalar] : enumerate(Ops, VecCs, ScalarOps)) {
if (!Scalar || (II && isVectorIntrinsicWithScalarOpAtArg(
II->getIntrinsicID(), Idx, &TTI)))
diff --git a/llvm/test/Transforms/VectorCombine/binop-scalarize.ll b/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
index 52a706a0b59a7..bc07f8b086496 100644
--- a/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
+++ b/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
@@ -20,3 +20,22 @@ define <4 x i8> @udiv_ub(i8 %x, i8 %y) {
%v = udiv <4 x i8> %x.insert, %y.insert
ret <4 x i8> %v
}
+
+
+; Unfoldable constant expression may cause infinite loop between
+; scalarizing insertelement and folding binop(insert(x,a,idx),insert(y,b,idx))
+@val = external hidden global ptr, align 8
+
+define <2 x i64> @pr153012(i64 %idx) #0 {
+; CHECK-LABEL: define <2 x i64> @pr153012(
+; CHECK-SAME: i64 [[IDX:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[A:%.*]] = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 [[IDX]], i32 0
+; CHECK-NEXT: [[B:%.*]] = or disjoint <2 x i64> splat (i64 2), [[A]]
+; CHECK-NEXT: ret <2 x i64> [[B]]
+;
+entry:
+ %a = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 %idx, i32 0
+ %b = or disjoint <2 x i64> splat (i64 2), %a
+ ret <2 x i64> %b
+}
|
|
@llvm/pr-subscribers-llvm-transforms Author: XChy (XChy) ChangesFixes #153012 As we miss the cost of unfoldable constant expression, we may fold define void @<!-- -->bug(ptr %ptr1, ptr %ptr2, i64 %idx) #<!-- -->0 {
entry:
%158 = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @<!-- -->val to i64)>, i64 %idx, i32 0
%159 = or disjoint <2 x i64> splat (i64 2), %158
store <2 x i64> %159, ptr %ptr2
ret void
}to define void @<!-- -->bug(ptr %ptr1, ptr %ptr2, i64 %idx) {
entry:
%.scalar = or disjoint i64 2, %idx
%0 = or <2 x i64> splat (i64 2), <i64 5, i64 ptrtoint (ptr @<!-- -->val to i64)>
%1 = insertelement <2 x i64> %0, i64 %.scalar, i64 0
store <2 x i64> %1, ptr %ptr2, align 16
ret void
}And it would be folded back in This patch calculates the cost precisely and adds a constraint to prevent splitting constant expression. Full diff: https://github.com/llvm/llvm-project/pull/153069.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 6345b18b809a6..31c8320bd4933 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -769,6 +769,11 @@ bool VectorCombine::foldInsExtBinop(Instruction &I) {
if (!ResultTy)
return false;
+ // Avoid splitting the unfoldable constant expression binop(x,y), otherwise
+ // binop(insert(x,a,idx),insert(y,b,idx)) may be folded back and forth.
+ if (match(VecBinOp, m_BinOp(m_Constant(), m_Constant())))
+ return false;
+
// TODO: Attempt to detect m_ExtractElt for scalar operands and convert to
// shuffle?
@@ -1250,6 +1255,10 @@ bool VectorCombine::scalarizeOpOrCmp(Instruction &I) {
InstructionCost NewCost =
ScalarOpCost + TTI.getVectorInstrCost(Instruction::InsertElement, VecTy,
CostKind, *Index, NewVecC);
+ // Additional cost for unfoldable constant expression.
+ if (!NewVecC)
+ NewCost += VectorOpCost;
+
for (auto [Idx, Op, VecC, Scalar] : enumerate(Ops, VecCs, ScalarOps)) {
if (!Scalar || (II && isVectorIntrinsicWithScalarOpAtArg(
II->getIntrinsicID(), Idx, &TTI)))
diff --git a/llvm/test/Transforms/VectorCombine/binop-scalarize.ll b/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
index 52a706a0b59a7..bc07f8b086496 100644
--- a/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
+++ b/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
@@ -20,3 +20,22 @@ define <4 x i8> @udiv_ub(i8 %x, i8 %y) {
%v = udiv <4 x i8> %x.insert, %y.insert
ret <4 x i8> %v
}
+
+
+; Unfoldable constant expression may cause infinite loop between
+; scalarizing insertelement and folding binop(insert(x,a,idx),insert(y,b,idx))
+@val = external hidden global ptr, align 8
+
+define <2 x i64> @pr153012(i64 %idx) #0 {
+; CHECK-LABEL: define <2 x i64> @pr153012(
+; CHECK-SAME: i64 [[IDX:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[A:%.*]] = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 [[IDX]], i32 0
+; CHECK-NEXT: [[B:%.*]] = or disjoint <2 x i64> splat (i64 2), [[A]]
+; CHECK-NEXT: ret <2 x i64> [[B]]
+;
+entry:
+ %a = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 %idx, i32 0
+ %b = or disjoint <2 x i64> splat (i64 2), %a
+ ret <2 x i64> %b
+}
|
| CostKind, *Index, NewVecC); | ||
| // Additional cost for unfoldable constant expression. | ||
| if (!NewVecC) | ||
| NewCost += VectorOpCost; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would replacing m_Constant(VecC) with m_ImmConstant(VecC) above work instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. And yes, it prevents any constant expression.
The comment below says that Create a new base vector if the constant folding failed. The original code seemed to try the fold after the constant folding failed. And it's theoretically possible to be profitable if the operand of binop has multiple uses. How do you think about that?
If still ok, I would adopt your advice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, and note that m_ImmConstant still allows some constant expressions with poison unfoldable for TargetFolder.
| ; CHECK-NEXT: [[TMP1:%.*]] = call <vscale x 4 x float> @llvm.fabs.nxv4f32(<vscale x 4 x float> poison) | ||
| ; CHECK-NEXT: [[V:%.*]] = insertelement <vscale x 4 x float> [[TMP1]], float [[V_SCALAR]], i64 0 | ||
| ; CHECK-NEXT: [[X_INSERT:%.*]] = insertelement <vscale x 4 x float> poison, float [[X]], i32 0 | ||
| ; CHECK-NEXT: [[V:%.*]] = call <vscale x 4 x float> @llvm.fabs.nxv4f32(<vscale x 4 x float> [[X_INSERT]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a regression. Is there a way to avoid this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can replace TargetFolder with InstSimplifyFolder and rely on InstSimplify to fold n-op intrinsics to avoid such regression. That may be put into a new patch?
|
I compiled the source where I originally encountered the infinite loop with this change, and it fixes the issue. Thanks! |
| // Additional cost for unfoldable constant expression. | ||
| if (!NewVecC) | ||
| NewCost += VectorOpCost; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay with the review.
After thinking about this for a bit, I think scalarizeOpOrCmp should only scalarize iff the base vector NewVecC can be folded.
I think that's what the original code in 0d2a0b4 kind of assumed. It's probably never profitable to do the scalarization if the base vector doesn't go away.
And whenever I was last working on scalarizeOpOrCmp, my only motivation was to scalarize scalable splats, which are inserts into poison vectors that are constant foldable anyway.
If we change TargetFolder to InstSimplifyFolder in this PR to minimize the regressions, and do something like
| if (!NewVecC) | |
| return nullptr; |
then we can get rid of the early out in foldInsExtBinop, and also this bit below:
// Create a new base vector if the constant folding failed.
if (!NewVecC) {
if (CI)
NewVecC = Builder.CreateCmp(CI->getPredicate(), VecCs[0], VecCs[1]);
else if (UO || BO)
NewVecC = Builder.CreateNAryOp(Opcode, VecCs);
else
NewVecC = Builder.CreateIntrinsic(VecTy, II->getIntrinsicID(), VecCs);
}I think the only regression we would be left with is that we can't constant fold n-ary intrinsics at the moment, but this is something that we should fix in IRBuilderFolder. This would basically undo #138406, but I'm happy to take those regressions for now if we want this fix to land in for LLVM 21.
What do you think? cc @RKSimon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened up #153743 to track folding n-ary intrinsics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds reasonable to me if there is no possible profitability. And I think simplifyInstruction(Instruction *I, const SimplifyQuery &Q) in InstSimplify would resolve your concern about n-ary intrinsics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or ConstantFoldInstOperands may be simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplifyInstWithOperands requires distinguishing intrinsics among the operators. Thus, I implement by simplifyCmpInst, simplifyUnOp, etc., case by case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh perfect, great to see we could avoid the regression. Thanks!
RKSimon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (the CI failure appears to be an unrelated lldb failure)
lukel97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
| // Additional cost for unfoldable constant expression. | ||
| if (!NewVecC) | ||
| NewCost += VectorOpCost; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh perfect, great to see we could avoid the regression. Thanks!
|
/cherry-pick 3a4a60d |
|
/pull-request #153958 |
…ite loop (llvm#153069) Fixes llvm#153012 As we tolerate unfoldable constant expressions in `scalarizeOpOrCmp`, we may fold ```llvm define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) #0 { entry: %158 = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @Val to i64)>, i64 %idx, i32 0 %159 = or disjoint <2 x i64> splat (i64 2), %158 store <2 x i64> %159, ptr %ptr2 ret void } ``` to ```llvm define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) { entry: %.scalar = or disjoint i64 2, %idx %0 = or <2 x i64> splat (i64 2), <i64 5, i64 ptrtoint (ptr @Val to i64)> %1 = insertelement <2 x i64> %0, i64 %.scalar, i64 0 store <2 x i64> %1, ptr %ptr2, align 16 ret void } ``` And it would be folded back in `foldInsExtBinop`, resulting in an infinite loop. This patch forces scalarization iff InstSimplify can fold the constant expression. (cherry picked from commit 3a4a60d)
Fixes #153012
As we tolerate unfoldable constant expressions in
scalarizeOpOrCmp, we may foldto
And it would be folded back in
foldInsExtBinop, resulting in an infinite loop.This patch forces scalarization iff InstSimplify can fold the constant expression.