[VectorCombine] Apply InstSimplify in scalarizeOpOrCmp to avoid infinite loop #153069

XChy · 2025-08-11T19:08:24Z

As we tolerate unfoldable constant expressions in scalarizeOpOrCmp, we may fold

define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) #0 {
entry:
  %158 = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 %idx, i32 0
  %159 = or disjoint <2 x i64> splat (i64 2), %158
  store <2 x i64> %159, ptr %ptr2
  ret void
}

to

define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) {
entry:
  %.scalar = or disjoint i64 2, %idx
  %0 = or <2 x i64> splat (i64 2), <i64 5, i64 ptrtoint (ptr @val to i64)>
  %1 = insertelement <2 x i64> %0, i64 %.scalar, i64 0
  store <2 x i64> %1, ptr %ptr2, align 16
  ret void
}

And it would be folded back in foldInsExtBinop, resulting in an infinite loop.

This patch forces scalarization iff InstSimplify can fold the constant expression.

…avoid infinite loop

llvmbot · 2025-08-11T19:09:00Z

@llvm/pr-subscribers-vectorizers

Author: XChy (XChy)

Changes

Fixes #153012

As we miss the cost of unfoldable constant expression, we may fold

define void @<!-- -->bug(ptr %ptr1, ptr %ptr2, i64 %idx) #<!-- -->0 {
entry:
  %158 = insertelement &lt;2 x i64&gt; &lt;i64 5, i64 ptrtoint (ptr @<!-- -->val to i64)&gt;, i64 %idx, i32 0
  %159 = or disjoint &lt;2 x i64&gt; splat (i64 2), %158
  store &lt;2 x i64&gt; %159, ptr %ptr2
  ret void
}

to

define void @<!-- -->bug(ptr %ptr1, ptr %ptr2, i64 %idx) {
entry:
  %.scalar = or disjoint i64 2, %idx
  %0 = or &lt;2 x i64&gt; splat (i64 2), &lt;i64 5, i64 ptrtoint (ptr @<!-- -->val to i64)&gt;
  %1 = insertelement &lt;2 x i64&gt; %0, i64 %.scalar, i64 0
  store &lt;2 x i64&gt; %1, ptr %ptr2, align 16
  ret void
}

And it would be folded back in foldInsExtBinop, resulting in an infinite loop.

This patch calculates the cost precisely and adds a constraint to prevent splitting constant expression.

Full diff: https://github.com/llvm/llvm-project/pull/153069.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+9)
(modified) llvm/test/Transforms/VectorCombine/binop-scalarize.ll (+19)

diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 6345b18b809a6..31c8320bd4933 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -769,6 +769,11 @@ bool VectorCombine::foldInsExtBinop(Instruction &I) {
   if (!ResultTy)
     return false;
 
+  // Avoid splitting the unfoldable constant expression binop(x,y), otherwise
+  // binop(insert(x,a,idx),insert(y,b,idx)) may be folded back and forth.
+  if (match(VecBinOp, m_BinOp(m_Constant(), m_Constant())))
+    return false;
+
   // TODO: Attempt to detect m_ExtractElt for scalar operands and convert to
   // shuffle?
 
@@ -1250,6 +1255,10 @@ bool VectorCombine::scalarizeOpOrCmp(Instruction &I) {
   InstructionCost NewCost =
       ScalarOpCost + TTI.getVectorInstrCost(Instruction::InsertElement, VecTy,
                                             CostKind, *Index, NewVecC);
+  // Additional cost for unfoldable constant expression.
+  if (!NewVecC)
+    NewCost += VectorOpCost;
+
   for (auto [Idx, Op, VecC, Scalar] : enumerate(Ops, VecCs, ScalarOps)) {
     if (!Scalar || (II && isVectorIntrinsicWithScalarOpAtArg(
                               II->getIntrinsicID(), Idx, &TTI)))
diff --git a/llvm/test/Transforms/VectorCombine/binop-scalarize.ll b/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
index 52a706a0b59a7..bc07f8b086496 100644
--- a/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
+++ b/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
@@ -20,3 +20,22 @@ define <4 x i8> @udiv_ub(i8 %x, i8 %y) {
   %v = udiv <4 x i8> %x.insert, %y.insert
   ret <4 x i8> %v
 }
+
+
+; Unfoldable constant expression may cause infinite loop between 
+; scalarizing insertelement and folding binop(insert(x,a,idx),insert(y,b,idx))
+@val = external hidden global ptr, align 8
+
+define <2 x i64> @pr153012(i64 %idx) #0 {
+; CHECK-LABEL: define <2 x i64> @pr153012(
+; CHECK-SAME: i64 [[IDX:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[A:%.*]] = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 [[IDX]], i32 0
+; CHECK-NEXT:    [[B:%.*]] = or disjoint <2 x i64> splat (i64 2), [[A]]
+; CHECK-NEXT:    ret <2 x i64> [[B]]
+;
+entry:
+  %a = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 %idx, i32 0
+  %b = or disjoint <2 x i64> splat (i64 2), %a
+  ret <2 x i64> %b
+}

llvmbot · 2025-08-11T19:09:01Z

@llvm/pr-subscribers-llvm-transforms

Author: XChy (XChy)

Changes

Fixes #153012

As we miss the cost of unfoldable constant expression, we may fold

define void @<!-- -->bug(ptr %ptr1, ptr %ptr2, i64 %idx) #<!-- -->0 {
entry:
  %158 = insertelement &lt;2 x i64&gt; &lt;i64 5, i64 ptrtoint (ptr @<!-- -->val to i64)&gt;, i64 %idx, i32 0
  %159 = or disjoint &lt;2 x i64&gt; splat (i64 2), %158
  store &lt;2 x i64&gt; %159, ptr %ptr2
  ret void
}

to

define void @<!-- -->bug(ptr %ptr1, ptr %ptr2, i64 %idx) {
entry:
  %.scalar = or disjoint i64 2, %idx
  %0 = or &lt;2 x i64&gt; splat (i64 2), &lt;i64 5, i64 ptrtoint (ptr @<!-- -->val to i64)&gt;
  %1 = insertelement &lt;2 x i64&gt; %0, i64 %.scalar, i64 0
  store &lt;2 x i64&gt; %1, ptr %ptr2, align 16
  ret void
}

And it would be folded back in foldInsExtBinop, resulting in an infinite loop.

This patch calculates the cost precisely and adds a constraint to prevent splitting constant expression.

Full diff: https://github.com/llvm/llvm-project/pull/153069.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+9)
(modified) llvm/test/Transforms/VectorCombine/binop-scalarize.ll (+19)

diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 6345b18b809a6..31c8320bd4933 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -769,6 +769,11 @@ bool VectorCombine::foldInsExtBinop(Instruction &I) {
   if (!ResultTy)
     return false;
 
+  // Avoid splitting the unfoldable constant expression binop(x,y), otherwise
+  // binop(insert(x,a,idx),insert(y,b,idx)) may be folded back and forth.
+  if (match(VecBinOp, m_BinOp(m_Constant(), m_Constant())))
+    return false;
+
   // TODO: Attempt to detect m_ExtractElt for scalar operands and convert to
   // shuffle?
 
@@ -1250,6 +1255,10 @@ bool VectorCombine::scalarizeOpOrCmp(Instruction &I) {
   InstructionCost NewCost =
       ScalarOpCost + TTI.getVectorInstrCost(Instruction::InsertElement, VecTy,
                                             CostKind, *Index, NewVecC);
+  // Additional cost for unfoldable constant expression.
+  if (!NewVecC)
+    NewCost += VectorOpCost;
+
   for (auto [Idx, Op, VecC, Scalar] : enumerate(Ops, VecCs, ScalarOps)) {
     if (!Scalar || (II && isVectorIntrinsicWithScalarOpAtArg(
                               II->getIntrinsicID(), Idx, &TTI)))
diff --git a/llvm/test/Transforms/VectorCombine/binop-scalarize.ll b/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
index 52a706a0b59a7..bc07f8b086496 100644
--- a/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
+++ b/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
@@ -20,3 +20,22 @@ define <4 x i8> @udiv_ub(i8 %x, i8 %y) {
   %v = udiv <4 x i8> %x.insert, %y.insert
   ret <4 x i8> %v
 }
+
+
+; Unfoldable constant expression may cause infinite loop between 
+; scalarizing insertelement and folding binop(insert(x,a,idx),insert(y,b,idx))
+@val = external hidden global ptr, align 8
+
+define <2 x i64> @pr153012(i64 %idx) #0 {
+; CHECK-LABEL: define <2 x i64> @pr153012(
+; CHECK-SAME: i64 [[IDX:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[A:%.*]] = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 [[IDX]], i32 0
+; CHECK-NEXT:    [[B:%.*]] = or disjoint <2 x i64> splat (i64 2), [[A]]
+; CHECK-NEXT:    ret <2 x i64> [[B]]
+;
+entry:
+  %a = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 %idx, i32 0
+  %b = or disjoint <2 x i64> splat (i64 2), %a
+  ret <2 x i64> %b
+}

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

nikic · 2025-08-11T19:16:55Z

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

                                            CostKind, *Index, NewVecC);
+  // Additional cost for unfoldable constant expression.
+  if (!NewVecC)
+    NewCost += VectorOpCost;


Would replacing m_Constant(VecC) with m_ImmConstant(VecC) above work instead?

Thanks for your review. And yes, it prevents any constant expression.
The comment below says that Create a new base vector if the constant folding failed. The original code seemed to try the fold after the constant folding failed. And it's theoretically possible to be profitable if the operand of binop has multiple uses. How do you think about that?
If still ok, I would adopt your advice.

Hmm, and note that m_ImmConstant still allows some constant expressions with poison unfoldable for TargetFolder.

lukel97 · 2025-08-12T05:15:08Z

llvm/test/Transforms/VectorCombine/intrinsic-scalarize.ll

-; CHECK-NEXT:    [[TMP1:%.*]] = call <vscale x 4 x float> @llvm.fabs.nxv4f32(<vscale x 4 x float> poison)
-; CHECK-NEXT:    [[V:%.*]] = insertelement <vscale x 4 x float> [[TMP1]], float [[V_SCALAR]], i64 0
+; CHECK-NEXT:    [[X_INSERT:%.*]] = insertelement <vscale x 4 x float> poison, float [[X]], i32 0
+; CHECK-NEXT:    [[V:%.*]] = call <vscale x 4 x float> @llvm.fabs.nxv4f32(<vscale x 4 x float> [[X_INSERT]])


This looks like a regression. Is there a way to avoid this?

Yes, we can replace TargetFolder with InstSimplifyFolder and rely on InstSimplify to fold n-op intrinsics to avoid such regression. That may be put into a new patch?

redstar · 2025-08-12T19:29:03Z

I compiled the source where I originally encountered the infinite loop with this change, and it fixes the issue. Thanks!

llvm/test/Transforms/VectorCombine/X86/intrinsic-scalarize.ll

lukel97 · 2025-08-15T04:23:00Z

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

+  // Additional cost for unfoldable constant expression.
+  if (!NewVecC)
+    NewCost += VectorOpCost;
+


Sorry for the delay with the review.

After thinking about this for a bit, I think scalarizeOpOrCmp should only scalarize iff the base vector NewVecC can be folded.

I think that's what the original code in 0d2a0b4 kind of assumed. It's probably never profitable to do the scalarization if the base vector doesn't go away.

And whenever I was last working on scalarizeOpOrCmp, my only motivation was to scalarize scalable splats, which are inserts into poison vectors that are constant foldable anyway.

If we change TargetFolder to InstSimplifyFolder in this PR to minimize the regressions, and do something like

Suggested change

if (!NewVecC)

return nullptr;

then we can get rid of the early out in foldInsExtBinop, and also this bit below:

// Create a new base vector if the constant folding failed. if (!NewVecC) { if (CI) NewVecC = Builder.CreateCmp(CI->getPredicate(), VecCs[0], VecCs[1]); else if (UO || BO) NewVecC = Builder.CreateNAryOp(Opcode, VecCs); else NewVecC = Builder.CreateIntrinsic(VecTy, II->getIntrinsicID(), VecCs); }

I think the only regression we would be left with is that we can't constant fold n-ary intrinsics at the moment, but this is something that we should fix in IRBuilderFolder. This would basically undo #138406, but I'm happy to take those regressions for now if we want this fix to land in for LLVM 21.

What do you think? cc @RKSimon

I've opened up #153743 to track folding n-ary intrinsics

It sounds reasonable to me if there is no possible profitability. And I think simplifyInstruction(Instruction *I, const SimplifyQuery &Q) in InstSimplify would resolve your concern about n-ary intrinsics.

Or ConstantFoldInstOperands may be simpler.

simplifyInstWithOperands requires distinguishing intrinsics among the operators. Thus, I implement by simplifyCmpInst, simplifyUnOp, etc., case by case.

Oh perfect, great to see we could avoid the regression. Thanks!

RKSimon

LGTM (the CI failure appears to be an unrelated lldb failure)

lukel97

LGTM, thanks!

lukel97 · 2025-08-15T13:44:26Z

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

+  // Additional cost for unfoldable constant expression.
+  if (!NewVecC)
+    NewCost += VectorOpCost;
+


Oh perfect, great to see we could avoid the regression. Thanks!

lukel97 · 2025-08-16T14:53:57Z

/cherry-pick 3a4a60d

llvmbot · 2025-08-16T14:59:59Z

/pull-request #153958

@bug

…ite loop (llvm#153069) Fixes llvm#153012 As we tolerate unfoldable constant expressions in `scalarizeOpOrCmp`, we may fold ```llvm define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) #0 { entry: %158 = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @Val to i64)>, i64 %idx, i32 0 %159 = or disjoint <2 x i64> splat (i64 2), %158 store <2 x i64> %159, ptr %ptr2 ret void } ``` to ```llvm define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) { entry: %.scalar = or disjoint i64 2, %idx %0 = or <2 x i64> splat (i64 2), <i64 5, i64 ptrtoint (ptr @Val to i64)> %1 = insertelement <2 x i64> %0, i64 %.scalar, i64 0 store <2 x i64> %1, ptr %ptr2, align 16 ret void } ``` And it would be folded back in `foldInsExtBinop`, resulting in an infinite loop. This patch forces scalarization iff InstSimplify can fold the constant expression. (cherry picked from commit 3a4a60d)

[VectorCombine] Precisely calculate the cost in foldInstExtBinop and …

42208a2

…avoid infinite loop

XChy requested review from RKSimon, alexey-bataev, davemgreen and lukel97 August 11, 2025 19:08

llvmbot added vectorizers llvm:transforms llvm:vectorcombine labels Aug 11, 2025

XChy commented Aug 11, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/VectorCombine.cpp Outdated Show resolved Hide resolved

nikic reviewed Aug 11, 2025

View reviewed changes

fix testcases

6a5666b

lukel97 reviewed Aug 12, 2025

View reviewed changes

RKSimon reviewed Aug 14, 2025

View reviewed changes

llvm/test/Transforms/VectorCombine/X86/intrinsic-scalarize.ll Show resolved Hide resolved

fix comment and testcase

716a5b1

lukel97 reviewed Aug 15, 2025

View reviewed changes

lukel97 mentioned this pull request Aug 15, 2025

Support folding n-ary intrinsics in IRBuilderFolder/ConstantFolding #153743

Open

Use InstSimplify

2f15724

RKSimon approved these changes Aug 15, 2025

View reviewed changes

lukel97 approved these changes Aug 15, 2025

View reviewed changes

lukel97 added this to the LLVM 21.x Release milestone Aug 15, 2025

github-project-automation bot added this to LLVM Release Status Aug 15, 2025

github-project-automation bot moved this to Needs Triage in LLVM Release Status Aug 15, 2025

XChy changed the title ~~[VectorCombine] Precisely calculate the cost in foldInstExtBinop and avoid infinite loop~~ [VectorCombine] Apply InstSimplify in foldInstExtBinop and avoid infinite loop Aug 15, 2025

XChy changed the title ~~[VectorCombine] Apply InstSimplify in foldInstExtBinop and avoid infinite loop~~ [VectorCombine] Apply InstSimplify in scalarizeOpOrCmp to avoid infinite loop Aug 15, 2025

XChy enabled auto-merge (squash) August 15, 2025 18:19

XChy merged commit 3a4a60d into llvm:main Aug 15, 2025
11 of 12 checks passed

github-project-automation bot moved this from Needs Triage to Done in LLVM Release Status Aug 15, 2025

dwblaikie mentioned this pull request Sep 16, 2025

port 5b4819e to release #159209

Merged

jhuber6 mentioned this pull request Sep 17, 2025

[LLVM] Update CUDA ELF flags for their new ABI #159451

Merged

[VectorCombine] Apply InstSimplify in scalarizeOpOrCmp to avoid infinite loop #153069

[VectorCombine] Apply InstSimplify in scalarizeOpOrCmp to avoid infinite loop #153069

Uh oh!

Conversation

XChy commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 11, 2025

Uh oh!

llvmbot commented Aug 11, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

redstar commented Aug 12, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lukel97 commented Aug 16, 2025

Uh oh!

llvmbot commented Aug 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

XChy commented Aug 11, 2025 •

edited

Loading