Skip to content

Conversation

@XChy
Copy link
Member

@XChy XChy commented Aug 11, 2025

Fixes #153012

As we tolerate unfoldable constant expressions in scalarizeOpOrCmp, we may fold

define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) #0 {
entry:
  %158 = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 %idx, i32 0
  %159 = or disjoint <2 x i64> splat (i64 2), %158
  store <2 x i64> %159, ptr %ptr2
  ret void
}

to

define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) {
entry:
  %.scalar = or disjoint i64 2, %idx
  %0 = or <2 x i64> splat (i64 2), <i64 5, i64 ptrtoint (ptr @val to i64)>
  %1 = insertelement <2 x i64> %0, i64 %.scalar, i64 0
  store <2 x i64> %1, ptr %ptr2, align 16
  ret void
}

And it would be folded back in foldInsExtBinop, resulting in an infinite loop.

This patch forces scalarization iff InstSimplify can fold the constant expression.

@llvmbot
Copy link
Member

llvmbot commented Aug 11, 2025

@llvm/pr-subscribers-vectorizers

Author: XChy (XChy)

Changes

Fixes #153012

As we miss the cost of unfoldable constant expression, we may fold

define void @<!-- -->bug(ptr %ptr1, ptr %ptr2, i64 %idx) #<!-- -->0 {
entry:
  %158 = insertelement &lt;2 x i64&gt; &lt;i64 5, i64 ptrtoint (ptr @<!-- -->val to i64)&gt;, i64 %idx, i32 0
  %159 = or disjoint &lt;2 x i64&gt; splat (i64 2), %158
  store &lt;2 x i64&gt; %159, ptr %ptr2
  ret void
}

to

define void @<!-- -->bug(ptr %ptr1, ptr %ptr2, i64 %idx) {
entry:
  %.scalar = or disjoint i64 2, %idx
  %0 = or &lt;2 x i64&gt; splat (i64 2), &lt;i64 5, i64 ptrtoint (ptr @<!-- -->val to i64)&gt;
  %1 = insertelement &lt;2 x i64&gt; %0, i64 %.scalar, i64 0
  store &lt;2 x i64&gt; %1, ptr %ptr2, align 16
  ret void
}

And it would be folded back in foldInsExtBinop, resulting in an infinite loop.

This patch calculates the cost precisely and adds a constraint to prevent splitting constant expression.


Full diff: https://github.com/llvm/llvm-project/pull/153069.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+9)
  • (modified) llvm/test/Transforms/VectorCombine/binop-scalarize.ll (+19)
diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 6345b18b809a6..31c8320bd4933 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -769,6 +769,11 @@ bool VectorCombine::foldInsExtBinop(Instruction &I) {
   if (!ResultTy)
     return false;
 
+  // Avoid splitting the unfoldable constant expression binop(x,y), otherwise
+  // binop(insert(x,a,idx),insert(y,b,idx)) may be folded back and forth.
+  if (match(VecBinOp, m_BinOp(m_Constant(), m_Constant())))
+    return false;
+
   // TODO: Attempt to detect m_ExtractElt for scalar operands and convert to
   // shuffle?
 
@@ -1250,6 +1255,10 @@ bool VectorCombine::scalarizeOpOrCmp(Instruction &I) {
   InstructionCost NewCost =
       ScalarOpCost + TTI.getVectorInstrCost(Instruction::InsertElement, VecTy,
                                             CostKind, *Index, NewVecC);
+  // Additional cost for unfoldable constant expression.
+  if (!NewVecC)
+    NewCost += VectorOpCost;
+
   for (auto [Idx, Op, VecC, Scalar] : enumerate(Ops, VecCs, ScalarOps)) {
     if (!Scalar || (II && isVectorIntrinsicWithScalarOpAtArg(
                               II->getIntrinsicID(), Idx, &TTI)))
diff --git a/llvm/test/Transforms/VectorCombine/binop-scalarize.ll b/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
index 52a706a0b59a7..bc07f8b086496 100644
--- a/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
+++ b/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
@@ -20,3 +20,22 @@ define <4 x i8> @udiv_ub(i8 %x, i8 %y) {
   %v = udiv <4 x i8> %x.insert, %y.insert
   ret <4 x i8> %v
 }
+
+
+; Unfoldable constant expression may cause infinite loop between 
+; scalarizing insertelement and folding binop(insert(x,a,idx),insert(y,b,idx))
+@val = external hidden global ptr, align 8
+
+define <2 x i64> @pr153012(i64 %idx) #0 {
+; CHECK-LABEL: define <2 x i64> @pr153012(
+; CHECK-SAME: i64 [[IDX:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[A:%.*]] = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 [[IDX]], i32 0
+; CHECK-NEXT:    [[B:%.*]] = or disjoint <2 x i64> splat (i64 2), [[A]]
+; CHECK-NEXT:    ret <2 x i64> [[B]]
+;
+entry:
+  %a = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 %idx, i32 0
+  %b = or disjoint <2 x i64> splat (i64 2), %a
+  ret <2 x i64> %b
+}

@llvmbot
Copy link
Member

llvmbot commented Aug 11, 2025

@llvm/pr-subscribers-llvm-transforms

Author: XChy (XChy)

Changes

Fixes #153012

As we miss the cost of unfoldable constant expression, we may fold

define void @<!-- -->bug(ptr %ptr1, ptr %ptr2, i64 %idx) #<!-- -->0 {
entry:
  %158 = insertelement &lt;2 x i64&gt; &lt;i64 5, i64 ptrtoint (ptr @<!-- -->val to i64)&gt;, i64 %idx, i32 0
  %159 = or disjoint &lt;2 x i64&gt; splat (i64 2), %158
  store &lt;2 x i64&gt; %159, ptr %ptr2
  ret void
}

to

define void @<!-- -->bug(ptr %ptr1, ptr %ptr2, i64 %idx) {
entry:
  %.scalar = or disjoint i64 2, %idx
  %0 = or &lt;2 x i64&gt; splat (i64 2), &lt;i64 5, i64 ptrtoint (ptr @<!-- -->val to i64)&gt;
  %1 = insertelement &lt;2 x i64&gt; %0, i64 %.scalar, i64 0
  store &lt;2 x i64&gt; %1, ptr %ptr2, align 16
  ret void
}

And it would be folded back in foldInsExtBinop, resulting in an infinite loop.

This patch calculates the cost precisely and adds a constraint to prevent splitting constant expression.


Full diff: https://github.com/llvm/llvm-project/pull/153069.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+9)
  • (modified) llvm/test/Transforms/VectorCombine/binop-scalarize.ll (+19)
diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 6345b18b809a6..31c8320bd4933 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -769,6 +769,11 @@ bool VectorCombine::foldInsExtBinop(Instruction &I) {
   if (!ResultTy)
     return false;
 
+  // Avoid splitting the unfoldable constant expression binop(x,y), otherwise
+  // binop(insert(x,a,idx),insert(y,b,idx)) may be folded back and forth.
+  if (match(VecBinOp, m_BinOp(m_Constant(), m_Constant())))
+    return false;
+
   // TODO: Attempt to detect m_ExtractElt for scalar operands and convert to
   // shuffle?
 
@@ -1250,6 +1255,10 @@ bool VectorCombine::scalarizeOpOrCmp(Instruction &I) {
   InstructionCost NewCost =
       ScalarOpCost + TTI.getVectorInstrCost(Instruction::InsertElement, VecTy,
                                             CostKind, *Index, NewVecC);
+  // Additional cost for unfoldable constant expression.
+  if (!NewVecC)
+    NewCost += VectorOpCost;
+
   for (auto [Idx, Op, VecC, Scalar] : enumerate(Ops, VecCs, ScalarOps)) {
     if (!Scalar || (II && isVectorIntrinsicWithScalarOpAtArg(
                               II->getIntrinsicID(), Idx, &TTI)))
diff --git a/llvm/test/Transforms/VectorCombine/binop-scalarize.ll b/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
index 52a706a0b59a7..bc07f8b086496 100644
--- a/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
+++ b/llvm/test/Transforms/VectorCombine/binop-scalarize.ll
@@ -20,3 +20,22 @@ define <4 x i8> @udiv_ub(i8 %x, i8 %y) {
   %v = udiv <4 x i8> %x.insert, %y.insert
   ret <4 x i8> %v
 }
+
+
+; Unfoldable constant expression may cause infinite loop between 
+; scalarizing insertelement and folding binop(insert(x,a,idx),insert(y,b,idx))
+@val = external hidden global ptr, align 8
+
+define <2 x i64> @pr153012(i64 %idx) #0 {
+; CHECK-LABEL: define <2 x i64> @pr153012(
+; CHECK-SAME: i64 [[IDX:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[A:%.*]] = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 [[IDX]], i32 0
+; CHECK-NEXT:    [[B:%.*]] = or disjoint <2 x i64> splat (i64 2), [[A]]
+; CHECK-NEXT:    ret <2 x i64> [[B]]
+;
+entry:
+  %a = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @val to i64)>, i64 %idx, i32 0
+  %b = or disjoint <2 x i64> splat (i64 2), %a
+  ret <2 x i64> %b
+}

CostKind, *Index, NewVecC);
// Additional cost for unfoldable constant expression.
if (!NewVecC)
NewCost += VectorOpCost;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would replacing m_Constant(VecC) with m_ImmConstant(VecC) above work instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review. And yes, it prevents any constant expression.
The comment below says that Create a new base vector if the constant folding failed. The original code seemed to try the fold after the constant folding failed. And it's theoretically possible to be profitable if the operand of binop has multiple uses. How do you think about that?
If still ok, I would adopt your advice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, and note that m_ImmConstant still allows some constant expressions with poison unfoldable for TargetFolder.

; CHECK-NEXT: [[TMP1:%.*]] = call <vscale x 4 x float> @llvm.fabs.nxv4f32(<vscale x 4 x float> poison)
; CHECK-NEXT: [[V:%.*]] = insertelement <vscale x 4 x float> [[TMP1]], float [[V_SCALAR]], i64 0
; CHECK-NEXT: [[X_INSERT:%.*]] = insertelement <vscale x 4 x float> poison, float [[X]], i32 0
; CHECK-NEXT: [[V:%.*]] = call <vscale x 4 x float> @llvm.fabs.nxv4f32(<vscale x 4 x float> [[X_INSERT]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a regression. Is there a way to avoid this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can replace TargetFolder with InstSimplifyFolder and rely on InstSimplify to fold n-op intrinsics to avoid such regression. That may be put into a new patch?

@redstar
Copy link
Member

redstar commented Aug 12, 2025

I compiled the source where I originally encountered the infinite loop with this change, and it fixes the issue. Thanks!

// Additional cost for unfoldable constant expression.
if (!NewVecC)
NewCost += VectorOpCost;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay with the review.

After thinking about this for a bit, I think scalarizeOpOrCmp should only scalarize iff the base vector NewVecC can be folded.

I think that's what the original code in 0d2a0b4 kind of assumed. It's probably never profitable to do the scalarization if the base vector doesn't go away.

And whenever I was last working on scalarizeOpOrCmp, my only motivation was to scalarize scalable splats, which are inserts into poison vectors that are constant foldable anyway.

If we change TargetFolder to InstSimplifyFolder in this PR to minimize the regressions, and do something like

Suggested change
if (!NewVecC)
return nullptr;

then we can get rid of the early out in foldInsExtBinop, and also this bit below:

  // Create a new base vector if the constant folding failed.
  if (!NewVecC) {
    if (CI)
      NewVecC = Builder.CreateCmp(CI->getPredicate(), VecCs[0], VecCs[1]);
    else if (UO || BO)
      NewVecC = Builder.CreateNAryOp(Opcode, VecCs);
    else
      NewVecC = Builder.CreateIntrinsic(VecTy, II->getIntrinsicID(), VecCs);
  }

I think the only regression we would be left with is that we can't constant fold n-ary intrinsics at the moment, but this is something that we should fix in IRBuilderFolder. This would basically undo #138406, but I'm happy to take those regressions for now if we want this fix to land in for LLVM 21.

What do you think? cc @RKSimon

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened up #153743 to track folding n-ary intrinsics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds reasonable to me if there is no possible profitability. And I think simplifyInstruction(Instruction *I, const SimplifyQuery &Q) in InstSimplify would resolve your concern about n-ary intrinsics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or ConstantFoldInstOperands may be simpler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplifyInstWithOperands requires distinguishing intrinsics among the operators. Thus, I implement by simplifyCmpInst, simplifyUnOp, etc., case by case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh perfect, great to see we could avoid the regression. Thanks!

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (the CI failure appears to be an unrelated lldb failure)

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

// Additional cost for unfoldable constant expression.
if (!NewVecC)
NewCost += VectorOpCost;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh perfect, great to see we could avoid the regression. Thanks!

@lukel97 lukel97 added this to the LLVM 21.x Release milestone Aug 15, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in LLVM Release Status Aug 15, 2025
@XChy XChy changed the title [VectorCombine] Precisely calculate the cost in foldInstExtBinop and avoid infinite loop [VectorCombine] Apply InstSimplify in foldInstExtBinop and avoid infinite loop Aug 15, 2025
@XChy XChy changed the title [VectorCombine] Apply InstSimplify in foldInstExtBinop and avoid infinite loop [VectorCombine] Apply InstSimplify in scalarizeOpOrCmp to avoid infinite loop Aug 15, 2025
@XChy XChy enabled auto-merge (squash) August 15, 2025 18:19
@XChy XChy merged commit 3a4a60d into llvm:main Aug 15, 2025
11 of 12 checks passed
@github-project-automation github-project-automation bot moved this from Needs Triage to Done in LLVM Release Status Aug 15, 2025
@lukel97
Copy link
Contributor

lukel97 commented Aug 16, 2025

/cherry-pick 3a4a60d

@llvmbot
Copy link
Member

llvmbot commented Aug 16, 2025

/pull-request #153958

tru pushed a commit to llvmbot/llvm-project that referenced this pull request Aug 18, 2025
…ite loop (llvm#153069)

Fixes llvm#153012

As we tolerate unfoldable constant expressions in `scalarizeOpOrCmp`, we
may fold
```llvm
define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) #0 {
entry:
  %158 = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @Val to i64)>, i64 %idx, i32 0
  %159 = or disjoint <2 x i64> splat (i64 2), %158
  store <2 x i64> %159, ptr %ptr2
  ret void
}
```

to

```llvm
define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) {
entry:
  %.scalar = or disjoint i64 2, %idx
  %0 = or <2 x i64> splat (i64 2), <i64 5, i64 ptrtoint (ptr @Val to i64)>
  %1 = insertelement <2 x i64> %0, i64 %.scalar, i64 0
  store <2 x i64> %1, ptr %ptr2, align 16
  ret void
}
```
And it would be folded back in `foldInsExtBinop`, resulting in an
infinite loop.

This patch forces scalarization iff InstSimplify can fold the constant
expression.

(cherry picked from commit 3a4a60d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

[VectorCombine] VectorCombiner goes into infinite loop

6 participants