Skip to content

Conversation

@lukel97
Copy link
Contributor

@lukel97 lukel97 commented Nov 13, 2025

For a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask:

<x1> vector loop: {
  vector.body:
    EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
    vp<%5> = SCALAR-STEPS vp<%4>, ir<1>, vp<%0>
    EMIT vp<%6> = icmp ule vp<%5>, vp<%3>
    EMIT vp<%index.next> = add nuw vp<%4>, vp<%1>
    EMIT branch-on-count vp<%index.next>, vp<%2>
  No successors
}
Successor(s): middle.block

middle.block:
  EMIT vp<%8> = last-active-lane vp<%6>
  EMIT vp<%9> = extract-lane vp<%8>, vp<%5>
Successor(s): ir-bb<exit>

The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe.

Fixes #167813

For a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask:

    <x1> vector loop: {
      vector.body:
        EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
        vp<%5> = SCALAR-STEPS vp<%4>, ir<1>, vp<%0>
        EMIT vp<%6> = icmp ule vp<%5>, vp<%3>
        EMIT vp<%index.next> = add nuw vp<%4>, vp<%1>
        EMIT branch-on-count vp<%index.next>, vp<%2>
      No successors
    }
    Successor(s): middle.block

    middle.block:
      EMIT vp<%8> = last-active-lane vp<%6>
      EMIT vp<%9> = extract-lane vp<%8>, vp<%5>
    Successor(s): ir-bb<exit>

The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe.

Fixes llvm#167813
@llvmbot
Copy link
Member

llvmbot commented Nov 13, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

For a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask:

&lt;x1&gt; vector loop: {
  vector.body:
    EMIT vp&lt;%4&gt; = CANONICAL-INDUCTION ir&lt;0&gt;, vp&lt;%index.next&gt;
    vp&lt;%5&gt; = SCALAR-STEPS vp&lt;%4&gt;, ir&lt;1&gt;, vp&lt;%0&gt;
    EMIT vp&lt;%6&gt; = icmp ule vp&lt;%5&gt;, vp&lt;%3&gt;
    EMIT vp&lt;%index.next&gt; = add nuw vp&lt;%4&gt;, vp&lt;%1&gt;
    EMIT branch-on-count vp&lt;%index.next&gt;, vp&lt;%2&gt;
  No successors
}
Successor(s): middle.block

middle.block:
  EMIT vp&lt;%8&gt; = last-active-lane vp&lt;%6&gt;
  EMIT vp&lt;%9&gt; = extract-lane vp&lt;%8&gt;, vp&lt;%5&gt;
Successor(s): ir-bb&lt;exit&gt;

The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe.

Fixes #167813


Full diff: https://github.com/llvm/llvm-project/pull/167897.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanUtils.cpp (+7)
  • (added) llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll (+60)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
index e22c5dfdb9f38..c9de9b82bca7c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
@@ -66,6 +66,13 @@ bool vputils::isHeaderMask(const VPValue *V, const VPlan &Plan) {
                       m_One(), m_Specific(&Plan.getVF()))) ||
             IsWideCanonicalIV(A));
 
+  if (match(V,
+            m_ICmp(m_ScalarIVSteps(
+                       m_Specific(Plan.getVectorLoopRegion()->getCanonicalIV()),
+                       m_One(), m_Specific(&Plan.getVF())),
+                   m_Specific(Plan.getBackedgeTakenCount()))))
+    return true;
+
   return match(V, m_ICmp(m_VPValue(A), m_VPValue(B))) && IsWideCanonicalIV(A) &&
          B == Plan.getBackedgeTakenCount();
 }
diff --git a/llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll b/llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll
new file mode 100644
index 0000000000000..5964cf45fb6be
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll
@@ -0,0 +1,60 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 6
+; RUN: opt -p loop-vectorize -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue -force-vector-width=1 -force-vector-interleave=2 -S %s | FileCheck %s
+
+define i64 @live_out_scalar_vf(i64 %n) {
+; CHECK-LABEL: define i64 @live_out_scalar_vf(
+; CHECK-SAME: i64 [[N:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = add i64 [[N]], 1
+; CHECK-NEXT:    br label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    [[N_RND_UP:%.*]] = add i64 [[TMP0]], 1
+; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 2
+; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
+; CHECK-NEXT:    [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[TMP0]], 1
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[TMP1:%.*]] = add i64 [[INDEX]], 0
+; CHECK-NEXT:    [[TMP2:%.*]] = add i64 [[INDEX]], 1
+; CHECK-NEXT:    [[TMP3:%.*]] = icmp ugt i64 [[TMP1]], [[TRIP_COUNT_MINUS_1]]
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp ugt i64 [[TMP2]], [[TRIP_COUNT_MINUS_1]]
+; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], 2
+; CHECK-NEXT:    [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[TMP5]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    [[TMP6:%.*]] = icmp eq i1 [[TMP4]], false
+; CHECK-NEXT:    [[TMP7:%.*]] = zext i1 [[TMP6]] to i64
+; CHECK-NEXT:    [[TMP8:%.*]] = add i64 1, [[TMP7]]
+; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i1 [[TMP3]], false
+; CHECK-NEXT:    [[TMP10:%.*]] = zext i1 [[TMP9]] to i64
+; CHECK-NEXT:    [[TMP11:%.*]] = add i64 0, [[TMP10]]
+; CHECK-NEXT:    [[TMP12:%.*]] = icmp ne i64 [[TMP10]], 1
+; CHECK-NEXT:    [[TMP13:%.*]] = select i1 [[TMP12]], i64 [[TMP11]], i64 [[TMP8]]
+; CHECK-NEXT:    [[LAST_ACTIVE_LANE:%.*]] = sub i64 [[TMP13]], 1
+; CHECK-NEXT:    [[TMP14:%.*]] = sub i64 [[LAST_ACTIVE_LANE]], 1
+; CHECK-NEXT:    [[TMP15:%.*]] = icmp uge i64 [[LAST_ACTIVE_LANE]], 1
+; CHECK-NEXT:    [[TMP16:%.*]] = select i1 [[TMP15]], i64 [[TMP2]], i64 [[TMP1]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret i64 [[TMP16]]
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
+  br label %latch
+
+latch:
+  ; Need to use a phi otherwise the header mask will use a
+  ; VPWidenCanonicalIVRecipe instead of a VPScalarIVStepsRecipe.
+  %exitval = phi i64 [ %iv, %loop ]
+  %iv.next = add i64 %iv, 1
+  %ec = icmp eq i64 %iv, %n
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret i64 %exitval
+}
+


loop:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
br label %latch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we turn this into a non-trivial branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to remove the branch entirely in 3863112, and the crash is still reproducable

m_One(), m_Specific(&Plan.getVF()))) ||
IsWideCanonicalIV(A));

if (match(V,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (match(V,
// For scalar plans, the header mask uses the scalar steps.
if (match(V,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 23bce03

m_Specific(Plan.getVectorLoopRegion()->getCanonicalIV()),
m_One(), m_Specific(&Plan.getVF())),
m_Specific(Plan.getBackedgeTakenCount()))))
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we assert Plan.hasScalarVFOnly()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 23bce03

@lukel97
Copy link
Contributor Author

lukel97 commented Nov 14, 2025

I see that #149042 was reverted but I've updated this on main if you'd like to land this separately anyway.


// For scalar plans, the header mask uses the scalar steps.
if (match(V,
m_ICmp(m_ScalarIVSteps(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this pattern is now used twice in this function, i.e.

m_ScalarIVSteps(m_Specific(Plan.getVectorLoopRegion()->getCanonicalIV()),
                m_One(), m_Specific(&Plan.getVF()))

Is it worth creating a specific pattern match for this? Something like

  m_CanonicalScalarIVSteps(Plan)

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've added a helper in efca5e5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[LoopVectorize] Assertion `verifyVPlanIsValid(*Plan) && "VPlan is invalid"' failed.

4 participants