[VPlan] Simplify Plan's entry in removeBranchOnConst. #154510

fhahn · 2025-08-20T10:50:44Z

After #153643, there may be a
BranchOnCond with constant condition in the entry block.

Simplify those in removeBranchOnConst. This removes a number of
redundant conditional branch from entry blocks.

In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.

A couple of places that assume the scalar loop to still be valid need
updating.

Depends on #153643.

llvmbot · 2025-08-26T17:52:24Z

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-backend-risc-v

Author: Florian Hahn (fhahn)

Changes

After #153643, there may be a
BranchOnCond with constant condition in the entry block.

Simplify those in removeBranchOnConst. This removes a number of
redundant conditional branch from entry blocks.

In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.

A couple of places that assume the scalar loop to still be valid need
updating.

Depends on #153643.

Patch is 2.60 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154510.diff

332 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+49-20)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+7-5)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+1-1)
(modified) llvm/lib/Transforms/Vectorize/VPlanUtils.h (+1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll (+4-5)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll (+8-11)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll (+8-10)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll (+15-77)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll (+8-10)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll (+6-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence-fold-tail.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll (+24-30)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll (+44-45)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll (+23-31)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll (+16-17)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/interleave-allocsize-not-equal-typesize.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/interleave-with-gaps.ll (+4-5)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/interleave_count_for_known_tc.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/invariant-replicate-region.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/licm-calls.ll (+4-5)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/low_trip_count_predicates.ll (+9-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll (+25-32)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll (+51-51)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/outer_loop_test1_no_explicit_vect_width.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-epilogue.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-mixed.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-neon.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll (+44-44)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-no-dotprod.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce.ll (+45-45)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/pr151664-cost-hoisted-vector-scalable.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/pr73894.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/predicated-costs.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll (+7-7)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll (+14-14)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-struct-return.ll (+183-25)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll (+21-25)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/store-costs-sve.ll (+17-19)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll (+6-9)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll (+20-23)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-overflow-checks.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-reductions.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-unroll.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/synthesize-mask-for-call.ll (+11-191)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll (+16-16)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-derived-ivs.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-metadata.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-remove-loop-region.ll (+13-13)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-unroll.ll (+7-7)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll (+56-56)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory.ll (+36-36)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-insertelt.ll (+7-7)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll (+6-8)
(modified) llvm/test/Transforms/LoopVectorize/AMDGPU/packed-math.ll (+8-10)
(modified) llvm/test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll (+24-25)
(modified) llvm/test/Transforms/LoopVectorize/ARM/mve-hoist-runtime-checks.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/ARM/mve-reduction-predselect.ll (+41-51)
(modified) llvm/test/Transforms/LoopVectorize/ARM/mve-reduction-types.ll (+42-42)
(modified) llvm/test/Transforms/LoopVectorize/ARM/mve-reg-pressure-vmla.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll (+39-39)
(modified) llvm/test/Transforms/LoopVectorize/ARM/tail-folding-loop-hint.ll (+3-1)
(modified) llvm/test/Transforms/LoopVectorize/ARM/tail-folding-not-allowed.ll (+18-24)
(modified) llvm/test/Transforms/LoopVectorize/LoongArch/defaults.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/PowerPC/widened-massv-call.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/PowerPC/widened-massv-vfabi-attr.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll (+7-7)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/blocks-with-dead-instructions.ll (+49-58)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/dead-ops-cost.ll (+40-43)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/defaults.ll (+7-7)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll (+70-70)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll (+8-10)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/f16.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/fminimumnum.ll (+16-20)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/induction-costs.ll (+5-7)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll (+14-14)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll (+129-129)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-masked-access.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/lmul.ll (+9-9)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll (+18-20)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/mask-index-type.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/masked_gather_scatter.ll (+6-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/partial-reduce-dot-product.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/pr154103.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/pr87378-vpinstruction-or-drop-poison-generating-flags.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/reductions.ll (+83-83)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/remark-reductions.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+37-44)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/safe-dep-distance.ll (+17-17)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/scalable-basics.ll (+23-23)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/scalable-tailfold.ll (+25-25)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll (+23-23)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll (+49-57)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-bin-unary-ops-args.ll (+72-90)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-call-intrinsics.ll (+36-45)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-cast-intrinsics.ll (+43-54)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-cond-reduction.ll (+34-38)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-div.ll (+17-21)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-fixed-order-recurrence.ll (+21-23)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-gather-scatter.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-inloop-reduction.ll (+28-28)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-interleave.ll (+25-25)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-intermediate-store.ll (+8-12)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-iv32.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-known-no-overflow.ll (+12-15)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-masked-loadstore.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-ordered-reduction.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reduction.ll (+28-28)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reverse-load-store.ll (+14-15)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-safe-dep-distance.ll (+33-34)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-uniform-store.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-cost.ll (+17-21)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll (+4-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+88-96)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vector-loop-backedge-elimination-with-evl.ll (+11-13)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vf-will-not-generate-any-vector-insts.ll (+4-5)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/addressing.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/force-target-instruction-cost.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll (+3-8)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/predicated-first-order-recurrence.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/scalar-steps-with-users-demanding-all-lanes-and-first-lane-only.ll (+3-8)
(modified) llvm/test/Transforms/LoopVectorize/X86/consecutive-ptr-uniforms.ll (+11-12)
(modified) llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll (+9-9)
(modified) llvm/test/Transforms/LoopVectorize/X86/cost-constant-known-via-scev.ll (+8-11)
(modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+18-20)
(modified) llvm/test/Transforms/LoopVectorize/X86/divs-with-tail-folding.ll (+8-10)
(modified) llvm/test/Transforms/LoopVectorize/X86/drop-inbounds-flags-for-reverse-vector-pointer.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll (+15-15)
(modified) llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll (+12-17)
(modified) llvm/test/Transforms/LoopVectorize/X86/fminimumnum.ll (+24-30)
(modified) llvm/test/Transforms/LoopVectorize/X86/gep-use-outside-loop.ll (+8-10)
(modified) llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll (+44-53)
(modified) llvm/test/Transforms/LoopVectorize/X86/interleave-cost.ll (+34-37)
(modified) llvm/test/Transforms/LoopVectorize/X86/interleave-ptradd-with-replicated-operand.ll (+5-7)
(modified) llvm/test/Transforms/LoopVectorize/X86/interleaved-accesses-hoist-load-across-store.ll (+8-12)
(modified) llvm/test/Transforms/LoopVectorize/X86/interleaved-accesses-sink-store-across-load.ll (+4-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/interleaving.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/limit-vf-by-tripcount.ll (+6-8)
(modified) llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll (+81-91)
(modified) llvm/test/Transforms/LoopVectorize/X86/masked-store-cost.ll (+8-12)
(modified) llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll (+24-31)
(modified) llvm/test/Transforms/LoopVectorize/X86/metadata-enable.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/X86/optsize.ll (+30-30)
(modified) llvm/test/Transforms/LoopVectorize/X86/outer_loop_test1_no_explicit_vect_width.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/parallel-loops.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr109581-unused-blend.ll (+5-7)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr131359-dead-for-splice.ll (+10-14)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr141968-instsimplifyfolder.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr34438.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr36524.ll (+4-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr51366-sunk-instruction-used-outside-of-loop.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr81872.ll (+10-13)
(modified) llvm/test/Transforms/LoopVectorize/X86/reduction-fastmath.ll (+15-15)
(modified) llvm/test/Transforms/LoopVectorize/X86/replicate-recipe-with-only-first-lane-used.ll (+8-10)
(modified) llvm/test/Transforms/LoopVectorize/X86/replicate-uniform-call.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/scev-checks-unprofitable.ll (+3-8)
(modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+17-17)
(modified) llvm/test/Transforms/LoopVectorize/X86/strided_load_cost.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/uniform_load.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll (+34-44)
(modified) llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-force-tail-with-evl.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-interleaved-accesses-gap.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/vplan-native-inner-loop-only.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/widened-value-used-as-scalar-and-first-lane.ll (+8-10)
(modified) llvm/test/Transforms/LoopVectorize/X86/x86-predication.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/assume.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/blend-in-header.ll (+12-16)
(modified) llvm/test/Transforms/LoopVectorize/bsd_regex.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/check-prof-info.ll (+16-16)
(modified) llvm/test/Transforms/LoopVectorize/constantfolder-infer-correct-gepty.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/constantfolder.ll (+27-27)
(modified) llvm/test/Transforms/LoopVectorize/create-induction-resume.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/dbg-outer-loop-vect.ll (+4-5)
(modified) llvm/test/Transforms/LoopVectorize/dead_instructions.ll (+20-21)
(modified) llvm/test/Transforms/LoopVectorize/debugloc-optimize-vfuf-term.ll (+3-6)
(modified) llvm/test/Transforms/LoopVectorize/dereferenceable-info-from-assumption-constant-size.ll (+73-89)
(modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/expand-scev-after-invoke.ll (+4-6)
(modified) llvm/test/Transforms/LoopVectorize/extract-from-end-vector-constant.ll (+8-10)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll (+37-58)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-dead-instructions.ll (+23-30)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-interleave-only.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-multiply-recurrences.ll (+17-23)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+108-117)
(modified) llvm/test/Transforms/LoopVectorize/float-induction.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/float-minmax-instruction-flag.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/forked-pointers.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/hints-trans.ll (+20-2)
(modified) llvm/test/Transforms/LoopVectorize/if-pred-non-void.ll (+9-12)
(modified) llvm/test/Transforms/LoopVectorize/if-pred-stores.ll (+18-19)
(modified) llvm/test/Transforms/LoopVectorize/if-reduction.ll (+4-6)
(modified) llvm/test/Transforms/LoopVectorize/induction-multiple-uses-in-same-instruction.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/induction-step.ll (+11-12)
(modified) llvm/test/Transforms/LoopVectorize/induction.ll (+187-205)
(modified) llvm/test/Transforms/LoopVectorize/instruction-only-used-outside-of-loop.ll (+15-15)
(modified) llvm/test/Transforms/LoopVectorize/interleave-with-i65-induction.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses-different-insert-position.ll (+11-12)
(modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses-gep-nowrap-flags.ll (+17-17)
(modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses-metadata.ll (+19-20)
(modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+50-53)
(modified) llvm/test/Transforms/LoopVectorize/invalidate-scev-at-scope-after-vectorization.ll (+6-7)
(modified) llvm/test/Transforms/LoopVectorize/is_fpclass.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-decreasing.ll (+45-45)
(modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-trunc.ll (+33-33)
(modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp.ll (+18-27)
(modified) llvm/test/Transforms/LoopVectorize/iv_outside_user.ll (+46-50)
(modified) llvm/test/Transforms/LoopVectorize/lcssa-crashes.ll (+4-6)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 98554310c74df..64cbf509a3118 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -2357,9 +2357,9 @@ EpilogueVectorizerMainLoop::createIterationCountCheck(ElementCount VF,
 /// VPBB are moved to the end of the newly created VPIRBasicBlock. VPBB must
 /// have a single predecessor, which is rewired to the new VPIRBasicBlock. All
 /// successors of VPBB, if any, are rewired to the new VPIRBasicBlock.
-static VPIRBasicBlock *replaceVPBBWithIRVPBB(VPBasicBlock *VPBB,
+static VPIRBasicBlock *replaceVPBBWithIRVPBB(VPlan &Plan, VPBasicBlock *VPBB,
                                              BasicBlock *IRBB) {
-  VPIRBasicBlock *IRVPBB = VPBB->getPlan()->createVPIRBasicBlock(IRBB);
+  VPIRBasicBlock *IRVPBB = Plan.createVPIRBasicBlock(IRBB);
   auto IP = IRVPBB->begin();
   for (auto &R : make_early_inc_range(VPBB->phis()))
     R.moveBefore(*IRVPBB, IP);
@@ -2571,6 +2571,9 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {
   // Remove redundant induction instructions.
   cse(HeaderBB);
 
+  if (Plan.getScalarPreheader()->getNumPredecessors() == 0)
+    return;
+
   // Set/update profile weights for the vector and remainder loops as original
   // loop iterations are now distributed among them. Note that original loop
   // becomes the scalar remainder loop after vectorization.
@@ -7226,6 +7229,12 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
   VPlanTransforms::simplifyRecipes(BestVPlan);
   VPlanTransforms::removeBranchOnConst(BestVPlan);
+  if (BestVPlan.getEntry()->getSingleSuccessor() ==
+      BestVPlan.getScalarPreheader()) {
+    // TODO: Should not even try to vectorize.
+    return DenseMap<const SCEV *, Value *>();
+  }
+
   VPlanTransforms::narrowInterleaveGroups(
       BestVPlan, BestVF,
       TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector));
@@ -7268,7 +7277,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   BasicBlock *EntryBB =
       cast<VPIRBasicBlock>(BestVPlan.getEntry())->getIRBasicBlock();
   State.CFG.PrevBB = ILV.createVectorizedLoopSkeleton();
-  replaceVPBBWithIRVPBB(BestVPlan.getScalarPreheader(),
+  replaceVPBBWithIRVPBB(BestVPlan, BestVPlan.getScalarPreheader(),
                         State.CFG.PrevBB->getSingleSuccessor());
   VPlanTransforms::removeDeadRecipes(BestVPlan);
 
@@ -7351,8 +7360,9 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
     } else {
       // Keep all loop hints from the original loop on the vector loop (we'll
       // replace the vectorizer-specific hints below).
-      if (MDNode *LID = OrigLoop->getLoopID())
-        L->setLoopID(LID);
+      if (BestVPlan.getScalarPreheader()->getNumPredecessors() > 0)
+        if (MDNode *LID = OrigLoop->getLoopID())
+          L->setLoopID(LID);
 
       LoopVectorizeHints Hints(L, true, *ORE);
       Hints.setAlreadyVectorized();
@@ -7383,6 +7393,16 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
       addRuntimeUnrollDisableMetaData(L);
   }
 
+  if (BestVPlan.getScalarPreheader()->getNumPredecessors() == 0) {
+    // If the original loop became unreachable, we need to delete it.
+    auto Blocks = OrigLoop->getBlocksVector();
+    Blocks.push_back(cast<VPIRBasicBlock>(BestVPlan.getScalarPreheader())
+                         ->getIRBasicBlock());
+    for (auto *BB : Blocks)
+      LI->removeBlock(BB);
+    LI->erase(OrigLoop);
+  }
+
   // 3. Fix the vectorized code: take care of header phi's, live-outs,
   //    predication, updating analyses.
   ILV.fixVectorizedLoop(State);
@@ -7460,7 +7480,8 @@ EpilogueVectorizerMainLoop::emitIterationCountCheck(BasicBlock *Bypass,
     // generated here dominates the vector epilog iter check.
     EPI.TripCount = Count;
   } else {
-    VectorPHVPBB = replaceVPBBWithIRVPBB(VectorPHVPBB, LoopVectorPreHeader);
+    VectorPHVPBB =
+        replaceVPBBWithIRVPBB(Plan, VectorPHVPBB, LoopVectorPreHeader);
   }
 
   BranchInst &BI =
@@ -7493,7 +7514,7 @@ BasicBlock *EpilogueVectorizerEpilogueLoop::createVectorizedLoopSkeleton() {
   BasicBlock *VecEpilogueIterationCountCheck =
       SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->begin(), DT, LI,
                  nullptr, "vec.epilog.iter.check", true);
-  VectorPHVPBB = replaceVPBBWithIRVPBB(VectorPHVPBB, LoopVectorPreHeader);
+  VectorPHVPBB = replaceVPBBWithIRVPBB(Plan, VectorPHVPBB, LoopVectorPreHeader);
 
   emitMinimumVectorEpilogueIterCountCheck(LoopScalarPreHeader,
                                           VecEpilogueIterationCountCheck);
@@ -10213,11 +10234,22 @@ bool LoopVectorizePass::processLoop(Loop *L) {
     LLVM_DEBUG(dbgs() << "LV: Interleave Count is " << IC << '\n');
   }
 
+  if (ORE->allowExtraAnalysis(LV_NAME))
+    checkMixedPrecision(L, ORE);
+
   bool DisableRuntimeUnroll = false;
   MDNode *OrigLoopID = L->getLoopID();
+  bool LoopRemoved = false;
   {
     using namespace ore;
     if (!VectorizeLoop) {
+      ORE->emit([&]() {
+        return OptimizationRemark(LV_NAME, "Interleaved", L->getStartLoc(),
+                                  L->getHeader())
+               << "interleaved loop (interleaved count: "
+               << NV("InterleaveCount", IC) << ")";
+      });
+
       assert(IC > 1 && "interleave count should not be 1 or 0");
       // If we decided that it is not legal to vectorize the loop, then
       // interleave it.
@@ -10234,14 +10266,11 @@ bool LoopVectorizePass::processLoop(Loop *L) {
       LVP.addMinimumIterationCheck(BestPlan, VF.Width, IC,
                                    VF.MinProfitableTripCount);
       LVP.executePlan(VF.Width, IC, BestPlan, Unroller, DT, false);
-
-      ORE->emit([&]() {
-        return OptimizationRemark(LV_NAME, "Interleaved", L->getStartLoc(),
-                                  L->getHeader())
-               << "interleaved loop (interleaved count: "
-               << NV("InterleaveCount", IC) << ")";
-      });
+      LoopRemoved = BestPlan.getScalarPreheader()->getNumPredecessors() == 0;
     } else {
+      // Report the vectorization decision.
+      reportVectorization(ORE, L, VF, IC);
+
       // If we decided that it is *legal* to vectorize the loop, then do it.
 
       VPlan &BestPlan = LVP.getPlanFor(VF.Width);
@@ -10311,23 +10340,23 @@ bool LoopVectorizePass::processLoop(Loop *L) {
         // rarely used is not worth unrolling.
         if (!Checks.hasChecks())
           DisableRuntimeUnroll = true;
+        LoopRemoved = BestPlan.getScalarPreheader()->getNumPredecessors() == 0;
       }
-      // Report the vectorization decision.
-      reportVectorization(ORE, L, VF, IC);
     }
-
-    if (ORE->allowExtraAnalysis(LV_NAME))
-      checkMixedPrecision(L, ORE);
   }
 
   assert(DT->verify(DominatorTree::VerificationLevel::Fast) &&
          "DT not preserved correctly");
 
+  if (LoopRemoved)
+    return true;
+
   std::optional<MDNode *> RemainderLoopID =
       makeFollowupLoopID(OrigLoopID, {LLVMLoopVectorizeFollowupAll,
                                       LLVMLoopVectorizeFollowupEpilogue});
   if (RemainderLoopID) {
-    L->setLoopID(*RemainderLoopID);
+    if (!LoopRemoved)
+      L->setLoopID(*RemainderLoopID);
   } else {
     if (DisableRuntimeUnroll)
       addRuntimeUnrollDisableMetaData(L);
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 1438dc366b55d..4a7618f40164b 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -972,12 +972,14 @@ void VPlan::execute(VPTransformState *State) {
   setName("Final VPlan");
   LLVM_DEBUG(dump());
 
-  // Disconnect scalar preheader and scalar header, as the dominator tree edge
-  // will be updated as part of VPlan execution. This allows keeping the DTU
-  // logic generic during VPlan execution.
   BasicBlock *ScalarPh = State->CFG.ExitBB;
-  State->CFG.DTU.applyUpdates(
-      {{DominatorTree::Delete, ScalarPh, ScalarPh->getSingleSuccessor()}});
+  if (getScalarPreheader()->getNumPredecessors() > 0) {
+    // Disconnect scalar preheader and scalar header, as the dominator tree edge
+    // will be updated as part of VPlan execution. This allows keeping the DTU
+    // logic generic during VPlan execution.
+    State->CFG.DTU.applyUpdates(
+        {{DominatorTree::Delete, ScalarPh, ScalarPh->getSingleSuccessor()}});
+  }
 
   ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
       Entry);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index d32d2a9ad11f7..8e7fc24080c31 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1920,7 +1920,7 @@ void VPlanTransforms::removeBranchOnConst(VPlan &Plan) {
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
            vp_depth_first_shallow(Plan.getEntry()))) {
     VPValue *Cond;
-    if (VPBB->getNumSuccessors() != 2 || VPBB == Plan.getEntry() ||
+    if (VPBB->getNumSuccessors() != 2 || VPBB->empty() ||
         !match(&VPBB->back(), m_BranchOnCond(m_VPValue(Cond))))
       continue;
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUtils.h b/llvm/lib/Transforms/Vectorize/VPlanUtils.h
index 9e1d325a4d8d6..2959e9440e753 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUtils.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanUtils.h
@@ -49,6 +49,7 @@ inline bool isSingleScalar(const VPValue *VPV) {
     case Instruction::GetElementPtr:
     case Instruction::ICmp:
     case Instruction::FCmp:
+    case Instruction::Select:
     case VPInstruction::Broadcast:
     case VPInstruction::PtrAdd:
       return true;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll b/llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll
index c18f9f2fae06b..ddfdb257ed49a 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll
@@ -52,8 +52,8 @@ define i64 @predicated_udiv_scalarized_operand(ptr %a, i64 %x) {
 ; CHECK-NEXT:    [[TMP17]] = add <2 x i64> [[VEC_PHI]], [[PREDPHI]]
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
 ; CHECK-NEXT:    [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
-; CHECK-NEXT:    br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
-; CHECK:       middle.block:
+; CHECK-NEXT:    br i1 [[TMP18]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       for.end:
 ; CHECK-NEXT:    [[TMP19:%.*]] = call i64 @llvm.vector.reduce.add.v2i64(<2 x i64> [[TMP17]])
 ; CHECK-NEXT:    ret i64 [[TMP19]]
 ;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
index e44ddbce34fd5..58965c19ae1cc 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
@@ -202,8 +202,8 @@ exit:
 define void @test_blend_feeding_replicated_store_2(ptr noalias %src, ptr %dst, i1 %c.0) {
 ; CHECK-LABEL: define void @test_blend_feeding_replicated_store_2(
 ; CHECK-SAME: ptr noalias [[SRC:%.*]], ptr [[DST:%.*]], i1 [[C_0:%.*]]) {
-; CHECK-NEXT:  [[ENTRY:.*]]:
-; CHECK-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    br label %[[VECTOR_PH:.*]]
 ; CHECK:       [[VECTOR_PH]]:
 ; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i1> poison, i1 [[C_0]], i64 0
 ; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i1> [[BROADCAST_SPLATINSERT]], <16 x i1> poison, <16 x i32> zeroinitializer
@@ -366,12 +366,11 @@ define void @test_blend_feeding_replicated_store_2(ptr noalias %src, ptr %dst, i
 ; CHECK-NEXT:    [[TMP71:%.*]] = icmp eq i32 [[INDEX_NEXT]], 96
 ; CHECK-NEXT:    br i1 [[TMP71]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[SCALAR_PH:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 96, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
 ; CHECK:       [[LOOP_HEADER]]:
-; CHECK-NEXT:    [[IV1:%.*]] = phi i32 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
+; CHECK-NEXT:    [[IV1:%.*]] = phi i32 [ 96, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
 ; CHECK-NEXT:    [[GEP_SRC1:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[IV1]]
 ; CHECK-NEXT:    [[L:%.*]] = load i8, ptr [[GEP_SRC1]], align 1
 ; CHECK-NEXT:    [[C_1:%.*]] = icmp eq i8 [[L]], 0
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
index f099c22333c3e..387bb4302de60 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
@@ -6,8 +6,8 @@ target triple = "arm64-apple-macosx11.0.0"
 define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-LABEL: define void @fshl_operand_first_order_recurrence(
 ; CHECK-SAME: ptr [[DST:%.*]], ptr noalias [[SRC:%.*]]) {
-; CHECK-NEXT:  [[ENTRY:.*]]:
-; CHECK-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    br label %[[VECTOR_PH:.*]]
 ; CHECK:       [[VECTOR_PH]]:
 ; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
 ; CHECK:       [[VECTOR_BODY]]:
@@ -30,14 +30,12 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
 ; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <2 x i64> [[WIDE_LOAD1]], i32 1
-; CHECK-NEXT:    br label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[SCALAR_PH:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 100, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
-; CHECK-NEXT:    [[SCALAR_RECUR_INIT:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP:.*]]
 ; CHECK:       [[LOOP]]:
-; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
-; CHECK-NEXT:    [[RECUR:%.*]] = phi i64 [ [[SCALAR_RECUR_INIT]], %[[SCALAR_PH]] ], [ [[L:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ 100, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[RECUR:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], %[[SCALAR_PH]] ], [ [[L:%.*]], %[[LOOP]] ]
 ; CHECK-NEXT:    [[GEP_SRC:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 [[IV]]
 ; CHECK-NEXT:    [[L]] = load i64, ptr [[GEP_SRC]], align 8
 ; CHECK-NEXT:    [[OR:%.*]] = tail call i64 @llvm.fshl.i64(i64 1, i64 [[RECUR]], i64 1)
@@ -73,7 +71,7 @@ define void @powi_call(ptr %P) {
 ; CHECK-LABEL: define void @powi_call(
 ; CHECK-SAME: ptr [[P:%.*]]) {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
-; CHECK-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK-NEXT:    br label %[[VECTOR_PH:.*]]
 ; CHECK:       [[VECTOR_PH]]:
 ; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
 ; CHECK:       [[VECTOR_BODY]]:
@@ -83,7 +81,7 @@ define void @powi_call(ptr %P) {
 ; CHECK-NEXT:    br label %[[MIDDLE_BLOCK:.*]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
 ; CHECK-NEXT:    br label %[[EXIT:.*]]
-; CHECK:       [[SCALAR_PH]]:
+; CHECK:       [[SCALAR_PH:.*]]:
 ; CHECK-NEXT:    br label %[[LOOP:.*]]
 ; CHECK:       [[LOOP]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ 0, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
@@ -93,7 +91,7 @@ define void @powi_call(ptr %P) {
 ; CHECK-NEXT:    store double [[POWI]], ptr [[GEP]], align 8
 ; CHECK-NEXT:    [[IV_NEXT]] = add i64 [[IV]], 1
 ; CHECK-NEXT:    [[EC:%.*]] = icmp eq i64 [[IV]], 1
-; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
@@ -224,5 +222,4 @@ declare i64 @llvm.fshl.i64(i64, i64, i64)
 ; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
 ; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
 ; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
-; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META2]], [[META1]]}
 ;.
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll b/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
index 626242667e203..944f2699d6e62 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
@@ -5,7 +5,7 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
 ; CHECK-LABEL: define void @clamped_tc_8(
 ; CHECK-SAME: ptr captures(none) [[DST:%.*]], i32 [[N:%.*]], i64 [[VAL:%.*]]) #[[ATTR0:[0-9]+]] {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK-NEXT:    br label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 8
@@ -36,7 +36,7 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
 ; CHECK:       scalar.ph:
 ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
-; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, [[SCALAR_PH:%.*]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
 ; CHECK-NEXT:    [[P_OUT_TAIL_09:%.*]] = phi ptr [ [[DST]], [[SCALAR_PH]] ], [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP19:%.*]] = shl nuw nsw i64 [[INDVARS_IV]], 3
 ; CHECK-NEXT:    [[SHR3:%.*]] = lshr i64 [[VAL]], [[TMP19]]
@@ -45,7 +45,7 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
 ; CHECK-NEXT:    [[INCDEC_PTR]] = getelementptr inbounds i8, ptr [[P_OUT_TAIL_09]], i64 1
 ; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
 ; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 8
-; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]]
 ; CHECK:       for.cond.cleanup:
 ; CHECK-NEXT:    ret void
 ;
@@ -79,7 +79,7 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range
 ; CHECK-NEXT:    [[ADD:%.*]] = add nuw nsw i32 [[REM]], 7
 ; CHECK-NEXT:    [[SHR:%.*]] = lshr i32 [[ADD]], 3
 ; CHECK-NEXT:    [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[SHR]] to i64
-; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK-NEXT:    br label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 8
@@ -104,13 +104,13 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP1]]
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 [[INDEX_NEXT]], i64 [[WIDE_TRIP_COUNT]])
 ; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <vscale x 8 x i64> [[VEC_IND]], [[DOTSPLAT]]
-; CHECK-NEXT:    br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK-NEXT:    br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
 ; CHECK:       middle.block:
 ; CHECK-NEXT:    br label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]]
 ; CHECK:       scalar.ph:
 ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
-; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[INDVARS_IV:%.*]] = ph...
[truncated]

Move the emission of remarks for the vectorization decision before executing the plan, in preparation for #154510.

fhahn

ping :)

…e (NFCI). Move the emission of remarks for the vectorization decision before executing the plan, in preparation for llvm/llvm-project#154510.

ayalz · 2025-08-31T14:13:59Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+static VPIRBasicBlock *replaceVPBBWithIRVPBB(VPlan &Plan, VPBasicBlock *VPBB,
                                             BasicBlock *IRBB) {
-  VPIRBasicBlock *IRVPBB = VPBB->getPlan()->createVPIRBasicBlock(IRBB);
+  VPIRBasicBlock *IRVPBB = Plan.createVPIRBasicBlock(IRBB);


This change is needed because VPBB may no longer be reachable from Plan's entry. VPBB must still have a single predecessor, as documented above, but that pred might be pred free? If only some callers pass an unreachable VPBB, Plan could be an optional parameter, to keep other callers intact.

Yep, updated the comment to say that all predecessors/successors are rewired, if any and added Plan as optional parameter, which must be passed if the may be unreachable, thanks!

ayalz · 2025-08-31T14:19:13Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  if (Plan.getScalarPreheader()->getNumPredecessors() == 0)
+    return;
+


Early exit if remainder/default scalar loop is dead, w/o setting/updating profile weights for the vector loop? Worth a comment.

I had to move the code, becuase in order to preserve profile info on the vector loop when the remainder loop is dead requires retrieving the profile info before removing the remainder loop and conditionally setting the profile info on the vector/remainder loops as available.

ayalz · 2025-08-31T14:21:13Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  if (BestVPlan.getEntry()->getSingleSuccessor() ==
+      BestVPlan.getScalarPreheader()) {


removeBranchOnConst() could conceivably bypass the vector loop; this actually happens in few tests. Worth emitting a missed-vectorization remark.

Yep, added an analysis remark. I am not sure if missed-vectorization would be accurate, because this is for cases where we would create a dead vector loop and should not even try to vectorize.

ok, it appears the loop isn't vectorized because the Trip Count guard is known to always jump to the scalar loop, i.e., where VFxUF is known to exceed TC, so conceptually a smaller VFxUF could work. But tests include unvectorizable non-loop cases where TC<=1, which should better be cleaned up before calling LV, certainly before reaching LVP::executePlan().

Agreed, we already have a TODO where we created the known True condition

We have a TODO here too; wondering if the message should specify that vectorization is dead or never executes - due to insufficient trip-count.

Updated message to mention insufficient trip count, thanks

ayalz · 2025-08-31T14:27:59Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

      // replace the vectorizer-specific hints below).
-      if (MDNode *LID = OrigLoop->getLoopID())
-        L->setLoopID(LID);
+      if (BestVPlan.getScalarPreheader()->getNumPredecessors() > 0)


Suggested change

if (BestVPlan.getScalarPreheader()->getNumPredecessors() > 0)

} else if (BestVPlan.getScalarPreheader()->getNumPredecessors() > 0) {

The code below will execute in in the general else{, so I left it as-is for now.

ayalz · 2025-08-31T14:34:24Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

  }

+  if (BestVPlan.getScalarPreheader()->getNumPredecessors() == 0) {
+    // If the original loop became unreachable, we need to delete it.


"Native" VPlan constructs can simply be discarded when they become dead or unreached (even then lazily), whereas VPlan constructs that model the scalar loop are kept orphaned after becoming unreachable, to be processed here.

This raises an old thought: VPlan specifies code to be generated, using its blocks and recipes; how/could VPlan be extended to also dismantle existing code, perhaps using "anti-recipes" or "anti-blocks" whose execute() performs the desired clean-up.

Yep, unfortunately we cannot do this when the VPIRBasicBlock for the scalar preheader gets executed, as we only execute reachable VPBBs.

I moved it for now to VPlan::execute. Unfortunately doing in the destructor of VPIRBasicBlock won't work either, because we need access to LoopInfo.

Having it in VPlan::execute is great!
As a potential follow-up: the VPIRBasicBlock of the scalar preheader and/or a VPIRRegionBlock of the scalar loop could be marked for destruction in VPlan if unreachable (as in CreatedBlocks), indicating that its loop should be removed from LoopInfo etc.

Yep, they are implicitly marked for destruction (all VPIRBasicBlocks that are unreachable are tracked in CreatedBlocks) and can be destroyed in the destructor; but they need LoopInfo/DT passed somehow, to notify those about the removal

ayalz · 2025-08-31T21:36:51Z

llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination.ll

@@ -53,76 +53,35 @@ define void @test_tc_less_than_16(ptr %A, i64 %N) {
 ; VF8UF2-SAME: ptr [[A:%.*]], i64 [[N:%.*]]) {
 ; VF8UF2-NEXT:  [[ENTRY:.*]]:
 ; VF8UF2-NEXT:    [[AND:%.*]] = and i64 [[N]], 15
-; VF8UF2-NEXT:    br i1 true, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]


Another case skipping the vector loop.

ayalz · 2025-08-31T21:38:10Z

llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll

-; CHECK-NEXT:    [[TMP0:%.*]] = add i64 [[UMAX]], -1
-; CHECK-NEXT:    [[TMP1:%.*]] = udiv i64 [[TMP0]], [[G_64]]
-; CHECK-NEXT:    [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
-; CHECK-NEXT:    br i1 true, label [[SCALAR_PH:%.*]], label [[VECTOR_SCEVCHECK:%.*]]


yep removed dead vector loop

ayalz · 2025-08-31T21:40:29Z

llvm/test/Transforms/LoopVectorize/vplan-printing-before-execute.ll

@@ -12,15 +12,13 @@ define void @test_tc_less_than_16(ptr %A, i64 %N) {
 ; CHECK-NEXT: Live-in vp<[[VF:%.+]]> = VF
 ; CHECK-NEXT: Live-in vp<[[VFxUF:%.+]]> = VF * UF
 ; CHECK-NEXT: Live-in vp<[[VTC:%.+]]> = vector-trip-count
-; CHECK-NEXT: vp<[[TC:%.+]]> = original trip-count
+; CHECK-NEXT: ir<16> = original trip-count


Simplification due to branch-on-const of entry?

I had to update the test because otherwise the vector loop would be dead

ayalz · 2025-08-31T21:41:37Z

llvm/test/Transforms/LoopVectorize/vplan-printing-before-execute.ll

 ; CHECK-NEXT: }
 ;
 entry:
-  %and = and i64 %N, 15


Ah, test changed?

ayalz · 2025-08-31T21:43:03Z

llvm/test/Transforms/PhaseOrdering/X86/vdiv-nounroll.ll

+; CHECK-NEXT:    [[TMP4:%.*]] = fdiv fast <4 x float> splat (float 1.000000e+00), [[BROADCAST_SPLAT]]
+; CHECK-NEXT:    [[TMP2:%.*]] = fdiv fast <4 x float> splat (float 1.000000e+00), [[BROADCAST_SPLAT]]
+; CHECK-NEXT:    [[TMP15:%.*]] = fdiv fast <4 x float> splat (float 1.000000e+00), [[BROADCAST_SPLAT]]
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds nuw float, ptr [[A:%.*]], i64 [[INDEX]]
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP1]], align 4, !tbaa [[TBAA3:![0-9]+]]
 ; CHECK-NEXT:    [[TMP3:%.*]] = fmul fast <4 x float> [[WIDE_LOAD]], [[TMP0]]
 ; CHECK-NEXT:    store <4 x float> [[TMP3]], ptr [[TMP1]], align 4, !tbaa [[TBAA3]]
-; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
+; CHECK-NEXT:    [[TMP6:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX]]
+; CHECK-NEXT:    [[TMP7:%.*]] = getelementptr inbounds nuw i8, ptr [[TMP6]], i64 16
+; CHECK-NEXT:    [[WIDE_LOAD_1:%.*]] = load <4 x float>, ptr [[TMP7]], align 4, !tbaa [[TBAA3]]
+; CHECK-NEXT:    [[TMP8:%.*]] = fmul fast <4 x float> [[WIDE_LOAD_1]], [[TMP4]]
+; CHECK-NEXT:    store <4 x float> [[TMP8]], ptr [[TMP7]], align 4, !tbaa [[TBAA3]]
+; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX]]
+; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds nuw i8, ptr [[TMP9]], i64 32
+; CHECK-NEXT:    [[WIDE_LOAD_2:%.*]] = load <4 x float>, ptr [[TMP10]], align 4, !tbaa [[TBAA3]]
+; CHECK-NEXT:    [[TMP11:%.*]] = fmul fast <4 x float> [[WIDE_LOAD_2]], [[TMP2]]
+; CHECK-NEXT:    store <4 x float> [[TMP11]], ptr [[TMP10]], align 4, !tbaa [[TBAA3]]
+; CHECK-NEXT:    [[TMP12:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX]]
+; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr inbounds nuw i8, ptr [[TMP12]], i64 48
+; CHECK-NEXT:    [[WIDE_LOAD_3:%.*]] = load <4 x float>, ptr [[TMP13]], align 4, !tbaa [[TBAA3]]
+; CHECK-NEXT:    [[TMP14:%.*]] = fmul fast <4 x float> [[WIDE_LOAD_3]], [[TMP15]]
+; CHECK-NEXT:    store <4 x float> [[TMP14]], ptr [[TMP13]], align 4, !tbaa [[TBAA3]]
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw nsw i64 [[INDEX]], 16


Fixed now, thanks

…-on-const-entry

Split off from #154510, add helper to check if a block has any predecessors.

…-on-const-entry

Split off from llvm/llvm-project#154510, add helper to check if a block has any predecessors.

…-on-const-entry

fhahn · 2025-09-14T12:02:27Z

ping :)

ayalz · 2025-09-02T18:28:16Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  if (BestVPlan.getEntry()->getSingleSuccessor() ==
+      BestVPlan.getScalarPreheader()) {


ok, it appears the loop isn't vectorized because the Trip Count guard is known to always jump to the scalar loop, i.e., where VFxUF is known to exceed TC, so conceptually a smaller VFxUF could work. But tests include unvectorizable non-loop cases where TC<=1, which should better be cleaned up before calling LV, certainly before reaching LVP::executePlan().

ayalz · 2025-09-12T12:14:58Z

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

-                                        bool VectorizingEpilogue,
-                                        unsigned EstimatedVFxUF,
-                                        bool DisableRuntimeUnroll);
+  /// loop on the vector loop and replaces vectorizer-specific metadata


Suggested change

/// loop on the vector loop and replaces vectorizer-specific metadata

/// loop on the vector loop and replaces vectorizer-specific metadata.

i.e., w/o change.

Doen thanks

ayalz · 2025-09-14T16:04:34Z

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

-                                        bool VectorizingEpilogue,
-                                        unsigned EstimatedVFxUF,
-                                        bool DisableRuntimeUnroll);
+  /// loop on the vector loop and replaces vectorizer-specific metadata


Augment documentation to cover additional parameters?

added thanks

ayalz · 2025-09-14T20:32:03Z

llvm/lib/Transforms/Vectorize/VPlan.cpp

  BasicBlock *ScalarPh = State->CFG.ExitBB;
-  State->CFG.DTU.applyUpdates(
-      {{DominatorTree::Delete, ScalarPh, ScalarPh->getSingleSuccessor()}});
+  if (getScalarPreheader()->hasPredecessors()) {


Suggested change

if (getScalarPreheader()->hasPredecessors()) {

VPBasicBlock *VPScalarPh = getScalarPreheader();

if (VPScalarPh->hasPredecessors()) {

reuseful below.

Updated, thanks, (named ScalarPhVPBB for consitency with other VPBB variables

ayalz · 2025-09-14T20:36:34Z

llvm/lib/Transforms/Vectorize/VPlan.cpp

+    Loop *OrigLoop = State->LI->getLoopFor(
+        cast<VPIRBasicBlock>(getScalarPreheader()->getSingleSuccessor())
+            ->getIRBasicBlock());


Suggested change

Loop *OrigLoop = State->LI->getLoopFor(

cast<VPIRBasicBlock>(getScalarPreheader()->getSingleSuccessor())

->getIRBasicBlock());

Loop *OrigLoop = State->LI->getLoopFor(

getScalarHeader()->getIRBasicBlock());

?

done thanks

ayalz · 2025-09-14T20:39:36Z

llvm/lib/Transforms/Vectorize/VPlan.cpp

+    // If the original loop is unreachable, we need to delete it.
+    auto Blocks = OrigLoop->getBlocksVector();
+    Blocks.push_back(
+        cast<VPIRBasicBlock>(getScalarPreheader())->getIRBasicBlock());


If the scalar preheader is a VPIRBB, should getScalarPreheader() return it as such?

The VPBB for the scalar preheader only gets replaced with the VPIRBB late, so I don't think that's possible yet

ayalz · 2025-09-14T20:42:35Z

llvm/lib/Transforms/Vectorize/VPlan.cpp

+  // the epilogue loop, to ensure it is only updated once, or when the became
+  // unreachable.


Suggested change

// the epilogue loop, to ensure it is only updated once, or when the became

// unreachable.

// the epilogue loop to ensure it is updated only once. Also skip the update

// when the scalar loop became unreachable.

updated thanks

ayalz · 2025-09-14T20:53:59Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

           vp_depth_first_shallow(Plan.getEntry()))) {
    VPValue *Cond;
-    if (VPBB->getNumSuccessors() != 2 || VPBB == Plan.getEntry() ||
+    if (VPBB->getNumSuccessors() != 2 || VPBB->empty() ||


If a branch-on-cond will be added only later, the following match will early-continue; but this match may fail when a block (the entry, with two successors) is currently free of recipes? Probably worth an explaining comment.

ayalz · 2025-09-14T20:54:52Z

llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll

 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 [[INDEX_NEXT]], i64 [[WIDE_TRIP_COUNT]])
 ; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <vscale x 8 x i64> [[VEC_IND]], [[DOTSPLAT]]
-; CHECK-NEXT:    br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK-NEXT:    br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]


May be worth leaving a TODO.

…-on-const-entry

david-arm · 2025-09-15T08:07:30Z

llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll

 ; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
 ; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
-; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]


Why has the metadata been dropped on the scalar loop?

The scalar loop is unreachable now, which means we have to remove it from LoopInfo as an unreachable block is dominated by any other unreachable block, breaking LoopInfo verification.

From the PR description

In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.

david-arm · 2025-09-15T08:08:58Z

llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll

 ; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
 ; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1000
-; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label %[[EXIT]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label %[[EXIT]], label %[[LOOP_HEADER]]


Again, this doesn't look right, not sure why loop vectoriser is killing off metadata from the original scalar loop?

david-arm · 2025-09-15T08:11:34Z

llvm/test/Transforms/LoopVectorize/AArch64/intrinsiccost.ll

 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt -passes=loop-vectorize -S  %s | FileCheck %s --check-prefix=CHECK
-; RUN: opt -passes=loop-vectorize -debug-only=loop-vectorize -disable-output %s 2>&1 | FileCheck %s --check-prefix=CHECK-COST
+; RUN: opt -passes=loop-vectorize,instcombine,simplifycfg < %s -S -o - | FileCheck %s --check-prefix=CHECK


Why are we adding extra passes here? Can you not just simply check the vectoriser output? It adds lots of extra test changes that aren't related to the patch as well.

Yes, this was added back by accident when updating to latest main

david-arm · 2025-09-15T08:13:07Z

llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll

 ; DEFAULT-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
 ; DEFAULT-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 4
-; DEFAULT-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; DEFAULT-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]]


Scalar loop metadata dropped.

same as above, loop now unreachable

david-arm · 2025-09-15T08:15:07Z

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll

  %for.next = sitofp i32 %iv to double
  %iv.next = add nsw i32 %iv, 1
-  %ec = icmp eq i32 %iv.next, 1025
+  %ec = icmp eq i32 %iv.next, %n


Why has this changed?

Without it there would be no merge phis in scalar.ph, which seems to be what the test needs to check.

david-arm · 2025-09-15T08:17:49Z

llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vscale-fixed.ll

  store i8 1, ptr %arrayidx, align 1
  %iv.next = add nuw nsw i64 %iv, 1
-  %exitcond = icmp ne i64 %iv.next, %n
+  %exitcond = icmp ne i64 %iv.next, 1024


Why has this changed? I thought @hassnaaHamdi explicitly wanted to make this a variable in #157512. I think you can revert the change to this file.

Restored the code, thanks

ayalz

This LGTM with minor last suggestions. Pending @david-arm's consent.

ayalz · 2025-09-15T09:24:05Z

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

+  /// the average trip count and invocation weight of the original loop (\p
+  /// OrigAverageTripCount and \p OrigLoopInvocationWeight respectively. They
+  /// cannot be retrieved after the plan has been executed, as the original loop
+  /// may have been removed.


Suggested change

/// may have been removed.

/// may have been removed).

Closed ) above thanks

ayalz · 2025-09-16T06:59:57Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  if (BestVPlan.getEntry()->getSingleSuccessor() ==
+      BestVPlan.getScalarPreheader()) {


We have a TODO here too; wondering if the message should specify that vectorization is dead or never executes - due to insufficient trip-count.

ayalz · 2025-09-16T07:02:33Z

llvm/lib/Transforms/Vectorize/VPlan.cpp

  // iterations of the loop are handled in one vector iteration, so instead
  // use the value of vscale used for tuning.
-  setProfileInfoAfterUnrolling(OrigLoop, VectorLoop, OrigLoop, EstimatedVFxUF);
+  if (OrigAverageTripCount) {


Early exit, as originally in llvm::setProfileInfoAfterUnrolling()?

Done, thanks

Done thanks!

ayalz · 2025-09-16T07:12:37Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

           vp_depth_first_shallow(Plan.getEntry()))) {
    VPValue *Cond;
-    if (VPBB->getNumSuccessors() != 2 || VPBB == Plan.getEntry() ||
+    if (VPBB->getNumSuccessors() != 2 || VPBB->empty() ||


Thanks. Would it suffice to check only if VPBB terminator matches a BranchOnCond? Is the emptiness check needed to prevent failure of this match? Check for two successors could then be an assert.

…-on-const-entry

ayalz · 2025-09-16T12:45:55Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

           vp_depth_first_shallow(Plan.getEntry()))) {
    VPValue *Cond;
-    if (VPBB->getNumSuccessors() != 2 || VPBB == Plan.getEntry() ||
+    if (VPBB->getNumSuccessors() != 2 || VPBB->empty() ||


Understood, plus even w/ a pair of successors VPBB may be empty. Is it then (necessary and) suffice to check

if (VPBB->empty() || !match(&VPBB->back(), m_BranchOnCond(m_VPValue(Cond)))) continue; assert(VPBB->getNumSuccessors() == 2 && "Two successors expected for BranchOnCond");

but redundant first check for two-successors may early-continue faster?

ayalz · 2025-09-16T12:48:03Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationDead",
+                                        OrigLoop->getStartLoc(),
+                                        OrigLoop->getHeader())
+             << "Created vector loop never executes due insufficient trip "


Suggested change

<< "Created vector loop never executes due insufficient trip "

<< "Created vector loop never executes due to insufficient trip "

Ah yes, updated, thanks

…-on-const-entry

…4510) After llvm/llvm-project#153643, there may be a BranchOnCond with constant condition in the entry block. Simplify those in removeBranchOnConst. This removes a number of redundant conditional branch from entry blocks. In some cases, it may also make the original scalar loop unreachable, because we know it will never execute. In that case, we need to remove the loop from LoopInfo, because all unreachable blocks may dominate each other, making LoopInfo invalid. In those cases, we can also completely remove the loop, for which I'll share a follow-up patch. Depends on llvm/llvm-project#153643. PR: llvm/llvm-project#154510

Build on top of llvm#154510 to completely remove dead scalar loops. Depends on llvm#154510. (Included in the PR)

Build on top of #154510 to completely remove the blocks of dead scalar loops. Depends on #154510. PR: #155497

Build on top of llvm/llvm-project#154510 to completely remove the blocks of dead scalar loops. Depends on llvm/llvm-project#154510. PR: llvm/llvm-project#155497

Build on top of llvm#154510 to completely remove the blocks of dead scalar loops. Depends on llvm#154510. PR: llvm#155497

fhahn requested review from aniragil, ayalz and rengolin August 20, 2025 10:50

fhahn mentioned this pull request Aug 23, 2025

[VPlan] Add VPlan-based addMinIterCheck, replace ILV for non-epilogue. #153643

Merged

fhahn force-pushed the vplan-simplify-branch-on-const-entry branch 2 times, most recently from 8c30518 to 1824795 Compare August 26, 2025 17:50

fhahn marked this pull request as ready for review August 26, 2025 17:51

llvmbot added backend:AMDGPU backend:RISC-V backend:PowerPC backend:SystemZ vectorizers llvm:transforms labels Aug 26, 2025

fhahn mentioned this pull request Aug 26, 2025

[VPlan] Remove original loop blocks if dead. #155497

Merged

fhahn added a commit that referenced this pull request Aug 31, 2025

[LV] Emit remarks for vectorization decision before execute (NFCI).

c6b340e

Move the emission of remarks for the vectorization decision before executing the plan, in preparation for #154510.

[LV] Add variable trip count for test.

f85de24

fhahn force-pushed the vplan-simplify-branch-on-const-entry branch from 1824795 to 6afb1b7 Compare August 31, 2025 13:54

[VPlan] Simplify Plan's entry in removeBranchOnConst.

33afce8

fhahn force-pushed the vplan-simplify-branch-on-const-entry branch from 6afb1b7 to 33afce8 Compare August 31, 2025 14:08

fhahn commented Aug 31, 2025

View reviewed changes

ayalz reviewed Aug 31, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…

208a182

…-on-const-entry

fhahn added a commit that referenced this pull request Sep 1, 2025

[VPlan] Add VPBlockBase::hasPredecessors (NFC).

a53a5ed

Split off from #154510, add helper to check if a block has any predecessors.

fhahn added 2 commits September 1, 2025 09:45

!fixup address latest comments, thanks!

1e16872

Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…

17fe80c

…-on-const-entry

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 1, 2025

Automerge: [VPlan] Add VPBlockBase::hasPredecessors (NFC).

0916584

Split off from llvm/llvm-project#154510, add helper to check if a block has any predecessors.

!fixup address latest comments,t hanks!

528e463

fhahn added 3 commits September 12, 2025 11:33

Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…

4accca8

…-on-const-entry

!fixup update tests after merge

df8c9da

Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…

83cb4dc

…-on-const-entry

ayalz reviewed Sep 14, 2025

View reviewed changes

fhahn added 2 commits September 15, 2025 08:30

Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…

cae7c85

…-on-const-entry

!fixup address latest comments, thanks!

6ea007b

david-arm reviewed Sep 15, 2025

View reviewed changes

!fixup restore tests

ea2db4e

ayalz approved these changes Sep 16, 2025

View reviewed changes

fhahn added 3 commits September 16, 2025 11:16

Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…

bef221f

…-on-const-entry

!fixup address comments, thanks

6a89924

Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…

0ec1a59

…-on-const-entry

ayalz reviewed Sep 16, 2025

View reviewed changes

fhahn added 3 commits September 16, 2025 20:44

Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…

44537d9

…-on-const-entry

!fixup adjust check and message.

b5405c1

Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…

ce766c7

…-on-const-entry

lukel97 mentioned this pull request Sep 18, 2025

[VPlan] Simplify VPPhi with all-equal incoming #157414

Draft

Merge branch 'main' into vplan-simplify-branch-on-const-entry

be17a75

fhahn merged commit 50b9ca4 into llvm:main Sep 18, 2025
9 checks passed

fhahn deleted the vplan-simplify-branch-on-const-entry branch September 18, 2025 18:25

fhahn added a commit to fhahn/llvm-project that referenced this pull request Sep 20, 2025

[VPlan] Remove original loop if dead after vectorization.

da88d6c

Build on top of llvm#154510 to completely remove dead scalar loops. Depends on llvm#154510. (Included in the PR)

fhahn added a commit that referenced this pull request Oct 1, 2025

[VPlan] Remove original loop blocks if dead. (#155497)

8907b6d

Build on top of #154510 to completely remove the blocks of dead scalar loops. Depends on #154510. PR: #155497

		if (Plan.getScalarPreheader()->getNumPredecessors() == 0)
		return;

		if (BestVPlan.getEntry()->getSingleSuccessor() ==
		BestVPlan.getScalarPreheader()) {

	if (BestVPlan.getScalarPreheader()->getNumPredecessors() > 0)
	} else if (BestVPlan.getScalarPreheader()->getNumPredecessors() > 0) {

	/// loop on the vector loop and replaces vectorizer-specific metadata
	/// loop on the vector loop and replaces vectorizer-specific metadata.

	if (getScalarPreheader()->hasPredecessors()) {
	VPBasicBlock *VPScalarPh = getScalarPreheader();
	if (VPScalarPh->hasPredecessors()) {

		// the epilogue loop, to ensure it is only updated once, or when the became
		// unreachable.

	<< "Created vector loop never executes due insufficient trip "
	<< "Created vector loop never executes due to insufficient trip "

[VPlan] Simplify Plan's entry in removeBranchOnConst. #154510

[VPlan] Simplify Plan's entry in removeBranchOnConst. #154510

Uh oh!

Conversation

fhahn commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fhahn commented Sep 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

fhahn commented Aug 20, 2025 •

edited

Loading

llvmbot commented Aug 26, 2025 •

edited

Loading