release/21.x: [MachinePipeliner] Limit the number of stores in BB (#154940) #162639

llvmbot · 2025-10-09T11:53:50Z

Backport 22b79fb

Requested by: @kasuga-fj

llvmbot · 2025-10-09T11:53:57Z

@aankit-ca What do you think about merging this PR to the release branch?

c-rhodes · 2025-10-13T10:04:10Z

generally seems ok, but I noticed there's no tests on the original PR, was this missed or were any added later?

aankit-ca · 2025-10-13T17:18:15Z

I think we missed the testcase here, @kasuga-fj Can you please add a small testcase on this PR and also on main too?

kasuga-fj · 2025-10-14T09:01:59Z

Ah, sorry, that is my silly mistake. Thanks for catching it, I'll submit a PR to add a testcase on main, and request backporting after it's merged.

llvmbot · 2025-10-16T12:09:09Z

@llvm/pr-subscribers-backend-hexagon

Author: None (llvmbot)

Changes

Backport 22b79fb

Requested by: @kasuga-fj

Full diff: https://github.com/llvm/llvm-project/pull/162639.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/MachinePipeliner.cpp (+25)
(added) llvm/test/CodeGen/Hexagon/swp-many-stores.mir (+88)

diff --git a/llvm/lib/CodeGen/MachinePipeliner.cpp b/llvm/lib/CodeGen/MachinePipeliner.cpp
index 90005bd181f3a..0e7cb0c980d40 100644
--- a/llvm/lib/CodeGen/MachinePipeliner.cpp
+++ b/llvm/lib/CodeGen/MachinePipeliner.cpp
@@ -110,6 +110,7 @@ STATISTIC(NumFailZeroMII, "Pipeliner abort due to zero MII");
 STATISTIC(NumFailNoSchedule, "Pipeliner abort due to no schedule found");
 STATISTIC(NumFailZeroStage, "Pipeliner abort due to zero stage");
 STATISTIC(NumFailLargeMaxStage, "Pipeliner abort due to too many stages");
+STATISTIC(NumFailTooManyStores, "Pipeliner abort due to too many stores");
 
 /// A command line option to turn software pipelining on or off.
 static cl::opt<bool> EnableSWP("enable-pipeliner", cl::Hidden, cl::init(true),
@@ -193,6 +194,13 @@ static cl::opt<bool>
     MVECodeGen("pipeliner-mve-cg", cl::Hidden, cl::init(false),
                cl::desc("Use the MVE code generator for software pipelining"));
 
+/// A command line argument to limit the number of store instructions in the
+/// target basic block.
+static cl::opt<unsigned> SwpMaxNumStores(
+    "pipeliner-max-num-stores",
+    cl::desc("Maximum number of stores allwed in the target loop."), cl::Hidden,
+    cl::init(200));
+
 namespace llvm {
 
 // A command line option to enable the CopyToPhi DAG mutation.
@@ -544,6 +552,23 @@ bool MachinePipeliner::canPipelineLoop(MachineLoop &L) {
     return false;
   }
 
+  unsigned NumStores = 0;
+  for (MachineInstr &MI : *L.getHeader())
+    if (MI.mayStore())
+      ++NumStores;
+  if (NumStores > SwpMaxNumStores) {
+    LLVM_DEBUG(dbgs() << "Too many stores\n");
+    NumFailTooManyStores++;
+    ORE->emit([&]() {
+      return MachineOptimizationRemarkAnalysis(DEBUG_TYPE, "canPipelineLoop",
+                                               L.getStartLoc(), L.getHeader())
+             << "Too many store instructions in the loop: "
+             << ore::NV("NumStores", NumStores) << " > "
+             << ore::NV("SwpMaxNumStores", SwpMaxNumStores) << ".";
+    });
+    return false;
+  }
+
   // Remove any subregisters from inputs to phi nodes.
   preprocessPhiNodes(*L.getHeader());
   return true;
diff --git a/llvm/test/CodeGen/Hexagon/swp-many-stores.mir b/llvm/test/CodeGen/Hexagon/swp-many-stores.mir
new file mode 100644
index 0000000000000..bf14dcf3c4fb3
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/swp-many-stores.mir
@@ -0,0 +1,88 @@
+# RUN: llc -run-pass pipeliner -debug-only=pipeliner %s -o /dev/null -pipeliner-max-num-stores=5 2>&1 | FileCheck %s
+# REQUIRES: asserts
+
+# This loop has six stores, which exceeds the limit set by
+# `pipeliner-max-num-stores`.
+
+# CHECK: Too many stores
+
+--- |
+  target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048"
+  target triple = "hexagon-unknown-linux-musl"
+  
+  define void @f(ptr %a, i32 %n) #0 {
+  entry:
+    %guard = icmp sgt i32 %n, 0
+    %btc = sub nsw i32 %n, 1
+    br i1 %guard, label %loop.preheader, label %exit
+  
+  loop.preheader:                                   ; preds = %entry
+    %0 = add i32 %n, 1
+    %cgep = getelementptr i8, ptr %a, i32 %0
+    br label %loop
+  
+  loop:                                             ; preds = %loop.preheader, %loop
+    %lsr.iv = phi ptr [ %cgep, %loop.preheader ], [ %cgep8, %loop ]
+    %i = phi i32 [ %i.dec, %loop ], [ %btc, %loop.preheader ]
+    %cgep7 = getelementptr i8, ptr %lsr.iv, i32 -2
+    store i8 0, ptr %cgep7, align 1
+    %cgep8 = getelementptr i8, ptr %lsr.iv, i32 -1
+    store i8 1, ptr %cgep8, align 1
+    store i8 2, ptr %lsr.iv, align 1
+    %cgep9 = getelementptr i8, ptr %lsr.iv, i32 1
+    store i8 3, ptr %cgep9, align 1
+    %cgep10 = getelementptr i8, ptr %lsr.iv, i32 2
+    store i8 4, ptr %cgep10, align 1
+    %cgep11 = getelementptr i8, ptr %lsr.iv, i32 3
+    store i8 5, ptr %cgep11, align 1
+    %i.dec = sub i32 %i, 1
+    %ec = icmp eq i32 %i.dec, 0
+    br i1 %ec, label %exit, label %loop
+  
+  exit:                                             ; preds = %loop, %entry
+    ret void
+  }
+  
+  attributes #0 = { "target-cpu"="hexagonv79" }
+...
+---
+name:            f
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    successors: %bb.1(0x50000000), %bb.3(0x30000000)
+    liveins: $r0, $r1
+  
+    %7:intregs = COPY $r1
+    %6:intregs = COPY $r0
+    %8:predregs = C2_cmpgti %7, 0
+    J2_jumpf %8, %bb.3, implicit-def dead $pc
+    J2_jump %bb.1, implicit-def dead $pc
+  
+  bb.1.loop.preheader:
+    successors: %bb.2(0x80000000)
+  
+    %0:intregs = A2_addi %7, -1
+    %1:intregs = S4_addaddi %7, %6, 1
+    %10:intregs = A2_tfrsi 0
+    %11:intregs = A2_tfrsi 1
+    %14:intregs = COPY %0
+    J2_loop0r %bb.2, %14, implicit-def $lc0, implicit-def $sa0, implicit-def $usr
+  
+  bb.2.loop (machine-block-address-taken):
+    successors: %bb.3(0x04000000), %bb.2(0x7c000000)
+  
+    %2:intregs = PHI %1, %bb.1, %4, %bb.2
+    S2_storerb_io %2, -2, %10 :: (store (s8) into %ir.cgep7)
+    %4:intregs = A2_addi %2, -1
+    S2_storerb_io %2, -1, %11 :: (store (s8) into %ir.cgep8)
+    S4_storeirb_io %2, 0, 2 :: (store (s8) into %ir.lsr.iv)
+    S4_storeirb_io %2, 1, 3 :: (store (s8) into %ir.cgep9)
+    S4_storeirb_io %2, 2, 4 :: (store (s8) into %ir.cgep10)
+    S4_storeirb_io %2, 3, 5 :: (store (s8) into %ir.cgep11)
+    ENDLOOP0 %bb.2, implicit-def $pc, implicit-def $lc0, implicit $sa0, implicit $lc0
+    J2_jump %bb.3, implicit-def dead $pc
+  
+  bb.3.exit:
+    PS_jmpret $r31, implicit-def dead $pc
+...

kasuga-fj · 2025-10-16T12:10:49Z

Added test which was landed in #163350. Let me know if the commits should be squashed.

The dependency analysis in MachinePipeliner checks dependencies for every pair of store instructions in the target basic block. This means the time complexity of the analysis is `O(N^2)`, where `N` is the number of store instructions. Therefore, compilation time can become significantly long when there are too many store instructions. To mitigate it, this patch introduces logic to count the number of store instructions at the beginning of the pipeliner and bail out if it exceeds the threshold. The default value if the threshold should be large enough. Thus, in most practical cases where the pipeliner is beneficial, this patch should not cause any performance regression. Related issue: llvm#150262 (cherry picked from commit 22b79fb)

This PR adds a testcase where pipeliner bails out early because the number of the store instructions exceeds the threshold set by `pipeliner-max-num-stores`. The test should have been added in llvm#154940, but it was missed.

github-actions · 2025-10-17T12:09:09Z

@kasuga-fj (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR.

llvmbot added this to the LLVM 21.x Release milestone Oct 9, 2025

github-project-automation bot added this to LLVM Release Status Oct 9, 2025

github-project-automation bot moved this to Needs Triage in LLVM Release Status Oct 9, 2025

llvmbot requested a review from aankit-ca October 9, 2025 11:53

llvmbot mentioned this pull request Oct 9, 2025

hexagon compiler runs "forever" on matrix-spec-types at O2 #150262

Closed

llvmbot added the llvm:codegen label Oct 9, 2025

kasuga-fj mentioned this pull request Oct 10, 2025

[MachinePipeliner] Limit the number of stores in BB #154940

Merged

aankit-ca approved these changes Oct 10, 2025

View reviewed changes

github-project-automation bot moved this from Needs Triage to Needs Merge in LLVM Release Status Oct 10, 2025

c-rhodes moved this from Needs Merge to Needs Test Fix in LLVM Release Status Oct 14, 2025

llvmbot added the backend:Hexagon label Oct 16, 2025

aankit-ca approved these changes Oct 16, 2025

View reviewed changes

github-project-automation bot moved this from Needs Test Fix to Needs Merge in LLVM Release Status Oct 16, 2025

kasuga-fj added 2 commits October 17, 2025 12:08

c-rhodes force-pushed the issue150262 branch from 3971b10 to 464d75a Compare October 17, 2025 12:08

c-rhodes merged commit 464d75a into llvm:release/21.x Oct 17, 2025
1 check was pending

github-project-automation bot moved this from Needs Merge to Done in LLVM Release Status Oct 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

release/21.x: [MachinePipeliner] Limit the number of stores in BB (#154940) #162639

release/21.x: [MachinePipeliner] Limit the number of stores in BB (#154940) #162639

Uh oh!

llvmbot commented Oct 9, 2025

Uh oh!

llvmbot commented Oct 9, 2025

Uh oh!

c-rhodes commented Oct 13, 2025

Uh oh!

aankit-ca commented Oct 13, 2025

Uh oh!

kasuga-fj commented Oct 14, 2025

Uh oh!

llvmbot commented Oct 16, 2025

Uh oh!

kasuga-fj commented Oct 16, 2025

Uh oh!

Uh oh!

github-actions bot commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

release/21.x: [MachinePipeliner] Limit the number of stores in BB (#154940) #162639

release/21.x: [MachinePipeliner] Limit the number of stores in BB (#154940) #162639

Uh oh!

Conversation

llvmbot commented Oct 9, 2025

Uh oh!

llvmbot commented Oct 9, 2025

Uh oh!

c-rhodes commented Oct 13, 2025

Uh oh!

aankit-ca commented Oct 13, 2025

Uh oh!

kasuga-fj commented Oct 14, 2025

Uh oh!

llvmbot commented Oct 16, 2025

Uh oh!

kasuga-fj commented Oct 16, 2025

Uh oh!

Uh oh!

github-actions bot commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants