-
Notifications
You must be signed in to change notification settings - Fork 15.2k
release/21.x: [MachinePipeliner] Limit the number of stores in BB (#154940) #162639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@aankit-ca What do you think about merging this PR to the release branch? |
|
generally seems ok, but I noticed there's no tests on the original PR, was this missed or were any added later? |
|
I think we missed the testcase here, @kasuga-fj Can you please add a small testcase on this PR and also on main too? |
|
Ah, sorry, that is my silly mistake. Thanks for catching it, I'll submit a PR to add a testcase on main, and request backporting after it's merged. |
|
@llvm/pr-subscribers-backend-hexagon Author: None (llvmbot) ChangesBackport 22b79fb Requested by: @kasuga-fj Full diff: https://github.com/llvm/llvm-project/pull/162639.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/MachinePipeliner.cpp b/llvm/lib/CodeGen/MachinePipeliner.cpp
index 90005bd181f3a..0e7cb0c980d40 100644
--- a/llvm/lib/CodeGen/MachinePipeliner.cpp
+++ b/llvm/lib/CodeGen/MachinePipeliner.cpp
@@ -110,6 +110,7 @@ STATISTIC(NumFailZeroMII, "Pipeliner abort due to zero MII");
STATISTIC(NumFailNoSchedule, "Pipeliner abort due to no schedule found");
STATISTIC(NumFailZeroStage, "Pipeliner abort due to zero stage");
STATISTIC(NumFailLargeMaxStage, "Pipeliner abort due to too many stages");
+STATISTIC(NumFailTooManyStores, "Pipeliner abort due to too many stores");
/// A command line option to turn software pipelining on or off.
static cl::opt<bool> EnableSWP("enable-pipeliner", cl::Hidden, cl::init(true),
@@ -193,6 +194,13 @@ static cl::opt<bool>
MVECodeGen("pipeliner-mve-cg", cl::Hidden, cl::init(false),
cl::desc("Use the MVE code generator for software pipelining"));
+/// A command line argument to limit the number of store instructions in the
+/// target basic block.
+static cl::opt<unsigned> SwpMaxNumStores(
+ "pipeliner-max-num-stores",
+ cl::desc("Maximum number of stores allwed in the target loop."), cl::Hidden,
+ cl::init(200));
+
namespace llvm {
// A command line option to enable the CopyToPhi DAG mutation.
@@ -544,6 +552,23 @@ bool MachinePipeliner::canPipelineLoop(MachineLoop &L) {
return false;
}
+ unsigned NumStores = 0;
+ for (MachineInstr &MI : *L.getHeader())
+ if (MI.mayStore())
+ ++NumStores;
+ if (NumStores > SwpMaxNumStores) {
+ LLVM_DEBUG(dbgs() << "Too many stores\n");
+ NumFailTooManyStores++;
+ ORE->emit([&]() {
+ return MachineOptimizationRemarkAnalysis(DEBUG_TYPE, "canPipelineLoop",
+ L.getStartLoc(), L.getHeader())
+ << "Too many store instructions in the loop: "
+ << ore::NV("NumStores", NumStores) << " > "
+ << ore::NV("SwpMaxNumStores", SwpMaxNumStores) << ".";
+ });
+ return false;
+ }
+
// Remove any subregisters from inputs to phi nodes.
preprocessPhiNodes(*L.getHeader());
return true;
diff --git a/llvm/test/CodeGen/Hexagon/swp-many-stores.mir b/llvm/test/CodeGen/Hexagon/swp-many-stores.mir
new file mode 100644
index 0000000000000..bf14dcf3c4fb3
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/swp-many-stores.mir
@@ -0,0 +1,88 @@
+# RUN: llc -run-pass pipeliner -debug-only=pipeliner %s -o /dev/null -pipeliner-max-num-stores=5 2>&1 | FileCheck %s
+# REQUIRES: asserts
+
+# This loop has six stores, which exceeds the limit set by
+# `pipeliner-max-num-stores`.
+
+# CHECK: Too many stores
+
+--- |
+ target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048"
+ target triple = "hexagon-unknown-linux-musl"
+
+ define void @f(ptr %a, i32 %n) #0 {
+ entry:
+ %guard = icmp sgt i32 %n, 0
+ %btc = sub nsw i32 %n, 1
+ br i1 %guard, label %loop.preheader, label %exit
+
+ loop.preheader: ; preds = %entry
+ %0 = add i32 %n, 1
+ %cgep = getelementptr i8, ptr %a, i32 %0
+ br label %loop
+
+ loop: ; preds = %loop.preheader, %loop
+ %lsr.iv = phi ptr [ %cgep, %loop.preheader ], [ %cgep8, %loop ]
+ %i = phi i32 [ %i.dec, %loop ], [ %btc, %loop.preheader ]
+ %cgep7 = getelementptr i8, ptr %lsr.iv, i32 -2
+ store i8 0, ptr %cgep7, align 1
+ %cgep8 = getelementptr i8, ptr %lsr.iv, i32 -1
+ store i8 1, ptr %cgep8, align 1
+ store i8 2, ptr %lsr.iv, align 1
+ %cgep9 = getelementptr i8, ptr %lsr.iv, i32 1
+ store i8 3, ptr %cgep9, align 1
+ %cgep10 = getelementptr i8, ptr %lsr.iv, i32 2
+ store i8 4, ptr %cgep10, align 1
+ %cgep11 = getelementptr i8, ptr %lsr.iv, i32 3
+ store i8 5, ptr %cgep11, align 1
+ %i.dec = sub i32 %i, 1
+ %ec = icmp eq i32 %i.dec, 0
+ br i1 %ec, label %exit, label %loop
+
+ exit: ; preds = %loop, %entry
+ ret void
+ }
+
+ attributes #0 = { "target-cpu"="hexagonv79" }
+...
+---
+name: f
+tracksRegLiveness: true
+body: |
+ bb.0.entry:
+ successors: %bb.1(0x50000000), %bb.3(0x30000000)
+ liveins: $r0, $r1
+
+ %7:intregs = COPY $r1
+ %6:intregs = COPY $r0
+ %8:predregs = C2_cmpgti %7, 0
+ J2_jumpf %8, %bb.3, implicit-def dead $pc
+ J2_jump %bb.1, implicit-def dead $pc
+
+ bb.1.loop.preheader:
+ successors: %bb.2(0x80000000)
+
+ %0:intregs = A2_addi %7, -1
+ %1:intregs = S4_addaddi %7, %6, 1
+ %10:intregs = A2_tfrsi 0
+ %11:intregs = A2_tfrsi 1
+ %14:intregs = COPY %0
+ J2_loop0r %bb.2, %14, implicit-def $lc0, implicit-def $sa0, implicit-def $usr
+
+ bb.2.loop (machine-block-address-taken):
+ successors: %bb.3(0x04000000), %bb.2(0x7c000000)
+
+ %2:intregs = PHI %1, %bb.1, %4, %bb.2
+ S2_storerb_io %2, -2, %10 :: (store (s8) into %ir.cgep7)
+ %4:intregs = A2_addi %2, -1
+ S2_storerb_io %2, -1, %11 :: (store (s8) into %ir.cgep8)
+ S4_storeirb_io %2, 0, 2 :: (store (s8) into %ir.lsr.iv)
+ S4_storeirb_io %2, 1, 3 :: (store (s8) into %ir.cgep9)
+ S4_storeirb_io %2, 2, 4 :: (store (s8) into %ir.cgep10)
+ S4_storeirb_io %2, 3, 5 :: (store (s8) into %ir.cgep11)
+ ENDLOOP0 %bb.2, implicit-def $pc, implicit-def $lc0, implicit $sa0, implicit $lc0
+ J2_jump %bb.3, implicit-def dead $pc
+
+ bb.3.exit:
+ PS_jmpret $r31, implicit-def dead $pc
+...
|
|
Added test which was landed in #163350. Let me know if the commits should be squashed. |
The dependency analysis in MachinePipeliner checks dependencies for every pair of store instructions in the target basic block. This means the time complexity of the analysis is `O(N^2)`, where `N` is the number of store instructions. Therefore, compilation time can become significantly long when there are too many store instructions. To mitigate it, this patch introduces logic to count the number of store instructions at the beginning of the pipeliner and bail out if it exceeds the threshold. The default value if the threshold should be large enough. Thus, in most practical cases where the pipeliner is beneficial, this patch should not cause any performance regression. Related issue: llvm#150262 (cherry picked from commit 22b79fb)
This PR adds a testcase where pipeliner bails out early because the number of the store instructions exceeds the threshold set by `pipeliner-max-num-stores`. The test should have been added in llvm#154940, but it was missed.
|
@kasuga-fj (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR. |
Backport 22b79fb
Requested by: @kasuga-fj