-
Notifications
You must be signed in to change notification settings - Fork 15k
[llvm-mca][AMDGPU] Retire instructions that have issue carry over correctly #83881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llvm-mca][AMDGPU] Retire instructions that have issue carry over correctly #83881
Conversation
…rectly. llvm#83775 shows llvm-mca hits sanitizer error in cycleEnd. There was an instruction that takes multiple cycles to issue and is finished executing directly after issue. Prior to this patch, the instruction is retired on the first issue cycle, despite taking multiple cycles to issue. To fix this, if an instruction takes multiple cycles to issue and is done executing after issue, let updateCarriedOver retire the instruction when it is fully issued.
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-tools-llvm-mca Author: Michael Maitland (michaelmaitland) Changes#83775 shows llvm-mca hits sanitizer error in cycleEnd. There was an instruction that takes multiple cycles to issue and is finished executing directly after issue. Prior to this patch, the instruction is retired on the first issue cycle, despite taking multiple cycles to issue. To fix this, if an instruction takes multiple cycles to issue and is done executing after issue, let updateCarriedOver retire the instruction when it is fully issued. Full diff: https://github.com/llvm/llvm-project/pull/83881.diff 2 Files Affected:
diff --git a/llvm/lib/MCA/Stages/InOrderIssueStage.cpp b/llvm/lib/MCA/Stages/InOrderIssueStage.cpp
index 0f1737dc3cbc1f..ce0ddcbc74ffc5 100644
--- a/llvm/lib/MCA/Stages/InOrderIssueStage.cpp
+++ b/llvm/lib/MCA/Stages/InOrderIssueStage.cpp
@@ -257,8 +257,9 @@ llvm::Error InOrderIssueStage::tryIssue(InstRef &IR) {
}
// If the instruction has a latency of 0, we need to handle
- // the execution and retirement now.
- if (IS.isExecuted()) {
+ // the execution and retirement now. If the instruction is issued in multiple
+ // cycles, we cannot retire until it is finished issuing.
+ if (IS.isExecuted() && !ShouldCarryOver) {
PRF.onInstructionExecuted(&IS);
LSU.onInstructionExecuted(IR);
notifyEvent<HWInstructionEvent>(
@@ -299,7 +300,10 @@ void InOrderIssueStage::updateIssuedInst() {
notifyInstructionExecuted(IR);
++NumExecuted;
- retireInstruction(*I);
+ // Allow updateCarriedOver to retire the instruction if the instruction
+ // takes multiple cycles to issue.
+ if (!CarriedOver)
+ retireInstruction(*I);
std::iter_swap(I, E - NumExecuted);
}
@@ -329,6 +333,10 @@ void InOrderIssueStage::updateCarriedOver() {
else
Bandwidth -= CarryOver;
+ // updateIssuedInst did not retireInstruction if it was carried over.
+ if (CarriedOver.getInstruction()->isExecuted())
+ retireInstruction(CarriedOver);
+
CarriedOver = InstRef();
CarryOver = 0;
}
diff --git a/llvm/test/tools/llvm-mca/AMDGPU/carried-over.s b/llvm/test/tools/llvm-mca/AMDGPU/carried-over.s
new file mode 100644
index 00000000000000..5812dc8211fc91
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/AMDGPU/carried-over.s
@@ -0,0 +1,87 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=amdgcn -mcpu=gfx940 --timeline --iterations=1 --timeline-max-cycles=0 < %s | FileCheck %s
+
+v_pk_mov_b32 v[0:1], v[2:3], v[4:5]
+v_pk_add_f32 v[0:1], v[0:1], v[0:1]
+v_pk_mul_f32 v[0:1], v[0:1], v[0:1]
+v_add_co_u32 v5, s[0:1], v1, v2
+v_sub_co_u32 v5, s[0:1], v1, v2
+v_add_u32 v5, v1, v2
+v_sub_u32 v5, v1, v2
+
+# CHECK: Iterations: 1
+# CHECK-NEXT: Instructions: 7
+# CHECK-NEXT: Total Cycles: 10
+# CHECK-NEXT: Total uOps: 9
+
+# CHECK: Dispatch Width: 1
+# CHECK-NEXT: uOps Per Cycle: 0.90
+# CHECK-NEXT: IPC: 0.70
+# CHECK-NEXT: Block RThroughput: 9.0
+
+# CHECK: Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
+# CHECK-NEXT: 1 1 1.00 U v_pk_mov_b32 v[0:1], v[2:3], v[4:5]
+# CHECK-NEXT: 1 1 1.00 U v_pk_add_f32 v[0:1], v[0:1], v[0:1]
+# CHECK-NEXT: 1 1 1.00 U v_pk_mul_f32 v[0:1], v[0:1], v[0:1]
+# CHECK-NEXT: 2 1 1.00 U v_add_co_u32_e64 v5, s[0:1], v1, v2
+# CHECK-NEXT: 2 1 1.00 U v_sub_co_u32_e64 v5, s[0:1], v1, v2
+# CHECK-NEXT: 1 1 1.00 U v_add_u32_e32 v5, v1, v2
+# CHECK-NEXT: 1 1 1.00 U v_sub_u32_e32 v5, v1, v2
+
+# CHECK: Resources:
+# CHECK-NEXT: [0] - HWBranch
+# CHECK-NEXT: [1] - HWExport
+# CHECK-NEXT: [2] - HWLGKM
+# CHECK-NEXT: [3] - HWSALU
+# CHECK-NEXT: [4] - HWVALU
+# CHECK-NEXT: [5] - HWVMEM
+# CHECK-NEXT: [6] - HWXDL
+
+# CHECK: Resource pressure per iteration:
+# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6]
+# CHECK-NEXT: - - - 2.00 7.00 - -
+
+# CHECK: Resource pressure by instruction:
+# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] Instructions:
+# CHECK-NEXT: - - - - 1.00 - - v_pk_mov_b32 v[0:1], v[2:3], v[4:5]
+# CHECK-NEXT: - - - - 1.00 - - v_pk_add_f32 v[0:1], v[0:1], v[0:1]
+# CHECK-NEXT: - - - - 1.00 - - v_pk_mul_f32 v[0:1], v[0:1], v[0:1]
+# CHECK-NEXT: - - - 1.00 1.00 - - v_add_co_u32_e64 v5, s[0:1], v1, v2
+# CHECK-NEXT: - - - 1.00 1.00 - - v_sub_co_u32_e64 v5, s[0:1], v1, v2
+# CHECK-NEXT: - - - - 1.00 - - v_add_u32_e32 v5, v1, v2
+# CHECK-NEXT: - - - - 1.00 - - v_sub_u32_e32 v5, v1, v2
+
+# CHECK: Timeline view:
+# CHECK-NEXT: Index 0123456789
+
+# CHECK: [0,0] DE . . v_pk_mov_b32 v[0:1], v[2:3], v[4:5]
+# CHECK-NEXT: [0,1] .DE . . v_pk_add_f32 v[0:1], v[0:1], v[0:1]
+# CHECK-NEXT: [0,2] . DE . . v_pk_mul_f32 v[0:1], v[0:1], v[0:1]
+# CHECK-NEXT: [0,3] . DE. . v_add_co_u32_e64 v5, s[0:1], v1, v2
+# CHECK-NEXT: [0,4] . DER . v_sub_co_u32_e64 v5, s[0:1], v1, v2
+# CHECK-NEXT: [0,5] . . DE. v_add_u32_e32 v5, v1, v2
+# CHECK-NEXT: [0,6] . . DE v_sub_u32_e32 v5, v1, v2
+
+# CHECK: Average Wait times (based on the timeline view):
+# CHECK-NEXT: [0]: Executions
+# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
+# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
+# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
+
+# CHECK: [0] [1] [2] [3]
+# CHECK-NEXT: 0. 1 0.0 0.0 0.0 v_pk_mov_b32 v[0:1], v[2:3], v[4:5]
+# CHECK-NEXT: 1. 1 0.0 0.0 0.0 v_pk_add_f32 v[0:1], v[0:1], v[0:1]
+# CHECK-NEXT: 2. 1 0.0 0.0 0.0 v_pk_mul_f32 v[0:1], v[0:1], v[0:1]
+# CHECK-NEXT: 3. 1 0.0 0.0 0.0 v_add_co_u32_e64 v5, s[0:1], v1, v2
+# CHECK-NEXT: 4. 1 0.0 0.0 0.0 v_sub_co_u32_e64 v5, s[0:1], v1, v2
+# CHECK-NEXT: 5. 1 0.0 0.0 0.0 v_add_u32_e32 v5, v1, v2
+# CHECK-NEXT: 6. 1 0.0 0.0 0.0 v_sub_u32_e32 v5, v1, v2
+# CHECK-NEXT: 1 0.0 0.0 0.0 <total>
|
f0f4f60
to
fdd62e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
It turns out this solution has a problem that I didn't catch when I checked locally. Going to have to rework this patch a bit. |
fc397e6
to
0e9b355
Compare
I think the latest fixup solves the problem I was facing. This patch is ready for review. |
a987421
to
1e5febd
Compare
#83775 shows llvm-mca hits sanitizer error in cycleEnd. There was an instruction that takes multiple cycles to issue and is finished executing directly after issue. Prior to this patch, the instruction is retired on the first issue cycle, despite taking multiple cycles to issue.
To fix this, if an instruction takes multiple cycles to issue and is done executing after issue, let updateCarriedOver retire the instruction when it is fully issued.