-
Notifications
You must be signed in to change notification settings - Fork 15.2k
AMDGPU: Handle V->A MFMA copy from case with immediate src2 #153023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMDGPU: Handle V->A MFMA copy from case with immediate src2 #153023
Conversation
|
@llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) ChangesHandle a special case for copies from AGPR VGPR on the MFMA inputs. Full diff: https://github.com/llvm/llvm-project/pull/153023.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
index b71c70db5e6b3..4e0d64a20690e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
@@ -375,13 +375,14 @@ bool AMDGPURewriteAGPRCopyMFMAImpl::tryFoldCopiesFromAGPR(
Register CopyDstReg = UseMI.getOperand(0).getReg();
if (!CopyDstReg.isVirtual())
continue;
+ for (MachineOperand &CopyUseMO : MRI.reg_nodbg_operands(CopyDstReg)) {
+ if (!CopyUseMO.readsReg())
+ continue;
- for (MachineInstr &CopyUseMI : MRI.use_instructions(CopyDstReg)) {
+ MachineInstr &CopyUseMI = *CopyUseMO.getParent();
if (isRewriteCandidate(CopyUseMI)) {
- const MachineOperand *Op =
- CopyUseMI.findRegisterUseOperand(CopyDstReg, /*TRI=*/nullptr);
- if (tryReassigningMFMAChain(CopyUseMI, Op->getOperandNo(),
- VRM.getPhys(Op->getReg())))
+ if (tryReassigningMFMAChain(CopyUseMI, CopyUseMO.getOperandNo(),
+ VRM.getPhys(CopyUseMO.getReg())))
MadeChange = true;
}
}
diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
index 632401b6128c5..17a72110767bb 100644
--- a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
@@ -187,8 +187,8 @@ body: |
; CHECK-NEXT: [[COPY1:%[0-9]+]]:av_64_align2 = COPY $vgpr0_vgpr1
; CHECK-NEXT: [[COPY2:%[0-9]+]]:av_64_align2 = COPY $vgpr2_vgpr3
; CHECK-NEXT: [[GLOBAL_LOAD_DWORDX4_:%[0-9]+]]:areg_128_align2 = GLOBAL_LOAD_DWORDX4 [[COPY]], 0, 0, implicit $exec :: (load (s128), addrspace 1)
- ; CHECK-NEXT: [[COPY3:%[0-9]+]]:vreg_128_align2 = COPY [[GLOBAL_LOAD_DWORDX4_]]
- ; CHECK-NEXT: [[COPY3:%[0-9]+]].sub0_sub1:vreg_128_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 [[COPY1]], [[COPY2]], 0, 0, 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:areg_128_align2 = COPY [[GLOBAL_LOAD_DWORDX4_]]
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]].sub0_sub1:areg_128_align2 = V_MFMA_F64_4X4X4F64_e64 [[COPY1]], [[COPY2]], 0, 0, 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: GLOBAL_STORE_DWORDX4 [[COPY]], [[COPY3]], 0, 0, implicit $exec :: (store (s128), addrspace 1)
; CHECK-NEXT: SI_RETURN
%0:vreg_64_align2 = COPY $vgpr4_vgpr5
|
431cefa to
5d234cc
Compare
ed69c54 to
3bd59fe
Compare
perlfu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
3bd59fe to
c73ac5e
Compare
5d234cc to
002114a
Compare
c73ac5e to
90d2381
Compare
002114a to
87bc565
Compare
90d2381 to
22d2495
Compare
87bc565 to
8a87d16
Compare
22d2495 to
8735fbf
Compare
2a2778f to
f2932c5
Compare
8735fbf to
db5f240
Compare
f2932c5 to
5d8dc9b
Compare
579e971 to
be46142
Compare
5d8dc9b to
968135b
Compare
Previously we handled the inverse situation only.
Handle a special case for copies from AGPR VGPR on the MFMA inputs. If the "input" is really a subregister def, we will not see the usual copy to VGPR for src2, only the read of the subregister def. Not sure if this pattern appears in practice.
be46142 to
d7f037a
Compare

Handle a special case for copies from AGPR VGPR on the MFMA inputs.
If the "input" is really a subregister def, we will not see the
usual copy to VGPR for src2, only the read of the subregister def.
Not sure if this pattern appears in practice.