-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[RISCV] Check only demanded VTYPE fields in needVSETVLIPHI #90168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In needVSETVLIPHI we know that the VLs will be the same if the AVL is the output VL of another vsetvli and they have the same VLMAX, so we just check that the VTYPEs are the same. But we don't need to check all the fields if they're not demanded. This allows us to avoid a vsetvli in some cases where e.g. the VLMAX ratio is the same . (We still need the VLMAXes to be the same to make sure the VLs are the same, but we could potentially relax this to allow a smaller VLMAX where there's no risk of truncation)
|
@llvm/pr-subscribers-backend-risc-v Author: Luke Lau (lukel97) ChangesIn needVSETVLIPHI we know that the VLs will be the same if the AVL is the output VL of another vsetvli and they have the same VLMAX, so we just check that the VTYPEs are the same. But we don't need to check all the fields if they're not demanded. This allows us to avoid a vsetvli in some cases where e.g. the VLMAX ratio is the same . (We still need the VLMAXes to be the same to make sure the VLs are the same, but we could potentially relax this to allow a smaller VLMAX where there's no risk of truncation) Full diff: https://github.com/llvm/llvm-project/pull/90168.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
index c40b9031543fe2..60210b1d6b92bc 100644
--- a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
@@ -808,8 +808,8 @@ class RISCVInsertVSETVLI : public MachineFunctionPass {
private:
bool needVSETVLI(const MachineInstr &MI, const VSETVLIInfo &Require,
const VSETVLIInfo &CurInfo) const;
- bool needVSETVLIPHI(const VSETVLIInfo &Require,
- const MachineBasicBlock &MBB) const;
+ bool needVSETVLIPHI(const VSETVLIInfo &Require, const MachineBasicBlock &MBB,
+ const DemandedFields &Used) const;
void insertVSETVLI(MachineBasicBlock &MBB, MachineInstr &MI,
const VSETVLIInfo &Info, const VSETVLIInfo &PrevInfo);
void insertVSETVLI(MachineBasicBlock &MBB,
@@ -1318,7 +1318,8 @@ void RISCVInsertVSETVLI::computeIncomingVLVTYPE(const MachineBasicBlock &MBB) {
// be unneeded if the AVL is a phi node where all incoming values are VL
// outputs from the last VSETVLI in their respective basic blocks.
bool RISCVInsertVSETVLI::needVSETVLIPHI(const VSETVLIInfo &Require,
- const MachineBasicBlock &MBB) const {
+ const MachineBasicBlock &MBB,
+ const DemandedFields &Used) const {
if (DisableInsertVSETVLPHIOpt)
return true;
@@ -1350,10 +1351,14 @@ bool RISCVInsertVSETVLI::needVSETVLIPHI(const VSETVLIInfo &Require,
if (DefInfo != PBBExit)
return true;
- // Require has the same VL as PBBExit, so if the exit from the
- // predecessor has the VTYPE we are looking for we might be able
- // to avoid a VSETVLI.
- if (PBBExit.isUnknown() || !PBBExit.hasSameVTYPE(Require))
+ // Make sure Require has the same VLMAX as the predecessor block,
+ // otherwise the VL might be different.
+ if (PBBExit.isUnknown() || !PBBExit.hasSameVLMAX(Require))
+ return true;
+
+ // If the exit from the predecessor has the VTYPE we are looking
+ // for we might be able to avoid a VSETVLI.
+ if (!PBBExit.hasCompatibleVTYPE(Used, Require))
return true;
}
@@ -1392,7 +1397,8 @@ void RISCVInsertVSETVLI::emitVSETVLIs(MachineBasicBlock &MBB) {
// wouldn't be used and VL/VTYPE registers are correct. Note that
// we *do* need to model the state as if it changed as while the
// register contents are unchanged, the abstract model can change.
- if (!PrefixTransparent || needVSETVLIPHI(CurInfo, MBB))
+ if (!PrefixTransparent ||
+ needVSETVLIPHI(CurInfo, MBB, getDemanded(MI, MRI, ST)))
insertVSETVLI(MBB, MI, CurInfo, PrevInfo);
PrefixTransparent = false;
}
diff --git a/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll b/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll
index 4ff2fc7a5fff5d..f113f746dec297 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll
@@ -494,7 +494,6 @@ define void @saxpy_vec_demanded_fields(i64 %n, float %a, ptr nocapture readonly
; CHECK-NEXT: beqz a3, .LBB9_2
; CHECK-NEXT: .LBB9_1: # %for.body
; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
-; CHECK-NEXT: vsetvli zero, a3, e32, m8, ta, ma
; CHECK-NEXT: vle32.v v8, (a1)
; CHECK-NEXT: vle32.v v16, (a2)
; CHECK-NEXT: slli a4, a3, 2
|
preames
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unsound because the CurInfo state during rewriting will differ from the CurInfo computed during the forward data flow. The first instruction may not care, but later ones in the block (or following blocks!) may depend on the other fields.
There is a correct variant of this, but you end of with different states flowing through the block during emission, and the need to compensate. Not sure if it's worth the complexity?
Woops, agreeed I don't think its worth the complexity. The original motivation for this was to split out a change from #89089 needed to avoid a regression, but since RISCVInsertVSETVLI has changed plenty in the meantime I'll check if this is still actually needed first. |
|
Closing as I think there should be a better way to avoid the saxpy_vec regression in #89089. I'm not sure what it is yet though |
In needVSETVLIPHI we know that the VLs will be the same if the AVL is the output VL of another vsetvli and they have the same VLMAX, so we just check that the VTYPEs are the same. But we don't need to check all the fields if they're not demanded.
This allows us to avoid a vsetvli in some cases where e.g. the VLMAX ratio is the same .
(We still need the VLMAXes to be the same to make sure the VLs are the same, but we could potentially relax this to allow a smaller VLMAX where there's no risk of truncation)