Skip to content

Conversation

@andcarminati
Copy link
Collaborator

@andcarminati andcarminati commented Oct 21, 2025

This work is intended to avoid 2D/3D (when possible) register spills.

The idea and rationale behind this work is in a previous Draft PR: #442.

To review, I recommend to follow this PR commit by commit.

Credits also for the co-author @krishnamtibrewala.

@andcarminati
Copy link
Collaborator Author

andcarminati commented Oct 21, 2025

QoR results:

Core_Insn_Count Core_StackSize_absolute Core_PMSize_absolute

@krishnamtibrewala
Copy link
Collaborator

Thanks you @andcarminati, very much !!

@krishnamtibrewala
Copy link
Collaborator

Also do you think following commit will help ?
a681b6e

@andcarminati
Copy link
Collaborator Author

Also do you think following commit will help ? a681b6e

Maybe yes! As mentioned before, I prefer to keep just the minimal necessary changes. We can test after, on top of this PR.

@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch from 31b7e71 to a27561f Compare October 22, 2025 08:35
@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch from a27561f to e124649 Compare October 31, 2025 14:07
@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch 3 times, most recently from 80e6f7c to 4c47705 Compare October 31, 2025 14:36
const AIEBaseRegisterInfo &TRI,
std::set<Register> &VisitedVRegs);

SmallSet<int, 8> getRewritableSubRegs(Register Reg,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we have a comment what this function does?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a refactor but it is always good to document.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we swap the documentation here?
I think the high level function should get the documentation, the actual implementation function (the one above this) should get the slimmed documentation.

}
Register SrcReg = RegOp.getParent()->getOperand(1).getReg();
if (!VisitedVRegs.count(SrcReg) &&
getRewritableSubRegs(SrcReg, MRI, TRI, VisitedVRegs).empty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it ever happen, that SrcReg has no SubRegs, but DstReg has them or vis versa?
Can they also have different SubRegs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part was a refactor, but as we are handling a full copy here, we can expect subregs on both sides.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a full copy, I mean a 2d to 2d copy or a 3d to 3d copy.

@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch 2 times, most recently from a88a541 to ab8c08d Compare November 7, 2025 13:02
@andcarminati
Copy link
Collaborator Author

There is no failures across all benchmarks.

@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch from eeedaa7 to 925f9d9 Compare November 13, 2025 11:07

MachineInstr *PartCopy = BuildMI(*MI.getParent(), MI, MI.getDebugLoc(),
TII.get(TargetOpcode::COPY))
.addReg(DstReg, RegState::Define, SubRegIdx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we do not need undefs here because we fully define the subregisters that we copy to?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we start the definition on the first copy, and we consider that next copies will read the previous one. In practice, inserting undefs in all copies will not hurt in general, but as is, we have a more accurate model for incremental definitions.


// Replace the original copy by the first one, so we automatically repair
// DstReg's LI.
LIS.ReplaceMachineInstrInMaps(MI, *FirstMI);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why aren't we inserting all the newly created machineinstructions into LIS?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because, later on, each new lane copy will make dead the previous one if we try to fix the LI. In this case, we say that the LR will start on the first copy and then it will simply continue though the next ones, until the next last use. I this way we prevent false dead flags everywhere.

}

/// Rewrite a full copy into multiple copies using the subregs in \p CopySubRegs
void rewriteFullCopy(MachineInstr &MI, const std::set<int> &CopySubRegs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MI is a copy, right? Could you rename MI to CopyMI, so it becomes clearer?

for (MachineInstr &MI : make_early_inc_range(MRI.reg_instructions(Reg))) {
if (MI.isFullCopy())
AIESuperRegUtils::rewriteFullCopy(
MI, TRI.getSubRegSplit(MRI.getRegClass(Reg)->getID()), LIS, *TII, TRI,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are also passing TRI, I think you don't need to pass CopySubRegs here

You can get MRI like this:
MachineRegisterInfo &MRI = MI.getMF()->getRegInfo();

VisitedVRegs.insert(Reg);
SmallSet<int, 8> UsedSubRegs;
for (MachineOperand &RegOp : MRI.reg_operands(Reg)) {
int SubReg = RegOp.getSubReg();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const

SmallSet<int, 8> UsedSubRegs;
for (MachineOperand &RegOp : MRI.reg_operands(Reg)) {
int SubReg = RegOp.getSubReg();
if (SubReg && SubRegSplit.count(SubReg)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are a subreg, shouldn't we automatically be in the SubRegSplit?
I think we can assert it instead of checking it in the condition.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will keep as is, this code was just refactored.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an assert crashes here because we cannot properly handle low and high parts of the 3d Registers, so a whole d0 or d4 register in the case of 3d_0.
They do not get returned from the TRI.getSubRegSplit()
Could you add a comment ?
Also do we know, why we did not add it to TRI.getSubRegSplit()?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not full copies but partial copies, that we currently don't handle

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are leaving something on the table here. not sure if it is needed for this PR though

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also do we know, why we did not add it to TRI.getSubRegSplit()?
We always rewrite into the smallest lanes as possible.

Here we are concerned with the things that we can rewrite after. Any update here should be aligned with the rewrite after, and it is not clear the benefits afterwards. I really don't want to complicate this PR even more. Any improvement could be done with an established baseline as follow up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included a TODO comment related to this investigation.


VisitedVRegs.insert(Reg);
SmallSet<int, 8> UsedSubRegs;
for (MachineOperand &RegOp : MRI.reg_operands(Reg)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment that we are walking all defs and uses to find subreg uses and full copies.

int SubReg = RegOp.getSubReg();
if (SubReg && SubRegSplit.count(SubReg)) {
UsedSubRegs.insert(SubReg);
} else if (RegOp.getParent()->isFullCopy()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we performing this check on uses?
Isn't this a check we can perform once on the DefI?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just a refactor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we don't handle copies uniform in this method?
we also encounter copies in this case here if (SubReg && SubRegSplit.count(SubReg))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we should be able to full expand the copy after. At this moment we don't handle cases like %7.sub_hi_dim:eds = COPY %6:ed.

}
UsedSubRegs.insert(SubRegSplit.begin(), SubRegSplit.end());
} else {
LLVM_DEBUG(dbgs() << " Cannot rewrite " << RegOp << " in "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a regular use/Def of the full register?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just a refactor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will catch things like %7.sub_hi_dim:eds = COPY %6:ed, not handled after.

@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch 3 times, most recently from 9e25710 to 8dba52e Compare November 19, 2025 10:11
}

/// Rewrite a full copy into multiple copies using the subregs in \p CopySubRegs
void rewriteFullCopy(MachineInstr &CopyMI, LiveIntervals &LIS,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: in the current state we will never rewrite subreg copies, e.g. copies of d0 or d4 in the case of a 3d_0 register.
Do you know why they are not relevant for the Superregrewriter?

Copy link
Collaborator Author

@andcarminati andcarminati Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this will never occur because it will be prevented by. getRewritableSubRegs.

}

// Rewrite full copies into multiple copies using subregs
for (MachineInstr &MI : make_early_inc_range(MRI.reg_instructions(Reg))) {
Copy link
Collaborator

@F-Stuckmann F-Stuckmann Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we rewrite Full copies again when we AssignedPhysReg is not set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that they may occur in case we have a VReg0_unallocated = COPY VReg1_unallocated. Considering Conv2D_bft16, we start with 5 3D virtual registers and end up creating 32 ones, so I this may be possible and I would like to keep this this possibility.

rewriteFullCopy(MI, LIS, *TII, TRI, VRM, LRM);
}

LLVM_DEBUG(dbgs() << " Splitting range " << LIS.getInterval(Reg) << "\n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we split ranges again when we AssignedPhysReg is not set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in this part of the function we change the proper registers by others with SubRegisterClass (allocated) and LargestLegalSuperClass (unallocated). Also, we are not doing this again, each pass covesr one subset: allocated or unallocated.

++VRegIdx) {
const Register Reg = Register::index2VirtReg(VRegIdx);

// Ignore un-used od already allocated registers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: or

//
// (c) Copyright 2025 Advanced Micro Devices, Inc. or its affiliates
//
//===----------------------------------------------------------------------===//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What registers aren't assigned after the previous 2d/3d reg allocs?
Are we changing here the copies and moves not the padds?
Could you Comment why this pass is necessary and the previous passes do not pick this up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will complement this part.

andcarminati and others added 9 commits November 21, 2025 03:32
Now we filter by register class and usage. Basically, we exclude here
instructions like copies and non-2D/3D ones.

Co-Authored-By: Krishnam Tibrewala <[email protected]>
The goal of this test is to check if we properly insert undef flag on the def side
of a expanded full copy.  On a sub-register def operand, it refers to the part of the
register that isn't written. A sub-register def implicitly reads the other parts of the
register being redefined unless the <undef> flag is set, and a missing flag can
force the related register to be inserted in liveout set of the predecessors block,
causing dominance problems.

Co-Authored-By: Krishnam Tibrewala <[email protected]>
This will handle properly use of non-dominating definitions. We also
change the handling of the destination registers in two parts:

*Copy expansion: we replace the ogininal index by the index of the first
lane copy to avoid the creation LRs with just one instruction, in this
way we keep que LI correct.

*Rewrite: reset dead flags if necessary.

Co-Authored-By: Krishnam Tibrewala <[email protected]>
If we don't need a full register, we can expand to individual lanes.

Co-Authored-By: Krishnam Tibrewala <[email protected]>
This avoids cycles in bundles that appear in VirtRegRewriter.
We also update LIs related to src and dst operands of those
expanded copies.

Co-Authored-By: Krishnam Tibrewala <[email protected]>
@andcarminati andcarminati force-pushed the andreu.extend.2d3d.allocation branch from 8dba52e to b8476f5 Compare November 21, 2025 12:34
@F-Stuckmann
Copy link
Collaborator

LGTM @martien-de-jong what do you say

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants