-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[X86InstrInfo] support memfold on spillable inline asm #70832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-llvm-regalloc Author: Nick Desaulniers (nickdesaulniers) Changes[X86InstrInfo] support memfold on spillable inline asm This enables -regalloc=greedy to memfold spillable inline asm Because no instruction selection framework marks MachineOperands as Patch is 22.68 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/70832.diff 9 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/MachineInstr.h b/llvm/include/llvm/CodeGen/MachineInstr.h
index 4877f43e8578d1c..93e8ff389d65673 100644
--- a/llvm/include/llvm/CodeGen/MachineInstr.h
+++ b/llvm/include/llvm/CodeGen/MachineInstr.h
@@ -1364,6 +1364,9 @@ class MachineInstr
return getOpcode() == TargetOpcode::INLINEASM ||
getOpcode() == TargetOpcode::INLINEASM_BR;
}
+ /// Returns true if the memory operand can be folded. Does so by checking the
+ /// InlineAsm::Flag immediate operand at OpId - 1.
+ bool mayFoldInlineAsmMemOp(unsigned OpId) const;
bool isStackAligningInlineAsm() const;
InlineAsm::AsmDialect getInlineAsmDialect() const;
diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
index 8e7499ac626a747..ba98f52c27bbd2e 100644
--- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
@@ -2186,6 +2186,16 @@ class TargetInstrInfo : public MCInstrInfo {
// Get the call frame size just before MI.
unsigned getCallFrameSizeAt(MachineInstr &MI) const;
+ /// Fills in the necessary MachineOperands to refer to a frame index.
+ /// The best way to understand this is to print `asm(""::"m"(x));` after
+ /// finalize-isel. Example:
+ /// INLINEASM ... 262190 /* mem:m */, %stack.0.x.addr, 1, $noreg, 0, $noreg ...
+ /// we would add placeholders for: ^ ^ ^ ^
+ virtual void
+ getFrameIndexOperands(SmallVectorImpl<MachineOperand> &Ops) const {
+ llvm_unreachable("unknown number of operands necessary");
+ }
+
private:
mutable std::unique_ptr<MIRFormatter> Formatter;
unsigned CallFrameSetupOpcode, CallFrameDestroyOpcode;
diff --git a/llvm/include/llvm/IR/InlineAsm.h b/llvm/include/llvm/IR/InlineAsm.h
index 969ad42816a7e52..2d395a53608b0b7 100644
--- a/llvm/include/llvm/IR/InlineAsm.h
+++ b/llvm/include/llvm/IR/InlineAsm.h
@@ -291,18 +291,23 @@ class InlineAsm final : public Value {
// Bits 30-16 - A ConstraintCode:: value indicating the original
// constraint code. (MemConstraintCode)
// Else:
- // Bits 30-16 - The register class ID to use for the operand. (RegClass)
+ // Bits 29-16 - The register class ID to use for the operand. (RegClass)
+ // Bit 30 - If the register is permitted to be spilled.
+ // (RegMayBeSpilled)
+ // Defaults to false "r", may be set for constraints like
+ // "rm" (or "g").
//
- // As such, MatchedOperandNo, MemConstraintCode, and RegClass are views of
- // the same slice of bits, but are mutually exclusive depending on the
- // fields IsMatched then KindField.
+ // As such, MatchedOperandNo, MemConstraintCode, and
+ // (RegClass+RegMayBeSpilled) are views of the same slice of bits, but are
+ // mutually exclusive depending on the fields IsMatched then KindField.
class Flag {
uint32_t Storage;
using KindField = Bitfield::Element<Kind, 0, 3, Kind::Func>;
using NumOperands = Bitfield::Element<unsigned, 3, 13>;
using MatchedOperandNo = Bitfield::Element<unsigned, 16, 15>;
using MemConstraintCode = Bitfield::Element<ConstraintCode, 16, 15, ConstraintCode::Max>;
- using RegClass = Bitfield::Element<unsigned, 16, 15>;
+ using RegClass = Bitfield::Element<unsigned, 16, 14>;
+ using RegMayBeSpilled = Bitfield::Element<bool, 30, 1>;
using IsMatched = Bitfield::Element<bool, 31, 1>;
@@ -413,6 +418,26 @@ class InlineAsm final : public Value {
"Flag is not a memory or function constraint!");
Bitfield::set<MemConstraintCode>(Storage, ConstraintCode::Unknown);
}
+
+ /// Set a bit to denote that while this operand is some kind of register
+ /// (use, def, ...), a memory flag did appear in the original constraint
+ /// list. This is set by the instruction selection framework, and consumed
+ /// by the register allocator. While the register allocator is generally
+ /// responsible for spilling registers, we need to be able to distinguish
+ /// between registers that the register allocator has permission to spill
+ /// ("rm") vs ones it does not ("r"). This is because the inline asm may use
+ /// instructions which don't support memory addressing modes for that
+ /// operand.
+ void setRegMayBeSpilled(bool B) {
+ assert((isRegDefKind() || isRegDefEarlyClobberKind() || isRegUseKind()) &&
+ "Must be reg");
+ Bitfield::set<RegMayBeSpilled>(Storage, B);
+ }
+ bool getRegMayBeSpilled() const {
+ assert((isRegDefKind() || isRegDefEarlyClobberKind() || isRegUseKind()) &&
+ "Must be reg");
+ return Bitfield::get<RegMayBeSpilled>(Storage);
+ }
};
static std::vector<StringRef> getExtraInfoNames(unsigned ExtraInfo) {
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index 6e98e2384ef975f..f446e11427e75d4 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -146,6 +146,17 @@ void VirtRegAuxInfo::calculateSpillWeightAndHint(LiveInterval &LI) {
LI.setWeight(Weight);
}
+static bool canMemFoldInlineAsm(LiveInterval &LI,
+ const MachineRegisterInfo &MRI) {
+ for (const MachineOperand &MO : MRI.reg_operands(LI.reg())) {
+ const MachineInstr *MI = MO.getParent();
+ if (MI->isInlineAsm() && MI->mayFoldInlineAsmMemOp(MI->getOperandNo(&MO)))
+ return true;
+ }
+
+ return false;
+}
+
float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
SlotIndex *End) {
MachineRegisterInfo &MRI = MF.getRegInfo();
@@ -315,7 +326,7 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
// into instruction itself makes perfect sense.
if (ShouldUpdateLI && LI.isZeroLength(LIS.getSlotIndexes()) &&
!LI.isLiveAtIndexes(LIS.getRegMaskSlots()) &&
- !isLiveAtStatepointVarArg(LI)) {
+ !isLiveAtStatepointVarArg(LI) && !canMemFoldInlineAsm(LI, MRI)) {
LI.markNotSpillable();
return -1.0;
}
diff --git a/llvm/lib/CodeGen/MachineInstr.cpp b/llvm/lib/CodeGen/MachineInstr.cpp
index 048563cc2bcc4e4..92c789e85a205b4 100644
--- a/llvm/lib/CodeGen/MachineInstr.cpp
+++ b/llvm/lib/CodeGen/MachineInstr.cpp
@@ -1792,6 +1792,12 @@ void MachineInstr::print(raw_ostream &OS, ModuleSlotTracker &MST,
if (F.isUseOperandTiedToDef(TiedTo))
OS << " tiedto:$" << TiedTo;
+ if ((F.isRegDefKind() || F.isRegDefEarlyClobberKind() ||
+ F.isRegUseKind()) &&
+ F.getRegMayBeSpilled()) {
+ OS << " spillable";
+ }
+
OS << ']';
// Compute the index of the next operand descriptor.
@@ -2526,3 +2532,20 @@ void MachineInstr::insert(mop_iterator InsertBefore,
tieOperands(Tie1, Tie2);
}
}
+
+bool MachineInstr::mayFoldInlineAsmMemOp(unsigned OpId) const {
+ assert(OpId && "expected non-zero operand id");
+ assert(isInlineAsm() && "should only be used on inline asm");
+
+ if (!getOperand(OpId).isReg())
+ return false;
+
+ const MachineOperand &MD = getOperand(OpId - 1);
+ if (!MD.isImm())
+ return false;
+
+ InlineAsm::Flag F(MD.getImm());
+ if (F.isRegUseKind() || F.isRegDefKind() || F.isRegDefEarlyClobberKind())
+ return F.getRegMayBeSpilled();
+ return false;
+}
diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp
index fe7efb73a2dce83..5252eeaadb2aeb2 100644
--- a/llvm/lib/CodeGen/TargetInstrInfo.cpp
+++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp
@@ -565,6 +565,64 @@ static MachineInstr *foldPatchpoint(MachineFunction &MF, MachineInstr &MI,
return NewMI;
}
+static void foldInlineAsmMemOperand(MachineInstr *MI, unsigned OpNo, int FI,
+ const TargetInstrInfo &TII) {
+ MachineOperand &MO = MI->getOperand(OpNo);
+ const VirtRegInfo &RI = AnalyzeVirtRegInBundle(*MI, MO.getReg());
+
+ // If the machine operand is tied, untie it first.
+ if (MO.isTied()) {
+ unsigned TiedTo = MI->findTiedOperandIdx(OpNo);
+ MI->untieRegOperand(OpNo);
+ // Intentional recursion!
+ foldInlineAsmMemOperand(MI, TiedTo, FI, TII);
+ }
+
+ // Change the operand from a register to a frame index.
+ MO.ChangeToFrameIndex(FI, MO.getTargetFlags());
+
+ SmallVector<MachineOperand, 4> NewOps;
+ TII.getFrameIndexOperands(NewOps);
+ assert(!NewOps.empty() && "getFrameIndexOperands didn't create any operands");
+ MI->insert(MI->operands_begin() + OpNo + 1, NewOps);
+
+ // Change the previous operand to a MemKind InlineAsm::Flag. The second param
+ // is the per-target number of operands that represent the memory operand
+ // excluding this one (MD). This includes MO.
+ InlineAsm::Flag F(InlineAsm::Kind::Mem, NewOps.size() + 1);
+ F.setMemConstraint(InlineAsm::ConstraintCode::m);
+ MachineOperand &MD = MI->getOperand(OpNo - 1);
+ MD.setImm(F);
+
+ // Update mayload/maystore metadata.
+ MachineOperand &ExtraMO = MI->getOperand(InlineAsm::MIOp_ExtraInfo);
+ if (RI.Reads)
+ ExtraMO.setImm(ExtraMO.getImm() | InlineAsm::Extra_MayLoad);
+ if (RI.Writes)
+ ExtraMO.setImm(ExtraMO.getImm() | InlineAsm::Extra_MayStore);
+}
+
+// Returns nullptr if not possible to fold.
+static MachineInstr *foldInlineAsmMemOperand(MachineInstr &MI,
+ ArrayRef<unsigned> Ops, int FI,
+ const TargetInstrInfo &TII) {
+ assert(MI.isInlineAsm() && "wrong opcode");
+ if (Ops.size() > 1)
+ return nullptr;
+ unsigned Op = Ops[0];
+ assert(Op && "should never be first operand");
+ assert(MI.getOperand(Op).isReg() && "shouldn't be folding non-reg operands");
+
+ if (!MI.mayFoldInlineAsmMemOp(Op))
+ return nullptr;
+
+ MachineInstr &NewMI = TII.duplicate(*MI.getParent(), MI.getIterator(), MI);
+
+ foldInlineAsmMemOperand(&NewMI, Op, FI, TII);
+
+ return &NewMI;
+}
+
MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,
ArrayRef<unsigned> Ops, int FI,
LiveIntervals *LIS,
@@ -612,6 +670,8 @@ MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,
NewMI = foldPatchpoint(MF, MI, Ops, FI, *this);
if (NewMI)
MBB->insert(MI, NewMI);
+ } else if (MI.isInlineAsm()) {
+ NewMI = foldInlineAsmMemOperand(MI, Ops, FI, *this);
} else {
// Ask the target to do the actual folding.
NewMI = foldMemoryOperandImpl(MF, MI, Ops, MI, FI, LIS, VRM);
@@ -683,6 +743,8 @@ MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,
NewMI = foldPatchpoint(MF, MI, Ops, FrameIndex, *this);
if (NewMI)
NewMI = &*MBB.insert(MI, NewMI);
+ } else if (MI.isInlineAsm() && isLoadFromStackSlot(LoadMI, FrameIndex)) {
+ NewMI = foldInlineAsmMemOperand(MI, Ops, FrameIndex, *this);
} else {
// Ask the target to do the actual folding.
NewMI = foldMemoryOperandImpl(MF, MI, Ops, MI, LoadMI, LIS);
@@ -1639,6 +1701,10 @@ std::string TargetInstrInfo::createMIROperandComment(
if (F.isUseOperandTiedToDef(TiedTo))
OS << " tiedto:$" << TiedTo;
+ if ((F.isRegDefKind() || F.isRegDefEarlyClobberKind() || F.isRegUseKind()) &&
+ F.getRegMayBeSpilled())
+ OS << " spillable";
+
return OS.str();
}
diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index 4c6854da0ada3d2..2e17cd8ac88ff4a 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -10333,5 +10333,15 @@ void X86InstrInfo::genAlternativeCodeSequence(
}
}
+// See also: X86DAGToDAGISel::SelectInlineAsmMemoryOperand().
+void X86InstrInfo::getFrameIndexOperands(SmallVectorImpl<MachineOperand> &Ops) const {
+ Ops.append({
+ MachineOperand::CreateImm(1), // Scale
+ MachineOperand::CreateReg(0, false), // Index
+ MachineOperand::CreateImm(0), // Disp
+ MachineOperand::CreateReg(0, false), // Segment
+ });
+}
+
#define GET_INSTRINFO_HELPERS
#include "X86GenInstrInfo.inc"
diff --git a/llvm/lib/Target/X86/X86InstrInfo.h b/llvm/lib/Target/X86/X86InstrInfo.h
index e1199e20c318e24..b08bc66d5b72c19 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.h
+++ b/llvm/lib/Target/X86/X86InstrInfo.h
@@ -648,6 +648,8 @@ class X86InstrInfo final : public X86GenInstrInfo {
return false;
}
+ void getFrameIndexOperands(SmallVectorImpl<MachineOperand> &Ops) const override;
+
private:
/// This is a helper for convertToThreeAddress for 8 and 16-bit instructions.
/// We use 32-bit LEA to form 3-address code by promoting to a 32-bit
diff --git a/llvm/test/CodeGen/X86/inline-asm-rm-exhaustion.mir b/llvm/test/CodeGen/X86/inline-asm-rm-exhaustion.mir
new file mode 100644
index 000000000000000..97f0b561bba1c49
--- /dev/null
+++ b/llvm/test/CodeGen/X86/inline-asm-rm-exhaustion.mir
@@ -0,0 +1,232 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 3
+# RUN: llc -start-after=finalize-isel -regalloc=greedy -stop-after=greedy \
+# RUN: -verify-machineinstrs -verify-regalloc %s -o - | FileCheck %s
+--- |
+ target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i128:128-f64:32:64-f80:32-n8:16:32-S128"
+ target triple = "i386-unknown-linux-gnu"
+
+ define void @input(i32 %0) #0 {
+ call void asm "# $0", "rm,~{ax},~{cx},~{dx},~{si},~{di},~{bx},~{bp}"(i32 %0)
+ ret void
+ }
+
+ define i32 @output() #0 {
+ %1 = alloca i32, align 4
+ call void asm "# $0", "=*rm,~{ax},~{cx},~{dx},~{si},~{di},~{bx},~{bp}"(ptr nonnull elementtype(i32) %1)
+ %2 = load i32, ptr %1, align 4
+ ret i32 %2
+ }
+
+ define i32 @inout(i32 %0) #0 {
+ %2 = alloca i32, align 4
+ store i32 %0, ptr %2, align 4
+ call void asm "# $0 $1", "=*rm,0,~{ax},~{cx},~{dx},~{si},~{di},~{bx},~{bp}"(ptr nonnull elementtype(i32) %2, i32 %0)
+ %3 = load i32, ptr %2, align 4
+ ret i32 %3
+ }
+
+ attributes #0 = { nounwind }
+
+...
+---
+name: input
+alignment: 16
+exposesReturnsTwice: false
+legalized: false
+regBankSelected: false
+selected: false
+failedISel: false
+tracksRegLiveness: true
+hasWinCFI: false
+callsEHReturn: false
+callsUnwindInit: false
+hasEHCatchret: false
+hasEHScopes: false
+hasEHFunclets: false
+isOutlined: false
+debugInstrRef: false
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+ - { id: 0, class: gr32, preferred-register: '' }
+liveins: []
+frameInfo:
+ isFrameAddressTaken: false
+ isReturnAddressTaken: false
+ hasStackMap: false
+ hasPatchPoint: false
+ stackSize: 0
+ offsetAdjustment: 0
+ maxAlignment: 4
+ adjustsStack: false
+ hasCalls: false
+ stackProtector: ''
+ functionContext: ''
+ maxCallFrameSize: 4294967295
+ cvBytesOfCalleeSavedRegisters: 0
+ hasOpaqueSPAdjustment: false
+ hasVAStart: false
+ hasMustTailInVarArgFunc: false
+ hasTailCall: false
+ localFrameSize: 0
+ savePoint: ''
+ restorePoint: ''
+fixedStack:
+ - { id: 0, type: default, offset: 0, size: 4, alignment: 16, stack-id: default,
+ isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
+ debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
+stack: []
+entry_values: []
+callSites: []
+debugValueSubstitutions: []
+constants: []
+machineFunctionInfo: {}
+body: |
+ bb.0 (%ir-block.1):
+ ; CHECK-LABEL: name: input
+ ; CHECK: INLINEASM &"# $0", 8 /* mayload attdialect */, 262190 /* mem:m */, %fixed-stack.0, 1, $noreg, 0, $noreg, 12 /* clobber */, implicit-def dead early-clobber $ax, 12 /* clobber */, implicit-def dead early-clobber $cx, 12 /* clobber */, implicit-def dead early-clobber $dx, 12 /* clobber */, implicit-def dead early-clobber $si, 12 /* clobber */, implicit-def dead early-clobber $di, 12 /* clobber */, implicit-def dead early-clobber $bx, 12 /* clobber */, implicit-def dead early-clobber $bp :: (load (s32) from %fixed-stack.0, align 16)
+ ; CHECK-NEXT: RET 0
+ %0:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0, align 16)
+ INLINEASM &"# $0", 0 /* attdialect */, 1076101129 /* reguse:GR32 spillable */, %0, 12 /* clobber */, implicit-def early-clobber $ax, 12 /* clobber */, implicit-def early-clobber $cx, 12 /* clobber */, implicit-def early-clobber $dx, 12 /* clobber */, implicit-def early-clobber $si, 12 /* clobber */, implicit-def early-clobber $di, 12 /* clobber */, implicit-def early-clobber $bx, 12 /* clobber */, implicit-def early-clobber $bp
+ RET 0
+
+...
+---
+name: output
+alignment: 16
+exposesReturnsTwice: false
+legalized: false
+regBankSelected: false
+selected: false
+failedISel: false
+tracksRegLiveness: true
+hasWinCFI: false
+callsEHReturn: false
+callsUnwindInit: false
+hasEHCatchret: false
+hasEHScopes: false
+hasEHFunclets: false
+isOutlined: false
+debugInstrRef: false
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+ - { id: 0, class: gr32, preferred-register: '' }
+liveins: []
+frameInfo:
+ isFrameAddressTaken: false
+ isReturnAddressTaken: false
+ hasStackMap: false
+ hasPatchPoint: false
+ stackSize: 0
+ offsetAdjustment: 0
+ maxAlignment: 4
+ adjustsStack: false
+ hasCalls: false
+ stackProtector: ''
+ functionContext: ''
+ maxCallFrameSize: 4294967295
+ cvBytesOfCalleeSavedRegisters: 0
+ hasOpaqueSPAdjustment: false
+ hasVAStart: false
+ hasMustTailInVarArgFunc: false
+ hasTailCall: false
+ localFrameSize: 0
+ savePoint: ''
+ restorePoint: ''
+fixedStack: []
+stack:
+ - { id: 0, name: '', type: default, offset: 0, size: 4, alignment: 4,
+ stack-id: default, callee-saved-register: '', callee-saved-restored: true,
+ debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
+entry_values: []
+callSites: []
+debugValueSubstitutions: []
+constants: []
+machineFunctionInfo: {}
+body: |
+ bb.0 (%ir-block.0):
+ ; CHECK-LABEL: name: output
+ ; CHECK: INLINEASM &"# $0", 16 /* maystore attdialect */, 262190 /* mem:m */, %stack.1, 1, $noreg, 0, $noreg, 12 /* clobber */, implicit-def dead early-clobber $ax, 12 /* clobber */, implicit-def dead early-clobber $cx, 12 /* clobber */, implicit-def dead early-clobber $dx, 12 /* clobber */, implicit-def dead early-clobber $si, 12 /* clobber */, implicit-def dead early-clobber $di, 12 /* clobber */, implicit-def dead early-clobber $bx, 12 /* clobber */, implicit-def dead early-clobber $bp :: (store (s32) into %stack.1)
+ ; CHECK-NEXT: [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %stack.1, 1, $noreg, 0, $noreg :: (load (s32) from %stack.1)
+ ; CHECK-NEXT: MOV32mr %stack.0, 1, $noreg, 0, $noreg, [[MOV32rm]] :: (store (s32) into %ir.1)
+ ; CHECK-NEXT: $eax = COPY [[MOV32rm]]
+ ; CHECK-NEXT: RET 0, $eax
+ INLINEASM &"# $0", 0 /* attdialect */, 1076101130 /* regdef:GR32 spillable */, def %0, 12 /* clobber */, implicit-def early-clobber $ax, 12 /* clobber */, implicit-def early-clobber $cx, 12 /* clobber */, implicit-def early-clobber $dx, 12 /* clobber */, implicit-def early-clobber $si, 12 /* clobber */, implicit-def early-clobber $di, 12 /* clobber */, implicit-def early-clobber $bx, 12 /* clobber */, implicit-def early-clobber $bp
+ MOV32mr %stack.0, 1, $noreg, 0, $noreg, %0 :: (store (s32) into %ir.1)
+ $eax = COPY %0
+ RET 0, $eax
+
+...
+---
+name: inout
+alignment: 16
+exposesReturnsTwice: false
+legalized: false
+regBankSelected: false
+selected: false
+failedISel: false
+tracksRegLiveness: true
+hasWinCFI: false
+callsEHReturn: false
+callsUnwindInit: false
+hasEHCatchret: false
+hasEHScopes: false
+hasEHFunclets: false
+isOutlined: false
+debugInstrRef: false
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+ - { id: 0, class: gr32, preferred-register: '' }
+ - { id: 1, class: gr32, preferred-register: '' }
+liveins: []
+frameInfo:
+ isFrameAddressTaken: false
+ isReturnAddressTaken: false
+ hasStackMap: false
+ hasPatchPoint: false
+ stackSize: 0
+ offsetAdjustment: 0
+ ma...
[truncated]
|
|
Note to reviewers: this is a stacked commit which depends on: |
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
llvm/lib/Target/X86/X86InstrInfo.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if this could/should be expressed with X86AddressMode.
X86AddressMode M; M.getFullAddress(Ops) seems to be very close to what this is doing. Except that this API here does not explicitely add a "base" (reg or frame index), but maybe it should start doing that? (So in foldInlineAsmMemOperand just remove the register operand (instead of using ChangeToFrameIndex) and have this function add the frame index operand as well. That would also help if a target has a different order of operands...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh! yeah it does look potentially reusable.
Though the class seems very x86 specific. Would be nice if there was a target independent interface I could call from foldInlineAsmMemOperand (added in) https://github.com/llvm/llvm-project/pull/70743/files to get this info.
Let me check the users of X86AddressMode... worst case I could call it and drop the first operand.
and have this function add the frame index operand as well. That would also help if a target has a different order of operands...
I guess if X86AddressMode isn't a per-target interface, then in foldInlineAsmMemOperand I'd have to do some target specific hack, then we back to adding this X86InstrInfo::getFrameIndexOperands interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if X86AddressMode isn't a per-target interface, then in foldInlineAsmMemOperand I'd have to do some target specific hack
example of what I roughly have in mind
diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp
index 5ede36505b5b..bfbe477f97a3 100644
--- a/llvm/lib/CodeGen/TargetInstrInfo.cpp
+++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp
@@ -578,13 +578,21 @@ static void foldInlineAsmMemOperand(MachineInstr *MI, unsigned OpNo, int FI,
foldInlineAsmMemOperand(MI, TiedTo, FI, TII);
}
+ SmallVector<MachineOperand, 4> NewOps;
+ MachineOperand *InsertPt;
+ if (isa<X86GenInstrInfo>(TII)) {
+ X86AddressMode M;
+ M.getFullAddress(NewOps);
+ InsertPt = MI->operands_begin() + OpNo;
+ } else {
// Change the operand from a register to a frame index.
- MO.ChangeToFrameIndex(FI, MO.getTargetFlags());
+ MO.ChangeToFrameIndex(FI, MO.getTargetFlags());
+ TII.getFrameIndexOperands(NewOps);
+ InsertPt = MI->operands_begin() + OpNo + 1;
+ }
- SmallVector<MachineOperand, 4> NewOps;
- TII.getFrameIndexOperands(NewOps);
assert(!NewOps.empty() && "getFrameIndexOperands didn't create any operands");
- MI->insert(MI->operands_begin() + OpNo + 1, NewOps);
+ MI->insert(InsertPt, NewOps);
// Change the previous operand to a MemKind InlineAsm::Flag. The second param
// is the per-target number of operands that represent the memory operand
seems kind of ugly, no? Did you have a better usage of X86AddressMode in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressing modes are highly target specific. I was just proposing the use of X86AddressMode for the implementation of X86InstrInfo::getFrameIndexOperands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the excessive delays following up on this; have been traveling for conference season for the past two weeks. I've just not made the changes you've suggested, and I think I do indeed like it better. Will post a commit on top once my tests finish successfully locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 06c300b PTAL
6c756cd to
5617e18
Compare
This enables -regalloc=greedy to memfold spillable inline asm MachineOperands. Because no instruction selection framework marks MachineOperands as spillable, no language frontend can observe functional changes from this patch. That will change once instruction selection frameworks are updated.
- change X86InstrInfo::getFrameIndexOperands to use X86AddressMode
5617e18 to
06c300b
Compare
| ; CHECK-NEXT: [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %stack.1, 1, $noreg, 0, $noreg :: (load (s32) from %stack.1) | ||
| ; CHECK-NEXT: MOV32mr %stack.0, 1, $noreg, 0, $noreg, [[MOV32rm]] :: (store (s32) into %ir.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the target it to eliminate these instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the load and store is redundant. The INLINEASM could have just stored to %stack.0, eliding the store to %stack.1, load from %stack.1, then store to %stack.0. The code gen when there is memory pressure is sub-optimal because of the redundant store/load.
But it is greedy that determines that a spill is necessary, and creates the slot to spill to (in this case %stack.1). The code I add is just told "use %stack.1."
Could we optimize this case later, yes, but I don't think that should be part of this commit.
I guess I'm curious why there isn't a later pass that determines that:
(store (s32) into %stack.1)
(load (s32) from %stack.1)
(store (s32) into %ir.1)
can't be collapsed into:
(store (s32) into %ir.1)
Is there a pass that does so, but maybe bails when one of the instructions is INLINEASM? Or is it too late to know that nothing else depends on that initial store?
| ; CHECK-LABEL: name: inout | ||
| ; CHECK: [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0, align 16) | ||
| ; CHECK-NEXT: MOV32mr %stack.0, 1, $noreg, 0, $noreg, [[MOV32rm]] :: (store (s32) into %stack.0) | ||
| ; CHECK-NEXT: INLINEASM &"# $0 $1", 24 /* mayload maystore attdialect */, 262190 /* mem:m */, %stack.0, 1, $noreg, 0, $noreg, 262190 /* mem:m */, %stack.0, 1, $noreg, 0, $noreg, 12 /* clobber */, implicit-def dead early-clobber $ax, 12 /* clobber */, implicit-def dead early-clobber $cx, 12 /* clobber */, implicit-def dead early-clobber $dx, 12 /* clobber */, implicit-def dead early-clobber $si, 12 /* clobber */, implicit-def dead early-clobber $di, 12 /* clobber */, implicit-def dead early-clobber $bx, 12 /* clobber */, implicit-def dead early-clobber $bp :: (store (s32) into %stack.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how the extra info like (load (s32) from %fixed-stack.0, align 16) etc. being kept in the inlineasm when we have more than one operands? I saw we only kept (store (s32) into %stack.0) here.
Besides, we don't have a mem to mem instrcution on X86. Will it be lowered into two instructions when both operands are mem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how the extra info like (load (s32) from %fixed-stack.0, align 16) etc. being kept in the inlineasm when we have more than one operands? I saw we only kept (store (s32) into %stack.0) here.
Yeah, it is weird that for an input+output parameter it is printed as just a store (even though the extra info says both mayload, maystore. Let me do some debugging of the print method and see if I can give you a more precise answer FWIW.
Besides, we don't have a mem to mem instrcution on X86. Will it be lowered into two instructions when both operands are mem?
heh, we just had that exact discussion on the parent patch to this one.
In that case, the programmer was wrong to use "rm" for more than one operand, and should have used "r" (regardless if they are using clang or not, with or without my patches). Also, inline asm may contain multiple instructions, where the inputs/outputs to the inline asm are used as operands for different instructions, not the same instruction which would expose them to the issue you allude to if there was register pressure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me do some debugging of the print method and see if I can give you a more precise answer FWIW.
I believe it's a loop in MachineInstr::print that's iterating the memoperands() of the MachineInstr and calling print on them.
I'm not super familiar with the MachineMemOperand class. Let me triple check whether that loop is indeed the one printing the load/store, and if so, do I need to modify the memoperands() of the MachineInstr (beyond setting the extra info which is very specific to INLINEASM).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's a loop in MachineInstr::print that's iterating the memoperands() of the MachineInstr and calling print on them.
Ah, close, but it's the other mir printer: MIPrinter::print. So I wonder if I'm missing setting a memoperands() when doing the fold...let me look into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I need to add calls to MachineInstr::addMemOperand...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh! the caller TargetInstrInfo::foldMemoryOperand IS calling MachineInstr::addMemOperand, but IIUC it's only doing so once, which is a problem for in/out parameters since technically there's 2 distinct operands, let me work on fixing that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in e553a37 , PTAL
|
bumping for review 🦃 |
phoebewang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one question.
| SmallVector<MachineOperand, 4> NewOps; | ||
| TII.getFrameIndexOperands(NewOps); | ||
| SmallVector<MachineOperand, 5> NewOps; | ||
| TII.getFrameIndexOperands(NewOps, FI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old behavoir is to change to frame index. Does it mean we need to call this method in TargetInstrInfo::getFrameIndexOperands for other targets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and the previous behavior did already call TargetInstrInfo::getFrameIndexOperands. Instead of MachineOperand::ChangeToFrameIndex + TargetInstrInfo::getFrameIndexOperands, this change instead does TargetInstrInfo::getFrameIndexOperands (which now creates the MachineOperand we were changing) then MachineInstr::removeOperand (to remove the operand we were previously changing).
I have additional child commits locally for supporting other targets. I need to rebase them onto this change, which shouldn't be a problem; basically getFrameIndexOperands just needs to set up 3 MachineOperands for most targets but 5 MachineOperands for x86. Even then, I've realized that we only need to support x86 and m68k.
In commit b053359 ("[X86InstrInfo] support memfold on spillable inline asm (llvm#70832)"), I had a last minute fix to update the memoperands. I originally did this in the parent foldInlineAsmMemOperand call, updated the mir test via update_mir_test_checks.py, but then decided to move it to the child call of foldInlineAsmMemOperand. But I forgot to rerun update_mir_test_checks.py. That last minute change caused the same memoperand to be added twice when recursion occurred (for tied operands). I happened to get lucky that trailing content omitted from the CHECK line doesn't result in test failure. But rerunning update_mir_test_checks.py on the mir test added in that commit produces updated output. This is resulting in updates to the test that: 1. conflate additions to the test in child commits with simply updating the test as it should have been when first committed. 2. look wrong because the same memoperand is specified twice (we don't deduplicate memoperands when added). Example: INLINEASM ... :: (load (s32) from %stack.0) (load (s32) from %stack.0) Fix the bug, so that in child commits, we don't have additional unrelated test changes (which would be wrong anyways) from simply running update_mir_test_checks.py. Link: llvm#20571
In commit b053359 ("[X86InstrInfo] support memfold on spillable inline asm (#70832)"), I had a last minute fix to update the memoperands. I originally did this in the parent foldInlineAsmMemOperand call, updated the mir test via update_mir_test_checks.py, but then decided to move it to the child call of foldInlineAsmMemOperand. But I forgot to rerun update_mir_test_checks.py. That last minute change caused the same memoperand to be added twice when recursion occurred (for tied operands). I happened to get lucky that trailing content omitted from the CHECK line doesn't result in test failure. But rerunning update_mir_test_checks.py on the mir test added in that commit produces updated output. This is resulting in updates to the test that: 1. conflate additions to the test in child commits with simply updating the test as it should have been when first committed. 2. look wrong because the same memoperand is specified twice (we don't deduplicate memoperands when added). Example: INLINEASM ... :: (load (s32) from %stack.0) (load (s32) from %stack.0) Fix the bug, so that in child commits, we don't have additional unrelated test changes (which would be wrong anyways) from simply running update_mir_test_checks.py. Link: #20571
This enables -regalloc=greedy to memfold spillable inline asm
MachineOperands.
Because no instruction selection framework marks MachineOperands as
spillable, no language frontend can observe functional changes from this
patch. That will change once instruction selection frameworks are
updated.
Link: #20571