[X86InstrInfo] support memfold on spillable inline asm #70832

nickdesaulniers · 2023-10-31T17:14:43Z

This enables -regalloc=greedy to memfold spillable inline asm
MachineOperands.

Because no instruction selection framework marks MachineOperands as
spillable, no language frontend can observe functional changes from this
patch. That will change once instruction selection frameworks are
updated.

Link: #20571

llvmbot · 2023-10-31T17:15:44Z

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-regalloc

Author: Nick Desaulniers (nickdesaulniers)

Changes

[X86InstrInfo] support memfold on spillable inline asm

This enables -regalloc=greedy to memfold spillable inline asm
MachineOperands.

Because no instruction selection framework marks MachineOperands as
spillable, no language frontend can observe functional changes from this
patch. That will change once instruction selection frameworks are
updated.

Patch is 22.68 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/70832.diff

9 Files Affected:

(modified) llvm/include/llvm/CodeGen/MachineInstr.h (+3)
(modified) llvm/include/llvm/CodeGen/TargetInstrInfo.h (+10)
(modified) llvm/include/llvm/IR/InlineAsm.h (+30-5)
(modified) llvm/lib/CodeGen/CalcSpillWeights.cpp (+12-1)
(modified) llvm/lib/CodeGen/MachineInstr.cpp (+23)
(modified) llvm/lib/CodeGen/TargetInstrInfo.cpp (+66)
(modified) llvm/lib/Target/X86/X86InstrInfo.cpp (+10)
(modified) llvm/lib/Target/X86/X86InstrInfo.h (+2)
(added) llvm/test/CodeGen/X86/inline-asm-rm-exhaustion.mir (+232)

diff --git a/llvm/include/llvm/CodeGen/MachineInstr.h b/llvm/include/llvm/CodeGen/MachineInstr.h
index 4877f43e8578d1c..93e8ff389d65673 100644
--- a/llvm/include/llvm/CodeGen/MachineInstr.h
+++ b/llvm/include/llvm/CodeGen/MachineInstr.h
@@ -1364,6 +1364,9 @@ class MachineInstr
     return getOpcode() == TargetOpcode::INLINEASM ||
            getOpcode() == TargetOpcode::INLINEASM_BR;
   }
+  /// Returns true if the memory operand can be folded. Does so by checking the
+  /// InlineAsm::Flag immediate operand at OpId - 1.
+  bool mayFoldInlineAsmMemOp(unsigned OpId) const;
 
   bool isStackAligningInlineAsm() const;
   InlineAsm::AsmDialect getInlineAsmDialect() const;
diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
index 8e7499ac626a747..ba98f52c27bbd2e 100644
--- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
@@ -2186,6 +2186,16 @@ class TargetInstrInfo : public MCInstrInfo {
   // Get the call frame size just before MI.
   unsigned getCallFrameSizeAt(MachineInstr &MI) const;
 
+  /// Fills in the necessary MachineOperands to refer to a frame index.
+  /// The best way to understand this is to print `asm(""::"m"(x));` after
+  /// finalize-isel. Example:
+  /// INLINEASM ... 262190 /* mem:m */, %stack.0.x.addr, 1, $noreg, 0, $noreg ...
+  /// we would add placeholders for:                     ^  ^       ^  ^
+  virtual void
+  getFrameIndexOperands(SmallVectorImpl<MachineOperand> &Ops) const {
+    llvm_unreachable("unknown number of operands necessary");
+  }
+
 private:
   mutable std::unique_ptr<MIRFormatter> Formatter;
   unsigned CallFrameSetupOpcode, CallFrameDestroyOpcode;
diff --git a/llvm/include/llvm/IR/InlineAsm.h b/llvm/include/llvm/IR/InlineAsm.h
index 969ad42816a7e52..2d395a53608b0b7 100644
--- a/llvm/include/llvm/IR/InlineAsm.h
+++ b/llvm/include/llvm/IR/InlineAsm.h
@@ -291,18 +291,23 @@ class InlineAsm final : public Value {
   //     Bits 30-16 - A ConstraintCode:: value indicating the original
   //                  constraint code. (MemConstraintCode)
   //   Else:
-  //     Bits 30-16 - The register class ID to use for the operand. (RegClass)
+  //     Bits 29-16 - The register class ID to use for the operand. (RegClass)
+  //     Bit  30    - If the register is permitted to be spilled.
+  //                  (RegMayBeSpilled)
+  //                  Defaults to false "r", may be set for constraints like
+  //                  "rm" (or "g").
   //
-  //   As such, MatchedOperandNo, MemConstraintCode, and RegClass are views of
-  //   the same slice of bits, but are mutually exclusive depending on the
-  //   fields IsMatched then KindField.
+  //   As such, MatchedOperandNo, MemConstraintCode, and
+  //   (RegClass+RegMayBeSpilled) are views of the same slice of bits, but are
+  //   mutually exclusive depending on the fields IsMatched then KindField.
   class Flag {
     uint32_t Storage;
     using KindField = Bitfield::Element<Kind, 0, 3, Kind::Func>;
     using NumOperands = Bitfield::Element<unsigned, 3, 13>;
     using MatchedOperandNo = Bitfield::Element<unsigned, 16, 15>;
     using MemConstraintCode = Bitfield::Element<ConstraintCode, 16, 15, ConstraintCode::Max>;
-    using RegClass = Bitfield::Element<unsigned, 16, 15>;
+    using RegClass = Bitfield::Element<unsigned, 16, 14>;
+    using RegMayBeSpilled = Bitfield::Element<bool, 30, 1>;
     using IsMatched = Bitfield::Element<bool, 31, 1>;
 
 
@@ -413,6 +418,26 @@ class InlineAsm final : public Value {
              "Flag is not a memory or function constraint!");
       Bitfield::set<MemConstraintCode>(Storage, ConstraintCode::Unknown);
     }
+
+    /// Set a bit to denote that while this operand is some kind of register
+    /// (use, def, ...), a memory flag did appear in the original constraint
+    /// list.  This is set by the instruction selection framework, and consumed
+    /// by the register allocator. While the register allocator is generally
+    /// responsible for spilling registers, we need to be able to distinguish
+    /// between registers that the register allocator has permission to spill
+    /// ("rm") vs ones it does not ("r"). This is because the inline asm may use
+    /// instructions which don't support memory addressing modes for that
+    /// operand.
+    void setRegMayBeSpilled(bool B) {
+      assert((isRegDefKind() || isRegDefEarlyClobberKind() || isRegUseKind()) &&
+             "Must be reg");
+      Bitfield::set<RegMayBeSpilled>(Storage, B);
+    }
+    bool getRegMayBeSpilled() const {
+      assert((isRegDefKind() || isRegDefEarlyClobberKind() || isRegUseKind()) &&
+             "Must be reg");
+      return Bitfield::get<RegMayBeSpilled>(Storage);
+    }
   };
 
   static std::vector<StringRef> getExtraInfoNames(unsigned ExtraInfo) {
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index 6e98e2384ef975f..f446e11427e75d4 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -146,6 +146,17 @@ void VirtRegAuxInfo::calculateSpillWeightAndHint(LiveInterval &LI) {
   LI.setWeight(Weight);
 }
 
+static bool canMemFoldInlineAsm(LiveInterval &LI,
+                                const MachineRegisterInfo &MRI) {
+  for (const MachineOperand &MO : MRI.reg_operands(LI.reg())) {
+    const MachineInstr *MI = MO.getParent();
+    if (MI->isInlineAsm() && MI->mayFoldInlineAsmMemOp(MI->getOperandNo(&MO)))
+      return true;
+  }
+
+  return false;
+}
+
 float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
                                        SlotIndex *End) {
   MachineRegisterInfo &MRI = MF.getRegInfo();
@@ -315,7 +326,7 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
   // into instruction itself makes perfect sense.
   if (ShouldUpdateLI && LI.isZeroLength(LIS.getSlotIndexes()) &&
       !LI.isLiveAtIndexes(LIS.getRegMaskSlots()) &&
-      !isLiveAtStatepointVarArg(LI)) {
+      !isLiveAtStatepointVarArg(LI) && !canMemFoldInlineAsm(LI, MRI)) {
     LI.markNotSpillable();
     return -1.0;
   }
diff --git a/llvm/lib/CodeGen/MachineInstr.cpp b/llvm/lib/CodeGen/MachineInstr.cpp
index 048563cc2bcc4e4..92c789e85a205b4 100644
--- a/llvm/lib/CodeGen/MachineInstr.cpp
+++ b/llvm/lib/CodeGen/MachineInstr.cpp
@@ -1792,6 +1792,12 @@ void MachineInstr::print(raw_ostream &OS, ModuleSlotTracker &MST,
       if (F.isUseOperandTiedToDef(TiedTo))
         OS << " tiedto:$" << TiedTo;
 
+      if ((F.isRegDefKind() || F.isRegDefEarlyClobberKind() ||
+           F.isRegUseKind()) &&
+          F.getRegMayBeSpilled()) {
+        OS << " spillable";
+      }
+
       OS << ']';
 
       // Compute the index of the next operand descriptor.
@@ -2526,3 +2532,20 @@ void MachineInstr::insert(mop_iterator InsertBefore,
     tieOperands(Tie1, Tie2);
   }
 }
+
+bool MachineInstr::mayFoldInlineAsmMemOp(unsigned OpId) const {
+  assert(OpId && "expected non-zero operand id");
+  assert(isInlineAsm() && "should only be used on inline asm");
+
+  if (!getOperand(OpId).isReg())
+    return false;
+
+  const MachineOperand &MD = getOperand(OpId - 1);
+  if (!MD.isImm())
+    return false;
+
+  InlineAsm::Flag F(MD.getImm());
+  if (F.isRegUseKind() || F.isRegDefKind() || F.isRegDefEarlyClobberKind())
+    return F.getRegMayBeSpilled();
+  return false;
+}
diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp
index fe7efb73a2dce83..5252eeaadb2aeb2 100644
--- a/llvm/lib/CodeGen/TargetInstrInfo.cpp
+++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp
@@ -565,6 +565,64 @@ static MachineInstr *foldPatchpoint(MachineFunction &MF, MachineInstr &MI,
   return NewMI;
 }
 
+static void foldInlineAsmMemOperand(MachineInstr *MI, unsigned OpNo, int FI,
+                                    const TargetInstrInfo &TII) {
+  MachineOperand &MO = MI->getOperand(OpNo);
+  const VirtRegInfo &RI = AnalyzeVirtRegInBundle(*MI, MO.getReg());
+
+  // If the machine operand is tied, untie it first.
+  if (MO.isTied()) {
+    unsigned TiedTo = MI->findTiedOperandIdx(OpNo);
+    MI->untieRegOperand(OpNo);
+    // Intentional recursion!
+    foldInlineAsmMemOperand(MI, TiedTo, FI, TII);
+  }
+
+  // Change the operand from a register to a frame index.
+  MO.ChangeToFrameIndex(FI, MO.getTargetFlags());
+
+  SmallVector<MachineOperand, 4> NewOps;
+  TII.getFrameIndexOperands(NewOps);
+  assert(!NewOps.empty() && "getFrameIndexOperands didn't create any operands");
+  MI->insert(MI->operands_begin() + OpNo + 1, NewOps);
+
+  // Change the previous operand to a MemKind InlineAsm::Flag. The second param
+  // is the per-target number of operands that represent the memory operand
+  // excluding this one (MD). This includes MO.
+  InlineAsm::Flag F(InlineAsm::Kind::Mem, NewOps.size() + 1);
+  F.setMemConstraint(InlineAsm::ConstraintCode::m);
+  MachineOperand &MD = MI->getOperand(OpNo - 1);
+  MD.setImm(F);
+
+  // Update mayload/maystore metadata.
+  MachineOperand &ExtraMO = MI->getOperand(InlineAsm::MIOp_ExtraInfo);
+  if (RI.Reads)
+    ExtraMO.setImm(ExtraMO.getImm() | InlineAsm::Extra_MayLoad);
+  if (RI.Writes)
+    ExtraMO.setImm(ExtraMO.getImm() | InlineAsm::Extra_MayStore);
+}
+
+// Returns nullptr if not possible to fold.
+static MachineInstr *foldInlineAsmMemOperand(MachineInstr &MI,
+                                             ArrayRef<unsigned> Ops, int FI,
+                                             const TargetInstrInfo &TII) {
+  assert(MI.isInlineAsm() && "wrong opcode");
+  if (Ops.size() > 1)
+    return nullptr;
+  unsigned Op = Ops[0];
+  assert(Op && "should never be first operand");
+  assert(MI.getOperand(Op).isReg() && "shouldn't be folding non-reg operands");
+
+  if (!MI.mayFoldInlineAsmMemOp(Op))
+    return nullptr;
+
+  MachineInstr &NewMI = TII.duplicate(*MI.getParent(), MI.getIterator(), MI);
+
+  foldInlineAsmMemOperand(&NewMI, Op, FI, TII);
+
+  return &NewMI;
+}
+
 MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,
                                                  ArrayRef<unsigned> Ops, int FI,
                                                  LiveIntervals *LIS,
@@ -612,6 +670,8 @@ MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,
     NewMI = foldPatchpoint(MF, MI, Ops, FI, *this);
     if (NewMI)
       MBB->insert(MI, NewMI);
+  } else if (MI.isInlineAsm()) {
+    NewMI = foldInlineAsmMemOperand(MI, Ops, FI, *this);
   } else {
     // Ask the target to do the actual folding.
     NewMI = foldMemoryOperandImpl(MF, MI, Ops, MI, FI, LIS, VRM);
@@ -683,6 +743,8 @@ MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,
     NewMI = foldPatchpoint(MF, MI, Ops, FrameIndex, *this);
     if (NewMI)
       NewMI = &*MBB.insert(MI, NewMI);
+  } else if (MI.isInlineAsm() && isLoadFromStackSlot(LoadMI, FrameIndex)) {
+    NewMI = foldInlineAsmMemOperand(MI, Ops, FrameIndex, *this);
   } else {
     // Ask the target to do the actual folding.
     NewMI = foldMemoryOperandImpl(MF, MI, Ops, MI, LoadMI, LIS);
@@ -1639,6 +1701,10 @@ std::string TargetInstrInfo::createMIROperandComment(
   if (F.isUseOperandTiedToDef(TiedTo))
     OS << " tiedto:$" << TiedTo;
 
+  if ((F.isRegDefKind() || F.isRegDefEarlyClobberKind() || F.isRegUseKind()) &&
+      F.getRegMayBeSpilled())
+    OS << " spillable";
+
   return OS.str();
 }
 
diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index 4c6854da0ada3d2..2e17cd8ac88ff4a 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -10333,5 +10333,15 @@ void X86InstrInfo::genAlternativeCodeSequence(
   }
 }
 
+// See also: X86DAGToDAGISel::SelectInlineAsmMemoryOperand().
+void X86InstrInfo::getFrameIndexOperands(SmallVectorImpl<MachineOperand> &Ops) const {
+  Ops.append({
+    MachineOperand::CreateImm(1),        // Scale
+    MachineOperand::CreateReg(0, false), // Index
+    MachineOperand::CreateImm(0),        // Disp
+    MachineOperand::CreateReg(0, false), // Segment
+  });
+}
+
 #define GET_INSTRINFO_HELPERS
 #include "X86GenInstrInfo.inc"
diff --git a/llvm/lib/Target/X86/X86InstrInfo.h b/llvm/lib/Target/X86/X86InstrInfo.h
index e1199e20c318e24..b08bc66d5b72c19 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.h
+++ b/llvm/lib/Target/X86/X86InstrInfo.h
@@ -648,6 +648,8 @@ class X86InstrInfo final : public X86GenInstrInfo {
     return false;
   }
 
+  void getFrameIndexOperands(SmallVectorImpl<MachineOperand> &Ops) const override;
+
 private:
   /// This is a helper for convertToThreeAddress for 8 and 16-bit instructions.
   /// We use 32-bit LEA to form 3-address code by promoting to a 32-bit
diff --git a/llvm/test/CodeGen/X86/inline-asm-rm-exhaustion.mir b/llvm/test/CodeGen/X86/inline-asm-rm-exhaustion.mir
new file mode 100644
index 000000000000000..97f0b561bba1c49
--- /dev/null
+++ b/llvm/test/CodeGen/X86/inline-asm-rm-exhaustion.mir
@@ -0,0 +1,232 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 3
+# RUN: llc -start-after=finalize-isel -regalloc=greedy -stop-after=greedy \
+# RUN:   -verify-machineinstrs -verify-regalloc %s -o - | FileCheck %s
+--- |
+  target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i128:128-f64:32:64-f80:32-n8:16:32-S128"
+  target triple = "i386-unknown-linux-gnu"
+
+  define void @input(i32 %0) #0 {
+    call void asm "# $0", "rm,~{ax},~{cx},~{dx},~{si},~{di},~{bx},~{bp}"(i32 %0)
+    ret void
+  }
+
+  define i32 @output() #0 {
+    %1 = alloca i32, align 4
+    call void asm "# $0", "=*rm,~{ax},~{cx},~{dx},~{si},~{di},~{bx},~{bp}"(ptr nonnull elementtype(i32) %1)
+    %2 = load i32, ptr %1, align 4
+    ret i32 %2
+  }
+
+  define i32 @inout(i32 %0) #0 {
+    %2 = alloca i32, align 4
+    store i32 %0, ptr %2, align 4
+    call void asm "# $0 $1", "=*rm,0,~{ax},~{cx},~{dx},~{si},~{di},~{bx},~{bp}"(ptr nonnull elementtype(i32) %2, i32 %0)
+    %3 = load i32, ptr %2, align 4
+    ret i32 %3
+  }
+
+  attributes #0 = { nounwind }
+
+...
+---
+name:            input
+alignment:       16
+exposesReturnsTwice: false
+legalized:       false
+regBankSelected: false
+selected:        false
+failedISel:      false
+tracksRegLiveness: true
+hasWinCFI:       false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHCatchret:   false
+hasEHScopes:     false
+hasEHFunclets:   false
+isOutlined:      false
+debugInstrRef:   false
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+  - { id: 0, class: gr32, preferred-register: '' }
+liveins:         []
+frameInfo:
+  isFrameAddressTaken: false
+  isReturnAddressTaken: false
+  hasStackMap:     false
+  hasPatchPoint:   false
+  stackSize:       0
+  offsetAdjustment: 0
+  maxAlignment:    4
+  adjustsStack:    false
+  hasCalls:        false
+  stackProtector:  ''
+  functionContext: ''
+  maxCallFrameSize: 4294967295
+  cvBytesOfCalleeSavedRegisters: 0
+  hasOpaqueSPAdjustment: false
+  hasVAStart:      false
+  hasMustTailInVarArgFunc: false
+  hasTailCall:     false
+  localFrameSize:  0
+  savePoint:       ''
+  restorePoint:    ''
+fixedStack:
+  - { id: 0, type: default, offset: 0, size: 4, alignment: 16, stack-id: default,
+      isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
+      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
+stack:           []
+entry_values:    []
+callSites:       []
+debugValueSubstitutions: []
+constants:       []
+machineFunctionInfo: {}
+body:             |
+  bb.0 (%ir-block.1):
+    ; CHECK-LABEL: name: input
+    ; CHECK: INLINEASM &"# $0", 8 /* mayload attdialect */, 262190 /* mem:m */, %fixed-stack.0, 1, $noreg, 0, $noreg, 12 /* clobber */, implicit-def dead early-clobber $ax, 12 /* clobber */, implicit-def dead early-clobber $cx, 12 /* clobber */, implicit-def dead early-clobber $dx, 12 /* clobber */, implicit-def dead early-clobber $si, 12 /* clobber */, implicit-def dead early-clobber $di, 12 /* clobber */, implicit-def dead early-clobber $bx, 12 /* clobber */, implicit-def dead early-clobber $bp :: (load (s32) from %fixed-stack.0, align 16)
+    ; CHECK-NEXT: RET 0
+    %0:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0, align 16)
+    INLINEASM &"# $0", 0 /* attdialect */, 1076101129 /* reguse:GR32 spillable */, %0, 12 /* clobber */, implicit-def early-clobber $ax, 12 /* clobber */, implicit-def early-clobber $cx, 12 /* clobber */, implicit-def early-clobber $dx, 12 /* clobber */, implicit-def early-clobber $si, 12 /* clobber */, implicit-def early-clobber $di, 12 /* clobber */, implicit-def early-clobber $bx, 12 /* clobber */, implicit-def early-clobber $bp
+    RET 0
+
+...
+---
+name:            output
+alignment:       16
+exposesReturnsTwice: false
+legalized:       false
+regBankSelected: false
+selected:        false
+failedISel:      false
+tracksRegLiveness: true
+hasWinCFI:       false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHCatchret:   false
+hasEHScopes:     false
+hasEHFunclets:   false
+isOutlined:      false
+debugInstrRef:   false
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+  - { id: 0, class: gr32, preferred-register: '' }
+liveins:         []
+frameInfo:
+  isFrameAddressTaken: false
+  isReturnAddressTaken: false
+  hasStackMap:     false
+  hasPatchPoint:   false
+  stackSize:       0
+  offsetAdjustment: 0
+  maxAlignment:    4
+  adjustsStack:    false
+  hasCalls:        false
+  stackProtector:  ''
+  functionContext: ''
+  maxCallFrameSize: 4294967295
+  cvBytesOfCalleeSavedRegisters: 0
+  hasOpaqueSPAdjustment: false
+  hasVAStart:      false
+  hasMustTailInVarArgFunc: false
+  hasTailCall:     false
+  localFrameSize:  0
+  savePoint:       ''
+  restorePoint:    ''
+fixedStack:      []
+stack:
+  - { id: 0, name: '', type: default, offset: 0, size: 4, alignment: 4,
+      stack-id: default, callee-saved-register: '', callee-saved-restored: true,
+      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
+entry_values:    []
+callSites:       []
+debugValueSubstitutions: []
+constants:       []
+machineFunctionInfo: {}
+body:             |
+  bb.0 (%ir-block.0):
+    ; CHECK-LABEL: name: output
+    ; CHECK: INLINEASM &"# $0", 16 /* maystore attdialect */, 262190 /* mem:m */, %stack.1, 1, $noreg, 0, $noreg, 12 /* clobber */, implicit-def dead early-clobber $ax, 12 /* clobber */, implicit-def dead early-clobber $cx, 12 /* clobber */, implicit-def dead early-clobber $dx, 12 /* clobber */, implicit-def dead early-clobber $si, 12 /* clobber */, implicit-def dead early-clobber $di, 12 /* clobber */, implicit-def dead early-clobber $bx, 12 /* clobber */, implicit-def dead early-clobber $bp :: (store (s32) into %stack.1)
+    ; CHECK-NEXT: [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %stack.1, 1, $noreg, 0, $noreg :: (load (s32) from %stack.1)
+    ; CHECK-NEXT: MOV32mr %stack.0, 1, $noreg, 0, $noreg, [[MOV32rm]] :: (store (s32) into %ir.1)
+    ; CHECK-NEXT: $eax = COPY [[MOV32rm]]
+    ; CHECK-NEXT: RET 0, $eax
+    INLINEASM &"# $0", 0 /* attdialect */, 1076101130 /* regdef:GR32 spillable */, def %0, 12 /* clobber */, implicit-def early-clobber $ax, 12 /* clobber */, implicit-def early-clobber $cx, 12 /* clobber */, implicit-def early-clobber $dx, 12 /* clobber */, implicit-def early-clobber $si, 12 /* clobber */, implicit-def early-clobber $di, 12 /* clobber */, implicit-def early-clobber $bx, 12 /* clobber */, implicit-def early-clobber $bp
+    MOV32mr %stack.0, 1, $noreg, 0, $noreg, %0 :: (store (s32) into %ir.1)
+    $eax = COPY %0
+    RET 0, $eax
+
+...
+---
+name:            inout
+alignment:       16
+exposesReturnsTwice: false
+legalized:       false
+regBankSelected: false
+selected:        false
+failedISel:      false
+tracksRegLiveness: true
+hasWinCFI:       false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHCatchret:   false
+hasEHScopes:     false
+hasEHFunclets:   false
+isOutlined:      false
+debugInstrRef:   false
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+  - { id: 0, class: gr32, preferred-register: '' }
+  - { id: 1, class: gr32, preferred-register: '' }
+liveins:         []
+frameInfo:
+  isFrameAddressTaken: false
+  isReturnAddressTaken: false
+  hasStackMap:     false
+  hasPatchPoint:   false
+  stackSize:       0
+  offsetAdjustment: 0
+  ma...
[truncated]

nickdesaulniers · 2023-10-31T17:16:23Z

Note to reviewers: this is a stacked commit which depends on:

github-actions · 2023-10-31T17:24:57Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/test/CodeGen/X86/inline-asm-rm-exhaustion.mir

MatzeB · 2023-11-03T17:05:01Z

llvm/lib/Target/X86/X86InstrInfo.cpp

I was wondering if this could/should be expressed with X86AddressMode.
X86AddressMode M; M.getFullAddress(Ops) seems to be very close to what this is doing. Except that this API here does not explicitely add a "base" (reg or frame index), but maybe it should start doing that? (So in foldInlineAsmMemOperand just remove the register operand (instead of using ChangeToFrameIndex) and have this function add the frame index operand as well. That would also help if a target has a different order of operands...

oh! yeah it does look potentially reusable.

Though the class seems very x86 specific. Would be nice if there was a target independent interface I could call from foldInlineAsmMemOperand (added in) https://github.com/llvm/llvm-project/pull/70743/files to get this info.

Let me check the users of X86AddressMode... worst case I could call it and drop the first operand.

and have this function add the frame index operand as well. That would also help if a target has a different order of operands...

I guess if X86AddressMode isn't a per-target interface, then in foldInlineAsmMemOperand I'd have to do some target specific hack, then we back to adding this X86InstrInfo::getFrameIndexOperands interface?

I guess if X86AddressMode isn't a per-target interface, then in foldInlineAsmMemOperand I'd have to do some target specific hack

example of what I roughly have in mind

diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp index 5ede36505b5b..bfbe477f97a3 100644 --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp @@ -578,13 +578,21 @@ static void foldInlineAsmMemOperand(MachineInstr *MI, unsigned OpNo, int FI, foldInlineAsmMemOperand(MI, TiedTo, FI, TII); } + SmallVector<MachineOperand, 4> NewOps; + MachineOperand *InsertPt; + if (isa<X86GenInstrInfo>(TII)) { + X86AddressMode M; + M.getFullAddress(NewOps); + InsertPt = MI->operands_begin() + OpNo; + } else { // Change the operand from a register to a frame index. - MO.ChangeToFrameIndex(FI, MO.getTargetFlags()); + MO.ChangeToFrameIndex(FI, MO.getTargetFlags()); + TII.getFrameIndexOperands(NewOps); + InsertPt = MI->operands_begin() + OpNo + 1; + } - SmallVector<MachineOperand, 4> NewOps; - TII.getFrameIndexOperands(NewOps); assert(!NewOps.empty() && "getFrameIndexOperands didn't create any operands"); - MI->insert(MI->operands_begin() + OpNo + 1, NewOps); + MI->insert(InsertPt, NewOps); // Change the previous operand to a MemKind InlineAsm::Flag. The second param // is the per-target number of operands that represent the memory operand

seems kind of ugly, no? Did you have a better usage of X86AddressMode in mind?

Addressing modes are highly target specific. I was just proposing the use of X86AddressMode for the implementation of X86InstrInfo::getFrameIndexOperands

Sorry for the excessive delays following up on this; have been traveling for conference season for the past two weeks. I've just not made the changes you've suggested, and I think I do indeed like it better. Will post a commit on top once my tests finish successfully locally.

done in 06c300b PTAL

This enables -regalloc=greedy to memfold spillable inline asm MachineOperands. Because no instruction selection framework marks MachineOperands as spillable, no language frontend can observe functional changes from this patch. That will change once instruction selection frameworks are updated.

- change X86InstrInfo::getFrameIndexOperands to use X86AddressMode

phoebewang · 2023-11-21T02:03:46Z

llvm/test/CodeGen/MIR/X86/inline-asm-rm-exhaustion.mir

+    ; CHECK-NEXT: [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %stack.1, 1, $noreg, 0, $noreg :: (load (s32) from %stack.1)
+    ; CHECK-NEXT: MOV32mr %stack.0, 1, $noreg, 0, $noreg, [[MOV32rm]] :: (store (s32) into %ir.1)


Should the target it to eliminate these instructions?

Yes, the load and store is redundant. The INLINEASM could have just stored to %stack.0, eliding the store to %stack.1, load from %stack.1, then store to %stack.0. The code gen when there is memory pressure is sub-optimal because of the redundant store/load.

But it is greedy that determines that a spill is necessary, and creates the slot to spill to (in this case %stack.1). The code I add is just told "use %stack.1."

Could we optimize this case later, yes, but I don't think that should be part of this commit.

I guess I'm curious why there isn't a later pass that determines that:

(store (s32) into %stack.1) (load (s32) from %stack.1) (store (s32) into %ir.1)

can't be collapsed into:

(store (s32) into %ir.1)

Is there a pass that does so, but maybe bails when one of the instructions is INLINEASM? Or is it too late to know that nothing else depends on that initial store?

phoebewang · 2023-11-21T02:11:07Z

llvm/test/CodeGen/MIR/X86/inline-asm-rm-exhaustion.mir

+    ; CHECK-LABEL: name: inout
+    ; CHECK: [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0, align 16)
+    ; CHECK-NEXT: MOV32mr %stack.0, 1, $noreg, 0, $noreg, [[MOV32rm]] :: (store (s32) into %stack.0)
+    ; CHECK-NEXT: INLINEASM &"# $0 $1", 24 /* mayload maystore attdialect */, 262190 /* mem:m */, %stack.0, 1, $noreg, 0, $noreg, 262190 /* mem:m */, %stack.0, 1, $noreg, 0, $noreg, 12 /* clobber */, implicit-def dead early-clobber $ax, 12 /* clobber */, implicit-def dead early-clobber $cx, 12 /* clobber */, implicit-def dead early-clobber $dx, 12 /* clobber */, implicit-def dead early-clobber $si, 12 /* clobber */, implicit-def dead early-clobber $di, 12 /* clobber */, implicit-def dead early-clobber $bx, 12 /* clobber */, implicit-def dead early-clobber $bp :: (store (s32) into %stack.0)


I wonder how the extra info like (load (s32) from %fixed-stack.0, align 16) etc. being kept in the inlineasm when we have more than one operands? I saw we only kept (store (s32) into %stack.0) here.
Besides, we don't have a mem to mem instrcution on X86. Will it be lowered into two instructions when both operands are mem?

I wonder how the extra info like (load (s32) from %fixed-stack.0, align 16) etc. being kept in the inlineasm when we have more than one operands? I saw we only kept (store (s32) into %stack.0) here.

Yeah, it is weird that for an input+output parameter it is printed as just a store (even though the extra info says both mayload, maystore. Let me do some debugging of the print method and see if I can give you a more precise answer FWIW.

Besides, we don't have a mem to mem instrcution on X86. Will it be lowered into two instructions when both operands are mem?

heh, we just had that exact discussion on the parent patch to this one.

In that case, the programmer was wrong to use "rm" for more than one operand, and should have used "r" (regardless if they are using clang or not, with or without my patches). Also, inline asm may contain multiple instructions, where the inputs/outputs to the inline asm are used as operands for different instructions, not the same instruction which would expose them to the issue you allude to if there was register pressure.

Let me do some debugging of the print method and see if I can give you a more precise answer FWIW.

I believe it's a loop in MachineInstr::print that's iterating the memoperands() of the MachineInstr and calling print on them.

I'm not super familiar with the MachineMemOperand class. Let me triple check whether that loop is indeed the one printing the load/store, and if so, do I need to modify the memoperands() of the MachineInstr (beyond setting the extra info which is very specific to INLINEASM).

I believe it's a loop in MachineInstr::print that's iterating the memoperands() of the MachineInstr and calling print on them.

Ah, close, but it's the other mir printer: MIPrinter::print. So I wonder if I'm missing setting a memoperands() when doing the fold...let me look into that.

Perhaps I need to add calls to MachineInstr::addMemOperand...

oh! the caller TargetInstrInfo::foldMemoryOperand IS calling MachineInstr::addMemOperand, but IIUC it's only doing so once, which is a problem for in/out parameters since technically there's 2 distinct operands, let me work on fixing that up.

fixed in e553a37 , PTAL

nickdesaulniers · 2023-11-27T17:45:47Z

bumping for review 🦃

phoebewang

LGTM with one question.

phoebewang · 2023-11-28T12:00:11Z

llvm/lib/CodeGen/TargetInstrInfo.cpp

-  SmallVector<MachineOperand, 4> NewOps;
-  TII.getFrameIndexOperands(NewOps);
+  SmallVector<MachineOperand, 5> NewOps;
+  TII.getFrameIndexOperands(NewOps, FI);


The old behavoir is to change to frame index. Does it mean we need to call this method in TargetInstrInfo::getFrameIndexOperands for other targets?

Yes, and the previous behavior did already call TargetInstrInfo::getFrameIndexOperands. Instead of MachineOperand::ChangeToFrameIndex + TargetInstrInfo::getFrameIndexOperands, this change instead does TargetInstrInfo::getFrameIndexOperands (which now creates the MachineOperand we were changing) then MachineInstr::removeOperand (to remove the operand we were previously changing).

I have additional child commits locally for supporting other targets. I need to rebase them onto this change, which shouldn't be a problem; basically getFrameIndexOperands just needs to set up 3 MachineOperands for most targets but 5 MachineOperands for x86. Even then, I've realized that we only need to support x86 and m68k.

In commit b053359 ("[X86InstrInfo] support memfold on spillable inline asm (llvm#70832)"), I had a last minute fix to update the memoperands. I originally did this in the parent foldInlineAsmMemOperand call, updated the mir test via update_mir_test_checks.py, but then decided to move it to the child call of foldInlineAsmMemOperand. But I forgot to rerun update_mir_test_checks.py. That last minute change caused the same memoperand to be added twice when recursion occurred (for tied operands). I happened to get lucky that trailing content omitted from the CHECK line doesn't result in test failure. But rerunning update_mir_test_checks.py on the mir test added in that commit produces updated output. This is resulting in updates to the test that: 1. conflate additions to the test in child commits with simply updating the test as it should have been when first committed. 2. look wrong because the same memoperand is specified twice (we don't deduplicate memoperands when added). Example: INLINEASM ... :: (load (s32) from %stack.0) (load (s32) from %stack.0) Fix the bug, so that in child commits, we don't have additional unrelated test changes (which would be wrong anyways) from simply running update_mir_test_checks.py. Link: llvm#20571

In commit b053359 ("[X86InstrInfo] support memfold on spillable inline asm (#70832)"), I had a last minute fix to update the memoperands. I originally did this in the parent foldInlineAsmMemOperand call, updated the mir test via update_mir_test_checks.py, but then decided to move it to the child call of foldInlineAsmMemOperand. But I forgot to rerun update_mir_test_checks.py. That last minute change caused the same memoperand to be added twice when recursion occurred (for tied operands). I happened to get lucky that trailing content omitted from the CHECK line doesn't result in test failure. But rerunning update_mir_test_checks.py on the mir test added in that commit produces updated output. This is resulting in updates to the test that: 1. conflate additions to the test in child commits with simply updating the test as it should have been when first committed. 2. look wrong because the same memoperand is specified twice (we don't deduplicate memoperands when added). Example: INLINEASM ... :: (load (s32) from %stack.0) (load (s32) from %stack.0) Fix the bug, so that in child commits, we don't have additional unrelated test changes (which would be wrong anyways) from simply running update_mir_test_checks.py. Link: #20571

nickdesaulniers changed the title ~~asm rm core4~~ [X86InstrInfo] support memfold on spillable inline asm Oct 31, 2023

llvmbot added backend:X86 llvm:regalloc llvm:ir labels Oct 31, 2023

nickdesaulniers marked this pull request as draft October 31, 2023 17:15

nickdesaulniers commented Nov 1, 2023

View reviewed changes

llvm/test/CodeGen/X86/inline-asm-rm-exhaustion.mir Outdated Show resolved Hide resolved

MatzeB reviewed Nov 3, 2023

View reviewed changes

nickdesaulniers force-pushed the asm_rm_core4 branch 2 times, most recently from 6c756cd to 5617e18 Compare November 3, 2023 20:40

nickdesaulniers mentioned this pull request Nov 9, 2023

[TargetInstrInfo] enable foldMemoryOperand for InlineAsm #70743

Merged

nickdesaulniers added 3 commits November 17, 2023 13:48

move test

582160a

- change getFrameIndexOperands to take the frame index

06c300b

- change X86InstrInfo::getFrameIndexOperands to use X86AddressMode

nickdesaulniers force-pushed the asm_rm_core4 branch from 5617e18 to 06c300b Compare November 20, 2023 22:05

nickdesaulniers marked this pull request as ready for review November 20, 2023 22:06

clang format

4959f90

nickdesaulniers requested review from efriedma-quic, jyknight, phoebewang and qcolombet November 20, 2023 22:09

phoebewang reviewed Nov 21, 2023

View reviewed changes

nickdesaulniers requested a review from arsenm November 21, 2023 22:46

nickdesaulniers marked this pull request as draft November 21, 2023 23:12

Set MemOperands on newly created INLINEASM

e553a37

nickdesaulniers marked this pull request as ready for review November 22, 2023 21:56

RKSimon added the inline-asm label Nov 27, 2023

phoebewang approved these changes Nov 28, 2023

View reviewed changes

nickdesaulniers merged commit b053359 into llvm:main Nov 29, 2023

nickdesaulniers deleted the asm_rm_core4 branch November 29, 2023 16:19

nickdesaulniers mentioned this pull request Dec 1, 2023

[TargetInstrInfo] update INLINEASM memoperands once #74135

Merged

		; CHECK-NEXT: [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %stack.1, 1, $noreg, 0, $noreg :: (load (s32) from %stack.1)
		; CHECK-NEXT: MOV32mr %stack.0, 1, $noreg, 0, $noreg, [[MOV32rm]] :: (store (s32) into %ir.1)

[X86InstrInfo] support memfold on spillable inline asm #70832

[X86InstrInfo] support memfold on spillable inline asm #70832

Uh oh!

Conversation

nickdesaulniers commented Oct 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nickdesaulniers commented Oct 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nickdesaulniers Nov 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nickdesaulniers Nov 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MatzeB Nov 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nickdesaulniers Nov 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nickdesaulniers commented Nov 27, 2023

Uh oh!

phoebewang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nickdesaulniers commented Oct 31, 2023 •

edited

Loading

llvmbot commented Oct 31, 2023 •

edited

Loading

nickdesaulniers commented Oct 31, 2023 •

edited

Loading

github-actions bot commented Oct 31, 2023 •

edited

Loading

nickdesaulniers Nov 3, 2023 •

edited

Loading

nickdesaulniers Nov 3, 2023 •

edited

Loading

MatzeB Nov 3, 2023 •

edited

Loading

nickdesaulniers Nov 21, 2023 •

edited

Loading