Skip to content

Conversation

@nickdesaulniers
Copy link
Member

@nickdesaulniers nickdesaulniers commented Oct 31, 2023

This enables -regalloc=greedy to memfold spillable inline asm
MachineOperands.

Because no instruction selection framework marks MachineOperands as
spillable, no language frontend can observe functional changes from this
patch. That will change once instruction selection frameworks are
updated.

Link: #20571

@nickdesaulniers nickdesaulniers changed the title asm rm core4 [X86InstrInfo] support memfold on spillable inline asm Oct 31, 2023
@nickdesaulniers nickdesaulniers marked this pull request as draft October 31, 2023 17:15
@llvmbot
Copy link
Member

llvmbot commented Oct 31, 2023

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-regalloc

Author: Nick Desaulniers (nickdesaulniers)

Changes

[X86InstrInfo] support memfold on spillable inline asm

This enables -regalloc=greedy to memfold spillable inline asm
MachineOperands.

Because no instruction selection framework marks MachineOperands as
spillable, no language frontend can observe functional changes from this
patch. That will change once instruction selection frameworks are
updated.


Patch is 22.68 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/70832.diff

9 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/MachineInstr.h (+3)
  • (modified) llvm/include/llvm/CodeGen/TargetInstrInfo.h (+10)
  • (modified) llvm/include/llvm/IR/InlineAsm.h (+30-5)
  • (modified) llvm/lib/CodeGen/CalcSpillWeights.cpp (+12-1)
  • (modified) llvm/lib/CodeGen/MachineInstr.cpp (+23)
  • (modified) llvm/lib/CodeGen/TargetInstrInfo.cpp (+66)
  • (modified) llvm/lib/Target/X86/X86InstrInfo.cpp (+10)
  • (modified) llvm/lib/Target/X86/X86InstrInfo.h (+2)
  • (added) llvm/test/CodeGen/X86/inline-asm-rm-exhaustion.mir (+232)
diff --git a/llvm/include/llvm/CodeGen/MachineInstr.h b/llvm/include/llvm/CodeGen/MachineInstr.h
index 4877f43e8578d1c..93e8ff389d65673 100644
--- a/llvm/include/llvm/CodeGen/MachineInstr.h
+++ b/llvm/include/llvm/CodeGen/MachineInstr.h
@@ -1364,6 +1364,9 @@ class MachineInstr
     return getOpcode() == TargetOpcode::INLINEASM ||
            getOpcode() == TargetOpcode::INLINEASM_BR;
   }
+  /// Returns true if the memory operand can be folded. Does so by checking the
+  /// InlineAsm::Flag immediate operand at OpId - 1.
+  bool mayFoldInlineAsmMemOp(unsigned OpId) const;
 
   bool isStackAligningInlineAsm() const;
   InlineAsm::AsmDialect getInlineAsmDialect() const;
diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
index 8e7499ac626a747..ba98f52c27bbd2e 100644
--- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
@@ -2186,6 +2186,16 @@ class TargetInstrInfo : public MCInstrInfo {
   // Get the call frame size just before MI.
   unsigned getCallFrameSizeAt(MachineInstr &MI) const;
 
+  /// Fills in the necessary MachineOperands to refer to a frame index.
+  /// The best way to understand this is to print `asm(""::"m"(x));` after
+  /// finalize-isel. Example:
+  /// INLINEASM ... 262190 /* mem:m */, %stack.0.x.addr, 1, $noreg, 0, $noreg ...
+  /// we would add placeholders for:                     ^  ^       ^  ^
+  virtual void
+  getFrameIndexOperands(SmallVectorImpl<MachineOperand> &Ops) const {
+    llvm_unreachable("unknown number of operands necessary");
+  }
+
 private:
   mutable std::unique_ptr<MIRFormatter> Formatter;
   unsigned CallFrameSetupOpcode, CallFrameDestroyOpcode;
diff --git a/llvm/include/llvm/IR/InlineAsm.h b/llvm/include/llvm/IR/InlineAsm.h
index 969ad42816a7e52..2d395a53608b0b7 100644
--- a/llvm/include/llvm/IR/InlineAsm.h
+++ b/llvm/include/llvm/IR/InlineAsm.h
@@ -291,18 +291,23 @@ class InlineAsm final : public Value {
   //     Bits 30-16 - A ConstraintCode:: value indicating the original
   //                  constraint code. (MemConstraintCode)
   //   Else:
-  //     Bits 30-16 - The register class ID to use for the operand. (RegClass)
+  //     Bits 29-16 - The register class ID to use for the operand. (RegClass)
+  //     Bit  30    - If the register is permitted to be spilled.
+  //                  (RegMayBeSpilled)
+  //                  Defaults to false "r", may be set for constraints like
+  //                  "rm" (or "g").
   //
-  //   As such, MatchedOperandNo, MemConstraintCode, and RegClass are views of
-  //   the same slice of bits, but are mutually exclusive depending on the
-  //   fields IsMatched then KindField.
+  //   As such, MatchedOperandNo, MemConstraintCode, and
+  //   (RegClass+RegMayBeSpilled) are views of the same slice of bits, but are
+  //   mutually exclusive depending on the fields IsMatched then KindField.
   class Flag {
     uint32_t Storage;
     using KindField = Bitfield::Element<Kind, 0, 3, Kind::Func>;
     using NumOperands = Bitfield::Element<unsigned, 3, 13>;
     using MatchedOperandNo = Bitfield::Element<unsigned, 16, 15>;
     using MemConstraintCode = Bitfield::Element<ConstraintCode, 16, 15, ConstraintCode::Max>;
-    using RegClass = Bitfield::Element<unsigned, 16, 15>;
+    using RegClass = Bitfield::Element<unsigned, 16, 14>;
+    using RegMayBeSpilled = Bitfield::Element<bool, 30, 1>;
     using IsMatched = Bitfield::Element<bool, 31, 1>;
 
 
@@ -413,6 +418,26 @@ class InlineAsm final : public Value {
              "Flag is not a memory or function constraint!");
       Bitfield::set<MemConstraintCode>(Storage, ConstraintCode::Unknown);
     }
+
+    /// Set a bit to denote that while this operand is some kind of register
+    /// (use, def, ...), a memory flag did appear in the original constraint
+    /// list.  This is set by the instruction selection framework, and consumed
+    /// by the register allocator. While the register allocator is generally
+    /// responsible for spilling registers, we need to be able to distinguish
+    /// between registers that the register allocator has permission to spill
+    /// ("rm") vs ones it does not ("r"). This is because the inline asm may use
+    /// instructions which don't support memory addressing modes for that
+    /// operand.
+    void setRegMayBeSpilled(bool B) {
+      assert((isRegDefKind() || isRegDefEarlyClobberKind() || isRegUseKind()) &&
+             "Must be reg");
+      Bitfield::set<RegMayBeSpilled>(Storage, B);
+    }
+    bool getRegMayBeSpilled() const {
+      assert((isRegDefKind() || isRegDefEarlyClobberKind() || isRegUseKind()) &&
+             "Must be reg");
+      return Bitfield::get<RegMayBeSpilled>(Storage);
+    }
   };
 
   static std::vector<StringRef> getExtraInfoNames(unsigned ExtraInfo) {
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index 6e98e2384ef975f..f446e11427e75d4 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -146,6 +146,17 @@ void VirtRegAuxInfo::calculateSpillWeightAndHint(LiveInterval &LI) {
   LI.setWeight(Weight);
 }
 
+static bool canMemFoldInlineAsm(LiveInterval &LI,
+                                const MachineRegisterInfo &MRI) {
+  for (const MachineOperand &MO : MRI.reg_operands(LI.reg())) {
+    const MachineInstr *MI = MO.getParent();
+    if (MI->isInlineAsm() && MI->mayFoldInlineAsmMemOp(MI->getOperandNo(&MO)))
+      return true;
+  }
+
+  return false;
+}
+
 float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
                                        SlotIndex *End) {
   MachineRegisterInfo &MRI = MF.getRegInfo();
@@ -315,7 +326,7 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
   // into instruction itself makes perfect sense.
   if (ShouldUpdateLI && LI.isZeroLength(LIS.getSlotIndexes()) &&
       !LI.isLiveAtIndexes(LIS.getRegMaskSlots()) &&
-      !isLiveAtStatepointVarArg(LI)) {
+      !isLiveAtStatepointVarArg(LI) && !canMemFoldInlineAsm(LI, MRI)) {
     LI.markNotSpillable();
     return -1.0;
   }
diff --git a/llvm/lib/CodeGen/MachineInstr.cpp b/llvm/lib/CodeGen/MachineInstr.cpp
index 048563cc2bcc4e4..92c789e85a205b4 100644
--- a/llvm/lib/CodeGen/MachineInstr.cpp
+++ b/llvm/lib/CodeGen/MachineInstr.cpp
@@ -1792,6 +1792,12 @@ void MachineInstr::print(raw_ostream &OS, ModuleSlotTracker &MST,
       if (F.isUseOperandTiedToDef(TiedTo))
         OS << " tiedto:$" << TiedTo;
 
+      if ((F.isRegDefKind() || F.isRegDefEarlyClobberKind() ||
+           F.isRegUseKind()) &&
+          F.getRegMayBeSpilled()) {
+        OS << " spillable";
+      }
+
       OS << ']';
 
       // Compute the index of the next operand descriptor.
@@ -2526,3 +2532,20 @@ void MachineInstr::insert(mop_iterator InsertBefore,
     tieOperands(Tie1, Tie2);
   }
 }
+
+bool MachineInstr::mayFoldInlineAsmMemOp(unsigned OpId) const {
+  assert(OpId && "expected non-zero operand id");
+  assert(isInlineAsm() && "should only be used on inline asm");
+
+  if (!getOperand(OpId).isReg())
+    return false;
+
+  const MachineOperand &MD = getOperand(OpId - 1);
+  if (!MD.isImm())
+    return false;
+
+  InlineAsm::Flag F(MD.getImm());
+  if (F.isRegUseKind() || F.isRegDefKind() || F.isRegDefEarlyClobberKind())
+    return F.getRegMayBeSpilled();
+  return false;
+}
diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp
index fe7efb73a2dce83..5252eeaadb2aeb2 100644
--- a/llvm/lib/CodeGen/TargetInstrInfo.cpp
+++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp
@@ -565,6 +565,64 @@ static MachineInstr *foldPatchpoint(MachineFunction &MF, MachineInstr &MI,
   return NewMI;
 }
 
+static void foldInlineAsmMemOperand(MachineInstr *MI, unsigned OpNo, int FI,
+                                    const TargetInstrInfo &TII) {
+  MachineOperand &MO = MI->getOperand(OpNo);
+  const VirtRegInfo &RI = AnalyzeVirtRegInBundle(*MI, MO.getReg());
+
+  // If the machine operand is tied, untie it first.
+  if (MO.isTied()) {
+    unsigned TiedTo = MI->findTiedOperandIdx(OpNo);
+    MI->untieRegOperand(OpNo);
+    // Intentional recursion!
+    foldInlineAsmMemOperand(MI, TiedTo, FI, TII);
+  }
+
+  // Change the operand from a register to a frame index.
+  MO.ChangeToFrameIndex(FI, MO.getTargetFlags());
+
+  SmallVector<MachineOperand, 4> NewOps;
+  TII.getFrameIndexOperands(NewOps);
+  assert(!NewOps.empty() && "getFrameIndexOperands didn't create any operands");
+  MI->insert(MI->operands_begin() + OpNo + 1, NewOps);
+
+  // Change the previous operand to a MemKind InlineAsm::Flag. The second param
+  // is the per-target number of operands that represent the memory operand
+  // excluding this one (MD). This includes MO.
+  InlineAsm::Flag F(InlineAsm::Kind::Mem, NewOps.size() + 1);
+  F.setMemConstraint(InlineAsm::ConstraintCode::m);
+  MachineOperand &MD = MI->getOperand(OpNo - 1);
+  MD.setImm(F);
+
+  // Update mayload/maystore metadata.
+  MachineOperand &ExtraMO = MI->getOperand(InlineAsm::MIOp_ExtraInfo);
+  if (RI.Reads)
+    ExtraMO.setImm(ExtraMO.getImm() | InlineAsm::Extra_MayLoad);
+  if (RI.Writes)
+    ExtraMO.setImm(ExtraMO.getImm() | InlineAsm::Extra_MayStore);
+}
+
+// Returns nullptr if not possible to fold.
+static MachineInstr *foldInlineAsmMemOperand(MachineInstr &MI,
+                                             ArrayRef<unsigned> Ops, int FI,
+                                             const TargetInstrInfo &TII) {
+  assert(MI.isInlineAsm() && "wrong opcode");
+  if (Ops.size() > 1)
+    return nullptr;
+  unsigned Op = Ops[0];
+  assert(Op && "should never be first operand");
+  assert(MI.getOperand(Op).isReg() && "shouldn't be folding non-reg operands");
+
+  if (!MI.mayFoldInlineAsmMemOp(Op))
+    return nullptr;
+
+  MachineInstr &NewMI = TII.duplicate(*MI.getParent(), MI.getIterator(), MI);
+
+  foldInlineAsmMemOperand(&NewMI, Op, FI, TII);
+
+  return &NewMI;
+}
+
 MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,
                                                  ArrayRef<unsigned> Ops, int FI,
                                                  LiveIntervals *LIS,
@@ -612,6 +670,8 @@ MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,
     NewMI = foldPatchpoint(MF, MI, Ops, FI, *this);
     if (NewMI)
       MBB->insert(MI, NewMI);
+  } else if (MI.isInlineAsm()) {
+    NewMI = foldInlineAsmMemOperand(MI, Ops, FI, *this);
   } else {
     // Ask the target to do the actual folding.
     NewMI = foldMemoryOperandImpl(MF, MI, Ops, MI, FI, LIS, VRM);
@@ -683,6 +743,8 @@ MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,
     NewMI = foldPatchpoint(MF, MI, Ops, FrameIndex, *this);
     if (NewMI)
       NewMI = &*MBB.insert(MI, NewMI);
+  } else if (MI.isInlineAsm() && isLoadFromStackSlot(LoadMI, FrameIndex)) {
+    NewMI = foldInlineAsmMemOperand(MI, Ops, FrameIndex, *this);
   } else {
     // Ask the target to do the actual folding.
     NewMI = foldMemoryOperandImpl(MF, MI, Ops, MI, LoadMI, LIS);
@@ -1639,6 +1701,10 @@ std::string TargetInstrInfo::createMIROperandComment(
   if (F.isUseOperandTiedToDef(TiedTo))
     OS << " tiedto:$" << TiedTo;
 
+  if ((F.isRegDefKind() || F.isRegDefEarlyClobberKind() || F.isRegUseKind()) &&
+      F.getRegMayBeSpilled())
+    OS << " spillable";
+
   return OS.str();
 }
 
diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index 4c6854da0ada3d2..2e17cd8ac88ff4a 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -10333,5 +10333,15 @@ void X86InstrInfo::genAlternativeCodeSequence(
   }
 }
 
+// See also: X86DAGToDAGISel::SelectInlineAsmMemoryOperand().
+void X86InstrInfo::getFrameIndexOperands(SmallVectorImpl<MachineOperand> &Ops) const {
+  Ops.append({
+    MachineOperand::CreateImm(1),        // Scale
+    MachineOperand::CreateReg(0, false), // Index
+    MachineOperand::CreateImm(0),        // Disp
+    MachineOperand::CreateReg(0, false), // Segment
+  });
+}
+
 #define GET_INSTRINFO_HELPERS
 #include "X86GenInstrInfo.inc"
diff --git a/llvm/lib/Target/X86/X86InstrInfo.h b/llvm/lib/Target/X86/X86InstrInfo.h
index e1199e20c318e24..b08bc66d5b72c19 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.h
+++ b/llvm/lib/Target/X86/X86InstrInfo.h
@@ -648,6 +648,8 @@ class X86InstrInfo final : public X86GenInstrInfo {
     return false;
   }
 
+  void getFrameIndexOperands(SmallVectorImpl<MachineOperand> &Ops) const override;
+
 private:
   /// This is a helper for convertToThreeAddress for 8 and 16-bit instructions.
   /// We use 32-bit LEA to form 3-address code by promoting to a 32-bit
diff --git a/llvm/test/CodeGen/X86/inline-asm-rm-exhaustion.mir b/llvm/test/CodeGen/X86/inline-asm-rm-exhaustion.mir
new file mode 100644
index 000000000000000..97f0b561bba1c49
--- /dev/null
+++ b/llvm/test/CodeGen/X86/inline-asm-rm-exhaustion.mir
@@ -0,0 +1,232 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 3
+# RUN: llc -start-after=finalize-isel -regalloc=greedy -stop-after=greedy \
+# RUN:   -verify-machineinstrs -verify-regalloc %s -o - | FileCheck %s
+--- |
+  target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i128:128-f64:32:64-f80:32-n8:16:32-S128"
+  target triple = "i386-unknown-linux-gnu"
+
+  define void @input(i32 %0) #0 {
+    call void asm "# $0", "rm,~{ax},~{cx},~{dx},~{si},~{di},~{bx},~{bp}"(i32 %0)
+    ret void
+  }
+
+  define i32 @output() #0 {
+    %1 = alloca i32, align 4
+    call void asm "# $0", "=*rm,~{ax},~{cx},~{dx},~{si},~{di},~{bx},~{bp}"(ptr nonnull elementtype(i32) %1)
+    %2 = load i32, ptr %1, align 4
+    ret i32 %2
+  }
+
+  define i32 @inout(i32 %0) #0 {
+    %2 = alloca i32, align 4
+    store i32 %0, ptr %2, align 4
+    call void asm "# $0 $1", "=*rm,0,~{ax},~{cx},~{dx},~{si},~{di},~{bx},~{bp}"(ptr nonnull elementtype(i32) %2, i32 %0)
+    %3 = load i32, ptr %2, align 4
+    ret i32 %3
+  }
+
+  attributes #0 = { nounwind }
+
+...
+---
+name:            input
+alignment:       16
+exposesReturnsTwice: false
+legalized:       false
+regBankSelected: false
+selected:        false
+failedISel:      false
+tracksRegLiveness: true
+hasWinCFI:       false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHCatchret:   false
+hasEHScopes:     false
+hasEHFunclets:   false
+isOutlined:      false
+debugInstrRef:   false
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+  - { id: 0, class: gr32, preferred-register: '' }
+liveins:         []
+frameInfo:
+  isFrameAddressTaken: false
+  isReturnAddressTaken: false
+  hasStackMap:     false
+  hasPatchPoint:   false
+  stackSize:       0
+  offsetAdjustment: 0
+  maxAlignment:    4
+  adjustsStack:    false
+  hasCalls:        false
+  stackProtector:  ''
+  functionContext: ''
+  maxCallFrameSize: 4294967295
+  cvBytesOfCalleeSavedRegisters: 0
+  hasOpaqueSPAdjustment: false
+  hasVAStart:      false
+  hasMustTailInVarArgFunc: false
+  hasTailCall:     false
+  localFrameSize:  0
+  savePoint:       ''
+  restorePoint:    ''
+fixedStack:
+  - { id: 0, type: default, offset: 0, size: 4, alignment: 16, stack-id: default,
+      isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
+      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
+stack:           []
+entry_values:    []
+callSites:       []
+debugValueSubstitutions: []
+constants:       []
+machineFunctionInfo: {}
+body:             |
+  bb.0 (%ir-block.1):
+    ; CHECK-LABEL: name: input
+    ; CHECK: INLINEASM &"# $0", 8 /* mayload attdialect */, 262190 /* mem:m */, %fixed-stack.0, 1, $noreg, 0, $noreg, 12 /* clobber */, implicit-def dead early-clobber $ax, 12 /* clobber */, implicit-def dead early-clobber $cx, 12 /* clobber */, implicit-def dead early-clobber $dx, 12 /* clobber */, implicit-def dead early-clobber $si, 12 /* clobber */, implicit-def dead early-clobber $di, 12 /* clobber */, implicit-def dead early-clobber $bx, 12 /* clobber */, implicit-def dead early-clobber $bp :: (load (s32) from %fixed-stack.0, align 16)
+    ; CHECK-NEXT: RET 0
+    %0:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0, align 16)
+    INLINEASM &"# $0", 0 /* attdialect */, 1076101129 /* reguse:GR32 spillable */, %0, 12 /* clobber */, implicit-def early-clobber $ax, 12 /* clobber */, implicit-def early-clobber $cx, 12 /* clobber */, implicit-def early-clobber $dx, 12 /* clobber */, implicit-def early-clobber $si, 12 /* clobber */, implicit-def early-clobber $di, 12 /* clobber */, implicit-def early-clobber $bx, 12 /* clobber */, implicit-def early-clobber $bp
+    RET 0
+
+...
+---
+name:            output
+alignment:       16
+exposesReturnsTwice: false
+legalized:       false
+regBankSelected: false
+selected:        false
+failedISel:      false
+tracksRegLiveness: true
+hasWinCFI:       false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHCatchret:   false
+hasEHScopes:     false
+hasEHFunclets:   false
+isOutlined:      false
+debugInstrRef:   false
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+  - { id: 0, class: gr32, preferred-register: '' }
+liveins:         []
+frameInfo:
+  isFrameAddressTaken: false
+  isReturnAddressTaken: false
+  hasStackMap:     false
+  hasPatchPoint:   false
+  stackSize:       0
+  offsetAdjustment: 0
+  maxAlignment:    4
+  adjustsStack:    false
+  hasCalls:        false
+  stackProtector:  ''
+  functionContext: ''
+  maxCallFrameSize: 4294967295
+  cvBytesOfCalleeSavedRegisters: 0
+  hasOpaqueSPAdjustment: false
+  hasVAStart:      false
+  hasMustTailInVarArgFunc: false
+  hasTailCall:     false
+  localFrameSize:  0
+  savePoint:       ''
+  restorePoint:    ''
+fixedStack:      []
+stack:
+  - { id: 0, name: '', type: default, offset: 0, size: 4, alignment: 4,
+      stack-id: default, callee-saved-register: '', callee-saved-restored: true,
+      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
+entry_values:    []
+callSites:       []
+debugValueSubstitutions: []
+constants:       []
+machineFunctionInfo: {}
+body:             |
+  bb.0 (%ir-block.0):
+    ; CHECK-LABEL: name: output
+    ; CHECK: INLINEASM &"# $0", 16 /* maystore attdialect */, 262190 /* mem:m */, %stack.1, 1, $noreg, 0, $noreg, 12 /* clobber */, implicit-def dead early-clobber $ax, 12 /* clobber */, implicit-def dead early-clobber $cx, 12 /* clobber */, implicit-def dead early-clobber $dx, 12 /* clobber */, implicit-def dead early-clobber $si, 12 /* clobber */, implicit-def dead early-clobber $di, 12 /* clobber */, implicit-def dead early-clobber $bx, 12 /* clobber */, implicit-def dead early-clobber $bp :: (store (s32) into %stack.1)
+    ; CHECK-NEXT: [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %stack.1, 1, $noreg, 0, $noreg :: (load (s32) from %stack.1)
+    ; CHECK-NEXT: MOV32mr %stack.0, 1, $noreg, 0, $noreg, [[MOV32rm]] :: (store (s32) into %ir.1)
+    ; CHECK-NEXT: $eax = COPY [[MOV32rm]]
+    ; CHECK-NEXT: RET 0, $eax
+    INLINEASM &"# $0", 0 /* attdialect */, 1076101130 /* regdef:GR32 spillable */, def %0, 12 /* clobber */, implicit-def early-clobber $ax, 12 /* clobber */, implicit-def early-clobber $cx, 12 /* clobber */, implicit-def early-clobber $dx, 12 /* clobber */, implicit-def early-clobber $si, 12 /* clobber */, implicit-def early-clobber $di, 12 /* clobber */, implicit-def early-clobber $bx, 12 /* clobber */, implicit-def early-clobber $bp
+    MOV32mr %stack.0, 1, $noreg, 0, $noreg, %0 :: (store (s32) into %ir.1)
+    $eax = COPY %0
+    RET 0, $eax
+
+...
+---
+name:            inout
+alignment:       16
+exposesReturnsTwice: false
+legalized:       false
+regBankSelected: false
+selected:        false
+failedISel:      false
+tracksRegLiveness: true
+hasWinCFI:       false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHCatchret:   false
+hasEHScopes:     false
+hasEHFunclets:   false
+isOutlined:      false
+debugInstrRef:   false
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+  - { id: 0, class: gr32, preferred-register: '' }
+  - { id: 1, class: gr32, preferred-register: '' }
+liveins:         []
+frameInfo:
+  isFrameAddressTaken: false
+  isReturnAddressTaken: false
+  hasStackMap:     false
+  hasPatchPoint:   false
+  stackSize:       0
+  offsetAdjustment: 0
+  ma...
[truncated]

@nickdesaulniers
Copy link
Member Author

nickdesaulniers commented Oct 31, 2023

@github-actions
Copy link

github-actions bot commented Oct 31, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

Comment on lines 10338 to 10343
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if this could/should be expressed with X86AddressMode.
X86AddressMode M; M.getFullAddress(Ops) seems to be very close to what this is doing. Except that this API here does not explicitely add a "base" (reg or frame index), but maybe it should start doing that? (So in foldInlineAsmMemOperand just remove the register operand (instead of using ChangeToFrameIndex) and have this function add the frame index operand as well. That would also help if a target has a different order of operands...

Copy link
Member Author

@nickdesaulniers nickdesaulniers Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! yeah it does look potentially reusable.

Though the class seems very x86 specific. Would be nice if there was a target independent interface I could call from foldInlineAsmMemOperand (added in) https://github.com/llvm/llvm-project/pull/70743/files to get this info.

Let me check the users of X86AddressMode... worst case I could call it and drop the first operand.

and have this function add the frame index operand as well. That would also help if a target has a different order of operands...

I guess if X86AddressMode isn't a per-target interface, then in foldInlineAsmMemOperand I'd have to do some target specific hack, then we back to adding this X86InstrInfo::getFrameIndexOperands interface?

Copy link
Member Author

@nickdesaulniers nickdesaulniers Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if X86AddressMode isn't a per-target interface, then in foldInlineAsmMemOperand I'd have to do some target specific hack

example of what I roughly have in mind

diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp
index 5ede36505b5b..bfbe477f97a3 100644
--- a/llvm/lib/CodeGen/TargetInstrInfo.cpp
+++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp
@@ -578,13 +578,21 @@ static void foldInlineAsmMemOperand(MachineInstr *MI, unsigned OpNo, int FI,
     foldInlineAsmMemOperand(MI, TiedTo, FI, TII);
   }
 
+  SmallVector<MachineOperand, 4> NewOps;
+  MachineOperand *InsertPt;
+  if (isa<X86GenInstrInfo>(TII)) {
+    X86AddressMode M;
+    M.getFullAddress(NewOps);
+    InsertPt = MI->operands_begin() + OpNo;
+  } else {
   // Change the operand from a register to a frame index.
-  MO.ChangeToFrameIndex(FI, MO.getTargetFlags());
+    MO.ChangeToFrameIndex(FI, MO.getTargetFlags());
+    TII.getFrameIndexOperands(NewOps);
+    InsertPt = MI->operands_begin() + OpNo + 1;
+  }
 
-  SmallVector<MachineOperand, 4> NewOps;
-  TII.getFrameIndexOperands(NewOps);
   assert(!NewOps.empty() && "getFrameIndexOperands didn't create any operands");
-  MI->insert(MI->operands_begin() + OpNo + 1, NewOps);
+  MI->insert(InsertPt, NewOps);
 
   // Change the previous operand to a MemKind InlineAsm::Flag. The second param
   // is the per-target number of operands that represent the memory operand

seems kind of ugly, no? Did you have a better usage of X86AddressMode in mind?

Copy link
Contributor

@MatzeB MatzeB Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing modes are highly target specific. I was just proposing the use of X86AddressMode for the implementation of X86InstrInfo::getFrameIndexOperands

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the excessive delays following up on this; have been traveling for conference season for the past two weeks. I've just not made the changes you've suggested, and I think I do indeed like it better. Will post a commit on top once my tests finish successfully locally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 06c300b PTAL

This enables -regalloc=greedy to memfold spillable inline asm
MachineOperands.

Because no instruction selection framework marks MachineOperands as
spillable, no language frontend can observe functional changes from this
patch. That will change once instruction selection frameworks are
updated.
- change X86InstrInfo::getFrameIndexOperands to use X86AddressMode
@nickdesaulniers nickdesaulniers marked this pull request as ready for review November 20, 2023 22:06
Comment on lines +151 to +152
; CHECK-NEXT: [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %stack.1, 1, $noreg, 0, $noreg :: (load (s32) from %stack.1)
; CHECK-NEXT: MOV32mr %stack.0, 1, $noreg, 0, $noreg, [[MOV32rm]] :: (store (s32) into %ir.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the target it to eliminate these instructions?

Copy link
Member Author

@nickdesaulniers nickdesaulniers Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the load and store is redundant. The INLINEASM could have just stored to %stack.0, eliding the store to %stack.1, load from %stack.1, then store to %stack.0. The code gen when there is memory pressure is sub-optimal because of the redundant store/load.

But it is greedy that determines that a spill is necessary, and creates the slot to spill to (in this case %stack.1). The code I add is just told "use %stack.1."

Could we optimize this case later, yes, but I don't think that should be part of this commit.

I guess I'm curious why there isn't a later pass that determines that:

(store (s32) into %stack.1)
(load (s32) from %stack.1)
(store (s32) into %ir.1)

can't be collapsed into:

(store (s32) into %ir.1)

Is there a pass that does so, but maybe bails when one of the instructions is INLINEASM? Or is it too late to know that nothing else depends on that initial store?

; CHECK-LABEL: name: inout
; CHECK: [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0, align 16)
; CHECK-NEXT: MOV32mr %stack.0, 1, $noreg, 0, $noreg, [[MOV32rm]] :: (store (s32) into %stack.0)
; CHECK-NEXT: INLINEASM &"# $0 $1", 24 /* mayload maystore attdialect */, 262190 /* mem:m */, %stack.0, 1, $noreg, 0, $noreg, 262190 /* mem:m */, %stack.0, 1, $noreg, 0, $noreg, 12 /* clobber */, implicit-def dead early-clobber $ax, 12 /* clobber */, implicit-def dead early-clobber $cx, 12 /* clobber */, implicit-def dead early-clobber $dx, 12 /* clobber */, implicit-def dead early-clobber $si, 12 /* clobber */, implicit-def dead early-clobber $di, 12 /* clobber */, implicit-def dead early-clobber $bx, 12 /* clobber */, implicit-def dead early-clobber $bp :: (store (s32) into %stack.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how the extra info like (load (s32) from %fixed-stack.0, align 16) etc. being kept in the inlineasm when we have more than one operands? I saw we only kept (store (s32) into %stack.0) here.
Besides, we don't have a mem to mem instrcution on X86. Will it be lowered into two instructions when both operands are mem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how the extra info like (load (s32) from %fixed-stack.0, align 16) etc. being kept in the inlineasm when we have more than one operands? I saw we only kept (store (s32) into %stack.0) here.

Yeah, it is weird that for an input+output parameter it is printed as just a store (even though the extra info says both mayload, maystore. Let me do some debugging of the print method and see if I can give you a more precise answer FWIW.


Besides, we don't have a mem to mem instrcution on X86. Will it be lowered into two instructions when both operands are mem?

heh, we just had that exact discussion on the parent patch to this one.

In that case, the programmer was wrong to use "rm" for more than one operand, and should have used "r" (regardless if they are using clang or not, with or without my patches). Also, inline asm may contain multiple instructions, where the inputs/outputs to the inline asm are used as operands for different instructions, not the same instruction which would expose them to the issue you allude to if there was register pressure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me do some debugging of the print method and see if I can give you a more precise answer FWIW.

I believe it's a loop in MachineInstr::print that's iterating the memoperands() of the MachineInstr and calling print on them.

I'm not super familiar with the MachineMemOperand class. Let me triple check whether that loop is indeed the one printing the load/store, and if so, do I need to modify the memoperands() of the MachineInstr (beyond setting the extra info which is very specific to INLINEASM).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's a loop in MachineInstr::print that's iterating the memoperands() of the MachineInstr and calling print on them.

Ah, close, but it's the other mir printer: MIPrinter::print. So I wonder if I'm missing setting a memoperands() when doing the fold...let me look into that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I need to add calls to MachineInstr::addMemOperand...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! the caller TargetInstrInfo::foldMemoryOperand IS calling MachineInstr::addMemOperand, but IIUC it's only doing so once, which is a problem for in/out parameters since technically there's 2 distinct operands, let me work on fixing that up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in e553a37 , PTAL

@nickdesaulniers nickdesaulniers marked this pull request as draft November 21, 2023 23:12
@nickdesaulniers nickdesaulniers marked this pull request as ready for review November 22, 2023 21:56
@nickdesaulniers
Copy link
Member Author

bumping for review 🦃

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one question.

SmallVector<MachineOperand, 4> NewOps;
TII.getFrameIndexOperands(NewOps);
SmallVector<MachineOperand, 5> NewOps;
TII.getFrameIndexOperands(NewOps, FI);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old behavoir is to change to frame index. Does it mean we need to call this method in TargetInstrInfo::getFrameIndexOperands for other targets?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and the previous behavior did already call TargetInstrInfo::getFrameIndexOperands. Instead of MachineOperand::ChangeToFrameIndex + TargetInstrInfo::getFrameIndexOperands, this change instead does TargetInstrInfo::getFrameIndexOperands (which now creates the MachineOperand we were changing) then MachineInstr::removeOperand (to remove the operand we were previously changing).

I have additional child commits locally for supporting other targets. I need to rebase them onto this change, which shouldn't be a problem; basically getFrameIndexOperands just needs to set up 3 MachineOperands for most targets but 5 MachineOperands for x86. Even then, I've realized that we only need to support x86 and m68k.

@nickdesaulniers nickdesaulniers merged commit b053359 into llvm:main Nov 29, 2023
@nickdesaulniers nickdesaulniers deleted the asm_rm_core4 branch November 29, 2023 16:19
nickdesaulniers added a commit to nickdesaulniers/llvm-project that referenced this pull request Dec 1, 2023
In commit b053359 ("[X86InstrInfo] support memfold on spillable inline asm
(llvm#70832)"), I had a last minute fix to update the memoperands.  I originally
did this in the parent foldInlineAsmMemOperand call, updated the mir test via
update_mir_test_checks.py, but then decided to move it to the child call of
foldInlineAsmMemOperand.

But I forgot to rerun update_mir_test_checks.py. That last minute change caused
the same memoperand to be added twice when recursion occurred (for tied
operands). I happened to get lucky that trailing content omitted from the
CHECK line doesn't result in test failure.

But rerunning update_mir_test_checks.py on the mir test added in that commit
produces updated output. This is resulting in updates to the test that:
1. conflate additions to the test in child commits with simply updating the
   test as it should have been when first committed.
2. look wrong because the same memoperand is specified twice (we don't
   deduplicate memoperands when added). Example:

    INLINEASM ... :: (load (s32) from %stack.0) (load (s32) from %stack.0)

Fix the bug, so that in child commits, we don't have additional unrelated test
changes (which would be wrong anyways) from simply running
update_mir_test_checks.py.

Link: llvm#20571
nickdesaulniers added a commit that referenced this pull request Dec 4, 2023
In commit b053359 ("[X86InstrInfo] support memfold on spillable
inline asm
(#70832)"), I had a last minute fix to update the memoperands. I
originally
did this in the parent foldInlineAsmMemOperand call, updated the mir
test via
update_mir_test_checks.py, but then decided to move it to the child call
of
foldInlineAsmMemOperand.

But I forgot to rerun update_mir_test_checks.py. That last minute change
caused
the same memoperand to be added twice when recursion occurred (for tied
operands). I happened to get lucky that trailing content omitted from
the
CHECK line doesn't result in test failure.

But rerunning update_mir_test_checks.py on the mir test added in that
commit
produces updated output. This is resulting in updates to the test that:
1. conflate additions to the test in child commits with simply updating
the
   test as it should have been when first committed.
2. look wrong because the same memoperand is specified twice (we don't
   deduplicate memoperands when added). Example:

INLINEASM ... :: (load (s32) from %stack.0) (load (s32) from %stack.0)

Fix the bug, so that in child commits, we don't have additional
unrelated test
changes (which would be wrong anyways) from simply running
update_mir_test_checks.py.

Link: #20571
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants