Skip to content

Conversation

@fzou1
Copy link
Contributor

@fzou1 fzou1 commented Jul 4, 2024

For more details about this feature, please refer to latest Intel 64 and IA-32 Architectures Optimization Reference Manual Volume 1: https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html

For more details about this feature, please refer to latest Intel 64 and
IA-32 Architectures Optimization Reference Manual Volume 1:
https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jul 4, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 4, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-x86

Author: Feng Zou (fzou1)

Changes

For more details about this feature, please refer to latest Intel 64 and IA-32 Architectures Optimization Reference Manual Volume 1: https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html


Full diff: https://github.com/llvm/llvm-project/pull/97721.diff

5 Files Affected:

  • (modified) clang/lib/Basic/Targets/X86.cpp (+3)
  • (modified) clang/lib/Basic/Targets/X86.h (+1)
  • (modified) llvm/lib/Target/X86/X86.td (+10-3)
  • (modified) llvm/lib/Target/X86/X86MCInstLower.cpp (+24)
  • (added) llvm/test/CodeGen/X86/branch-hint.ll (+95)
diff --git a/clang/lib/Basic/Targets/X86.cpp b/clang/lib/Basic/Targets/X86.cpp
index 276d492955207..1f6fc842ddd95 100644
--- a/clang/lib/Basic/Targets/X86.cpp
+++ b/clang/lib/Basic/Targets/X86.cpp
@@ -457,6 +457,8 @@ bool X86TargetInfo::handleTargetFeatures(std::vector<std::string> &Features,
       HasCF = true;
     } else if (Feature == "+zu") {
       HasZU = true;
+    } else if (Feature == "+branch-hint") {
+      HasBranchHint = true;
     }
 
     X86SSEEnum Level = llvm::StringSwitch<X86SSEEnum>(Feature)
@@ -1292,6 +1294,7 @@ bool X86TargetInfo::hasFeature(StringRef Feature) const {
       .Case("nf", HasNF)
       .Case("cf", HasCF)
       .Case("zu", HasZU)
+      .Case("branch-hint", HasBranchHint)
       .Default(false);
 }
 
diff --git a/clang/lib/Basic/Targets/X86.h b/clang/lib/Basic/Targets/X86.h
index 5ce4953251bc3..a70711f4ae2bb 100644
--- a/clang/lib/Basic/Targets/X86.h
+++ b/clang/lib/Basic/Targets/X86.h
@@ -174,6 +174,7 @@ class LLVM_LIBRARY_VISIBILITY X86TargetInfo : public TargetInfo {
   bool HasCF = false;
   bool HasZU = false;
   bool HasInlineAsmUseGPR32 = false;
+  bool HasBranchHint = false;
 
 protected:
   llvm::X86::CPUKind CPU = llvm::X86::CK_None;
diff --git a/llvm/lib/Target/X86/X86.td b/llvm/lib/Target/X86/X86.td
index 68b78c7c44771..fdd7d5f1ee0e7 100644
--- a/llvm/lib/Target/X86/X86.td
+++ b/llvm/lib/Target/X86/X86.td
@@ -749,6 +749,11 @@ def TuningUseGLMDivSqrtCosts
     : SubtargetFeature<"use-glm-div-sqrt-costs", "UseGLMDivSqrtCosts", "true",
         "Use Goldmont specific floating point div/sqrt costs">;
 
+// Starting with Redwood Cove architecture, the branch has branch taken hint
+// (i.e., instruction prefix 3EH).
+def TuningBranchHint: SubtargetFeature<"branch-hint", "HasBranchHint", "true",
+                                        "Target has branch hint feature">;
+
 //===----------------------------------------------------------------------===//
 // X86 CPU Families
 // TODO: Remove these - use general tuning features to determine codegen.
@@ -1124,6 +1129,8 @@ def ProcessorFeatures {
                                                   FeaturePREFETCHI];
   list<SubtargetFeature> GNRFeatures =
     !listconcat(SPRFeatures, GNRAdditionalFeatures);
+  list<SubtargetFeature> GNRAdditionalTuning = [TuningBranchHint];
+  list<SubtargetFeature> GNRTuning = !listconcat(SPRTuning, GNRAdditionalTuning);
 
   // Graniterapids D
   list<SubtargetFeature> GNRDAdditionalFeatures = [FeatureAMXCOMPLEX];
@@ -1815,12 +1822,12 @@ def : ProcModel<"pantherlake", AlderlakePModel,
 def : ProcModel<"clearwaterforest", AlderlakePModel,
                 ProcessorFeatures.CWFFeatures, ProcessorFeatures.ADLTuning>;
 def : ProcModel<"graniterapids", SapphireRapidsModel,
-                ProcessorFeatures.GNRFeatures, ProcessorFeatures.SPRTuning>;
+                ProcessorFeatures.GNRFeatures, ProcessorFeatures.GNRTuning>;
 def : ProcModel<"emeraldrapids", SapphireRapidsModel,
-                ProcessorFeatures.SPRFeatures, ProcessorFeatures.SPRTuning>;
+                ProcessorFeatures.SPRFeatures, ProcessorFeatures.GNRTuning>;
 foreach P = ["graniterapids-d", "graniterapids_d"] in {
 def : ProcModel<P, SapphireRapidsModel,
-                ProcessorFeatures.GNRDFeatures, ProcessorFeatures.SPRTuning>;
+                ProcessorFeatures.GNRDFeatures, ProcessorFeatures.GNRTuning>;
 }
 
 // AMD CPUs.
diff --git a/llvm/lib/Target/X86/X86MCInstLower.cpp b/llvm/lib/Target/X86/X86MCInstLower.cpp
index 00f58f9432e4d..34d95573585c9 100644
--- a/llvm/lib/Target/X86/X86MCInstLower.cpp
+++ b/llvm/lib/Target/X86/X86MCInstLower.cpp
@@ -25,6 +25,7 @@
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/StringExtras.h"
+#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
 #include "llvm/CodeGen/MachineConstantPool.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineModuleInfoImpls.h"
@@ -54,6 +55,14 @@
 
 using namespace llvm;
 
+static cl::opt<bool> EnableBranchHint("branch-hint",
+                                      cl::desc("Enable branch hint."),
+                                      cl::init(false), cl::Hidden);
+static cl::opt<unsigned> BranchHintProbabilityThreshold(
+    "branch-hint-probability-threshold",
+    cl::desc("The probability threshold of enabling branch hint."),
+    cl::init(50), cl::Hidden);
+
 namespace {
 
 /// X86MCInstLower - This class is used to lower an MachineInstr into an MCInst.
@@ -2444,6 +2453,21 @@ void X86AsmPrinter::emitInstruction(const MachineInstr *MI) {
     if (IndCSPrefix && MI->hasRegisterImplicitUseOperand(X86::R11))
       EmitAndCountInstruction(MCInstBuilder(X86::CS_PREFIX));
     break;
+  case X86::JCC_1:
+    // Two instruction prefixes (2EH for branch not-taken and 3EH for branch
+    // taken) are used as branch hints. Here we add branch taken prefix for
+    // jump instruction with higher probability than threshold.
+    if (getSubtarget().hasBranchHint() && EnableBranchHint) {
+      const MachineBranchProbabilityInfo *MBPI =
+          &getAnalysis<MachineBranchProbabilityInfoWrapperPass>().getMBPI();
+      MachineBasicBlock *DestBB = MI->getOperand(0).getMBB();
+      BranchProbability EdgeProb =
+          MBPI->getEdgeProbability(MI->getParent(), DestBB);
+      BranchProbability Threshold(BranchHintProbabilityThreshold, 100);
+      if (EdgeProb > Threshold)
+        EmitAndCountInstruction(MCInstBuilder(X86::DS_PREFIX));
+    }
+    break;
   }
 
   MCInst TmpInst;
diff --git a/llvm/test/CodeGen/X86/branch-hint.ll b/llvm/test/CodeGen/X86/branch-hint.ll
new file mode 100644
index 0000000000000..7bef84c34d10f
--- /dev/null
+++ b/llvm/test/CodeGen/X86/branch-hint.ll
@@ -0,0 +1,95 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-linux -branch-hint -branch-hint-probability-threshold=60 -show-mc-encoding | FileCheck %s
+
+; Design: Insert "ds # encoding: [0x3e]" for condition branches who has high probability to take.
+; Source code:
+; int XX = 0;
+; int bar(int x);
+; void foo(int a, int b, char* c) {
+;   if (a == 0) {
+;     XX += 1;
+;   } else {
+;     if (c == 0)
+;       XX += 2;
+;   }
+;   for(int i=0; i<b; i++)
+;     bar(XX);
+; }
+
+@XX = dso_local local_unnamed_addr global i32 0, align 4
+
+; Function Attrs: nounwind
+define dso_local void @foo(i32 noundef %a, i32 noundef %b, i8* noundef readnone %c) local_unnamed_addr #0 {
+; CHECK-LABEL: foo:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    pushq %rbx # encoding: [0x53]
+; CHECK-NEXT:    movl %esi, %ebx # encoding: [0x89,0xf3]
+; CHECK-NEXT:    testl %edi, %edi # encoding: [0x85,0xff]
+; CHECK-NEXT:    je .LBB0_1 # encoding: [0x74,A]
+; CHECK-NEXT:    # fixup A - offset: 1, value: .LBB0_1-1, kind: FK_PCRel_1
+; CHECK-NEXT:  # %bb.2: # %if.else
+; CHECK-NEXT:    testq %rdx, %rdx # encoding: [0x48,0x85,0xd2]
+; CHECK-NEXT:    ds # encoding: [0x3e]
+; CHECK-NEXT:    jne .LBB0_5 # encoding: [0x75,A]
+; CHECK-NEXT:    # fixup A - offset: 1, value: .LBB0_5-1, kind: FK_PCRel_1
+; CHECK-NEXT:  # %bb.3:
+; CHECK-NEXT:    movl $2, %eax # encoding: [0xb8,0x02,0x00,0x00,0x00]
+; CHECK-NEXT:    jmp .LBB0_4 # encoding: [0xeb,A]
+; CHECK-NEXT:    # fixup A - offset: 1, value: .LBB0_4-1, kind: FK_PCRel_1
+; CHECK-NEXT:  .LBB0_1:
+; CHECK-NEXT:    movl $1, %eax # encoding: [0xb8,0x01,0x00,0x00,0x00]
+; CHECK-NEXT:  .LBB0_4: # %if.end4.sink.split
+; CHECK-NEXT:    addl %eax, XX(%rip) # encoding: [0x01,0x05,A,A,A,A]
+; CHECK-NEXT:    # fixup A - offset: 2, value: XX-4, kind: reloc_riprel_4byte
+; CHECK-NEXT:  .LBB0_5: # %if.end4
+; CHECK-NEXT:    testl %ebx, %ebx # encoding: [0x85,0xdb]
+; CHECK-NEXT:    jle .LBB0_7 # encoding: [0x7e,A]
+; CHECK-NEXT:    # fixup A - offset: 1, value: .LBB0_7-1, kind: FK_PCRel_1
+; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:  .LBB0_6: # %for.body
+; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    movl XX(%rip), %edi # encoding: [0x8b,0x3d,A,A,A,A]
+; CHECK-NEXT:    # fixup A - offset: 2, value: XX-4, kind: reloc_riprel_4byte_relax
+; CHECK-NEXT:    callq bar # encoding: [0xe8,A,A,A,A]
+; CHECK-NEXT:    # fixup A - offset: 1, value: bar-4, kind: reloc_branch_4byte_pcrel
+; CHECK-NEXT:    decl %ebx # encoding: [0xff,0xcb]
+; CHECK-NEXT:    ds # encoding: [0x3e]
+; CHECK-NEXT:    jne .LBB0_6 # encoding: [0x75,A]
+; CHECK-NEXT:    # fixup A - offset: 1, value: .LBB0_6-1, kind: FK_PCRel_1
+; CHECK-NEXT:  .LBB0_7: # %for.cond.cleanup
+; CHECK-NEXT:    popq %rbx # encoding: [0x5b]
+; CHECK-NEXT:    retq # encoding: [0xc3]
+entry:
+  %cmp = icmp eq i32 %a, 0
+  br i1 %cmp, label %if.end4.sink.split, label %if.else
+
+if.else:                                          ; preds = %entry
+  %cmp1 = icmp eq i8* %c, null
+  br i1 %cmp1, label %if.end4.sink.split, label %if.end4
+
+if.end4.sink.split:                               ; preds = %if.else, %entry
+  %.sink10 = phi i32 [ 1, %entry ], [ 2, %if.else ]
+  %0 = load i32, i32* @XX, align 4
+  %add3 = add nsw i32 %0, %.sink10
+  store i32 %add3, i32* @XX, align 4
+  br label %if.end4
+
+if.end4:                                          ; preds = %if.end4.sink.split, %if.else
+  %cmp58 = icmp sgt i32 %b, 0
+  br i1 %cmp58, label %for.body, label %for.cond.cleanup
+
+for.cond.cleanup:                                 ; preds = %for.body, %if.end4
+  ret void
+
+for.body:                                         ; preds = %if.end4, %for.body
+  %i.09 = phi i32 [ %inc, %for.body ], [ 0, %if.end4 ]
+  %1 = load i32, i32* @XX, align 4
+  %call = tail call i32 @bar(i32 noundef %1) #0
+  %inc = add nuw nsw i32 %i.09, 1
+  %exitcond.not = icmp eq i32 %inc, %b
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+}
+
+declare dso_local i32 @bar(i32 noundef) local_unnamed_addr #0
+
+attributes #0 = { nounwind "target-features"="+branch-hint" }


// Starting with Redwood Cove architecture, the branch has branch taken hint
// (i.e., instruction prefix 3EH).
def TuningBranchHint: SubtargetFeature<"branch-hint", "HasBranchHint", "true",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have you added this as a Tuning bit and not a Feature bit? Tuning bits are guaranteed to work on all applicable CPUs - they just might not be performant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always add 0x3e to JCC in 64-bit mode. So it does work on all applicable CPUs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Intel SDM

— Branch hints1:
• 2EH—Branch not taken (used only with Jcc instructions).
• 3EH—Branch taken (used only with Jcc instructions).

1. Some earlier microarchitectures used these as branch hints, but recent generations have not and they are reserved for future hint usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KanRobert, Thank you for helping this.

@RKSimon RKSimon requested review from KanRobert and phoebewang July 4, 2024 13:14
@RKSimon RKSimon changed the title Support branch hint [X86] Support branch hint Jul 4, 2024
@XX = dso_local local_unnamed_addr global i32 0, align 4

; Function Attrs: nounwind
define dso_local void @foo(i32 noundef %a, i32 noundef %b, i8* noundef readnone %c) local_unnamed_addr #0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop dso_local local_unnamed_addr #0 ; Function Attrs: nounwind and add -mattr=+branch-hint to run line.

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-linux -branch-hint -branch-hint-probability-threshold=60 -show-mc-encoding | FileCheck %s

; Design: Insert "ds # encoding: [0x3e]" for condition branches who has high probability to take.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

branches who has -> branch which has?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insert "ds # encoding: [0x3e]" -> Add DS segment override prefix ?

br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
}

declare dso_local i32 @bar(i32 noundef) local_unnamed_addr #0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop dso_local local_unnamed_addr #0

@@ -0,0 +1,95 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-linux -branch-hint -branch-hint-probability-threshold=60 -show-mc-encoding | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop -show-mc-encoding


using namespace llvm;

static cl::opt<bool> EnableBranchHint("branch-hint",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable-branch-hint is more in line with our current naming habits

Copy link
Contributor

@KanRobert KanRobert Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is too verbose, not robust and miss coverage. I would suggest the following

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc < %s -mtriple=x86_64 -mattr=+branch-hint -branch-hint | FileCheck %s
; RUN: llc < %s -mtriple=x86_64 -mattr=+branch-hint -branch-hint -branch-hint-probability-threshold=50 | FileCheck %s
; RUN: llc < %s -mtriple=x86_64 -mattr=+branch-hint -branch-hint -branch-hint-probability-threshold=60 -tail-dup-placement=false | FileCheck --check-prefix=TH60 %s

define void @p51(i32 %x, ptr %p) {
; CHECK-LABEL: p51:
; CHECK:       # %bb.0: # %entry
; CHECK-NEXT:    testl %edi, %edi
; CHECK-NEXT:    ds
; CHECK-NEXT:    je .LBB0_2
; CHECK-NEXT:  # %bb.1: # %if.then
; CHECK-NEXT:    movl %edi, (%rsi)
; CHECK-NEXT:  .LBB0_2: # %if.end
; CHECK-NEXT:    retq
;
; TH60-LABEL: p51:
; TH60:       # %bb.0: # %entry
; TH60-NEXT:    testl %edi, %edi
; TH60-NEXT:    je .LBB0_2
; TH60-NEXT:  # %bb.1: # %if.then
; TH60-NEXT:    movl %edi, (%rsi)
; TH60-NEXT:  .LBB0_2: # %if.end
; TH60-NEXT:    retq
entry:
  %tobool.not = icmp eq i32 %x, 0
  br i1 %tobool.not, label %if.end, label %if.then, !prof !0

if.then:
  store i32 %x, ptr %p, align 4
  br label %if.end

if.end:
  ret void
}

define void @p61(i32 %x, ptr %p) {
; CHECK-LABEL: p61:
; CHECK:       # %bb.0: # %entry
; CHECK-NEXT:    testl %edi, %edi
; CHECK-NEXT:    jne .LBB1_1
; CHECK-NEXT:  # %bb.2: # %if.end
; CHECK-NEXT:    retq
; CHECK-NEXT:  .LBB1_1: # %if.then
; CHECK-NEXT:    movl %edi, (%rsi)
; CHECK-NEXT:    retq
;
; TH60-LABEL: p61:
; TH60:       # %bb.0: # %entry
; TH60-NEXT:    testl %edi, %edi
; TH60-NEXT:    ds
; TH60-NEXT:    je .LBB1_2
; TH60-NEXT:  # %bb.1: # %if.then
; TH60-NEXT:    movl %edi, (%rsi)
; TH60-NEXT:  .LBB1_2: # %if.end
; TH60-NEXT:    retq
entry:
  %tobool.not = icmp eq i32 %x, 0
  br i1 %tobool.not, label %if.end, label %if.then, !prof !1

if.then:
  store i32 %x, ptr %p, align 4
  br label %if.end

if.end:
  ret void
}

!0 = !{!"branch_weights", i32 51, i32 49}
!1 = !{!"branch_weights", i32 61, i32 39}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It's simpler. What's the metadata for "!prof !0" and "!prof !1"?

Copy link
Contributor

@KanRobert KanRobert Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. We should tell the branch probability explicitly instead of letting other passes to guess the probability.

Copy link
Contributor

@KanRobert KanRobert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@KanRobert KanRobert merged commit e603451 into llvm:main Jul 8, 2024
@fzou1 fzou1 deleted the branch-hint branch July 8, 2024 10:21
ProcessorFeatures.GNRFeatures, ProcessorFeatures.GNRTuning>;
def : ProcModel<"emeraldrapids", SapphireRapidsModel,
ProcessorFeatures.SPRFeatures, ProcessorFeatures.SPRTuning>;
ProcessorFeatures.SPRFeatures, ProcessorFeatures.GNRTuning>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't Emerald Rapids based on Raptor Lake microarchitecture (and not Redwood Cove)? If so, wouldn't branch hints be unsupported for this subtarget?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I'll correct it.

fzou1 added a commit to fzou1/llvm-project that referenced this pull request Jul 9, 2024
The tuning is introduced by llvm#97721.
KanRobert pushed a commit that referenced this pull request Jul 10, 2024
The incorrect tuning is introduced by #97721.
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
The incorrect tuning is introduced by llvm#97721.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:X86 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants