Skip to content

[AArch64] Extend condition optimizer to support unsigned comparisons #144380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

AZero13
Copy link
Contributor

@AZero13 AZero13 commented Jun 16, 2025

We have to be extra careful to not allow unsigned wraps, however. This also required some adjusting of the logic in adjustCmp, as well as compare the true imm value with add or sub taken into effect.

Because SIGNED_MIN and SIGNED_MAX cannot be an immediate, we do not need to worry about those edge cases when dealing with unsigned comparisons.

@llvmbot
Copy link
Member

llvmbot commented Jun 16, 2025

@llvm/pr-subscribers-backend-aarch64

Author: AZero13 (AZero13)

Changes

We have to be extra careful to not allow unsigned wraps, however. This also required some adjusting of the logic in adjustCmp, as well as compare the true imm value with add or sub taken into effect.

Because SIGNED_MIN and SIGNED_MAX cannot be an immediate, we do not need to worry about those edge cases when dealing with unsigned comparisons.


Patch is 29.20 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144380.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ConditionOptimizer.cpp (+54-16)
  • (modified) llvm/test/CodeGen/AArch64/combine-comparisons-by-cse.ll (+662)
diff --git a/llvm/lib/Target/AArch64/AArch64ConditionOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64ConditionOptimizer.cpp
index 4c9f8c2723493..230dac417eae0 100644
--- a/llvm/lib/Target/AArch64/AArch64ConditionOptimizer.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ConditionOptimizer.cpp
@@ -231,6 +231,10 @@ static AArch64CC::CondCode getAdjustedCmp(AArch64CC::CondCode Cmp) {
   case AArch64CC::GE: return AArch64CC::GT;
   case AArch64CC::LT: return AArch64CC::LE;
   case AArch64CC::LE: return AArch64CC::LT;
+  case AArch64CC::HI: return AArch64CC::HS;
+  case AArch64CC::HS: return AArch64CC::HI;
+  case AArch64CC::LO: return AArch64CC::LS;
+  case AArch64CC::LS: return AArch64CC::LO;
   default:
     llvm_unreachable("Unexpected condition code");
   }
@@ -238,15 +242,20 @@ static AArch64CC::CondCode getAdjustedCmp(AArch64CC::CondCode Cmp) {
 
 // Transforms GT -> GE, GE -> GT, LT -> LE, LE -> LT by updating comparison
 // operator and condition code.
-AArch64ConditionOptimizer::CmpInfo AArch64ConditionOptimizer::adjustCmp(
-    MachineInstr *CmpMI, AArch64CC::CondCode Cmp) {
+AArch64ConditionOptimizer::CmpInfo
+AArch64ConditionOptimizer::adjustCmp(MachineInstr *CmpMI,
+                                     AArch64CC::CondCode Cmp) {
   unsigned Opc = CmpMI->getOpcode();
+  unsigned OldOpc = Opc;
+
+  bool isSigned = Cmp == AArch64CC::GT || Cmp == AArch64CC::GE ||
+                  Cmp == AArch64CC::LT || Cmp == AArch64CC::LE;
 
   // CMN (compare with negative immediate) is an alias to ADDS (as
   // "operand - negative" == "operand + positive")
   bool Negative = (Opc == AArch64::ADDSWri || Opc == AArch64::ADDSXri);
 
-  int Correction = (Cmp == AArch64CC::GT) ? 1 : -1;
+  int Correction = (Cmp == AArch64CC::GT || Cmp == AArch64CC::HI) ? 1 : -1;
   // Negate Correction value for comparison with negative immediate (CMN).
   if (Negative) {
     Correction = -Correction;
@@ -255,13 +264,23 @@ AArch64ConditionOptimizer::CmpInfo AArch64ConditionOptimizer::adjustCmp(
   const int OldImm = (int)CmpMI->getOperand(2).getImm();
   const int NewImm = std::abs(OldImm + Correction);
 
-  // Handle +0 -> -1 and -0 -> +1 (CMN with 0 immediate) transitions by
-  // adjusting compare instruction opcode.
-  if (OldImm == 0 && ((Negative && Correction == 1) ||
-                      (!Negative && Correction == -1))) {
+  // Handle cmn 1 -> cmp 0, transitions by adjusting compare instruction opcode.
+  if (OldImm == 1 && Negative && Correction == -1) {
+    // If we are adjusting from -1 to 0, we need to change the opcode.
+    Opc = getComplementOpc(Opc);
+  }
+
+  // Handle +0 -> -1 transitions by adjusting compare instruction opcode.
+  assert((OldImm != 0 || !Negative) && "Should not encounter cmn 0!");
+  if (OldImm == 0 && Correction == -1) {
     Opc = getComplementOpc(Opc);
   }
 
+  // If we change opcodes, this means we did an unsigned wrap, so return the old
+  // cmp.
+  if (!isSigned && Opc != OldOpc)
+    return CmpInfo(OldImm, OldOpc, Cmp);
+
   return CmpInfo(NewImm, Opc, getAdjustedCmp(Cmp));
 }
 
@@ -323,6 +342,14 @@ bool AArch64ConditionOptimizer::adjustTo(MachineInstr *CmpMI,
   return false;
 }
 
+static bool isGreaterThan(AArch64CC::CondCode Cmp) {
+  return Cmp == AArch64CC::GT || Cmp == AArch64CC::HI;
+}
+
+static bool isLessThan(AArch64CC::CondCode Cmp) {
+  return Cmp == AArch64CC::LT || Cmp == AArch64CC::LO;
+}
+
 bool AArch64ConditionOptimizer::runOnMachineFunction(MachineFunction &MF) {
   LLVM_DEBUG(dbgs() << "********** AArch64 Conditional Compares **********\n"
                     << "********** Function: " << MF.getName() << '\n');
@@ -383,6 +410,9 @@ bool AArch64ConditionOptimizer::runOnMachineFunction(MachineFunction &MF) {
     const int HeadImm = (int)HeadCmpMI->getOperand(2).getImm();
     const int TrueImm = (int)TrueCmpMI->getOperand(2).getImm();
 
+    int HeadImmTrueValue = HeadImm;
+    int TrueImmTrueValue = TrueImm;
+
     LLVM_DEBUG(dbgs() << "Head branch:\n");
     LLVM_DEBUG(dbgs() << "\tcondition: " << AArch64CC::getCondCodeName(HeadCmp)
                       << '\n');
@@ -393,9 +423,17 @@ bool AArch64ConditionOptimizer::runOnMachineFunction(MachineFunction &MF) {
                       << '\n');
     LLVM_DEBUG(dbgs() << "\timmediate: " << TrueImm << '\n');
 
-    if (((HeadCmp == AArch64CC::GT && TrueCmp == AArch64CC::LT) ||
-         (HeadCmp == AArch64CC::LT && TrueCmp == AArch64CC::GT)) &&
-        std::abs(TrueImm - HeadImm) == 2) {
+    unsigned Opc = HeadCmpMI->getOpcode();
+    if (Opc == AArch64::ADDSWri || Opc == AArch64::ADDSXri)
+      HeadImmTrueValue = -HeadImmTrueValue;
+
+    Opc = TrueCmpMI->getOpcode();
+    if (Opc == AArch64::ADDSWri || Opc == AArch64::ADDSXri)
+      TrueImmTrueValue = -TrueImmTrueValue;
+
+    if (((isGreaterThan(HeadCmp) && isLessThan(TrueCmp)) ||
+         (isLessThan(HeadCmp) && isGreaterThan(TrueCmp))) &&
+        std::abs(TrueImmTrueValue - HeadImmTrueValue) == 2) {
       // This branch transforms machine instructions that correspond to
       //
       // 1) (a > {TrueImm} && ...) || (a < {HeadImm} && ...)
@@ -414,9 +452,9 @@ bool AArch64ConditionOptimizer::runOnMachineFunction(MachineFunction &MF) {
         modifyCmp(TrueCmpMI, TrueCmpInfo);
         Changed = true;
       }
-    } else if (((HeadCmp == AArch64CC::GT && TrueCmp == AArch64CC::GT) ||
-                (HeadCmp == AArch64CC::LT && TrueCmp == AArch64CC::LT)) &&
-                std::abs(TrueImm - HeadImm) == 1) {
+    } else if (((isGreaterThan(HeadCmp) && isGreaterThan(TrueCmp)) ||
+                (isLessThan(HeadCmp) && isLessThan(TrueCmp))) &&
+               std::abs(TrueImmTrueValue - HeadImmTrueValue) == 1) {
       // This branch transforms machine instructions that correspond to
       //
       // 1) (a > {TrueImm} && ...) || (a > {HeadImm} && ...)
@@ -429,9 +467,9 @@ bool AArch64ConditionOptimizer::runOnMachineFunction(MachineFunction &MF) {
 
       // GT -> GE transformation increases immediate value, so picking the
       // smaller one; LT -> LE decreases immediate value so invert the choice.
-      bool adjustHeadCond = (HeadImm < TrueImm);
-      if (HeadCmp == AArch64CC::LT) {
-          adjustHeadCond = !adjustHeadCond;
+      bool adjustHeadCond = (HeadImmTrueValue < TrueImmTrueValue);
+      if (isLessThan(HeadCmp)) {
+        adjustHeadCond = !adjustHeadCond;
       }
 
       if (adjustHeadCond) {
diff --git a/llvm/test/CodeGen/AArch64/combine-comparisons-by-cse.ll b/llvm/test/CodeGen/AArch64/combine-comparisons-by-cse.ll
index 6449c3e11d667..1a8e5bc249b26 100644
--- a/llvm/test/CodeGen/AArch64/combine-comparisons-by-cse.ll
+++ b/llvm/test/CodeGen/AArch64/combine-comparisons-by-cse.ll
@@ -845,6 +845,668 @@ return:                                           ; preds = %if.end, %land.lhs.t
   ret i32 %retval.0
 }
 
+; (a > 10 && b == c) || (a >= 10 && b == d)
+define i32 @combine_ugt_uge_10() #0 {
+; CHECK-LABEL: combine_ugt_uge_10:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    adrp x8, :got:a
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:a]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    cmp w8, #10
+; CHECK-NEXT:    adrp x8, :got:b
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT:    b.ls .LBB12_3
+; CHECK-NEXT:  // %bb.1: // %land.lhs.true
+; CHECK-NEXT:    adrp x9, :got:c
+; CHECK-NEXT:    ldr x9, [x9, :got_lo12:c]
+; CHECK-NEXT:    ldr w10, [x8]
+; CHECK-NEXT:    ldr w9, [x9]
+; CHECK-NEXT:    cmp w10, w9
+; CHECK-NEXT:    b.ne .LBB12_4
+; CHECK-NEXT:  // %bb.2:
+; CHECK-NEXT:    mov w0, #1 // =0x1
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB12_3: // %lor.lhs.false
+; CHECK-NEXT:    b.lo .LBB12_6
+; CHECK-NEXT:  .LBB12_4: // %land.lhs.true3
+; CHECK-NEXT:    adrp x9, :got:d
+; CHECK-NEXT:    ldr x9, [x9, :got_lo12:d]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    ldr w9, [x9]
+; CHECK-NEXT:    cmp w8, w9
+; CHECK-NEXT:    b.ne .LBB12_6
+; CHECK-NEXT:  // %bb.5:
+; CHECK-NEXT:    mov w0, #1 // =0x1
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB12_6: // %if.end
+; CHECK-NEXT:    mov w0, wzr
+; CHECK-NEXT:    ret
+entry:
+  %0 = load i32, ptr @a, align 4
+  %cmp = icmp ugt i32 %0, 10
+  br i1 %cmp, label %land.lhs.true, label %lor.lhs.false
+
+land.lhs.true:                                    ; preds = %entry
+  %1 = load i32, ptr @b, align 4
+  %2 = load i32, ptr @c, align 4
+  %cmp1 = icmp eq i32 %1, %2
+  br i1 %cmp1, label %return, label %land.lhs.true3
+
+lor.lhs.false:                                    ; preds = %entry
+  %cmp2 = icmp ugt i32 %0, 9
+  br i1 %cmp2, label %land.lhs.true3, label %if.end
+
+land.lhs.true3:                                   ; preds = %lor.lhs.false, %land.lhs.true
+  %3 = load i32, ptr @b, align 4
+  %4 = load i32, ptr @d, align 4
+  %cmp4 = icmp eq i32 %3, %4
+  br i1 %cmp4, label %return, label %if.end
+
+if.end:                                           ; preds = %land.lhs.true3, %lor.lhs.false
+  br label %return
+
+return:                                           ; preds = %if.end, %land.lhs.true3, %land.lhs.true
+  %retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
+  ret i32 %retval.0
+}
+
+; (a > 5 && b == c) || (a < 5 && b == d)
+define i32 @combine_ugt_ult_5() #0 {
+; CHECK-LABEL: combine_ugt_ult_5:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    adrp x8, :got:a
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:a]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    cmp w8, #5
+; CHECK-NEXT:    b.ls .LBB13_3
+; CHECK-NEXT:  // %bb.1: // %land.lhs.true
+; CHECK-NEXT:    adrp x8, :got:b
+; CHECK-NEXT:    adrp x9, :got:c
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT:    ldr x9, [x9, :got_lo12:c]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    ldr w9, [x9]
+; CHECK-NEXT:    cmp w8, w9
+; CHECK-NEXT:    b.ne .LBB13_6
+; CHECK-NEXT:  // %bb.2:
+; CHECK-NEXT:    mov w0, #1 // =0x1
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB13_3: // %lor.lhs.false
+; CHECK-NEXT:    b.hs .LBB13_6
+; CHECK-NEXT:  // %bb.4: // %land.lhs.true3
+; CHECK-NEXT:    adrp x8, :got:b
+; CHECK-NEXT:    adrp x9, :got:d
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT:    ldr x9, [x9, :got_lo12:d]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    ldr w9, [x9]
+; CHECK-NEXT:    cmp w8, w9
+; CHECK-NEXT:    b.ne .LBB13_6
+; CHECK-NEXT:  // %bb.5:
+; CHECK-NEXT:    mov w0, #1 // =0x1
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB13_6: // %if.end
+; CHECK-NEXT:    mov w0, wzr
+; CHECK-NEXT:    ret
+entry:
+  %0 = load i32, ptr @a, align 4
+  %cmp = icmp ugt i32 %0, 5
+  br i1 %cmp, label %land.lhs.true, label %lor.lhs.false
+
+land.lhs.true:                                    ; preds = %entry
+  %1 = load i32, ptr @b, align 4
+  %2 = load i32, ptr @c, align 4
+  %cmp1 = icmp eq i32 %1, %2
+  br i1 %cmp1, label %return, label %if.end
+
+lor.lhs.false:                                    ; preds = %entry
+  %cmp2 = icmp ult i32 %0, 5
+  br i1 %cmp2, label %land.lhs.true3, label %if.end
+
+land.lhs.true3:                                   ; preds = %lor.lhs.false
+  %3 = load i32, ptr @b, align 4
+  %4 = load i32, ptr @d, align 4
+  %cmp4 = icmp eq i32 %3, %4
+  br i1 %cmp4, label %return, label %if.end
+
+if.end:                                           ; preds = %land.lhs.true3, %lor.lhs.false, %land.lhs.true
+  br label %return
+
+return:                                           ; preds = %if.end, %land.lhs.true3, %land.lhs.true
+  %retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
+  ret i32 %retval.0
+}
+
+; (a < 5 && b == c) || (a <= 5 && b == d)
+define i32 @combine_ult_uge_5() #0 {
+; CHECK-LABEL: combine_ult_uge_5:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    adrp x8, :got:a
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:a]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    cmp w8, #5
+; CHECK-NEXT:    adrp x8, :got:b
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT:    b.hs .LBB14_3
+; CHECK-NEXT:  // %bb.1: // %land.lhs.true
+; CHECK-NEXT:    adrp x9, :got:c
+; CHECK-NEXT:    ldr x9, [x9, :got_lo12:c]
+; CHECK-NEXT:    ldr w10, [x8]
+; CHECK-NEXT:    ldr w9, [x9]
+; CHECK-NEXT:    cmp w10, w9
+; CHECK-NEXT:    b.ne .LBB14_4
+; CHECK-NEXT:  // %bb.2:
+; CHECK-NEXT:    mov w0, #1 // =0x1
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB14_3: // %lor.lhs.false
+; CHECK-NEXT:    b.hi .LBB14_6
+; CHECK-NEXT:  .LBB14_4: // %land.lhs.true3
+; CHECK-NEXT:    adrp x9, :got:d
+; CHECK-NEXT:    ldr x9, [x9, :got_lo12:d]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    ldr w9, [x9]
+; CHECK-NEXT:    cmp w8, w9
+; CHECK-NEXT:    b.ne .LBB14_6
+; CHECK-NEXT:  // %bb.5:
+; CHECK-NEXT:    mov w0, #1 // =0x1
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB14_6: // %if.end
+; CHECK-NEXT:    mov w0, wzr
+; CHECK-NEXT:    ret
+entry:
+  %0 = load i32, ptr @a, align 4
+  %cmp = icmp ult i32 %0, 5
+  br i1 %cmp, label %land.lhs.true, label %lor.lhs.false
+
+land.lhs.true:                                    ; preds = %entry
+  %1 = load i32, ptr @b, align 4
+  %2 = load i32, ptr @c, align 4
+  %cmp1 = icmp eq i32 %1, %2
+  br i1 %cmp1, label %return, label %land.lhs.true3
+
+lor.lhs.false:                                    ; preds = %entry
+  %cmp2 = icmp ult i32 %0, 6
+  br i1 %cmp2, label %land.lhs.true3, label %if.end
+
+land.lhs.true3:                                   ; preds = %lor.lhs.false, %land.lhs.true
+  %3 = load i32, ptr @b, align 4
+  %4 = load i32, ptr @d, align 4
+  %cmp4 = icmp eq i32 %3, %4
+  br i1 %cmp4, label %return, label %if.end
+
+if.end:                                           ; preds = %land.lhs.true3, %lor.lhs.false
+  br label %return
+
+return:                                           ; preds = %if.end, %land.lhs.true3, %land.lhs.true
+  %retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
+  ret i32 %retval.0
+}
+
+; (a < 5 && b == c) || (a > 5 && b == d)
+define i32 @combine_ult_ugt_5() #0 {
+; CHECK-LABEL: combine_ult_ugt_5:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    adrp x8, :got:a
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:a]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    cmp w8, #5
+; CHECK-NEXT:    b.hs .LBB15_3
+; CHECK-NEXT:  // %bb.1: // %land.lhs.true
+; CHECK-NEXT:    adrp x8, :got:b
+; CHECK-NEXT:    adrp x9, :got:c
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT:    ldr x9, [x9, :got_lo12:c]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    ldr w9, [x9]
+; CHECK-NEXT:    cmp w8, w9
+; CHECK-NEXT:    b.ne .LBB15_6
+; CHECK-NEXT:  // %bb.2:
+; CHECK-NEXT:    mov w0, #1 // =0x1
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB15_3: // %lor.lhs.false
+; CHECK-NEXT:    b.ls .LBB15_6
+; CHECK-NEXT:  // %bb.4: // %land.lhs.true3
+; CHECK-NEXT:    adrp x8, :got:b
+; CHECK-NEXT:    adrp x9, :got:d
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT:    ldr x9, [x9, :got_lo12:d]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    ldr w9, [x9]
+; CHECK-NEXT:    cmp w8, w9
+; CHECK-NEXT:    b.ne .LBB15_6
+; CHECK-NEXT:  // %bb.5:
+; CHECK-NEXT:    mov w0, #1 // =0x1
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB15_6: // %if.end
+; CHECK-NEXT:    mov w0, wzr
+; CHECK-NEXT:    ret
+entry:
+  %0 = load i32, ptr @a, align 4
+  %cmp = icmp ult i32 %0, 5
+  br i1 %cmp, label %land.lhs.true, label %lor.lhs.false
+
+land.lhs.true:                                    ; preds = %entry
+  %1 = load i32, ptr @b, align 4
+  %2 = load i32, ptr @c, align 4
+  %cmp1 = icmp eq i32 %1, %2
+  br i1 %cmp1, label %return, label %if.end
+
+lor.lhs.false:                                    ; preds = %entry
+  %cmp2 = icmp ugt i32 %0, 5
+  br i1 %cmp2, label %land.lhs.true3, label %if.end
+
+land.lhs.true3:                                   ; preds = %lor.lhs.false
+  %3 = load i32, ptr @b, align 4
+  %4 = load i32, ptr @d, align 4
+  %cmp4 = icmp eq i32 %3, %4
+  br i1 %cmp4, label %return, label %if.end
+
+if.end:                                           ; preds = %land.lhs.true3, %lor.lhs.false, %land.lhs.true
+  br label %return
+
+return:                                           ; preds = %if.end, %land.lhs.true3, %land.lhs.true
+  %retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
+  ret i32 %retval.0
+}
+
+; (a > -5 && b == c) || (a < -5 && b == d)
+define i32 @combine_ugt_ult_n5() #0 {
+; CHECK-LABEL: combine_ugt_ult_n5:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    adrp x8, :got:a
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:a]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    cmn w8, #5
+; CHECK-NEXT:    b.ls .LBB16_3
+; CHECK-NEXT:  // %bb.1: // %land.lhs.true
+; CHECK-NEXT:    adrp x8, :got:b
+; CHECK-NEXT:    adrp x9, :got:c
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT:    ldr x9, [x9, :got_lo12:c]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    ldr w9, [x9]
+; CHECK-NEXT:    cmp w8, w9
+; CHECK-NEXT:    b.ne .LBB16_6
+; CHECK-NEXT:  // %bb.2:
+; CHECK-NEXT:    mov w0, #1 // =0x1
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB16_3: // %lor.lhs.false
+; CHECK-NEXT:    b.hs .LBB16_6
+; CHECK-NEXT:  // %bb.4: // %land.lhs.true3
+; CHECK-NEXT:    adrp x8, :got:b
+; CHECK-NEXT:    adrp x9, :got:d
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT:    ldr x9, [x9, :got_lo12:d]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    ldr w9, [x9]
+; CHECK-NEXT:    cmp w8, w9
+; CHECK-NEXT:    b.ne .LBB16_6
+; CHECK-NEXT:  // %bb.5:
+; CHECK-NEXT:    mov w0, #1 // =0x1
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB16_6: // %if.end
+; CHECK-NEXT:    mov w0, wzr
+; CHECK-NEXT:    ret
+entry:
+  %0 = load i32, ptr @a, align 4
+  %cmp = icmp ugt i32 %0, -5
+  br i1 %cmp, label %land.lhs.true, label %lor.lhs.false
+
+land.lhs.true:                                    ; preds = %entry
+  %1 = load i32, ptr @b, align 4
+  %2 = load i32, ptr @c, align 4
+  %cmp1 = icmp eq i32 %1, %2
+  br i1 %cmp1, label %return, label %if.end
+
+lor.lhs.false:                                    ; preds = %entry
+  %cmp2 = icmp ult i32 %0, -5
+  br i1 %cmp2, label %land.lhs.true3, label %if.end
+
+land.lhs.true3:                                   ; preds = %lor.lhs.false
+  %3 = load i32, ptr @b, align 4
+  %4 = load i32, ptr @d, align 4
+  %cmp4 = icmp eq i32 %3, %4
+  br i1 %cmp4, label %return, label %if.end
+
+if.end:                                           ; preds = %land.lhs.true3, %lor.lhs.false, %land.lhs.true
+  br label %return
+
+return:                                           ; preds = %if.end, %land.lhs.true3, %land.lhs.true
+  %retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
+  ret i32 %retval.0
+}
+
+; (a < -5 && b == c) || (a > -5 && b == d)
+define i32 @combine_ult_ugt_n5() #0 {
+; CHECK-LABEL: combine_ult_ugt_n5:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    adrp x8, :got:a
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:a]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    cmn w8, #5
+; CHECK-NEXT:    b.hs .LBB17_3
+; CHECK-NEXT:  // %bb.1: // %land.lhs.true
+; CHECK-NEXT:    adrp x8, :got:b
+; CHECK-NEXT:    adrp x9, :got:c
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT:    ldr x9, [x9, :got_lo12:c]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    ldr w9, [x9]
+; CHECK-NEXT:    cmp w8, w9
+; CHECK-NEXT:    b.ne .LBB17_6
+; CHECK-NEXT:  // %bb.2:
+; CHECK-NEXT:    mov w0, #1 // =0x1
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB17_3: // %lor.lhs.false
+; CHECK-NEXT:    b.ls .LBB17_6
+; CHECK-NEXT:  // %bb.4: // %land.lhs.true3
+; CHECK-NEXT:    adrp x8, :got:b
+; CHECK-NEXT:    adrp x9, :got:d
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT:    ldr x9, [x9, :got_lo12:d]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    ldr w9, [x9]
+; CHECK-NEXT:    cmp w8, w9
+; CHECK-NEXT:    b.ne .LBB17_6
+; CHECK-NEXT:  // %bb.5:
+; CHECK-NEXT:    mov w0, #1 // =0x1
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB17_6: // %if.end
+; CHECK-NEXT:    mov w0, wzr
+; CHECK-NEXT:    ret
+entry:
+  %0 = load i32, ptr @a, align 4
+  %cmp = icmp ult i32 %0, -5
+  br i1 %cmp, label %land.lhs.true, label %lor.lhs.false
+
+land.lhs.true:                                    ; preds = %entry
+  %1 = load i32, ptr @b, align 4
+  %2 = load i32, ptr @c, align 4
+  %cmp1 = icmp eq i32 %1, %2
+  br i1 %cmp1, label %return, label %if.end
+
+lor.lhs.false:                                    ; preds = %entry
+  %cmp2 = icmp ugt i32 %0, -5
+  br i1 %cmp2, label %land.lhs.true3, label %if.end
+
+land.lhs.true3:                                   ; preds = %lor.lhs.false
+  %3 = load i32, ptr @b, align 4
+  %4 = load i32, ptr @d, align 4
+  %cmp4 = icmp eq i32 %3...
[truncated]

@AZero13 AZero13 changed the title [AArch64] Add support for unsigned comparisons [AArch64] Add support for unsigned comparisons in the condition optimizer Jun 16, 2025
@AZero13
Copy link
Contributor Author

AZero13 commented Jun 16, 2025

Future PR: I don't want to have a situation where we jump via bls -> blo -> where we wanna go

Instead we can just do

blo (if bls is first) blo-loc
then:
bls bls-loc because one implies the other.

@AZero13
Copy link
Contributor Author

AZero13 commented Jun 22, 2025

@topperc What do you think of this?

AZero13 added 2 commits June 22, 2025 19:22
We have to be extra careful to not allow unsigned wraps, however. This also required some adjusting of the logic in adjustCmp, as well as compare the true imm value with add or sub taken into effect.

Because SIGNED_MIN and SIGNED_MAX cannot be an immediate, we do not need to worry about those edge cases when dealing with unsigned comparisons.
@AZero13 AZero13 changed the title [AArch64] Add support for unsigned comparisons in the condition optimizer [AArch64] Extend condition optimizer to support unsigned comparisons Jun 23, 2025
@AZero13
Copy link
Contributor Author

AZero13 commented Jun 23, 2025

@davemgreen I tried to make the tests. However, I found it best to just copy paste the original tests, changed signed to unsigned, even variants that mixed both signed and unsigned.

@AZero13
Copy link
Contributor Author

AZero13 commented Jun 23, 2025

So, basically everything was tested. Which is why I made sure to ensure that all cases were covered.

For example: x > 0 || x < -1, does not work when unsigned, so how do I check this? by checking that we switched from adds to subs.

I used a helper function to check between both signed and unsigned variants, and the realheadimm was because it is important to know what we are truly comparing again.

@AZero13 AZero13 requested a review from davemgreen June 23, 2025 23:46
@AZero13
Copy link
Contributor Author

AZero13 commented Jul 9, 2025

@davemgreen ping

@AZero13
Copy link
Contributor Author

AZero13 commented Jul 10, 2025

@davemgreen If you'd like I can explain every change I made and why this works. This was entirely coded from hand, grouping the types of checks and ensuring that the 0 boundary cannot be crossed on unsigned.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we write simpler mir tests for this?

@AZero13
Copy link
Contributor Author

AZero13 commented Jul 11, 2025

Why can't we write simpler mir tests for this?

Because then they will get folded to conditional selects and not branches

@AZero13
Copy link
Contributor Author

AZero13 commented Jul 11, 2025

These are literally just copy pastes of the old tests but modified to work with unsigned and mixed versions @davemgreen

There's nothing that complex about it.

@AZero13 AZero13 requested review from topperc and davemgreen July 11, 2025 14:11
@AZero13
Copy link
Contributor Author

AZero13 commented Jul 11, 2025

These are literally just copy pastes of the old tests but modified to work with unsigned and mixed versions @davemgreen

There's nothing that complex about it.

To prevent the conditions from being condensed to a conditional compare, it needs to be complex to ensure these conditions are implemented as branches in the asm. That is why these are complex.

@davemgreen
Copy link
Collaborator

Hi. Ideally we would be able to come up with a formal proof to ensure we are not missing anything. Like you said some of the cases are difficult to reach though. Something like this is what I was thinking of for mir test, that can be more concise but we can still use for testing. Different variants can be made up.

# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
# RUN: llc -mtriple=aarch64 -o - %s -run-pass=aarch64-condopt | FileCheck %s

---
name:            add_add_hi_hi
tracksRegLiveness: true
body:             |
  ; CHECK-LABEL: name: add_add_hi_hi
  ; CHECK: bb.0:
  ; CHECK-NEXT:   successors: %bb.1(0x40000000), %bb.2(0x40000000)
  ; CHECK-NEXT:   liveins: $w0
  ; CHECK-NEXT: {{  $}}
  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:gpr32common = COPY $w0
  ; CHECK-NEXT:   [[ADDSWri:%[0-9]+]]:gpr32 = ADDSWri [[COPY]], 0, 0, implicit-def $nzcv
  ; CHECK-NEXT:   Bcc 12, %bb.1, implicit $nzcv
  ; CHECK-NEXT:   B %bb.2
  ; CHECK-NEXT: {{  $}}
  ; CHECK-NEXT: bb.1:
  ; CHECK-NEXT:   successors: %bb.2(0x80000000)
  ; CHECK-NEXT:   liveins: $w0
  ; CHECK-NEXT: {{  $}}
  ; CHECK-NEXT:   [[ADDSWri1:%[0-9]+]]:gpr32 = ADDSWri [[COPY]], 1, 0, implicit-def $nzcv
  ; CHECK-NEXT:   Bcc 12, %bb.2, implicit $nzcv
  ; CHECK-NEXT:   B %bb.2
  ; CHECK-NEXT: {{  $}}
  ; CHECK-NEXT: bb.2:
  ; CHECK-NEXT:   liveins: $w0
  ; CHECK-NEXT: {{  $}}
  ; CHECK-NEXT:   [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 1
  ; CHECK-NEXT:   $w0 = COPY [[MOVi32imm]]
  ; CHECK-NEXT:   RET undef $lr, implicit $w0
  bb.0:
    liveins: $w0
    %0:gpr32common = COPY $w0
    %4:gpr32 = ADDSWri %0:gpr32common, 0, 0, implicit-def $nzcv
    Bcc 12, %bb.2, implicit $nzcv
    B %bb.3

  bb.2:
    liveins: $w0
    %5:gpr32 = ADDSWri %0:gpr32common, 1, 0, implicit-def $nzcv
    Bcc 12, %bb.3, implicit $nzcv
    B %bb.3

  bb.3:
    liveins: $w0
    %1:gpr32 = MOVi32imm 1
    $w0 = COPY %1
    RET undef $lr, implicit $w0
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants