-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[AArch64] Extend condition optimizer to support unsigned comparisons #144380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: AZero13 (AZero13) ChangesWe have to be extra careful to not allow unsigned wraps, however. This also required some adjusting of the logic in adjustCmp, as well as compare the true imm value with add or sub taken into effect. Because SIGNED_MIN and SIGNED_MAX cannot be an immediate, we do not need to worry about those edge cases when dealing with unsigned comparisons. Patch is 29.20 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144380.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ConditionOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64ConditionOptimizer.cpp
index 4c9f8c2723493..230dac417eae0 100644
--- a/llvm/lib/Target/AArch64/AArch64ConditionOptimizer.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ConditionOptimizer.cpp
@@ -231,6 +231,10 @@ static AArch64CC::CondCode getAdjustedCmp(AArch64CC::CondCode Cmp) {
case AArch64CC::GE: return AArch64CC::GT;
case AArch64CC::LT: return AArch64CC::LE;
case AArch64CC::LE: return AArch64CC::LT;
+ case AArch64CC::HI: return AArch64CC::HS;
+ case AArch64CC::HS: return AArch64CC::HI;
+ case AArch64CC::LO: return AArch64CC::LS;
+ case AArch64CC::LS: return AArch64CC::LO;
default:
llvm_unreachable("Unexpected condition code");
}
@@ -238,15 +242,20 @@ static AArch64CC::CondCode getAdjustedCmp(AArch64CC::CondCode Cmp) {
// Transforms GT -> GE, GE -> GT, LT -> LE, LE -> LT by updating comparison
// operator and condition code.
-AArch64ConditionOptimizer::CmpInfo AArch64ConditionOptimizer::adjustCmp(
- MachineInstr *CmpMI, AArch64CC::CondCode Cmp) {
+AArch64ConditionOptimizer::CmpInfo
+AArch64ConditionOptimizer::adjustCmp(MachineInstr *CmpMI,
+ AArch64CC::CondCode Cmp) {
unsigned Opc = CmpMI->getOpcode();
+ unsigned OldOpc = Opc;
+
+ bool isSigned = Cmp == AArch64CC::GT || Cmp == AArch64CC::GE ||
+ Cmp == AArch64CC::LT || Cmp == AArch64CC::LE;
// CMN (compare with negative immediate) is an alias to ADDS (as
// "operand - negative" == "operand + positive")
bool Negative = (Opc == AArch64::ADDSWri || Opc == AArch64::ADDSXri);
- int Correction = (Cmp == AArch64CC::GT) ? 1 : -1;
+ int Correction = (Cmp == AArch64CC::GT || Cmp == AArch64CC::HI) ? 1 : -1;
// Negate Correction value for comparison with negative immediate (CMN).
if (Negative) {
Correction = -Correction;
@@ -255,13 +264,23 @@ AArch64ConditionOptimizer::CmpInfo AArch64ConditionOptimizer::adjustCmp(
const int OldImm = (int)CmpMI->getOperand(2).getImm();
const int NewImm = std::abs(OldImm + Correction);
- // Handle +0 -> -1 and -0 -> +1 (CMN with 0 immediate) transitions by
- // adjusting compare instruction opcode.
- if (OldImm == 0 && ((Negative && Correction == 1) ||
- (!Negative && Correction == -1))) {
+ // Handle cmn 1 -> cmp 0, transitions by adjusting compare instruction opcode.
+ if (OldImm == 1 && Negative && Correction == -1) {
+ // If we are adjusting from -1 to 0, we need to change the opcode.
+ Opc = getComplementOpc(Opc);
+ }
+
+ // Handle +0 -> -1 transitions by adjusting compare instruction opcode.
+ assert((OldImm != 0 || !Negative) && "Should not encounter cmn 0!");
+ if (OldImm == 0 && Correction == -1) {
Opc = getComplementOpc(Opc);
}
+ // If we change opcodes, this means we did an unsigned wrap, so return the old
+ // cmp.
+ if (!isSigned && Opc != OldOpc)
+ return CmpInfo(OldImm, OldOpc, Cmp);
+
return CmpInfo(NewImm, Opc, getAdjustedCmp(Cmp));
}
@@ -323,6 +342,14 @@ bool AArch64ConditionOptimizer::adjustTo(MachineInstr *CmpMI,
return false;
}
+static bool isGreaterThan(AArch64CC::CondCode Cmp) {
+ return Cmp == AArch64CC::GT || Cmp == AArch64CC::HI;
+}
+
+static bool isLessThan(AArch64CC::CondCode Cmp) {
+ return Cmp == AArch64CC::LT || Cmp == AArch64CC::LO;
+}
+
bool AArch64ConditionOptimizer::runOnMachineFunction(MachineFunction &MF) {
LLVM_DEBUG(dbgs() << "********** AArch64 Conditional Compares **********\n"
<< "********** Function: " << MF.getName() << '\n');
@@ -383,6 +410,9 @@ bool AArch64ConditionOptimizer::runOnMachineFunction(MachineFunction &MF) {
const int HeadImm = (int)HeadCmpMI->getOperand(2).getImm();
const int TrueImm = (int)TrueCmpMI->getOperand(2).getImm();
+ int HeadImmTrueValue = HeadImm;
+ int TrueImmTrueValue = TrueImm;
+
LLVM_DEBUG(dbgs() << "Head branch:\n");
LLVM_DEBUG(dbgs() << "\tcondition: " << AArch64CC::getCondCodeName(HeadCmp)
<< '\n');
@@ -393,9 +423,17 @@ bool AArch64ConditionOptimizer::runOnMachineFunction(MachineFunction &MF) {
<< '\n');
LLVM_DEBUG(dbgs() << "\timmediate: " << TrueImm << '\n');
- if (((HeadCmp == AArch64CC::GT && TrueCmp == AArch64CC::LT) ||
- (HeadCmp == AArch64CC::LT && TrueCmp == AArch64CC::GT)) &&
- std::abs(TrueImm - HeadImm) == 2) {
+ unsigned Opc = HeadCmpMI->getOpcode();
+ if (Opc == AArch64::ADDSWri || Opc == AArch64::ADDSXri)
+ HeadImmTrueValue = -HeadImmTrueValue;
+
+ Opc = TrueCmpMI->getOpcode();
+ if (Opc == AArch64::ADDSWri || Opc == AArch64::ADDSXri)
+ TrueImmTrueValue = -TrueImmTrueValue;
+
+ if (((isGreaterThan(HeadCmp) && isLessThan(TrueCmp)) ||
+ (isLessThan(HeadCmp) && isGreaterThan(TrueCmp))) &&
+ std::abs(TrueImmTrueValue - HeadImmTrueValue) == 2) {
// This branch transforms machine instructions that correspond to
//
// 1) (a > {TrueImm} && ...) || (a < {HeadImm} && ...)
@@ -414,9 +452,9 @@ bool AArch64ConditionOptimizer::runOnMachineFunction(MachineFunction &MF) {
modifyCmp(TrueCmpMI, TrueCmpInfo);
Changed = true;
}
- } else if (((HeadCmp == AArch64CC::GT && TrueCmp == AArch64CC::GT) ||
- (HeadCmp == AArch64CC::LT && TrueCmp == AArch64CC::LT)) &&
- std::abs(TrueImm - HeadImm) == 1) {
+ } else if (((isGreaterThan(HeadCmp) && isGreaterThan(TrueCmp)) ||
+ (isLessThan(HeadCmp) && isLessThan(TrueCmp))) &&
+ std::abs(TrueImmTrueValue - HeadImmTrueValue) == 1) {
// This branch transforms machine instructions that correspond to
//
// 1) (a > {TrueImm} && ...) || (a > {HeadImm} && ...)
@@ -429,9 +467,9 @@ bool AArch64ConditionOptimizer::runOnMachineFunction(MachineFunction &MF) {
// GT -> GE transformation increases immediate value, so picking the
// smaller one; LT -> LE decreases immediate value so invert the choice.
- bool adjustHeadCond = (HeadImm < TrueImm);
- if (HeadCmp == AArch64CC::LT) {
- adjustHeadCond = !adjustHeadCond;
+ bool adjustHeadCond = (HeadImmTrueValue < TrueImmTrueValue);
+ if (isLessThan(HeadCmp)) {
+ adjustHeadCond = !adjustHeadCond;
}
if (adjustHeadCond) {
diff --git a/llvm/test/CodeGen/AArch64/combine-comparisons-by-cse.ll b/llvm/test/CodeGen/AArch64/combine-comparisons-by-cse.ll
index 6449c3e11d667..1a8e5bc249b26 100644
--- a/llvm/test/CodeGen/AArch64/combine-comparisons-by-cse.ll
+++ b/llvm/test/CodeGen/AArch64/combine-comparisons-by-cse.ll
@@ -845,6 +845,668 @@ return: ; preds = %if.end, %land.lhs.t
ret i32 %retval.0
}
+; (a > 10 && b == c) || (a >= 10 && b == d)
+define i32 @combine_ugt_uge_10() #0 {
+; CHECK-LABEL: combine_ugt_uge_10:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: adrp x8, :got:a
+; CHECK-NEXT: ldr x8, [x8, :got_lo12:a]
+; CHECK-NEXT: ldr w8, [x8]
+; CHECK-NEXT: cmp w8, #10
+; CHECK-NEXT: adrp x8, :got:b
+; CHECK-NEXT: ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT: b.ls .LBB12_3
+; CHECK-NEXT: // %bb.1: // %land.lhs.true
+; CHECK-NEXT: adrp x9, :got:c
+; CHECK-NEXT: ldr x9, [x9, :got_lo12:c]
+; CHECK-NEXT: ldr w10, [x8]
+; CHECK-NEXT: ldr w9, [x9]
+; CHECK-NEXT: cmp w10, w9
+; CHECK-NEXT: b.ne .LBB12_4
+; CHECK-NEXT: // %bb.2:
+; CHECK-NEXT: mov w0, #1 // =0x1
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB12_3: // %lor.lhs.false
+; CHECK-NEXT: b.lo .LBB12_6
+; CHECK-NEXT: .LBB12_4: // %land.lhs.true3
+; CHECK-NEXT: adrp x9, :got:d
+; CHECK-NEXT: ldr x9, [x9, :got_lo12:d]
+; CHECK-NEXT: ldr w8, [x8]
+; CHECK-NEXT: ldr w9, [x9]
+; CHECK-NEXT: cmp w8, w9
+; CHECK-NEXT: b.ne .LBB12_6
+; CHECK-NEXT: // %bb.5:
+; CHECK-NEXT: mov w0, #1 // =0x1
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB12_6: // %if.end
+; CHECK-NEXT: mov w0, wzr
+; CHECK-NEXT: ret
+entry:
+ %0 = load i32, ptr @a, align 4
+ %cmp = icmp ugt i32 %0, 10
+ br i1 %cmp, label %land.lhs.true, label %lor.lhs.false
+
+land.lhs.true: ; preds = %entry
+ %1 = load i32, ptr @b, align 4
+ %2 = load i32, ptr @c, align 4
+ %cmp1 = icmp eq i32 %1, %2
+ br i1 %cmp1, label %return, label %land.lhs.true3
+
+lor.lhs.false: ; preds = %entry
+ %cmp2 = icmp ugt i32 %0, 9
+ br i1 %cmp2, label %land.lhs.true3, label %if.end
+
+land.lhs.true3: ; preds = %lor.lhs.false, %land.lhs.true
+ %3 = load i32, ptr @b, align 4
+ %4 = load i32, ptr @d, align 4
+ %cmp4 = icmp eq i32 %3, %4
+ br i1 %cmp4, label %return, label %if.end
+
+if.end: ; preds = %land.lhs.true3, %lor.lhs.false
+ br label %return
+
+return: ; preds = %if.end, %land.lhs.true3, %land.lhs.true
+ %retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
+ ret i32 %retval.0
+}
+
+; (a > 5 && b == c) || (a < 5 && b == d)
+define i32 @combine_ugt_ult_5() #0 {
+; CHECK-LABEL: combine_ugt_ult_5:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: adrp x8, :got:a
+; CHECK-NEXT: ldr x8, [x8, :got_lo12:a]
+; CHECK-NEXT: ldr w8, [x8]
+; CHECK-NEXT: cmp w8, #5
+; CHECK-NEXT: b.ls .LBB13_3
+; CHECK-NEXT: // %bb.1: // %land.lhs.true
+; CHECK-NEXT: adrp x8, :got:b
+; CHECK-NEXT: adrp x9, :got:c
+; CHECK-NEXT: ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT: ldr x9, [x9, :got_lo12:c]
+; CHECK-NEXT: ldr w8, [x8]
+; CHECK-NEXT: ldr w9, [x9]
+; CHECK-NEXT: cmp w8, w9
+; CHECK-NEXT: b.ne .LBB13_6
+; CHECK-NEXT: // %bb.2:
+; CHECK-NEXT: mov w0, #1 // =0x1
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB13_3: // %lor.lhs.false
+; CHECK-NEXT: b.hs .LBB13_6
+; CHECK-NEXT: // %bb.4: // %land.lhs.true3
+; CHECK-NEXT: adrp x8, :got:b
+; CHECK-NEXT: adrp x9, :got:d
+; CHECK-NEXT: ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT: ldr x9, [x9, :got_lo12:d]
+; CHECK-NEXT: ldr w8, [x8]
+; CHECK-NEXT: ldr w9, [x9]
+; CHECK-NEXT: cmp w8, w9
+; CHECK-NEXT: b.ne .LBB13_6
+; CHECK-NEXT: // %bb.5:
+; CHECK-NEXT: mov w0, #1 // =0x1
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB13_6: // %if.end
+; CHECK-NEXT: mov w0, wzr
+; CHECK-NEXT: ret
+entry:
+ %0 = load i32, ptr @a, align 4
+ %cmp = icmp ugt i32 %0, 5
+ br i1 %cmp, label %land.lhs.true, label %lor.lhs.false
+
+land.lhs.true: ; preds = %entry
+ %1 = load i32, ptr @b, align 4
+ %2 = load i32, ptr @c, align 4
+ %cmp1 = icmp eq i32 %1, %2
+ br i1 %cmp1, label %return, label %if.end
+
+lor.lhs.false: ; preds = %entry
+ %cmp2 = icmp ult i32 %0, 5
+ br i1 %cmp2, label %land.lhs.true3, label %if.end
+
+land.lhs.true3: ; preds = %lor.lhs.false
+ %3 = load i32, ptr @b, align 4
+ %4 = load i32, ptr @d, align 4
+ %cmp4 = icmp eq i32 %3, %4
+ br i1 %cmp4, label %return, label %if.end
+
+if.end: ; preds = %land.lhs.true3, %lor.lhs.false, %land.lhs.true
+ br label %return
+
+return: ; preds = %if.end, %land.lhs.true3, %land.lhs.true
+ %retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
+ ret i32 %retval.0
+}
+
+; (a < 5 && b == c) || (a <= 5 && b == d)
+define i32 @combine_ult_uge_5() #0 {
+; CHECK-LABEL: combine_ult_uge_5:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: adrp x8, :got:a
+; CHECK-NEXT: ldr x8, [x8, :got_lo12:a]
+; CHECK-NEXT: ldr w8, [x8]
+; CHECK-NEXT: cmp w8, #5
+; CHECK-NEXT: adrp x8, :got:b
+; CHECK-NEXT: ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT: b.hs .LBB14_3
+; CHECK-NEXT: // %bb.1: // %land.lhs.true
+; CHECK-NEXT: adrp x9, :got:c
+; CHECK-NEXT: ldr x9, [x9, :got_lo12:c]
+; CHECK-NEXT: ldr w10, [x8]
+; CHECK-NEXT: ldr w9, [x9]
+; CHECK-NEXT: cmp w10, w9
+; CHECK-NEXT: b.ne .LBB14_4
+; CHECK-NEXT: // %bb.2:
+; CHECK-NEXT: mov w0, #1 // =0x1
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB14_3: // %lor.lhs.false
+; CHECK-NEXT: b.hi .LBB14_6
+; CHECK-NEXT: .LBB14_4: // %land.lhs.true3
+; CHECK-NEXT: adrp x9, :got:d
+; CHECK-NEXT: ldr x9, [x9, :got_lo12:d]
+; CHECK-NEXT: ldr w8, [x8]
+; CHECK-NEXT: ldr w9, [x9]
+; CHECK-NEXT: cmp w8, w9
+; CHECK-NEXT: b.ne .LBB14_6
+; CHECK-NEXT: // %bb.5:
+; CHECK-NEXT: mov w0, #1 // =0x1
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB14_6: // %if.end
+; CHECK-NEXT: mov w0, wzr
+; CHECK-NEXT: ret
+entry:
+ %0 = load i32, ptr @a, align 4
+ %cmp = icmp ult i32 %0, 5
+ br i1 %cmp, label %land.lhs.true, label %lor.lhs.false
+
+land.lhs.true: ; preds = %entry
+ %1 = load i32, ptr @b, align 4
+ %2 = load i32, ptr @c, align 4
+ %cmp1 = icmp eq i32 %1, %2
+ br i1 %cmp1, label %return, label %land.lhs.true3
+
+lor.lhs.false: ; preds = %entry
+ %cmp2 = icmp ult i32 %0, 6
+ br i1 %cmp2, label %land.lhs.true3, label %if.end
+
+land.lhs.true3: ; preds = %lor.lhs.false, %land.lhs.true
+ %3 = load i32, ptr @b, align 4
+ %4 = load i32, ptr @d, align 4
+ %cmp4 = icmp eq i32 %3, %4
+ br i1 %cmp4, label %return, label %if.end
+
+if.end: ; preds = %land.lhs.true3, %lor.lhs.false
+ br label %return
+
+return: ; preds = %if.end, %land.lhs.true3, %land.lhs.true
+ %retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
+ ret i32 %retval.0
+}
+
+; (a < 5 && b == c) || (a > 5 && b == d)
+define i32 @combine_ult_ugt_5() #0 {
+; CHECK-LABEL: combine_ult_ugt_5:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: adrp x8, :got:a
+; CHECK-NEXT: ldr x8, [x8, :got_lo12:a]
+; CHECK-NEXT: ldr w8, [x8]
+; CHECK-NEXT: cmp w8, #5
+; CHECK-NEXT: b.hs .LBB15_3
+; CHECK-NEXT: // %bb.1: // %land.lhs.true
+; CHECK-NEXT: adrp x8, :got:b
+; CHECK-NEXT: adrp x9, :got:c
+; CHECK-NEXT: ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT: ldr x9, [x9, :got_lo12:c]
+; CHECK-NEXT: ldr w8, [x8]
+; CHECK-NEXT: ldr w9, [x9]
+; CHECK-NEXT: cmp w8, w9
+; CHECK-NEXT: b.ne .LBB15_6
+; CHECK-NEXT: // %bb.2:
+; CHECK-NEXT: mov w0, #1 // =0x1
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB15_3: // %lor.lhs.false
+; CHECK-NEXT: b.ls .LBB15_6
+; CHECK-NEXT: // %bb.4: // %land.lhs.true3
+; CHECK-NEXT: adrp x8, :got:b
+; CHECK-NEXT: adrp x9, :got:d
+; CHECK-NEXT: ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT: ldr x9, [x9, :got_lo12:d]
+; CHECK-NEXT: ldr w8, [x8]
+; CHECK-NEXT: ldr w9, [x9]
+; CHECK-NEXT: cmp w8, w9
+; CHECK-NEXT: b.ne .LBB15_6
+; CHECK-NEXT: // %bb.5:
+; CHECK-NEXT: mov w0, #1 // =0x1
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB15_6: // %if.end
+; CHECK-NEXT: mov w0, wzr
+; CHECK-NEXT: ret
+entry:
+ %0 = load i32, ptr @a, align 4
+ %cmp = icmp ult i32 %0, 5
+ br i1 %cmp, label %land.lhs.true, label %lor.lhs.false
+
+land.lhs.true: ; preds = %entry
+ %1 = load i32, ptr @b, align 4
+ %2 = load i32, ptr @c, align 4
+ %cmp1 = icmp eq i32 %1, %2
+ br i1 %cmp1, label %return, label %if.end
+
+lor.lhs.false: ; preds = %entry
+ %cmp2 = icmp ugt i32 %0, 5
+ br i1 %cmp2, label %land.lhs.true3, label %if.end
+
+land.lhs.true3: ; preds = %lor.lhs.false
+ %3 = load i32, ptr @b, align 4
+ %4 = load i32, ptr @d, align 4
+ %cmp4 = icmp eq i32 %3, %4
+ br i1 %cmp4, label %return, label %if.end
+
+if.end: ; preds = %land.lhs.true3, %lor.lhs.false, %land.lhs.true
+ br label %return
+
+return: ; preds = %if.end, %land.lhs.true3, %land.lhs.true
+ %retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
+ ret i32 %retval.0
+}
+
+; (a > -5 && b == c) || (a < -5 && b == d)
+define i32 @combine_ugt_ult_n5() #0 {
+; CHECK-LABEL: combine_ugt_ult_n5:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: adrp x8, :got:a
+; CHECK-NEXT: ldr x8, [x8, :got_lo12:a]
+; CHECK-NEXT: ldr w8, [x8]
+; CHECK-NEXT: cmn w8, #5
+; CHECK-NEXT: b.ls .LBB16_3
+; CHECK-NEXT: // %bb.1: // %land.lhs.true
+; CHECK-NEXT: adrp x8, :got:b
+; CHECK-NEXT: adrp x9, :got:c
+; CHECK-NEXT: ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT: ldr x9, [x9, :got_lo12:c]
+; CHECK-NEXT: ldr w8, [x8]
+; CHECK-NEXT: ldr w9, [x9]
+; CHECK-NEXT: cmp w8, w9
+; CHECK-NEXT: b.ne .LBB16_6
+; CHECK-NEXT: // %bb.2:
+; CHECK-NEXT: mov w0, #1 // =0x1
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB16_3: // %lor.lhs.false
+; CHECK-NEXT: b.hs .LBB16_6
+; CHECK-NEXT: // %bb.4: // %land.lhs.true3
+; CHECK-NEXT: adrp x8, :got:b
+; CHECK-NEXT: adrp x9, :got:d
+; CHECK-NEXT: ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT: ldr x9, [x9, :got_lo12:d]
+; CHECK-NEXT: ldr w8, [x8]
+; CHECK-NEXT: ldr w9, [x9]
+; CHECK-NEXT: cmp w8, w9
+; CHECK-NEXT: b.ne .LBB16_6
+; CHECK-NEXT: // %bb.5:
+; CHECK-NEXT: mov w0, #1 // =0x1
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB16_6: // %if.end
+; CHECK-NEXT: mov w0, wzr
+; CHECK-NEXT: ret
+entry:
+ %0 = load i32, ptr @a, align 4
+ %cmp = icmp ugt i32 %0, -5
+ br i1 %cmp, label %land.lhs.true, label %lor.lhs.false
+
+land.lhs.true: ; preds = %entry
+ %1 = load i32, ptr @b, align 4
+ %2 = load i32, ptr @c, align 4
+ %cmp1 = icmp eq i32 %1, %2
+ br i1 %cmp1, label %return, label %if.end
+
+lor.lhs.false: ; preds = %entry
+ %cmp2 = icmp ult i32 %0, -5
+ br i1 %cmp2, label %land.lhs.true3, label %if.end
+
+land.lhs.true3: ; preds = %lor.lhs.false
+ %3 = load i32, ptr @b, align 4
+ %4 = load i32, ptr @d, align 4
+ %cmp4 = icmp eq i32 %3, %4
+ br i1 %cmp4, label %return, label %if.end
+
+if.end: ; preds = %land.lhs.true3, %lor.lhs.false, %land.lhs.true
+ br label %return
+
+return: ; preds = %if.end, %land.lhs.true3, %land.lhs.true
+ %retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
+ ret i32 %retval.0
+}
+
+; (a < -5 && b == c) || (a > -5 && b == d)
+define i32 @combine_ult_ugt_n5() #0 {
+; CHECK-LABEL: combine_ult_ugt_n5:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: adrp x8, :got:a
+; CHECK-NEXT: ldr x8, [x8, :got_lo12:a]
+; CHECK-NEXT: ldr w8, [x8]
+; CHECK-NEXT: cmn w8, #5
+; CHECK-NEXT: b.hs .LBB17_3
+; CHECK-NEXT: // %bb.1: // %land.lhs.true
+; CHECK-NEXT: adrp x8, :got:b
+; CHECK-NEXT: adrp x9, :got:c
+; CHECK-NEXT: ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT: ldr x9, [x9, :got_lo12:c]
+; CHECK-NEXT: ldr w8, [x8]
+; CHECK-NEXT: ldr w9, [x9]
+; CHECK-NEXT: cmp w8, w9
+; CHECK-NEXT: b.ne .LBB17_6
+; CHECK-NEXT: // %bb.2:
+; CHECK-NEXT: mov w0, #1 // =0x1
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB17_3: // %lor.lhs.false
+; CHECK-NEXT: b.ls .LBB17_6
+; CHECK-NEXT: // %bb.4: // %land.lhs.true3
+; CHECK-NEXT: adrp x8, :got:b
+; CHECK-NEXT: adrp x9, :got:d
+; CHECK-NEXT: ldr x8, [x8, :got_lo12:b]
+; CHECK-NEXT: ldr x9, [x9, :got_lo12:d]
+; CHECK-NEXT: ldr w8, [x8]
+; CHECK-NEXT: ldr w9, [x9]
+; CHECK-NEXT: cmp w8, w9
+; CHECK-NEXT: b.ne .LBB17_6
+; CHECK-NEXT: // %bb.5:
+; CHECK-NEXT: mov w0, #1 // =0x1
+; CHECK-NEXT: ret
+; CHECK-NEXT: .LBB17_6: // %if.end
+; CHECK-NEXT: mov w0, wzr
+; CHECK-NEXT: ret
+entry:
+ %0 = load i32, ptr @a, align 4
+ %cmp = icmp ult i32 %0, -5
+ br i1 %cmp, label %land.lhs.true, label %lor.lhs.false
+
+land.lhs.true: ; preds = %entry
+ %1 = load i32, ptr @b, align 4
+ %2 = load i32, ptr @c, align 4
+ %cmp1 = icmp eq i32 %1, %2
+ br i1 %cmp1, label %return, label %if.end
+
+lor.lhs.false: ; preds = %entry
+ %cmp2 = icmp ugt i32 %0, -5
+ br i1 %cmp2, label %land.lhs.true3, label %if.end
+
+land.lhs.true3: ; preds = %lor.lhs.false
+ %3 = load i32, ptr @b, align 4
+ %4 = load i32, ptr @d, align 4
+ %cmp4 = icmp eq i32 %3...
[truncated]
|
Future PR: I don't want to have a situation where we jump via bls -> blo -> where we wanna go Instead we can just do blo (if bls is first) blo-loc |
@topperc What do you think of this? |
We have to be extra careful to not allow unsigned wraps, however. This also required some adjusting of the logic in adjustCmp, as well as compare the true imm value with add or sub taken into effect. Because SIGNED_MIN and SIGNED_MAX cannot be an immediate, we do not need to worry about those edge cases when dealing with unsigned comparisons.
@davemgreen I tried to make the tests. However, I found it best to just copy paste the original tests, changed signed to unsigned, even variants that mixed both signed and unsigned. |
So, basically everything was tested. Which is why I made sure to ensure that all cases were covered. For example: x > 0 || x < -1, does not work when unsigned, so how do I check this? by checking that we switched from adds to subs. I used a helper function to check between both signed and unsigned variants, and the realheadimm was because it is important to know what we are truly comparing again. |
@davemgreen ping |
@davemgreen If you'd like I can explain every change I made and why this works. This was entirely coded from hand, grouping the types of checks and ensuring that the 0 boundary cannot be crossed on unsigned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we write simpler mir tests for this?
Because then they will get folded to conditional selects and not branches |
These are literally just copy pastes of the old tests but modified to work with unsigned and mixed versions @davemgreen There's nothing that complex about it. |
To prevent the conditions from being condensed to a conditional compare, it needs to be complex to ensure these conditions are implemented as branches in the asm. That is why these are complex. |
Hi. Ideally we would be able to come up with a formal proof to ensure we are not missing anything. Like you said some of the cases are difficult to reach though. Something like this is what I was thinking of for mir test, that can be more concise but we can still use for testing. Different variants can be made up.
|
We have to be extra careful to not allow unsigned wraps, however. This also required some adjusting of the logic in adjustCmp, as well as compare the true imm value with add or sub taken into effect.
Because SIGNED_MIN and SIGNED_MAX cannot be an immediate, we do not need to worry about those edge cases when dealing with unsigned comparisons.