-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[NVPTX] Constant fold NVVM fmin and fmax #121966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add constant-folding for nvvm float/double fmin + fmax intrinsics, including all combinations of xorsign.abs, nan-propagation, and ftz.
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-nvptx Author: Lewis Crawford (LewisCrawford) ChangesAdd constant-folding for nvvm float/double fmin + fmax intrinsics, including all combinations of xorsign.abs, nan-propagation, and ftz. Patch is 30.57 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121966.diff 3 Files Affected:
diff --git a/llvm/include/llvm/IR/NVVMIntrinsicUtils.h b/llvm/include/llvm/IR/NVVMIntrinsicUtils.h
index 8ca073ba822534..d533f944f90ff2 100644
--- a/llvm/include/llvm/IR/NVVMIntrinsicUtils.h
+++ b/llvm/include/llvm/IR/NVVMIntrinsicUtils.h
@@ -38,9 +38,8 @@ enum class TMAReductionOp : uint8_t {
XOR = 7,
};
-inline bool IntrinsicShouldFTZ(Intrinsic::ID IntrinsicID) {
+inline bool FloatToIntIntrinsicShouldFTZ(Intrinsic::ID IntrinsicID) {
switch (IntrinsicID) {
- // Float to i32 / i64 conversion intrinsics:
case Intrinsic::nvvm_f2i_rm_ftz:
case Intrinsic::nvvm_f2i_rn_ftz:
case Intrinsic::nvvm_f2i_rp_ftz:
@@ -171,6 +170,54 @@ IntrinsicGetRoundingMode(Intrinsic::ID IntrinsicID) {
return APFloat::roundingMode::Invalid;
}
+inline bool FMinFMaxShouldFTZ(Intrinsic::ID IntrinsicID) {
+ switch (IntrinsicID) {
+ case Intrinsic::nvvm_fmax_ftz_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+
+ case Intrinsic::nvvm_fmin_ftz_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_ftz_xorsign_abs_f:
+ return true;
+ }
+ return false;
+}
+
+inline bool FMinFMaxPropagatesNaNs(Intrinsic::ID IntrinsicID) {
+ switch (IntrinsicID) {
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+
+ case Intrinsic::nvvm_fmin_ftz_nan_f:
+ case Intrinsic::nvvm_fmin_nan_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_nan_xorsign_abs_f:
+ return true;
+ }
+ return false;
+}
+
+inline bool FMinFMaxIsXorSignAbs(Intrinsic::ID IntrinsicID) {
+ switch (IntrinsicID) {
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_xorsign_abs_f:
+
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_xorsign_abs_f:
+ return true;
+ }
+ return false;
+}
+
} // namespace nvvm
} // namespace llvm
#endif // LLVM_IR_NVVMINTRINSICUTILS_H
diff --git a/llvm/lib/Analysis/ConstantFolding.cpp b/llvm/lib/Analysis/ConstantFolding.cpp
index 031d675c330ec4..75150ed97aa7b4 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -1689,6 +1689,28 @@ bool llvm::canConstantFoldCallTo(const CallBase *Call, const Function *F) {
case Intrinsic::x86_avx512_cvttsd2usi64:
return !Call->isStrictFP();
+ // NVVM FMax intrinsics
+ case Intrinsic::nvvm_fmax_d:
+ case Intrinsic::nvvm_fmax_f:
+ case Intrinsic::nvvm_fmax_ftz_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_xorsign_abs_f:
+
+ // NVVM FMin intrinsics
+ case Intrinsic::nvvm_fmin_d:
+ case Intrinsic::nvvm_fmin_f:
+ case Intrinsic::nvvm_fmin_ftz_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_nan_f:
+ case Intrinsic::nvvm_fmin_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_xorsign_abs_f:
+
// NVVM float/double to int32/uint32 conversion intrinsics
case Intrinsic::nvvm_f2i_rm:
case Intrinsic::nvvm_f2i_rn:
@@ -2432,7 +2454,7 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
return ConstantInt::get(Ty, 0);
APFloat::roundingMode RMode = nvvm::IntrinsicGetRoundingMode(IntrinsicID);
- bool IsFTZ = nvvm::IntrinsicShouldFTZ(IntrinsicID);
+ bool IsFTZ = nvvm::FloatToIntIntrinsicShouldFTZ(IntrinsicID);
bool IsSigned = nvvm::IntrinsicConvertsToSignedInteger(IntrinsicID);
APSInt ResInt(Ty->getIntegerBitWidth(), !IsSigned);
@@ -2892,12 +2914,49 @@ static Constant *ConstantFoldIntrinsicCall2(Intrinsic::ID IntrinsicID, Type *Ty,
case Intrinsic::minnum:
case Intrinsic::maximum:
case Intrinsic::minimum:
+ case Intrinsic::nvvm_fmax_d:
+ case Intrinsic::nvvm_fmin_d:
// If one argument is undef, return the other argument.
if (IsOp0Undef)
return Operands[1];
if (IsOp1Undef)
return Operands[0];
break;
+
+ case Intrinsic::nvvm_fmax_f:
+ case Intrinsic::nvvm_fmax_ftz_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_xorsign_abs_f:
+
+ case Intrinsic::nvvm_fmin_f:
+ case Intrinsic::nvvm_fmin_ftz_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_nan_f:
+ case Intrinsic::nvvm_fmin_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_xorsign_abs_f:
+ // If one arg is undef, the other arg can be returned only if it is
+ // constant, as we may need to flush it to sign-preserving zero or
+ // canonicalize the NaN.
+ if (!IsOp0Undef && !IsOp1Undef)
+ break;
+ if (auto *Op = dyn_cast<ConstantFP>(Operands[IsOp0Undef ? 1 : 0])) {
+ if (Op->isNaN()) {
+ APInt NVCanonicalNaN(32, 0x7fffffff);
+ return ConstantFP::get(
+ Ty, APFloat(Ty->getFltSemantics(), NVCanonicalNaN));
+ }
+ if (nvvm::FMinFMaxShouldFTZ(IntrinsicID))
+ return ConstantFP::get(Ty, FTZPreserveSign(Op->getValueAPF()));
+ else
+ return Op;
+ }
+ break;
}
}
@@ -2955,6 +3014,79 @@ static Constant *ConstantFoldIntrinsicCall2(Intrinsic::ID IntrinsicID, Type *Ty,
return ConstantFP::get(Ty->getContext(), minimum(Op1V, Op2V));
case Intrinsic::maximum:
return ConstantFP::get(Ty->getContext(), maximum(Op1V, Op2V));
+
+ case Intrinsic::nvvm_fmax_d:
+ case Intrinsic::nvvm_fmax_f:
+ case Intrinsic::nvvm_fmax_ftz_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_xorsign_abs_f:
+
+ case Intrinsic::nvvm_fmin_d:
+ case Intrinsic::nvvm_fmin_f:
+ case Intrinsic::nvvm_fmin_ftz_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_nan_f:
+ case Intrinsic::nvvm_fmin_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_xorsign_abs_f: {
+
+ bool ShouldCanonicalizeNaNs = IntrinsicID != Intrinsic::nvvm_fmax_d &&
+ IntrinsicID != Intrinsic::nvvm_fmin_d;
+ bool IsFTZ = nvvm::FMinFMaxShouldFTZ(IntrinsicID);
+ bool IsNaNPropagating = nvvm::FMinFMaxPropagatesNaNs(IntrinsicID);
+ bool IsXorSignAbs = nvvm::FMinFMaxIsXorSignAbs(IntrinsicID);
+
+ APFloat A = IsFTZ ? FTZPreserveSign(Op1V) : Op1V;
+ APFloat B = IsFTZ ? FTZPreserveSign(Op2V) : Op2V;
+
+ bool XorSign = false;
+ if (IsXorSignAbs) {
+ XorSign = A.isNegative() ^ B.isNegative();
+ A = abs(A);
+ B = abs(B);
+ }
+
+ bool IsFMax = false;
+ switch (IntrinsicID) {
+ case Intrinsic::nvvm_fmax_d:
+ case Intrinsic::nvvm_fmax_f:
+ case Intrinsic::nvvm_fmax_ftz_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_xorsign_abs_f:
+ IsFMax = true;
+ break;
+ }
+ APFloat Res = IsFMax ? maximum(A, B) : minimum(A, B);
+
+ if (ShouldCanonicalizeNaNs) {
+ APFloat NVCanonicalNaN(Res.getSemantics(), APInt(32, 0x7fffffff));
+ if (A.isNaN() && B.isNaN())
+ return ConstantFP::get(Ty, NVCanonicalNaN);
+ else if (IsNaNPropagating && (A.isNaN() || B.isNaN()))
+ return ConstantFP::get(Ty, NVCanonicalNaN);
+ }
+
+ if (A.isNaN() && B.isNaN())
+ return Operands[1];
+ else if (A.isNaN())
+ Res = B;
+ else if (B.isNaN())
+ Res = A;
+
+ if (IsXorSignAbs && XorSign != Res.isNegative())
+ Res.changeSign();
+
+ return ConstantFP::get(Ty->getContext(), Res);
+ }
}
if (!Ty->isHalfTy() && !Ty->isFloatTy() && !Ty->isDoubleTy())
diff --git a/llvm/test/Transforms/InstSimplify/const-fold-nvvm-fmin-fmax.ll b/llvm/test/Transforms/InstSimplify/const-fold-nvvm-fmin-fmax.ll
new file mode 100644
index 00000000000000..ab277483dbba5a
--- /dev/null
+++ b/llvm/test/Transforms/InstSimplify/const-fold-nvvm-fmin-fmax.ll
@@ -0,0 +1,614 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=instsimplify -march=nvptx64 --mcpu=sm_86 --mattr=+ptx72 -S | FileCheck %s
+
+; Check constant-folding for NVVM fmin fmax intrinsics
+
+;###############################################################
+;# FMax(1.25, -2.0) #
+;###############################################################
+
+define double @test_fmax_1_25_neg_2_d() {
+; CHECK-LABEL: define double @test_fmax_1_25_neg_2_d() {
+; CHECK-NEXT: ret double 1.250000e+00
+;
+ %res = call double @llvm.nvvm.fmax.d(double 1.25, double -2.0)
+ ret double %res
+}
+
+define float @test_fmax_1_25_neg_2_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_f() {
+; CHECK-NEXT: ret float 1.250000e+00
+;
+ %res = call float @llvm.nvvm.fmax.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_ftz_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_ftz_f() {
+; CHECK-NEXT: ret float 1.250000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_ftz_nan_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_ftz_nan_f() {
+; CHECK-NEXT: ret float 1.250000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_ftz_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_ftz_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float -2.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.nan.xorsign.abs.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_ftz_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_ftz_xorsign_abs_f() {
+; CHECK-NEXT: ret float -2.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.xorsign.abs.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_nan_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_nan_f() {
+; CHECK-NEXT: ret float 1.250000e+00
+;
+ %res = call float @llvm.nvvm.fmax.nan.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float -2.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.nan.xorsign.abs.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_xorsign_abs_f() {
+; CHECK-NEXT: ret float -2.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.xorsign.abs.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+;###############################################################
+;# FMax(+Subnormal, NaN) #
+;###############################################################
+
+define double @test_fmax_pos_subnorm_nan_d() {
+; CHECK-LABEL: define double @test_fmax_pos_subnorm_nan_d() {
+; CHECK-NEXT: ret double 0x380FFFFFC0000000
+;
+ %res = call double @llvm.nvvm.fmax.d(double 0x380FFFFFC0000000, double 0x7fff444400000000)
+ ret double %res
+}
+
+define float @test_fmax_pos_subnorm_nan_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_ftz_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_ftz_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_ftz_nan_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_ftz_nan_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_ftz_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_ftz_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.ftz.nan.xorsign.abs.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_ftz_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_ftz_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.xorsign.abs.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_nan_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_nan_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.xorsign.abs.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.xorsign.abs.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+;###############################################################
+;# FMax(subnorm, undef) #
+;###############################################################
+
+define double @test_fmax_subnorm_undef_d() {
+; CHECK-LABEL: define double @test_fmax_subnorm_undef_d() {
+; CHECK-NEXT: ret double 0x380FFFFFC0000000
+;
+ %res = call double @llvm.nvvm.fmax.d(double 0x380FFFFFC0000000, double undef)
+ ret double %res
+}
+
+define float @test_fmax_subnorm_undef_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_ftz_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_ftz_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_ftz_nan_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_ftz_nan_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_ftz_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_ftz_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.nan.xorsign.abs.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_ftz_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_ftz_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.xorsign.abs.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_nan_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_nan_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.xorsign.abs.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.xorsign.abs.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+;###############################################################
+;# FMax(NaN, undef) #
+;###############################################################
+; Ensure we canonicalize the NaNs for f32
+
+define double @test_fmax_nan_undef_d() {
+; CHECK-LABEL: define double @test_fmax_nan_undef_d() {
+; CHECK-NEXT: ret double 0x7FF4444400000000
+;
+ %res = call double @llvm.nvvm.fmax.d(double 0x7ff4444400000000, double undef)
+ ret double %res
+}
+
+define float @test_fmax_nan_undef_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_ftz_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_ftz_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_ftz_nan_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_ftz_nan_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_ftz_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_ftz_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.ftz.nan.xorsign.abs.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_ftz_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_ftz_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.ftz.xorsign.abs.f(float 0x7ffff4ff00000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_nan_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_nan_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.xorsign.abs.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.xorsign.abs.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+;############...
[truncated]
|
@llvm/pr-subscribers-llvm-ir Author: Lewis Crawford (LewisCrawford) ChangesAdd constant-folding for nvvm float/double fmin + fmax intrinsics, including all combinations of xorsign.abs, nan-propagation, and ftz. Patch is 30.57 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121966.diff 3 Files Affected:
diff --git a/llvm/include/llvm/IR/NVVMIntrinsicUtils.h b/llvm/include/llvm/IR/NVVMIntrinsicUtils.h
index 8ca073ba822534..d533f944f90ff2 100644
--- a/llvm/include/llvm/IR/NVVMIntrinsicUtils.h
+++ b/llvm/include/llvm/IR/NVVMIntrinsicUtils.h
@@ -38,9 +38,8 @@ enum class TMAReductionOp : uint8_t {
XOR = 7,
};
-inline bool IntrinsicShouldFTZ(Intrinsic::ID IntrinsicID) {
+inline bool FloatToIntIntrinsicShouldFTZ(Intrinsic::ID IntrinsicID) {
switch (IntrinsicID) {
- // Float to i32 / i64 conversion intrinsics:
case Intrinsic::nvvm_f2i_rm_ftz:
case Intrinsic::nvvm_f2i_rn_ftz:
case Intrinsic::nvvm_f2i_rp_ftz:
@@ -171,6 +170,54 @@ IntrinsicGetRoundingMode(Intrinsic::ID IntrinsicID) {
return APFloat::roundingMode::Invalid;
}
+inline bool FMinFMaxShouldFTZ(Intrinsic::ID IntrinsicID) {
+ switch (IntrinsicID) {
+ case Intrinsic::nvvm_fmax_ftz_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+
+ case Intrinsic::nvvm_fmin_ftz_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_ftz_xorsign_abs_f:
+ return true;
+ }
+ return false;
+}
+
+inline bool FMinFMaxPropagatesNaNs(Intrinsic::ID IntrinsicID) {
+ switch (IntrinsicID) {
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+
+ case Intrinsic::nvvm_fmin_ftz_nan_f:
+ case Intrinsic::nvvm_fmin_nan_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_nan_xorsign_abs_f:
+ return true;
+ }
+ return false;
+}
+
+inline bool FMinFMaxIsXorSignAbs(Intrinsic::ID IntrinsicID) {
+ switch (IntrinsicID) {
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_xorsign_abs_f:
+
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_xorsign_abs_f:
+ return true;
+ }
+ return false;
+}
+
} // namespace nvvm
} // namespace llvm
#endif // LLVM_IR_NVVMINTRINSICUTILS_H
diff --git a/llvm/lib/Analysis/ConstantFolding.cpp b/llvm/lib/Analysis/ConstantFolding.cpp
index 031d675c330ec4..75150ed97aa7b4 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -1689,6 +1689,28 @@ bool llvm::canConstantFoldCallTo(const CallBase *Call, const Function *F) {
case Intrinsic::x86_avx512_cvttsd2usi64:
return !Call->isStrictFP();
+ // NVVM FMax intrinsics
+ case Intrinsic::nvvm_fmax_d:
+ case Intrinsic::nvvm_fmax_f:
+ case Intrinsic::nvvm_fmax_ftz_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_xorsign_abs_f:
+
+ // NVVM FMin intrinsics
+ case Intrinsic::nvvm_fmin_d:
+ case Intrinsic::nvvm_fmin_f:
+ case Intrinsic::nvvm_fmin_ftz_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_nan_f:
+ case Intrinsic::nvvm_fmin_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_xorsign_abs_f:
+
// NVVM float/double to int32/uint32 conversion intrinsics
case Intrinsic::nvvm_f2i_rm:
case Intrinsic::nvvm_f2i_rn:
@@ -2432,7 +2454,7 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
return ConstantInt::get(Ty, 0);
APFloat::roundingMode RMode = nvvm::IntrinsicGetRoundingMode(IntrinsicID);
- bool IsFTZ = nvvm::IntrinsicShouldFTZ(IntrinsicID);
+ bool IsFTZ = nvvm::FloatToIntIntrinsicShouldFTZ(IntrinsicID);
bool IsSigned = nvvm::IntrinsicConvertsToSignedInteger(IntrinsicID);
APSInt ResInt(Ty->getIntegerBitWidth(), !IsSigned);
@@ -2892,12 +2914,49 @@ static Constant *ConstantFoldIntrinsicCall2(Intrinsic::ID IntrinsicID, Type *Ty,
case Intrinsic::minnum:
case Intrinsic::maximum:
case Intrinsic::minimum:
+ case Intrinsic::nvvm_fmax_d:
+ case Intrinsic::nvvm_fmin_d:
// If one argument is undef, return the other argument.
if (IsOp0Undef)
return Operands[1];
if (IsOp1Undef)
return Operands[0];
break;
+
+ case Intrinsic::nvvm_fmax_f:
+ case Intrinsic::nvvm_fmax_ftz_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_xorsign_abs_f:
+
+ case Intrinsic::nvvm_fmin_f:
+ case Intrinsic::nvvm_fmin_ftz_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_nan_f:
+ case Intrinsic::nvvm_fmin_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_xorsign_abs_f:
+ // If one arg is undef, the other arg can be returned only if it is
+ // constant, as we may need to flush it to sign-preserving zero or
+ // canonicalize the NaN.
+ if (!IsOp0Undef && !IsOp1Undef)
+ break;
+ if (auto *Op = dyn_cast<ConstantFP>(Operands[IsOp0Undef ? 1 : 0])) {
+ if (Op->isNaN()) {
+ APInt NVCanonicalNaN(32, 0x7fffffff);
+ return ConstantFP::get(
+ Ty, APFloat(Ty->getFltSemantics(), NVCanonicalNaN));
+ }
+ if (nvvm::FMinFMaxShouldFTZ(IntrinsicID))
+ return ConstantFP::get(Ty, FTZPreserveSign(Op->getValueAPF()));
+ else
+ return Op;
+ }
+ break;
}
}
@@ -2955,6 +3014,79 @@ static Constant *ConstantFoldIntrinsicCall2(Intrinsic::ID IntrinsicID, Type *Ty,
return ConstantFP::get(Ty->getContext(), minimum(Op1V, Op2V));
case Intrinsic::maximum:
return ConstantFP::get(Ty->getContext(), maximum(Op1V, Op2V));
+
+ case Intrinsic::nvvm_fmax_d:
+ case Intrinsic::nvvm_fmax_f:
+ case Intrinsic::nvvm_fmax_ftz_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_xorsign_abs_f:
+
+ case Intrinsic::nvvm_fmin_d:
+ case Intrinsic::nvvm_fmin_f:
+ case Intrinsic::nvvm_fmin_ftz_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_nan_f:
+ case Intrinsic::nvvm_fmin_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_xorsign_abs_f: {
+
+ bool ShouldCanonicalizeNaNs = IntrinsicID != Intrinsic::nvvm_fmax_d &&
+ IntrinsicID != Intrinsic::nvvm_fmin_d;
+ bool IsFTZ = nvvm::FMinFMaxShouldFTZ(IntrinsicID);
+ bool IsNaNPropagating = nvvm::FMinFMaxPropagatesNaNs(IntrinsicID);
+ bool IsXorSignAbs = nvvm::FMinFMaxIsXorSignAbs(IntrinsicID);
+
+ APFloat A = IsFTZ ? FTZPreserveSign(Op1V) : Op1V;
+ APFloat B = IsFTZ ? FTZPreserveSign(Op2V) : Op2V;
+
+ bool XorSign = false;
+ if (IsXorSignAbs) {
+ XorSign = A.isNegative() ^ B.isNegative();
+ A = abs(A);
+ B = abs(B);
+ }
+
+ bool IsFMax = false;
+ switch (IntrinsicID) {
+ case Intrinsic::nvvm_fmax_d:
+ case Intrinsic::nvvm_fmax_f:
+ case Intrinsic::nvvm_fmax_ftz_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_xorsign_abs_f:
+ IsFMax = true;
+ break;
+ }
+ APFloat Res = IsFMax ? maximum(A, B) : minimum(A, B);
+
+ if (ShouldCanonicalizeNaNs) {
+ APFloat NVCanonicalNaN(Res.getSemantics(), APInt(32, 0x7fffffff));
+ if (A.isNaN() && B.isNaN())
+ return ConstantFP::get(Ty, NVCanonicalNaN);
+ else if (IsNaNPropagating && (A.isNaN() || B.isNaN()))
+ return ConstantFP::get(Ty, NVCanonicalNaN);
+ }
+
+ if (A.isNaN() && B.isNaN())
+ return Operands[1];
+ else if (A.isNaN())
+ Res = B;
+ else if (B.isNaN())
+ Res = A;
+
+ if (IsXorSignAbs && XorSign != Res.isNegative())
+ Res.changeSign();
+
+ return ConstantFP::get(Ty->getContext(), Res);
+ }
}
if (!Ty->isHalfTy() && !Ty->isFloatTy() && !Ty->isDoubleTy())
diff --git a/llvm/test/Transforms/InstSimplify/const-fold-nvvm-fmin-fmax.ll b/llvm/test/Transforms/InstSimplify/const-fold-nvvm-fmin-fmax.ll
new file mode 100644
index 00000000000000..ab277483dbba5a
--- /dev/null
+++ b/llvm/test/Transforms/InstSimplify/const-fold-nvvm-fmin-fmax.ll
@@ -0,0 +1,614 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=instsimplify -march=nvptx64 --mcpu=sm_86 --mattr=+ptx72 -S | FileCheck %s
+
+; Check constant-folding for NVVM fmin fmax intrinsics
+
+;###############################################################
+;# FMax(1.25, -2.0) #
+;###############################################################
+
+define double @test_fmax_1_25_neg_2_d() {
+; CHECK-LABEL: define double @test_fmax_1_25_neg_2_d() {
+; CHECK-NEXT: ret double 1.250000e+00
+;
+ %res = call double @llvm.nvvm.fmax.d(double 1.25, double -2.0)
+ ret double %res
+}
+
+define float @test_fmax_1_25_neg_2_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_f() {
+; CHECK-NEXT: ret float 1.250000e+00
+;
+ %res = call float @llvm.nvvm.fmax.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_ftz_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_ftz_f() {
+; CHECK-NEXT: ret float 1.250000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_ftz_nan_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_ftz_nan_f() {
+; CHECK-NEXT: ret float 1.250000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_ftz_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_ftz_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float -2.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.nan.xorsign.abs.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_ftz_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_ftz_xorsign_abs_f() {
+; CHECK-NEXT: ret float -2.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.xorsign.abs.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_nan_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_nan_f() {
+; CHECK-NEXT: ret float 1.250000e+00
+;
+ %res = call float @llvm.nvvm.fmax.nan.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float -2.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.nan.xorsign.abs.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_xorsign_abs_f() {
+; CHECK-NEXT: ret float -2.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.xorsign.abs.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+;###############################################################
+;# FMax(+Subnormal, NaN) #
+;###############################################################
+
+define double @test_fmax_pos_subnorm_nan_d() {
+; CHECK-LABEL: define double @test_fmax_pos_subnorm_nan_d() {
+; CHECK-NEXT: ret double 0x380FFFFFC0000000
+;
+ %res = call double @llvm.nvvm.fmax.d(double 0x380FFFFFC0000000, double 0x7fff444400000000)
+ ret double %res
+}
+
+define float @test_fmax_pos_subnorm_nan_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_ftz_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_ftz_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_ftz_nan_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_ftz_nan_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_ftz_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_ftz_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.ftz.nan.xorsign.abs.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_ftz_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_ftz_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.xorsign.abs.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_nan_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_nan_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.xorsign.abs.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.xorsign.abs.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+;###############################################################
+;# FMax(subnorm, undef) #
+;###############################################################
+
+define double @test_fmax_subnorm_undef_d() {
+; CHECK-LABEL: define double @test_fmax_subnorm_undef_d() {
+; CHECK-NEXT: ret double 0x380FFFFFC0000000
+;
+ %res = call double @llvm.nvvm.fmax.d(double 0x380FFFFFC0000000, double undef)
+ ret double %res
+}
+
+define float @test_fmax_subnorm_undef_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_ftz_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_ftz_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_ftz_nan_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_ftz_nan_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_ftz_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_ftz_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.nan.xorsign.abs.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_ftz_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_ftz_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.xorsign.abs.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_nan_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_nan_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.xorsign.abs.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.xorsign.abs.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+;###############################################################
+;# FMax(NaN, undef) #
+;###############################################################
+; Ensure we canonicalize the NaNs for f32
+
+define double @test_fmax_nan_undef_d() {
+; CHECK-LABEL: define double @test_fmax_nan_undef_d() {
+; CHECK-NEXT: ret double 0x7FF4444400000000
+;
+ %res = call double @llvm.nvvm.fmax.d(double 0x7ff4444400000000, double undef)
+ ret double %res
+}
+
+define float @test_fmax_nan_undef_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_ftz_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_ftz_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_ftz_nan_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_ftz_nan_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_ftz_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_ftz_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.ftz.nan.xorsign.abs.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_ftz_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_ftz_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.ftz.xorsign.abs.f(float 0x7ffff4ff00000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_nan_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_nan_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.xorsign.abs.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.xorsign.abs.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+;############...
[truncated]
|
@llvm/pr-subscribers-llvm-analysis Author: Lewis Crawford (LewisCrawford) ChangesAdd constant-folding for nvvm float/double fmin + fmax intrinsics, including all combinations of xorsign.abs, nan-propagation, and ftz. Patch is 30.57 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121966.diff 3 Files Affected:
diff --git a/llvm/include/llvm/IR/NVVMIntrinsicUtils.h b/llvm/include/llvm/IR/NVVMIntrinsicUtils.h
index 8ca073ba822534..d533f944f90ff2 100644
--- a/llvm/include/llvm/IR/NVVMIntrinsicUtils.h
+++ b/llvm/include/llvm/IR/NVVMIntrinsicUtils.h
@@ -38,9 +38,8 @@ enum class TMAReductionOp : uint8_t {
XOR = 7,
};
-inline bool IntrinsicShouldFTZ(Intrinsic::ID IntrinsicID) {
+inline bool FloatToIntIntrinsicShouldFTZ(Intrinsic::ID IntrinsicID) {
switch (IntrinsicID) {
- // Float to i32 / i64 conversion intrinsics:
case Intrinsic::nvvm_f2i_rm_ftz:
case Intrinsic::nvvm_f2i_rn_ftz:
case Intrinsic::nvvm_f2i_rp_ftz:
@@ -171,6 +170,54 @@ IntrinsicGetRoundingMode(Intrinsic::ID IntrinsicID) {
return APFloat::roundingMode::Invalid;
}
+inline bool FMinFMaxShouldFTZ(Intrinsic::ID IntrinsicID) {
+ switch (IntrinsicID) {
+ case Intrinsic::nvvm_fmax_ftz_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+
+ case Intrinsic::nvvm_fmin_ftz_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_ftz_xorsign_abs_f:
+ return true;
+ }
+ return false;
+}
+
+inline bool FMinFMaxPropagatesNaNs(Intrinsic::ID IntrinsicID) {
+ switch (IntrinsicID) {
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+
+ case Intrinsic::nvvm_fmin_ftz_nan_f:
+ case Intrinsic::nvvm_fmin_nan_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_nan_xorsign_abs_f:
+ return true;
+ }
+ return false;
+}
+
+inline bool FMinFMaxIsXorSignAbs(Intrinsic::ID IntrinsicID) {
+ switch (IntrinsicID) {
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_xorsign_abs_f:
+
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_xorsign_abs_f:
+ return true;
+ }
+ return false;
+}
+
} // namespace nvvm
} // namespace llvm
#endif // LLVM_IR_NVVMINTRINSICUTILS_H
diff --git a/llvm/lib/Analysis/ConstantFolding.cpp b/llvm/lib/Analysis/ConstantFolding.cpp
index 031d675c330ec4..75150ed97aa7b4 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -1689,6 +1689,28 @@ bool llvm::canConstantFoldCallTo(const CallBase *Call, const Function *F) {
case Intrinsic::x86_avx512_cvttsd2usi64:
return !Call->isStrictFP();
+ // NVVM FMax intrinsics
+ case Intrinsic::nvvm_fmax_d:
+ case Intrinsic::nvvm_fmax_f:
+ case Intrinsic::nvvm_fmax_ftz_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_xorsign_abs_f:
+
+ // NVVM FMin intrinsics
+ case Intrinsic::nvvm_fmin_d:
+ case Intrinsic::nvvm_fmin_f:
+ case Intrinsic::nvvm_fmin_ftz_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_nan_f:
+ case Intrinsic::nvvm_fmin_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_xorsign_abs_f:
+
// NVVM float/double to int32/uint32 conversion intrinsics
case Intrinsic::nvvm_f2i_rm:
case Intrinsic::nvvm_f2i_rn:
@@ -2432,7 +2454,7 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
return ConstantInt::get(Ty, 0);
APFloat::roundingMode RMode = nvvm::IntrinsicGetRoundingMode(IntrinsicID);
- bool IsFTZ = nvvm::IntrinsicShouldFTZ(IntrinsicID);
+ bool IsFTZ = nvvm::FloatToIntIntrinsicShouldFTZ(IntrinsicID);
bool IsSigned = nvvm::IntrinsicConvertsToSignedInteger(IntrinsicID);
APSInt ResInt(Ty->getIntegerBitWidth(), !IsSigned);
@@ -2892,12 +2914,49 @@ static Constant *ConstantFoldIntrinsicCall2(Intrinsic::ID IntrinsicID, Type *Ty,
case Intrinsic::minnum:
case Intrinsic::maximum:
case Intrinsic::minimum:
+ case Intrinsic::nvvm_fmax_d:
+ case Intrinsic::nvvm_fmin_d:
// If one argument is undef, return the other argument.
if (IsOp0Undef)
return Operands[1];
if (IsOp1Undef)
return Operands[0];
break;
+
+ case Intrinsic::nvvm_fmax_f:
+ case Intrinsic::nvvm_fmax_ftz_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_xorsign_abs_f:
+
+ case Intrinsic::nvvm_fmin_f:
+ case Intrinsic::nvvm_fmin_ftz_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_nan_f:
+ case Intrinsic::nvvm_fmin_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_xorsign_abs_f:
+ // If one arg is undef, the other arg can be returned only if it is
+ // constant, as we may need to flush it to sign-preserving zero or
+ // canonicalize the NaN.
+ if (!IsOp0Undef && !IsOp1Undef)
+ break;
+ if (auto *Op = dyn_cast<ConstantFP>(Operands[IsOp0Undef ? 1 : 0])) {
+ if (Op->isNaN()) {
+ APInt NVCanonicalNaN(32, 0x7fffffff);
+ return ConstantFP::get(
+ Ty, APFloat(Ty->getFltSemantics(), NVCanonicalNaN));
+ }
+ if (nvvm::FMinFMaxShouldFTZ(IntrinsicID))
+ return ConstantFP::get(Ty, FTZPreserveSign(Op->getValueAPF()));
+ else
+ return Op;
+ }
+ break;
}
}
@@ -2955,6 +3014,79 @@ static Constant *ConstantFoldIntrinsicCall2(Intrinsic::ID IntrinsicID, Type *Ty,
return ConstantFP::get(Ty->getContext(), minimum(Op1V, Op2V));
case Intrinsic::maximum:
return ConstantFP::get(Ty->getContext(), maximum(Op1V, Op2V));
+
+ case Intrinsic::nvvm_fmax_d:
+ case Intrinsic::nvvm_fmax_f:
+ case Intrinsic::nvvm_fmax_ftz_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_xorsign_abs_f:
+
+ case Intrinsic::nvvm_fmin_d:
+ case Intrinsic::nvvm_fmin_f:
+ case Intrinsic::nvvm_fmin_ftz_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_f:
+ case Intrinsic::nvvm_fmin_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_nan_f:
+ case Intrinsic::nvvm_fmin_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmin_xorsign_abs_f: {
+
+ bool ShouldCanonicalizeNaNs = IntrinsicID != Intrinsic::nvvm_fmax_d &&
+ IntrinsicID != Intrinsic::nvvm_fmin_d;
+ bool IsFTZ = nvvm::FMinFMaxShouldFTZ(IntrinsicID);
+ bool IsNaNPropagating = nvvm::FMinFMaxPropagatesNaNs(IntrinsicID);
+ bool IsXorSignAbs = nvvm::FMinFMaxIsXorSignAbs(IntrinsicID);
+
+ APFloat A = IsFTZ ? FTZPreserveSign(Op1V) : Op1V;
+ APFloat B = IsFTZ ? FTZPreserveSign(Op2V) : Op2V;
+
+ bool XorSign = false;
+ if (IsXorSignAbs) {
+ XorSign = A.isNegative() ^ B.isNegative();
+ A = abs(A);
+ B = abs(B);
+ }
+
+ bool IsFMax = false;
+ switch (IntrinsicID) {
+ case Intrinsic::nvvm_fmax_d:
+ case Intrinsic::nvvm_fmax_f:
+ case Intrinsic::nvvm_fmax_ftz_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_f:
+ case Intrinsic::nvvm_fmax_ftz_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_ftz_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_nan_f:
+ case Intrinsic::nvvm_fmax_nan_xorsign_abs_f:
+ case Intrinsic::nvvm_fmax_xorsign_abs_f:
+ IsFMax = true;
+ break;
+ }
+ APFloat Res = IsFMax ? maximum(A, B) : minimum(A, B);
+
+ if (ShouldCanonicalizeNaNs) {
+ APFloat NVCanonicalNaN(Res.getSemantics(), APInt(32, 0x7fffffff));
+ if (A.isNaN() && B.isNaN())
+ return ConstantFP::get(Ty, NVCanonicalNaN);
+ else if (IsNaNPropagating && (A.isNaN() || B.isNaN()))
+ return ConstantFP::get(Ty, NVCanonicalNaN);
+ }
+
+ if (A.isNaN() && B.isNaN())
+ return Operands[1];
+ else if (A.isNaN())
+ Res = B;
+ else if (B.isNaN())
+ Res = A;
+
+ if (IsXorSignAbs && XorSign != Res.isNegative())
+ Res.changeSign();
+
+ return ConstantFP::get(Ty->getContext(), Res);
+ }
}
if (!Ty->isHalfTy() && !Ty->isFloatTy() && !Ty->isDoubleTy())
diff --git a/llvm/test/Transforms/InstSimplify/const-fold-nvvm-fmin-fmax.ll b/llvm/test/Transforms/InstSimplify/const-fold-nvvm-fmin-fmax.ll
new file mode 100644
index 00000000000000..ab277483dbba5a
--- /dev/null
+++ b/llvm/test/Transforms/InstSimplify/const-fold-nvvm-fmin-fmax.ll
@@ -0,0 +1,614 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=instsimplify -march=nvptx64 --mcpu=sm_86 --mattr=+ptx72 -S | FileCheck %s
+
+; Check constant-folding for NVVM fmin fmax intrinsics
+
+;###############################################################
+;# FMax(1.25, -2.0) #
+;###############################################################
+
+define double @test_fmax_1_25_neg_2_d() {
+; CHECK-LABEL: define double @test_fmax_1_25_neg_2_d() {
+; CHECK-NEXT: ret double 1.250000e+00
+;
+ %res = call double @llvm.nvvm.fmax.d(double 1.25, double -2.0)
+ ret double %res
+}
+
+define float @test_fmax_1_25_neg_2_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_f() {
+; CHECK-NEXT: ret float 1.250000e+00
+;
+ %res = call float @llvm.nvvm.fmax.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_ftz_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_ftz_f() {
+; CHECK-NEXT: ret float 1.250000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_ftz_nan_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_ftz_nan_f() {
+; CHECK-NEXT: ret float 1.250000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_ftz_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_ftz_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float -2.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.nan.xorsign.abs.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_ftz_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_ftz_xorsign_abs_f() {
+; CHECK-NEXT: ret float -2.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.xorsign.abs.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_nan_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_nan_f() {
+; CHECK-NEXT: ret float 1.250000e+00
+;
+ %res = call float @llvm.nvvm.fmax.nan.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float -2.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.nan.xorsign.abs.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+define float @test_fmax_1_25_neg_2_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_1_25_neg_2_xorsign_abs_f() {
+; CHECK-NEXT: ret float -2.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.xorsign.abs.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+;###############################################################
+;# FMax(+Subnormal, NaN) #
+;###############################################################
+
+define double @test_fmax_pos_subnorm_nan_d() {
+; CHECK-LABEL: define double @test_fmax_pos_subnorm_nan_d() {
+; CHECK-NEXT: ret double 0x380FFFFFC0000000
+;
+ %res = call double @llvm.nvvm.fmax.d(double 0x380FFFFFC0000000, double 0x7fff444400000000)
+ ret double %res
+}
+
+define float @test_fmax_pos_subnorm_nan_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_ftz_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_ftz_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_ftz_nan_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_ftz_nan_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_ftz_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_ftz_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.ftz.nan.xorsign.abs.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_ftz_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_ftz_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.xorsign.abs.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_nan_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_nan_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.xorsign.abs.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+define float @test_fmax_pos_subnorm_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_pos_subnorm_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.xorsign.abs.f(float 0x380FFFFFC0000000, float 0x7fff444400000000)
+ ret float %res
+}
+
+;###############################################################
+;# FMax(subnorm, undef) #
+;###############################################################
+
+define double @test_fmax_subnorm_undef_d() {
+; CHECK-LABEL: define double @test_fmax_subnorm_undef_d() {
+; CHECK-NEXT: ret double 0x380FFFFFC0000000
+;
+ %res = call double @llvm.nvvm.fmax.d(double 0x380FFFFFC0000000, double undef)
+ ret double %res
+}
+
+define float @test_fmax_subnorm_undef_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_ftz_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_ftz_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_ftz_nan_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_ftz_nan_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_ftz_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_ftz_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.nan.xorsign.abs.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_ftz_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_ftz_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0.000000e+00
+;
+ %res = call float @llvm.nvvm.fmax.ftz.xorsign.abs.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_nan_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_nan_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.xorsign.abs.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_subnorm_undef_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_subnorm_undef_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x380FFFFFC0000000
+;
+ %res = call float @llvm.nvvm.fmax.xorsign.abs.f(float 0x380FFFFFC0000000, float undef)
+ ret float %res
+}
+
+;###############################################################
+;# FMax(NaN, undef) #
+;###############################################################
+; Ensure we canonicalize the NaNs for f32
+
+define double @test_fmax_nan_undef_d() {
+; CHECK-LABEL: define double @test_fmax_nan_undef_d() {
+; CHECK-NEXT: ret double 0x7FF4444400000000
+;
+ %res = call double @llvm.nvvm.fmax.d(double 0x7ff4444400000000, double undef)
+ ret double %res
+}
+
+define float @test_fmax_nan_undef_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_ftz_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_ftz_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_ftz_nan_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_ftz_nan_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.ftz.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_ftz_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_ftz_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.ftz.nan.xorsign.abs.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_ftz_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_ftz_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.ftz.xorsign.abs.f(float 0x7ffff4ff00000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_nan_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_nan_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_nan_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_nan_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.nan.xorsign.abs.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+define float @test_fmax_nan_undef_xorsign_abs_f() {
+; CHECK-LABEL: define float @test_fmax_nan_undef_xorsign_abs_f() {
+; CHECK-NEXT: ret float 0x7FFFFFFFE0000000
+;
+ %res = call float @llvm.nvvm.fmax.xorsign.abs.f(float 0x7fff444400000000, float undef)
+ ret float %res
+}
+
+;############...
[truncated]
|
You can test this locally with the following command:git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 7dd34baf5505d689161c3a8678322a394d7a2929 738f51ac1d6e5f6f8c896155486e76b223d4ad15 llvm/test/Transforms/InstSimplify/const-fold-nvvm-fmin-fmax.ll llvm/include/llvm/IR/NVVMIntrinsicUtils.h llvm/lib/Analysis/ConstantFolding.cpp The following files introduce new uses of undef:
Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields In tests, avoid using For example, this is considered a bad practice: define void @fn() {
...
br i1 undef, ...
} Please use the following instead: define void @fn(i1 %cond) {
...
br i1 %cond, ...
} Please refer to the Undefined Behavior Manual for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall.
@@ -38,9 +38,8 @@ enum class TMAReductionOp : uint8_t { | |||
XOR = 7, | |||
}; | |||
|
|||
inline bool IntrinsicShouldFTZ(Intrinsic::ID IntrinsicID) { | |||
inline bool FloatToIntIntrinsicShouldFTZ(Intrinsic::ID IntrinsicID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. The new name implies that the function is intended for f2i intrinsics only, yet it will still happily accept other intrinsics.
If it is intended to be restricted to particular subset of intrinsics (as opposed to telling us whether a given intrinsic folds to zero), then it should handle all entries in the set, and have an assertion in case an unexpected intrinsic is passed.
I'd just revert to the old name, and, possibly, extend the list of intrinsics, if we're missing any other FTZ variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly in response to @AlexMaclean 's comment on #118965 here: #118965 (comment) , where he suggested it would be better to have helper functions for each class of intrinsic, rather than one big switch statement with 176 unrelated intrinsics to check for rounding modes, and 144 unrelated intrinsics with the FTZ modifier. Here, I'm adding a new FMinFMaxShouldFTZ function, and renaming IntrinsicShouldFTZ to FloatToIntIntrinsicShouldFTZ instead of combining both into a single switch statement.
I've changed these helper functions all to have explicit checks for all entries in the sets now, which means there are roughly double the number of case statements needed in this file.
Hopefully these rather unwieldy case statements can help motivate the use of flag arguments in future intrinsics, such as in #121507
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree with Alex' suggestion, just pointing out that the name of the function should match what it does. Right now it does not.
I've changed these helper functions all to have explicit checks for all entries in the sets now, which means there are roughly double the number of case statements needed in this file.
I would say that those would be there for a reason. they wold also make it possible to add an assertion that we're not using them unintentionally for an intrinsic they were not intended for, which is too easy to do with intrinsic identifiers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you mean now. I've changed it to FPToInteger
now, which hopefully implies a broader range of both floating-point and integer types, rather than just specifically float
and int
.
On a side note, would it make sense to autoupgrade some of the min/max intrinsics to their LLVM counterpart, so LLVM can optimize them? |
Make all the helper functions in NVVMIntrinsicUtils.h explicitly accept all valid intrinsics, and call llvm_unreachable for any unexpected intrinsics. Rename some f2i/d2i helpers to make it clearer their scope is for finite groups of intrinsics, rather than all intrinsics.
I believe the issue with auto-upgrading these (even the ones without ftz/xorsignabs/nan modifiers) is that the semantics are slightly different than the existing
|
Try to avoid the connotation that FloatToInt was restricted to f2i intrinsics, in the hopes that FPToInteger covers a borader range of floating-point types (float + double), and both signed and unsigned integers with both 32 and 64 bits (rather than just specifically C-style floats and ints).
; CHECK-LABEL: define float @test_fmax_1_25_neg_2_ftz_f() { | ||
; CHECK-NEXT: ret float 1.250000e+00 | ||
; | ||
%res = call float @llvm.nvvm.fmax.ftz.f(float 1.25, float -2.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any test cases that demonstrate that FTZ is actually in effect for those intrinsics (I.e. the cases where a non-FTZ variant of the intrinsic would return a different value for the same arguments?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I added the following cases for all the NVVM fmax/fmin intrinsics:
-
FMax(1.25, -2.0)
-
FMax(+Subnormal, NaN)
-
FMax(+Subnormal, undef)
-
FMax(NaN, undef)
-
FMin(1.25, -2.0)
-
FMin(+Subnormal, NaN)
-
FMin(+Subnormal, undef)
-
FMin(NaN, undef)
In the cases with Subnormal, the FTZ version returns 0.0, and the non-FTZ version returns the unmodified subnormal value 0x380FFFFFC0000000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me rephrase. For the tests @test_fmax_1_25_neg_2_f
and @test_fmax_1_25_neg_2_ftz_f
, is there a set of the input values, that would produce different results in those two tests?
Right now the behavior of the ftz/non-ftz variants of the intrinsics in the tests is indistinguishable.
If they can behave differently, the tests should cover that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inputs (1.25, -2.0)
are only there to test that the intrinsics function as expected for regular (non-subnormal) floats, so the FTZ and non-FTZ variants are expected to behave identically for those inputs.
The FTZ behaviour should be covered already by the cases inputs (+Subnormal, NaN)
and (+Subnormal, undef)
.
I've added an additional set of inputs for (+Subnormal, -Subnormal)
now to increase the coverage there even further, and showcase the behaviour of -0.0 < +0.0
when the FTZ modifier makes the subnormals into zeroes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
The inputs (1.25, -2.0) are only there to test that the intrinsics function as expected for regular (non-subnormal) floats, so the FTZ and non-FTZ variants are expected to behave identically for those inputs.
Agreed. It is useful, but it's the corner cases I would like to check. They tests you've already added cover most of the insteresting combinations, but we still seem to be missing the tests for {normal FP, Subnormal}
combinations.
E.g. fmax(0.0, +Subnormal)
. FTZ variant would return 0.0, while non-FTZ would presumably pass through +Subnormal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added cases for fmax(+Subnormal, 0.0)
and fmin(-Subnormal, 0.0)
now too.
The tests for fmax/fmin.ftz.nan were missing the .nan modifier, so were erroneously testing fmax/fmin.ftz twice. This patch adds the missing modifier, and updates the expected values to NaN where the instruction should propagate nan inputs.
Add test cases for: fmax(+Subnormal, 0.0) fmin(-Subnormal, 0.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
Add constant-folding for nvvm float/double fmin + fmax intrinsics, including all combinations of xorsign.abs, nan-propagation, and ftz.