[Flang] Add -ffast-real-mod and direct code for MOD on REAL types #160660

mjklemm · 2025-09-25T08:22:05Z

This patch adds direct code-gen support for a faster MOD intrinsic for REAL types. Flang has maintained and keeps maintaining a high-precision implementation of the MOD intrinsic as part of the Fortran runtime. With the -ffast-real-mod flag, users can opt to avoid calling into the Fortran runtime, but instead trigger code-gen that produces faster code by avoiding the runtime call, at the expense of potentially risking bit cancelation by having the compiler use the MOD formula a specified by ISO Fortran.

llvmbot · 2025-09-25T08:22:36Z

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-flang-fir-hlfir

Author: Michael Klemm (mjklemm)

Changes

This patch adds direct code-gen support for a faster MOD intrinsic for REAL types. Flang has maintained and keeps maintaining a high-precision implementation of the MOD intrinsic as part of the Fortran runtime. With the -ffast-real-mod flag, users can opt to avoid calling into the Fortran runtime, but instead trigger code-gen that produces faster code by avoiding the runtime call, at the expense of potentially risking bit cancelation by having the compiler use the MOD formula a specified by ISO Fortran.

Full diff: https://github.com/llvm/llvm-project/pull/160660.diff

7 Files Affected:

(modified) clang/include/clang/Driver/Options.td (+1)
(modified) clang/lib/Driver/ToolChains/Flang.cpp (+3)
(modified) flang/include/flang/Support/LangOptions.def (+2-1)
(modified) flang/lib/Frontend/CompilerInvocation.cpp (+4)
(modified) flang/lib/Frontend/FrontendActions.cpp (+8)
(modified) flang/lib/Optimizer/Builder/IntrinsicCall.cpp (+34-3)
(added) flang/test/Lower/Intrinsics/fast-real-mod.f90 (+75)

diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index a7c514e809aa9..4dc4acd5603cb 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -2750,6 +2750,7 @@ def fno_unsafe_math_optimizations : Flag<["-"], "fno-unsafe-math-optimizations">
   Group<f_Group>;
 def fassociative_math : Flag<["-"], "fassociative-math">, Visibility<[ClangOption, FlangOption]>, Group<f_Group>;
 def fno_associative_math : Flag<["-"], "fno-associative-math">, Visibility<[ClangOption, FlangOption]>, Group<f_Group>;
+def ffast_real_mod : Flag<["-"], "ffast-real-mod">, Visibility<[FlangOption, FC1Option]>, Group<f_Group>;
 defm reciprocal_math : BoolFOption<"reciprocal-math",
   LangOpts<"AllowRecip">, DefaultFalse,
   PosFlag<SetTrue, [], [ClangOption, CC1Option, FC1Option, FlangOption],
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp
index 1535f4cebf436..fbaa083d204b8 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -766,6 +766,9 @@ static void addFloatingPointOptions(const Driver &D, const ArgList &Args,
 
   if (ReciprocalMath)
     CmdArgs.push_back("-freciprocal-math");
+
+  if (Args.hasArg(options::OPT_ffast_real_mod))
+    CmdArgs.push_back("-ffast-real-mod");
 }
 
 static void renderRemarksOptions(const ArgList &Args, ArgStringList &CmdArgs,
diff --git a/flang/include/flang/Support/LangOptions.def b/flang/include/flang/Support/LangOptions.def
index ba72d7b4b7212..e310ecf37a52d 100644
--- a/flang/include/flang/Support/LangOptions.def
+++ b/flang/include/flang/Support/LangOptions.def
@@ -60,7 +60,8 @@ LANGOPT(OpenMPNoThreadState, 1, 0)
 LANGOPT(OpenMPNoNestedParallelism, 1, 0)
 /// Use SIMD only OpenMP support.
 LANGOPT(OpenMPSimd, 1, false)
-
+/// Enable fast MOD operations for REAL
+LANGOPT(FastRealMod, 1, false)
 LANGOPT(VScaleMin, 32, 0)  ///< Minimum vscale range value
 LANGOPT(VScaleMax, 32, 0)  ///< Maximum vscale range value
 
diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp
index 6295a58b1bdad..5b3f64971013e 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -1424,6 +1424,10 @@ static bool parseFloatingPointArgs(CompilerInvocation &invoc,
     opts.setFPContractMode(Fortran::common::LangOptions::FPM_Fast);
   }
 
+  if (args.hasArg(clang::driver::options::OPT_ffast_real_mod)) {
+    opts.FastRealMod = true;
+  }
+
   return true;
 }
 
diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp
index 3bef6b1c31825..d22124bc0bdeb 100644
--- a/flang/lib/Frontend/FrontendActions.cpp
+++ b/flang/lib/Frontend/FrontendActions.cpp
@@ -277,6 +277,14 @@ bool CodeGenAction::beginSourceFileAction() {
                               ci.getInvocation().getLangOpts().OpenMPVersion);
   }
 
+  if (ci.getInvocation().getLangOpts().FastRealMod) {
+    auto mod = lb.getModule();
+    mod.getOperation()->setAttr(
+        mlir::StringAttr::get(mod.getContext(),
+                              llvm::Twine{"fir.fast_real_mod"}),
+        mlir::BoolAttr::get(mod.getContext(), true));
+  }
+
   // Create a parse tree and lower it to FIR
   parseAndLowerTree(ci, lb);
 
diff --git a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
index ce1376fd209cc..5e0e4fbf81717 100644
--- a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
+++ b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
@@ -7009,8 +7009,30 @@ mlir::Value IntrinsicLibrary::genMergeBits(mlir::Type resultType,
 }
 
 // MOD
+static mlir::Value genFastMod(fir::FirOpBuilder &builder, mlir::Location loc,
+                              mlir::Value a, mlir::Value p) {
+  auto fastmathFlags = mlir::arith::FastMathFlags::contract;
+  auto fastmathAttr =
+      mlir::arith::FastMathFlagsAttr::get(builder.getContext(), fastmathFlags);
+  mlir::Value divResult = mlir::arith::DivFOp::create(builder, loc, a, p, fastmathAttr);
+  mlir::Type intType = builder.getIntegerType(
+      a.getType().getIntOrFloatBitWidth(), /*signed=*/true);
+  mlir::Value intResult = builder.createConvert(loc, intType, divResult);
+  mlir::Value cnvResult = builder.createConvert(loc, a.getType(), intResult);
+  mlir::Value mulResult =
+      mlir::arith::MulFOp::create(builder, loc, cnvResult, p, fastmathAttr);
+  mlir::Value subResult =
+      mlir::arith::SubFOp::create(builder, loc, a, mulResult, fastmathAttr);
+  return subResult;
+}
+
 mlir::Value IntrinsicLibrary::genMod(mlir::Type resultType,
                                      llvm::ArrayRef<mlir::Value> args) {
+  auto mod = builder.getModule();
+  bool useFastRealMod = false;
+  if (auto attr = mod->getAttrOfType<mlir::BoolAttr>("fir.fast_real_mod"))
+    useFastRealMod = attr.getValue();
+
   assert(args.size() == 2);
   if (resultType.isUnsignedInteger()) {
     mlir::Type signlessType = mlir::IntegerType::get(
@@ -7022,9 +7044,18 @@ mlir::Value IntrinsicLibrary::genMod(mlir::Type resultType,
   if (mlir::isa<mlir::IntegerType>(resultType))
     return mlir::arith::RemSIOp::create(builder, loc, args[0], args[1]);
 
-  // Use runtime.
-  return builder.createConvert(
-      loc, resultType, fir::runtime::genMod(builder, loc, args[0], args[1]));
+  if (useFastRealMod) {
+    // If fast MOD for REAL has been requested, generate less precise,
+    // but faster code directly.
+    assert(resultType.isFloat() &&
+           "non floating-point type hit for fast real MOD");
+    return builder.createConvert(loc, resultType,
+                                 genFastMod(builder, loc, args[0], args[1]));
+  } else {
+    // Use runtime.
+    return builder.createConvert(
+        loc, resultType, fir::runtime::genMod(builder, loc, args[0], args[1]));
+  }
 }
 
 // MODULO
diff --git a/flang/test/Lower/Intrinsics/fast-real-mod.f90 b/flang/test/Lower/Intrinsics/fast-real-mod.f90
new file mode 100644
index 0000000000000..00607fa5c30d1
--- /dev/null
+++ b/flang/test/Lower/Intrinsics/fast-real-mod.f90
@@ -0,0 +1,75 @@
+! RUN: %flang_fc1 -ffast-real-mod -emit-mlir -o - %s | FileCheck %s --check-prefixes=CHECK%if target=x86_64{{.*}} %{,CHECK-KIND10%}%if flang-supports-f128-math %{,CHECK-KIND16%}
+
+! CHECK: module attributes {{{.*}}fir.fast_real_mod = true{{.*}}}
+
+! CHECK-LABEL: @_QPmod_real4
+subroutine mod_real4(r, a, p)
+    implicit none
+    real(kind=4) :: r, a, p
+! CHECK: %[[A:.*]] = fir.declare{{.*}}a"
+! CHECK: %[[P:.*]] = fir.declare{{.*}}p"
+! CHECK: %[[R:.*]] = fir.declare{{.*}}r"
+! CHECK: %[[A_LOAD:.*]] = fir.load %[[A]]
+! CHECK: %[[P_LOAD:.*]] = fir.load %[[P]]
+! CHECK: %[[DIV:.*]] = arith.divf %[[A_LOAD]], %[[P_LOAD]] fastmath<contract> : f32
+! CHECK: %[[CV1:.*]] = fir.convert %[[DIV]] : (f32) -> si32
+! CHECK: %[[CV2:.*]] = fir.convert %[[CV1]] : (si32) -> f32
+! CHECK: %[[MUL:.*]] = arith.mulf %[[CV2]], %[[P_LOAD]] fastmath<contract> : f32
+! CHECK: %[[SUB:.*]] = arith.subf %[[A_LOAD]], %[[MUL]] fastmath<contract> : f32
+! CHECK: fir.store %[[SUB]] to %[[R]] : !fir.ref<f32>
+    r = mod(a, p)
+end subroutine mod_real4
+
+! CHECK-LABEL: @_QPmod_real8
+subroutine mod_real8(r, a, p)
+    implicit none
+    real(kind=8) :: r, a, p
+! CHECK: %[[A:.*]] = fir.declare{{.*}}a"
+! CHECK: %[[P:.*]] = fir.declare{{.*}}p"
+! CHECK: %[[R:.*]] = fir.declare{{.*}}r"
+! CHECK: %[[A_LOAD:.*]] = fir.load %[[A]]
+! CHECK: %[[P_LOAD:.*]] = fir.load %[[P]]
+! CHECK: %[[DIV:.*]] = arith.divf %[[A_LOAD]], %[[P_LOAD]] fastmath<contract> : f64
+! CHECK: %[[CV1:.*]] = fir.convert %[[DIV]] : (f64) -> si64
+! CHECK: %[[CV2:.*]] = fir.convert %[[CV1]] : (si64) -> f64
+! CHECK: %[[MUL:.*]] = arith.mulf %[[CV2]], %[[P_LOAD]] fastmath<contract> : f64
+! CHECK: %[[SUB:.*]] = arith.subf %[[A_LOAD]], %[[MUL]] fastmath<contract> : f64
+! CHECK: fir.store %[[SUB]] to %[[R]] : !fir.ref<f64>
+    r = mod(a, p)
+end subroutine mod_real8
+
+! CHECK-LABEL: @_QPmod_real10
+subroutine mod_real10(r, a, p)
+    implicit none
+    real(kind=10) :: r, a, p
+! CHECK-KIND10: %[[A:.*]] = fir.declare{{.*}}a"
+! CHECK-KIND10: %[[P:.*]] = fir.declare{{.*}}p"
+! CHECK-KIND10: %[[R:.*]] = fir.declare{{.*}}r"
+! CHECK-KIND10: %[[A_LOAD:.*]] = fir.load %[[A]]
+! CHECK-KIND10: %[[P_LOAD:.*]] = fir.load %[[P]]
+! CHECK-KIND10: %[[DIV:.*]] = arith.divf %[[A_LOAD]], %[[P_LOAD]] fastmath<contract> : f80
+! CHECK-KIND10: %[[CV1:.*]] = fir.convert %[[DIV]] : (f80) -> si80
+! CHECK-KIND10: %[[CV2:.*]] = fir.convert %[[CV1]] : (si80) -> f80
+! CHECK-KIND10: %[[MUL:.*]] = arith.mulf %[[CV2]], %[[P_LOAD]] fastmath<contract> : f80
+! CHECK-KIND10: %[[SUB:.*]] = arith.subf %[[A_LOAD]], %[[MUL]] fastmath<contract> : f80
+! CHECK-KIND10: fir.store %[[SUB]] to %[[R]] : !fir.ref<f80>
+    r = mod(a, p)
+end subroutine mod_real10
+
+! CHECK-LABEL: @_QPmod_real16
+subroutine mod_real16(r, a, p)
+    implicit none
+    real(kind=16) :: r, a, p
+! CHECK-KIND16: %[[A:.*]] = fir.declare{{.*}}a"
+! CHECK-KIND16: %[[P:.*]] = fir.declare{{.*}}p"
+! CHECK-KIND16: %[[R:.*]] = fir.declare{{.*}}r"
+! CHECK-KIND16: %[[A_LOAD:.*]] = fir.load %[[A]]
+! CHECK-KIND16: %[[P_LOAD:.*]] = fir.load %[[P]]
+! CHECK-KIND16: %[[DIV:.*]] = arith.divf %[[A_LOAD]], %[[P_LOAD]] fastmath<contract> : f128
+! CHECK-KIND16: %[[CV1:.*]] = fir.convert %[[DIV]] : (f128) -> si128
+! CHECK-KIND16: %[[CV2:.*]] = fir.convert %[[CV1]] : (si128) -> f128
+! CHECK-KIND16: %[[MUL:.*]] = arith.mulf %[[CV2]], %[[P_LOAD]] fastmath<contract> : f128
+! CHECK-KIND16: %[[SUB:.*]] = arith.subf %[[A_LOAD]], %[[MUL]] fastmath<contract> : f128
+! CHECK-KIND16: fir.store %[[SUB]] to %[[R]] : !fir.ref<f128>
+    r = mod(a, p)
+end subroutine mod_real16

llvmbot · 2025-09-25T08:22:38Z

@llvm/pr-subscribers-flang-driver

Author: Michael Klemm (mjklemm)

Changes

This patch adds direct code-gen support for a faster MOD intrinsic for REAL types. Flang has maintained and keeps maintaining a high-precision implementation of the MOD intrinsic as part of the Fortran runtime. With the -ffast-real-mod flag, users can opt to avoid calling into the Fortran runtime, but instead trigger code-gen that produces faster code by avoiding the runtime call, at the expense of potentially risking bit cancelation by having the compiler use the MOD formula a specified by ISO Fortran.

Full diff: https://github.com/llvm/llvm-project/pull/160660.diff

7 Files Affected:

(modified) clang/include/clang/Driver/Options.td (+1)
(modified) clang/lib/Driver/ToolChains/Flang.cpp (+3)
(modified) flang/include/flang/Support/LangOptions.def (+2-1)
(modified) flang/lib/Frontend/CompilerInvocation.cpp (+4)
(modified) flang/lib/Frontend/FrontendActions.cpp (+8)
(modified) flang/lib/Optimizer/Builder/IntrinsicCall.cpp (+34-3)
(added) flang/test/Lower/Intrinsics/fast-real-mod.f90 (+75)

diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index a7c514e809aa9..4dc4acd5603cb 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -2750,6 +2750,7 @@ def fno_unsafe_math_optimizations : Flag<["-"], "fno-unsafe-math-optimizations">
   Group<f_Group>;
 def fassociative_math : Flag<["-"], "fassociative-math">, Visibility<[ClangOption, FlangOption]>, Group<f_Group>;
 def fno_associative_math : Flag<["-"], "fno-associative-math">, Visibility<[ClangOption, FlangOption]>, Group<f_Group>;
+def ffast_real_mod : Flag<["-"], "ffast-real-mod">, Visibility<[FlangOption, FC1Option]>, Group<f_Group>;
 defm reciprocal_math : BoolFOption<"reciprocal-math",
   LangOpts<"AllowRecip">, DefaultFalse,
   PosFlag<SetTrue, [], [ClangOption, CC1Option, FC1Option, FlangOption],
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp
index 1535f4cebf436..fbaa083d204b8 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -766,6 +766,9 @@ static void addFloatingPointOptions(const Driver &D, const ArgList &Args,
 
   if (ReciprocalMath)
     CmdArgs.push_back("-freciprocal-math");
+
+  if (Args.hasArg(options::OPT_ffast_real_mod))
+    CmdArgs.push_back("-ffast-real-mod");
 }
 
 static void renderRemarksOptions(const ArgList &Args, ArgStringList &CmdArgs,
diff --git a/flang/include/flang/Support/LangOptions.def b/flang/include/flang/Support/LangOptions.def
index ba72d7b4b7212..e310ecf37a52d 100644
--- a/flang/include/flang/Support/LangOptions.def
+++ b/flang/include/flang/Support/LangOptions.def
@@ -60,7 +60,8 @@ LANGOPT(OpenMPNoThreadState, 1, 0)
 LANGOPT(OpenMPNoNestedParallelism, 1, 0)
 /// Use SIMD only OpenMP support.
 LANGOPT(OpenMPSimd, 1, false)
-
+/// Enable fast MOD operations for REAL
+LANGOPT(FastRealMod, 1, false)
 LANGOPT(VScaleMin, 32, 0)  ///< Minimum vscale range value
 LANGOPT(VScaleMax, 32, 0)  ///< Maximum vscale range value
 
diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp
index 6295a58b1bdad..5b3f64971013e 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -1424,6 +1424,10 @@ static bool parseFloatingPointArgs(CompilerInvocation &invoc,
     opts.setFPContractMode(Fortran::common::LangOptions::FPM_Fast);
   }
 
+  if (args.hasArg(clang::driver::options::OPT_ffast_real_mod)) {
+    opts.FastRealMod = true;
+  }
+
   return true;
 }
 
diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp
index 3bef6b1c31825..d22124bc0bdeb 100644
--- a/flang/lib/Frontend/FrontendActions.cpp
+++ b/flang/lib/Frontend/FrontendActions.cpp
@@ -277,6 +277,14 @@ bool CodeGenAction::beginSourceFileAction() {
                               ci.getInvocation().getLangOpts().OpenMPVersion);
   }
 
+  if (ci.getInvocation().getLangOpts().FastRealMod) {
+    auto mod = lb.getModule();
+    mod.getOperation()->setAttr(
+        mlir::StringAttr::get(mod.getContext(),
+                              llvm::Twine{"fir.fast_real_mod"}),
+        mlir::BoolAttr::get(mod.getContext(), true));
+  }
+
   // Create a parse tree and lower it to FIR
   parseAndLowerTree(ci, lb);
 
diff --git a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
index ce1376fd209cc..5e0e4fbf81717 100644
--- a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
+++ b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
@@ -7009,8 +7009,30 @@ mlir::Value IntrinsicLibrary::genMergeBits(mlir::Type resultType,
 }
 
 // MOD
+static mlir::Value genFastMod(fir::FirOpBuilder &builder, mlir::Location loc,
+                              mlir::Value a, mlir::Value p) {
+  auto fastmathFlags = mlir::arith::FastMathFlags::contract;
+  auto fastmathAttr =
+      mlir::arith::FastMathFlagsAttr::get(builder.getContext(), fastmathFlags);
+  mlir::Value divResult = mlir::arith::DivFOp::create(builder, loc, a, p, fastmathAttr);
+  mlir::Type intType = builder.getIntegerType(
+      a.getType().getIntOrFloatBitWidth(), /*signed=*/true);
+  mlir::Value intResult = builder.createConvert(loc, intType, divResult);
+  mlir::Value cnvResult = builder.createConvert(loc, a.getType(), intResult);
+  mlir::Value mulResult =
+      mlir::arith::MulFOp::create(builder, loc, cnvResult, p, fastmathAttr);
+  mlir::Value subResult =
+      mlir::arith::SubFOp::create(builder, loc, a, mulResult, fastmathAttr);
+  return subResult;
+}
+
 mlir::Value IntrinsicLibrary::genMod(mlir::Type resultType,
                                      llvm::ArrayRef<mlir::Value> args) {
+  auto mod = builder.getModule();
+  bool useFastRealMod = false;
+  if (auto attr = mod->getAttrOfType<mlir::BoolAttr>("fir.fast_real_mod"))
+    useFastRealMod = attr.getValue();
+
   assert(args.size() == 2);
   if (resultType.isUnsignedInteger()) {
     mlir::Type signlessType = mlir::IntegerType::get(
@@ -7022,9 +7044,18 @@ mlir::Value IntrinsicLibrary::genMod(mlir::Type resultType,
   if (mlir::isa<mlir::IntegerType>(resultType))
     return mlir::arith::RemSIOp::create(builder, loc, args[0], args[1]);
 
-  // Use runtime.
-  return builder.createConvert(
-      loc, resultType, fir::runtime::genMod(builder, loc, args[0], args[1]));
+  if (useFastRealMod) {
+    // If fast MOD for REAL has been requested, generate less precise,
+    // but faster code directly.
+    assert(resultType.isFloat() &&
+           "non floating-point type hit for fast real MOD");
+    return builder.createConvert(loc, resultType,
+                                 genFastMod(builder, loc, args[0], args[1]));
+  } else {
+    // Use runtime.
+    return builder.createConvert(
+        loc, resultType, fir::runtime::genMod(builder, loc, args[0], args[1]));
+  }
 }
 
 // MODULO
diff --git a/flang/test/Lower/Intrinsics/fast-real-mod.f90 b/flang/test/Lower/Intrinsics/fast-real-mod.f90
new file mode 100644
index 0000000000000..00607fa5c30d1
--- /dev/null
+++ b/flang/test/Lower/Intrinsics/fast-real-mod.f90
@@ -0,0 +1,75 @@
+! RUN: %flang_fc1 -ffast-real-mod -emit-mlir -o - %s | FileCheck %s --check-prefixes=CHECK%if target=x86_64{{.*}} %{,CHECK-KIND10%}%if flang-supports-f128-math %{,CHECK-KIND16%}
+
+! CHECK: module attributes {{{.*}}fir.fast_real_mod = true{{.*}}}
+
+! CHECK-LABEL: @_QPmod_real4
+subroutine mod_real4(r, a, p)
+    implicit none
+    real(kind=4) :: r, a, p
+! CHECK: %[[A:.*]] = fir.declare{{.*}}a"
+! CHECK: %[[P:.*]] = fir.declare{{.*}}p"
+! CHECK: %[[R:.*]] = fir.declare{{.*}}r"
+! CHECK: %[[A_LOAD:.*]] = fir.load %[[A]]
+! CHECK: %[[P_LOAD:.*]] = fir.load %[[P]]
+! CHECK: %[[DIV:.*]] = arith.divf %[[A_LOAD]], %[[P_LOAD]] fastmath<contract> : f32
+! CHECK: %[[CV1:.*]] = fir.convert %[[DIV]] : (f32) -> si32
+! CHECK: %[[CV2:.*]] = fir.convert %[[CV1]] : (si32) -> f32
+! CHECK: %[[MUL:.*]] = arith.mulf %[[CV2]], %[[P_LOAD]] fastmath<contract> : f32
+! CHECK: %[[SUB:.*]] = arith.subf %[[A_LOAD]], %[[MUL]] fastmath<contract> : f32
+! CHECK: fir.store %[[SUB]] to %[[R]] : !fir.ref<f32>
+    r = mod(a, p)
+end subroutine mod_real4
+
+! CHECK-LABEL: @_QPmod_real8
+subroutine mod_real8(r, a, p)
+    implicit none
+    real(kind=8) :: r, a, p
+! CHECK: %[[A:.*]] = fir.declare{{.*}}a"
+! CHECK: %[[P:.*]] = fir.declare{{.*}}p"
+! CHECK: %[[R:.*]] = fir.declare{{.*}}r"
+! CHECK: %[[A_LOAD:.*]] = fir.load %[[A]]
+! CHECK: %[[P_LOAD:.*]] = fir.load %[[P]]
+! CHECK: %[[DIV:.*]] = arith.divf %[[A_LOAD]], %[[P_LOAD]] fastmath<contract> : f64
+! CHECK: %[[CV1:.*]] = fir.convert %[[DIV]] : (f64) -> si64
+! CHECK: %[[CV2:.*]] = fir.convert %[[CV1]] : (si64) -> f64
+! CHECK: %[[MUL:.*]] = arith.mulf %[[CV2]], %[[P_LOAD]] fastmath<contract> : f64
+! CHECK: %[[SUB:.*]] = arith.subf %[[A_LOAD]], %[[MUL]] fastmath<contract> : f64
+! CHECK: fir.store %[[SUB]] to %[[R]] : !fir.ref<f64>
+    r = mod(a, p)
+end subroutine mod_real8
+
+! CHECK-LABEL: @_QPmod_real10
+subroutine mod_real10(r, a, p)
+    implicit none
+    real(kind=10) :: r, a, p
+! CHECK-KIND10: %[[A:.*]] = fir.declare{{.*}}a"
+! CHECK-KIND10: %[[P:.*]] = fir.declare{{.*}}p"
+! CHECK-KIND10: %[[R:.*]] = fir.declare{{.*}}r"
+! CHECK-KIND10: %[[A_LOAD:.*]] = fir.load %[[A]]
+! CHECK-KIND10: %[[P_LOAD:.*]] = fir.load %[[P]]
+! CHECK-KIND10: %[[DIV:.*]] = arith.divf %[[A_LOAD]], %[[P_LOAD]] fastmath<contract> : f80
+! CHECK-KIND10: %[[CV1:.*]] = fir.convert %[[DIV]] : (f80) -> si80
+! CHECK-KIND10: %[[CV2:.*]] = fir.convert %[[CV1]] : (si80) -> f80
+! CHECK-KIND10: %[[MUL:.*]] = arith.mulf %[[CV2]], %[[P_LOAD]] fastmath<contract> : f80
+! CHECK-KIND10: %[[SUB:.*]] = arith.subf %[[A_LOAD]], %[[MUL]] fastmath<contract> : f80
+! CHECK-KIND10: fir.store %[[SUB]] to %[[R]] : !fir.ref<f80>
+    r = mod(a, p)
+end subroutine mod_real10
+
+! CHECK-LABEL: @_QPmod_real16
+subroutine mod_real16(r, a, p)
+    implicit none
+    real(kind=16) :: r, a, p
+! CHECK-KIND16: %[[A:.*]] = fir.declare{{.*}}a"
+! CHECK-KIND16: %[[P:.*]] = fir.declare{{.*}}p"
+! CHECK-KIND16: %[[R:.*]] = fir.declare{{.*}}r"
+! CHECK-KIND16: %[[A_LOAD:.*]] = fir.load %[[A]]
+! CHECK-KIND16: %[[P_LOAD:.*]] = fir.load %[[P]]
+! CHECK-KIND16: %[[DIV:.*]] = arith.divf %[[A_LOAD]], %[[P_LOAD]] fastmath<contract> : f128
+! CHECK-KIND16: %[[CV1:.*]] = fir.convert %[[DIV]] : (f128) -> si128
+! CHECK-KIND16: %[[CV2:.*]] = fir.convert %[[CV1]] : (si128) -> f128
+! CHECK-KIND16: %[[MUL:.*]] = arith.mulf %[[CV2]], %[[P_LOAD]] fastmath<contract> : f128
+! CHECK-KIND16: %[[SUB:.*]] = arith.subf %[[A_LOAD]], %[[MUL]] fastmath<contract> : f128
+! CHECK-KIND16: fir.store %[[SUB]] to %[[R]] : !fir.ref<f128>
+    r = mod(a, p)
+end subroutine mod_real16

github-actions · 2025-09-25T08:26:50Z

✅ With the latest revision this PR passed the C/C++ code formatter.

tblah

LGTM overall just some minor comments. Also see the clang-format and CI failures.

flang/lib/Optimizer/Builder/IntrinsicCall.cpp

tblah · 2025-09-25T09:23:24Z

flang/test/Lower/Intrinsics/fast-real-mod.f90

@@ -0,0 +1,75 @@
+! RUN: %flang_fc1 -ffast-real-mod -emit-mlir -o - %s | FileCheck %s --check-prefixes=CHECK%if target=x86_64{{.*}} %{,CHECK-KIND10%}%if flang-supports-f128-math %{,CHECK-KIND16%}


I think using kinds which are not enabled for a particular target is a compile error. Please could you put the checks for kind 10 and kind 16 in separate files and skip the whole thing if the types are not supported.

I was trying to follow what mod.f90 did. Alas, I failed to do everything that test did. Now the test in this PR does it the same way. Hope that's OK and an acceptable way how I have addressed your comment.

That test is using lines like

integer, parameter :: kind16 = merge(16, 4, selected_real_kind(p=33).eq.16) real(kind16) :: r, a, p

This make sure that on systems that don't support real(16) it will instead use real(4).

tarunprabhu

Just some minor comments.

It would be good to have a test that explicitly checks that the driver passes -ffast-real-mod to fc1.

flang/lib/Frontend/FrontendActions.cpp

flang/lib/Frontend/CompilerInvocation.cpp

vzakhari

Thank you for the changes! I have a couple of comments inlined.

vzakhari · 2025-09-25T16:25:21Z

flang/lib/Optimizer/Builder/IntrinsicCall.cpp

-  // Use runtime.
-  return builder.createConvert(
-      loc, resultType, fir::runtime::genMod(builder, loc, args[0], args[1]));
+  if (useFastRealMod && resultType.isFloat()) {


I wonder if we can enable the fast MOD under afn FastMathFlag. Is there a reason to control it via a separate option?

I was doing that to avoid issues with precision loss that could have come in with the other math functions. After I spoke to the engineers, who are working on the applications, it seems they'd be fine with moving this under the afn flag and do the optimization as part of -ffast-math which is enabled for this application.

However, I'm tempted to keep this as a separate flag and simply enable this MOD optimization it when -ffast-math is present. That makes it more controllable w.r.t. to the other math optimizations. What do you think?

It sounds okay to me. I would recommend enabling this optimization whenever afn is set (i.e. under -ffast-math or under -fapprox-func), and allow to override this with -fno-fast-real-mod.

vzakhari · 2025-09-25T16:25:46Z

clang/include/clang/Driver/Options.td

  Group<f_Group>;
 def fassociative_math : Flag<["-"], "fassociative-math">, Visibility<[ClangOption, FlangOption]>, Group<f_Group>;
 def fno_associative_math : Flag<["-"], "fno-associative-math">, Visibility<[ClangOption, FlangOption]>, Group<f_Group>;
+def ffast_real_mod : Flag<["-"], "ffast-real-mod">, Visibility<[FlangOption, FC1Option]>, Group<f_Group>;


Can you please also add -fno-fast-real-mod?

I have added it my local copy for now and push it with my next update.

tblah

Thanks for the updates. LGTM once Slava and Tarun are happy

vzakhari

Thank you!

…vm#160660) This patch adds direct code-gen support for a faster MOD intrinsic for REAL types. Flang has maintained and keeps maintaining a high-precision implementation of the MOD intrinsic as part of the Fortran runtime. With the -ffast-real-mod flag, users can opt to avoid calling into the Fortran runtime, but instead trigger code-gen that produces faster code by avoiding the runtime call, at the expense of potentially risking bit cancelation by having the compiler use the MOD formula a specified by ISO Fortran.

mjklemm added 9 commits September 15, 2025 16:02

Add first rough implementation of -ffast-real-mod

8f9868d

Add command line flag

ecae88c

Pass -ffast-real-mod via MLIR module attribute to code-gen

85b14eb

Clean up code

d32863a

Add test

d06a1ad

Improve test and add kind=16 test

313abd0

Don't use hard-coded register numbers

2572cc0

Honor -ffast-math when present

7cc56df

Remove unwanted changes

ed68857

mjklemm requested review from agozillon, kiranchandramohan, kparzysz and tblah September 25, 2025 08:22

mjklemm self-assigned this Sep 25, 2025

llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' flang:driver flang Flang issues not falling into any other category flang:fir-hlfir labels Sep 25, 2025

tblah reviewed Sep 25, 2025

View reviewed changes

Format code

afc2063

tarunprabhu reviewed Sep 25, 2025

View reviewed changes

flang/lib/Frontend/FrontendActions.cpp Outdated Show resolved Hide resolved

flang/lib/Frontend/CompilerInvocation.cpp Outdated Show resolved Hide resolved

mjklemm added 3 commits September 25, 2025 15:49

Follow suit of the test in flang/Lower/Intrinsics/mod.f90

3b392c8

Address reviewer comments

6d5836c

Add Flang driver check for -ffast-real-mod

5c8304d

vzakhari reviewed Sep 25, 2025

View reviewed changes

Add -fno-fast-real-mod

52c48db

tblah approved these changes Sep 30, 2025

View reviewed changes

tarunprabhu approved these changes Sep 30, 2025

View reviewed changes

Put the MOD optimization under AFN and add -fno-fast-real-mod

d7beb16

vzakhari approved these changes Sep 30, 2025

View reviewed changes

mjklemm merged commit 8aa64ed into llvm:main Oct 2, 2025
9 checks passed

		@@ -0,0 +1,75 @@
		! RUN: %flang_fc1 -ffast-real-mod -emit-mlir -o - %s \| FileCheck %s --check-prefixes=CHECK%if target=x86_64{{.*}} %{,CHECK-KIND10%}%if flang-supports-f128-math %{,CHECK-KIND16%}

[Flang] Add -ffast-real-mod and direct code for MOD on REAL types #160660

[Flang] Add -ffast-real-mod and direct code for MOD on REAL types #160660

Uh oh!

Conversation

mjklemm commented Sep 25, 2025

Uh oh!

llvmbot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 25, 2025

Uh oh!

github-actions bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tarunprabhu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vzakhari left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

vzakhari left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

llvmbot commented Sep 25, 2025 •

edited

Loading

github-actions bot commented Sep 25, 2025 •

edited

Loading