Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
6677b2f
Add pass which forwards unimplemented math builtins / libcalls to the…
AlexVlx May 15, 2025
a341315
Fix formatting.
AlexVlx May 15, 2025
a833e95
Add missing whitespace.
AlexVlx May 16, 2025
19194c5
Merge branch 'main' of https://github.com/llvm/llvm-project into hips…
AlexVlx May 16, 2025
057a86a
Less `auto`.
AlexVlx May 16, 2025
c598938
Auto-generate test.
AlexVlx May 20, 2025
c7f7886
Remove spurious use of `move`.
AlexVlx May 20, 2025
807f048
Refactor prefix replacement.
AlexVlx May 20, 2025
375c2f4
Use structured bindings.
AlexVlx May 22, 2025
c51a091
Refactor test.
AlexVlx May 22, 2025
db13ca2
Merge branch 'main' of https://github.com/llvm/llvm-project into hips…
AlexVlx Jun 2, 2025
7887edb
Clean up more noise.
AlexVlx Jun 2, 2025
6aa13e2
Merge branch 'main' of https://github.com/llvm/llvm-project into hips…
AlexVlx Jun 10, 2025
2eac122
Merge branch 'main' of https://github.com/llvm/llvm-project into hips…
AlexVlx Jun 13, 2025
7d5a52e
Merge branch 'main' of https://github.com/llvm/llvm-project into hips…
AlexVlx Jun 17, 2025
7aaa719
Merge branch 'main' of https://github.com/llvm/llvm-project into hips…
AlexVlx Jul 15, 2025
1167cc0
Add missing `modf` test.
AlexVlx Jul 15, 2025
e40a70a
Add test for globals used as args to replaced fn.
AlexVlx Jul 15, 2025
5084371
Merge branch 'main' into hipstdpar_math_forwarding
AlexVlx Jul 23, 2025
2767e5c
Update HipStdPar.cpp
AlexVlx Jul 23, 2025
6cd719c
Merge branch 'main' into hipstdpar_math_forwarding
AlexVlx Jul 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions llvm/include/llvm/Transforms/HipStdPar/HipStdPar.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,13 @@ class HipStdParAllocationInterpositionPass
static bool isRequired() { return true; }
};

class HipStdParMathFixupPass : public PassInfoMixin<HipStdParMathFixupPass> {
public:
PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM);

static bool isRequired() { return true; }
};

} // namespace llvm

#endif // LLVM_TRANSFORMS_HIPSTDPAR_HIPSTDPAR_H
1 change: 1 addition & 0 deletions llvm/lib/Passes/PassRegistry.def
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ MODULE_PASS("global-merge-func", GlobalMergeFuncPass())
MODULE_PASS("globalopt", GlobalOptPass())
MODULE_PASS("globalsplit", GlobalSplitPass())
MODULE_PASS("hipstdpar-interpose-alloc", HipStdParAllocationInterpositionPass())
MODULE_PASS("hipstdpar-math-fixup", HipStdParMathFixupPass())
MODULE_PASS("hipstdpar-select-accelerator-code",
HipStdParAcceleratorCodeSelectionPass())
MODULE_PASS("hotcoldsplit", HotColdSplittingPass())
Expand Down
8 changes: 6 additions & 2 deletions llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -836,8 +836,10 @@ void AMDGPUTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {
// When we are not using -fgpu-rdc, we can run accelerator code
// selection relatively early, but still after linking to prevent
// eager removal of potentially reachable symbols.
if (EnableHipStdPar)
if (EnableHipStdPar) {
PM.addPass(HipStdParMathFixupPass());
PM.addPass(HipStdParAcceleratorCodeSelectionPass());
}
PM.addPass(AMDGPUPrintfRuntimeBindingPass());
}

Expand Down Expand Up @@ -916,8 +918,10 @@ void AMDGPUTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {
// selection after linking to prevent, otherwise we end up removing
// potentially reachable symbols that were exported as external in other
// modules.
if (EnableHipStdPar)
if (EnableHipStdPar) {
PM.addPass(HipStdParMathFixupPass());
PM.addPass(HipStdParAcceleratorCodeSelectionPass());
}
// We want to support the -lto-partitions=N option as "best effort".
// For that, we need to lower LDS earlier in the pipeline before the
// module is partitioned for codegen.
Expand Down
118 changes: 118 additions & 0 deletions llvm/lib/Transforms/HipStdPar/HipStdPar.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,16 @@
// memory that ends up in one of the runtime equivalents, since this can
// happen if e.g. a library that was compiled without interposition returns
// an allocation that can be validly passed to `free`.
//
// 3. MathFixup (required): Some accelerators might have an incomplete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need to fix our whole library usage strategy (which I'm working on writing up how)

// implementation for the intrinsics used to implement some of the math
// functions in <cmath> / their corresponding libcall lowerings. Since this
// can vary quite significantly between accelerators, we replace calls to a
// set of intrinsics / lib functions known to be problematic with calls to a
// HIPSTDPAR specific forwarding layer, which gives an uniform interface for
// accelerators to implement in their own runtime components. This pass
// should run before AcceleratorCodeSelection so as to prevent the spurious
// removal of the HIPSTDPAR specific forwarding functions.
//===----------------------------------------------------------------------===//

#include "llvm/Transforms/HipStdPar/HipStdPar.h"
Expand All @@ -49,6 +59,7 @@
#include "llvm/IR/Constants.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/Module.h"
#include "llvm/Transforms/Utils/ModuleUtils.h"

Expand Down Expand Up @@ -519,3 +530,110 @@ HipStdParAllocationInterpositionPass::run(Module &M, ModuleAnalysisManager&) {

return PreservedAnalyses::none();
}

static constexpr std::pair<StringLiteral, StringLiteral> MathLibToHipStdPar[]{
{"acosh", "__hipstdpar_acosh_f64"},
{"acoshf", "__hipstdpar_acosh_f32"},
{"asinh", "__hipstdpar_asinh_f64"},
{"asinhf", "__hipstdpar_asinh_f32"},
{"atanh", "__hipstdpar_atanh_f64"},
{"atanhf", "__hipstdpar_atanh_f32"},
{"cbrt", "__hipstdpar_cbrt_f64"},
{"cbrtf", "__hipstdpar_cbrt_f32"},
{"erf", "__hipstdpar_erf_f64"},
{"erff", "__hipstdpar_erf_f32"},
{"erfc", "__hipstdpar_erfc_f64"},
{"erfcf", "__hipstdpar_erfc_f32"},
{"fdim", "__hipstdpar_fdim_f64"},
{"fdimf", "__hipstdpar_fdim_f32"},
{"expm1", "__hipstdpar_expm1_f64"},
{"expm1f", "__hipstdpar_expm1_f32"},
{"hypot", "__hipstdpar_hypot_f64"},
{"hypotf", "__hipstdpar_hypot_f32"},
{"ilogb", "__hipstdpar_ilogb_f64"},
{"ilogbf", "__hipstdpar_ilogb_f32"},
{"lgamma", "__hipstdpar_lgamma_f64"},
{"lgammaf", "__hipstdpar_lgamma_f32"},
{"log1p", "__hipstdpar_log1p_f64"},
{"log1pf", "__hipstdpar_log1p_f32"},
{"logb", "__hipstdpar_logb_f64"},
{"logbf", "__hipstdpar_logb_f32"},
{"nextafter", "__hipstdpar_nextafter_f64"},
{"nextafterf", "__hipstdpar_nextafter_f32"},
{"nexttoward", "__hipstdpar_nexttoward_f64"},
{"nexttowardf", "__hipstdpar_nexttoward_f32"},
{"remainder", "__hipstdpar_remainder_f64"},
{"remainderf", "__hipstdpar_remainder_f32"},
{"remquo", "__hipstdpar_remquo_f64"},
{"remquof", "__hipstdpar_remquo_f32"},
{"scalbln", "__hipstdpar_scalbln_f64"},
{"scalblnf", "__hipstdpar_scalbln_f32"},
{"scalbn", "__hipstdpar_scalbn_f64"},
{"scalbnf", "__hipstdpar_scalbn_f32"},
{"tgamma", "__hipstdpar_tgamma_f64"},
{"tgammaf", "__hipstdpar_tgamma_f32"}};

PreservedAnalyses HipStdParMathFixupPass::run(Module &M,
ModuleAnalysisManager &) {
if (M.empty())
return PreservedAnalyses::all();

SmallVector<std::pair<Function *, std::string>> ToReplace;
for (auto &&F : M) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why &&?

No auto as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's a forwarding ref (NOT an rvalue ref), and this is somewhat idiomatic usage in range-for loops. It's also used elsewhere in this file already, so it's self consistent.

if (!F.hasName())
continue;
Comment on lines +583 to +584
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think anonymous functions need special handling

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It saves on doing the lookup for the not_intrinsic case.


StringRef N = F.getName();
Intrinsic::ID ID = F.getIntrinsicID();

switch (ID) {
case Intrinsic::not_intrinsic: {
auto It =
find_if(MathLibToHipStdPar, [&](auto &&M) { return M.first == N; });
if (It == std::cend(MathLibToHipStdPar))
Comment on lines +591 to +593
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would cross reference this with the TargetLibraryInfo for whether this is a recognized call. That will always fail for AMDGPU since we have to say we have no library calls.

Failing that, should at least make an effort to respect nobuiltin

continue;
ToReplace.emplace_back(&F, It->second);
break;
}
case Intrinsic::acos:
case Intrinsic::asin:
case Intrinsic::atan:
case Intrinsic::atan2:
case Intrinsic::cosh:
case Intrinsic::modf:
case Intrinsic::sinh:
case Intrinsic::tan:
case Intrinsic::tanh:
break;
default: {
if (F.getReturnType()->isDoubleTy()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't handle vectors, which is the main plus of the intrinsics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but this is (at least for the time being) intended to cover the standard library, and neither the C nor the C++ one have vector support at the moment.

switch (ID) {
case Intrinsic::cos:
case Intrinsic::exp:
case Intrinsic::exp2:
case Intrinsic::log:
case Intrinsic::log10:
case Intrinsic::log2:
case Intrinsic::pow:
case Intrinsic::sin:
break;
default:
continue;
}
break;
}
Comment on lines +608 to +624
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why not just take the inner switch out and merge it with the outer one? I don't think they are semantically different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are only missing for FP64, it'd have to check per case, rather than once per switch. I don't know that one is significantly more readable than the other.

continue;
}
}

ToReplace.emplace_back(&F, N);
llvm::replace(ToReplace.back().second, '.', '_');
StringRef Prefix = "llvm";
ToReplace.back().second.replace(0, Prefix.size(), "__hipstdpar");
}
for (auto &&[F, NewF] : ToReplace)
F->replaceAllUsesWith(
M.getOrInsertFunction(NewF, F->getFunctionType()).getCallee());

return PreservedAnalyses::none();
}
Loading