Skip to content

Commit dd14eb8

Browse files
authored
[RISCV] Introduce pass to promote double constants to a global array (#160536)
As discussed in #153402, we have inefficiences in handling constant pool access that are difficult to address. Using an IR pass to promote double constants to a global allows a higher degree of control of code generation for these accesses, resulting in improved performance on benchmarks that might otherwise have high register pressure due to accessing constant pool values separately rather than via a common base. Directly promoting double constants to separate global values and relying on the global merger to do a sensible thing would be one potential avenue to explore, but it is _not_ done in this version of the patch because: * The global merger pass needs fixes. For instance it claims to be a function pass, yet all of the work is done in initialisation. This means that attempts by backends to schedule it after a given module pass don't actually work as expected. * The heuristics used can impact codegen unexpectedly, so I worry that tweaking it to get the behaviour desired for promoted constants may lead to other issues. This may be completely tractable though. Now that #159352 has landed, the impact on terms if dynamically executed instructions is slightly smaller (as we are starting from a better baseline), but still worthwhile in lbm and nab from SPEC. Results below are for rva22u64: ``` Benchmark Baseline This PR Diff (%) ============================================================ ============================================================ 500.perlbench_r 180668945687 180666122417 -0.00% 502.gcc_r 221274522161 221277565086 0.00% 505.mcf_r 134656204033 134656204066 0.00% 508.namd_r 217646645332 216699783858 -0.44% 510.parest_r 291731988950 291916190776 0.06% 511.povray_r 30983594866 31107718817 0.40% 519.lbm_r 91217999812 87405361395 -4.18% 520.omnetpp_r 137699867177 137674535853 -0.02% 523.xalancbmk_r 284730719514 284734023366 0.00% 525.x264_r 379107521547 379100250568 -0.00% 526.blender_r 659391437610 659447919505 0.01% 531.deepsjeng_r 350038121654 350038121656 0.00% 538.imagick_r 238568674979 238560772162 -0.00% 541.leela_r 405660852855 405654701346 -0.00% 544.nab_r 398215801848 391352111262 -1.72% 557.xz_r 129832192047 129832192055 0.00% ``` --- Notes for reviewers: * As discussed at the sync-up meeting, the suggestion is to try to land an incremental improvement to the status quo even if there is more work to be done around the general issue of constant pool handling. We can discuss here if that is actually the best next step or not, but I just wanted to clarify that's why this is being posted with a somewhat narrow scope. * I've disabled transformations both for RV32 and on systems without D as both cases saw some regressions.
1 parent 5fedb7c commit dd14eb8

File tree

6 files changed

+370
-0
lines changed

6 files changed

+370
-0
lines changed

llvm/lib/Target/RISCV/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ add_llvm_target(RISCVCodeGen
5858
RISCVMoveMerger.cpp
5959
RISCVOptWInstrs.cpp
6060
RISCVPostRAExpandPseudoInsts.cpp
61+
RISCVPromoteConstant.cpp
6162
RISCVPushPopOptimizer.cpp
6263
RISCVRedundantCopyElimination.cpp
6364
RISCVRegisterInfo.cpp

llvm/lib/Target/RISCV/RISCV.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
namespace llvm {
2121
class FunctionPass;
2222
class InstructionSelector;
23+
class ModulePass;
2324
class PassRegistry;
2425
class RISCVRegisterBankInfo;
2526
class RISCVSubtarget;
@@ -111,6 +112,9 @@ void initializeRISCVO0PreLegalizerCombinerPass(PassRegistry &);
111112
FunctionPass *createRISCVPreLegalizerCombiner();
112113
void initializeRISCVPreLegalizerCombinerPass(PassRegistry &);
113114

115+
ModulePass *createRISCVPromoteConstantPass();
116+
void initializeRISCVPromoteConstantPass(PassRegistry &);
117+
114118
FunctionPass *createRISCVVLOptimizerPass();
115119
void initializeRISCVVLOptimizerPass(PassRegistry &);
116120

Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
//==- RISCVPromoteConstant.cpp - Promote constant fp to global for RISC-V --==//
2+
//
3+
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4+
// See https://llvm.org/LICENSE.txt for license information.
5+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
//
7+
//===----------------------------------------------------------------------===//
8+
9+
#include "RISCV.h"
10+
#include "RISCVSubtarget.h"
11+
#include "llvm/ADT/DenseMap.h"
12+
#include "llvm/ADT/SmallVector.h"
13+
#include "llvm/ADT/Statistic.h"
14+
#include "llvm/CodeGen/TargetLowering.h"
15+
#include "llvm/CodeGen/TargetPassConfig.h"
16+
#include "llvm/IR/BasicBlock.h"
17+
#include "llvm/IR/Constant.h"
18+
#include "llvm/IR/Constants.h"
19+
#include "llvm/IR/Function.h"
20+
#include "llvm/IR/GlobalValue.h"
21+
#include "llvm/IR/GlobalVariable.h"
22+
#include "llvm/IR/IRBuilder.h"
23+
#include "llvm/IR/InstIterator.h"
24+
#include "llvm/IR/Instruction.h"
25+
#include "llvm/IR/Instructions.h"
26+
#include "llvm/IR/IntrinsicInst.h"
27+
#include "llvm/IR/Module.h"
28+
#include "llvm/IR/Type.h"
29+
#include "llvm/InitializePasses.h"
30+
#include "llvm/Pass.h"
31+
#include "llvm/Support/Casting.h"
32+
#include "llvm/Support/Debug.h"
33+
34+
using namespace llvm;
35+
36+
#define DEBUG_TYPE "riscv-promote-const"
37+
#define RISCV_PROMOTE_CONSTANT_NAME "RISC-V Promote Constants"
38+
39+
STATISTIC(NumPromoted, "Number of constant literals promoted to globals");
40+
STATISTIC(NumPromotedUses, "Number of uses of promoted literal constants");
41+
42+
namespace {
43+
44+
class RISCVPromoteConstant : public ModulePass {
45+
public:
46+
static char ID;
47+
RISCVPromoteConstant() : ModulePass(ID) {}
48+
49+
StringRef getPassName() const override { return RISCV_PROMOTE_CONSTANT_NAME; }
50+
51+
void getAnalysisUsage(AnalysisUsage &AU) const override {
52+
AU.addRequired<TargetPassConfig>();
53+
AU.setPreservesCFG();
54+
}
55+
56+
/// Iterate over the functions and promote the double fp constants that
57+
/// would otherwise go into the constant pool to a constant array.
58+
bool runOnModule(Module &M) override {
59+
if (skipModule(M))
60+
return false;
61+
// TargetMachine and Subtarget are needed to query isFPImmlegal.
62+
const TargetPassConfig &TPC = getAnalysis<TargetPassConfig>();
63+
const TargetMachine &TM = TPC.getTM<TargetMachine>();
64+
bool Changed = false;
65+
for (Function &F : M) {
66+
const RISCVSubtarget &ST = TM.getSubtarget<RISCVSubtarget>(F);
67+
const RISCVTargetLowering *TLI = ST.getTargetLowering();
68+
Changed |= runOnFunction(F, TLI);
69+
}
70+
return Changed;
71+
}
72+
73+
private:
74+
bool runOnFunction(Function &F, const RISCVTargetLowering *TLI);
75+
};
76+
} // end anonymous namespace
77+
78+
char RISCVPromoteConstant::ID = 0;
79+
80+
INITIALIZE_PASS(RISCVPromoteConstant, DEBUG_TYPE, RISCV_PROMOTE_CONSTANT_NAME,
81+
false, false)
82+
83+
ModulePass *llvm::createRISCVPromoteConstantPass() {
84+
return new RISCVPromoteConstant();
85+
}
86+
87+
bool RISCVPromoteConstant::runOnFunction(Function &F,
88+
const RISCVTargetLowering *TLI) {
89+
if (F.hasOptNone() || F.hasOptSize())
90+
return false;
91+
92+
// Bail out and make no transformation if the target doesn't support
93+
// doubles, or if we're not targeting RV64 as we currently see some
94+
// regressions for those targets.
95+
if (!TLI->isTypeLegal(MVT::f64) || !TLI->isTypeLegal(MVT::i64))
96+
return false;
97+
98+
// Collect all unique double constants and their uses in the function. Use
99+
// MapVector to preserve insertion order.
100+
MapVector<ConstantFP *, SmallVector<Use *, 8>> ConstUsesMap;
101+
102+
for (Instruction &I : instructions(F)) {
103+
for (Use &U : I.operands()) {
104+
auto *C = dyn_cast<ConstantFP>(U.get());
105+
if (!C || !C->getType()->isDoubleTy())
106+
continue;
107+
// Do not promote if it wouldn't be loaded from the constant pool.
108+
if (TLI->isFPImmLegal(C->getValueAPF(), MVT::f64,
109+
/*ForCodeSize=*/false))
110+
continue;
111+
// Do not promote a constant if it is used as an immediate argument
112+
// for an intrinsic.
113+
if (auto *II = dyn_cast<IntrinsicInst>(U.getUser())) {
114+
Function *IntrinsicFunc = II->getFunction();
115+
unsigned OperandIdx = U.getOperandNo();
116+
if (IntrinsicFunc && IntrinsicFunc->getAttributes().hasParamAttr(
117+
OperandIdx, Attribute::ImmArg)) {
118+
LLVM_DEBUG(dbgs() << "Skipping promotion of constant in: " << *II
119+
<< " because operand " << OperandIdx
120+
<< " must be an immediate.\n");
121+
continue;
122+
}
123+
}
124+
// Note: FP args to inline asm would be problematic if we had a
125+
// constraint that required an immediate floating point operand. At the
126+
// time of writing LLVM doesn't recognise such a constraint.
127+
ConstUsesMap[C].push_back(&U);
128+
}
129+
}
130+
131+
int PromotableConstants = ConstUsesMap.size();
132+
LLVM_DEBUG(dbgs() << "Found " << PromotableConstants
133+
<< " promotable constants in " << F.getName() << "\n");
134+
// Bail out if no promotable constants found, or if only one is found.
135+
if (PromotableConstants < 2) {
136+
LLVM_DEBUG(dbgs() << "Performing no promotions as insufficient promotable "
137+
"constants found\n");
138+
return false;
139+
}
140+
141+
NumPromoted += PromotableConstants;
142+
143+
// Create a global array containing the promoted constants.
144+
Module *M = F.getParent();
145+
Type *DoubleTy = Type::getDoubleTy(M->getContext());
146+
147+
SmallVector<Constant *, 16> ConstantVector;
148+
for (auto const &Pair : ConstUsesMap)
149+
ConstantVector.push_back(Pair.first);
150+
151+
ArrayType *ArrayTy = ArrayType::get(DoubleTy, ConstantVector.size());
152+
Constant *GlobalArrayInitializer =
153+
ConstantArray::get(ArrayTy, ConstantVector);
154+
155+
auto *GlobalArray = new GlobalVariable(
156+
*M, ArrayTy,
157+
/*isConstant=*/true, GlobalValue::InternalLinkage, GlobalArrayInitializer,
158+
".promoted_doubles." + F.getName());
159+
160+
// A cache to hold the loaded value for a given constant within a basic block.
161+
DenseMap<std::pair<ConstantFP *, BasicBlock *>, Value *> LocalLoads;
162+
163+
// Replace all uses with the loaded value.
164+
unsigned Idx = 0;
165+
for (auto const &Pair : ConstUsesMap) {
166+
ConstantFP *Const = Pair.first;
167+
const SmallVector<Use *, 8> &Uses = Pair.second;
168+
169+
for (Use *U : Uses) {
170+
Instruction *UserInst = cast<Instruction>(U->getUser());
171+
BasicBlock *InsertionBB;
172+
173+
// If the user is a PHI node, we must insert the load in the
174+
// corresponding predecessor basic block. Otherwise, it's inserted into
175+
// the same block as the use.
176+
if (auto *PN = dyn_cast<PHINode>(UserInst))
177+
InsertionBB = PN->getIncomingBlock(*U);
178+
else
179+
InsertionBB = UserInst->getParent();
180+
181+
if (isa<CatchSwitchInst>(InsertionBB->getTerminator())) {
182+
LLVM_DEBUG(dbgs() << "Bailing out: catchswitch means thre is no valid "
183+
"insertion point.\n");
184+
return false;
185+
}
186+
187+
auto CacheKey = std::make_pair(Const, InsertionBB);
188+
Value *LoadedVal = nullptr;
189+
190+
// Re-use a load if it exists in the insertion block.
191+
if (LocalLoads.count(CacheKey)) {
192+
LoadedVal = LocalLoads.at(CacheKey);
193+
} else {
194+
// Otherwise, create a new GEP and Load at the correct insertion point.
195+
// It is always safe to insert in the first insertion point in the BB,
196+
// so do that and let other passes reorder.
197+
IRBuilder<> Builder(InsertionBB, InsertionBB->getFirstInsertionPt());
198+
Value *ElementPtr = Builder.CreateConstInBoundsGEP2_64(
199+
GlobalArray->getValueType(), GlobalArray, 0, Idx, "double.addr");
200+
LoadedVal = Builder.CreateLoad(DoubleTy, ElementPtr, "double.val");
201+
202+
// Cache the newly created load for this block.
203+
LocalLoads[CacheKey] = LoadedVal;
204+
}
205+
206+
U->set(LoadedVal);
207+
++NumPromotedUses;
208+
}
209+
++Idx;
210+
}
211+
212+
return true;
213+
}

llvm/lib/Target/RISCV/RISCVTargetMachine.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,7 @@ extern "C" LLVM_ABI LLVM_EXTERNAL_VISIBILITY void LLVMInitializeRISCVTarget() {
139139
initializeRISCVExpandAtomicPseudoPass(*PR);
140140
initializeRISCVRedundantCopyEliminationPass(*PR);
141141
initializeRISCVAsmPrinterPass(*PR);
142+
initializeRISCVPromoteConstantPass(*PR);
142143
}
143144

144145
static Reloc::Model getEffectiveRelocModel(std::optional<Reloc::Model> RM) {
@@ -462,6 +463,8 @@ void RISCVPassConfig::addIRPasses() {
462463
}
463464

464465
bool RISCVPassConfig::addPreISel() {
466+
if (TM->getOptLevel() != CodeGenOptLevel::None)
467+
addPass(createRISCVPromoteConstantPass());
465468
if (TM->getOptLevel() != CodeGenOptLevel::None) {
466469
// Add a barrier before instruction selection so that we will not get
467470
// deleted block address after enabling default outlining. See D99707 for

llvm/test/CodeGen/RISCV/O3-pipeline.ll

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@
7575
; CHECK-NEXT: CodeGen Prepare
7676
; CHECK-NEXT: Dominator Tree Construction
7777
; CHECK-NEXT: Exception handling preparation
78+
; CHECK-NEXT: RISC-V Promote Constants
7879
; CHECK-NEXT: A No-Op Barrier Pass
7980
; CHECK-NEXT: FunctionPass Manager
8081
; CHECK-NEXT: Merge internal globals

0 commit comments

Comments
 (0)