-
Notifications
You must be signed in to change notification settings - Fork 14.6k
[VPlan] Replace VPRegionBlock with explicit CFG before execute (NFCI). #117506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
c274eea
241c47c
e3d3754
9c652ec
3a064e7
ba15842
9cae5f7
45042a4
6c169e0
b082242
b4670f6
72c1a60
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -2760,6 +2760,15 @@ LoopVectorizationCostModel::getVectorIntrinsicCost(CallInst *CI, | |||||
return TTI.getIntrinsicInstrCost(CostAttrs, CostKind); | ||||||
} | ||||||
|
||||||
static VPBasicBlock *getHeaderForMainVectorLoop(VPlan &Plan, | ||||||
VPDominatorTree &VPDT) { | ||||||
return find_singleton<VPBasicBlock>( | ||||||
vp_depth_first_shallow(Plan.getEntry()), [&VPDT](VPBlockBase *VPB, bool) { | ||||||
auto *VPBB = dyn_cast<VPBasicBlock>(VPB); | ||||||
return VPBB && VPBB->isHeader(VPDT) ? VPBB : nullptr; | ||||||
}); | ||||||
} | ||||||
|
||||||
void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) { | ||||||
// Fix widened non-induction PHIs by setting up the PHI operands. | ||||||
if (EnableVPlanNativePath) | ||||||
|
@@ -2778,13 +2787,13 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) { | |||||
PSE.getSE()->forgetLoop(OrigLoop); | ||||||
PSE.getSE()->forgetBlockAndLoopDispositions(); | ||||||
|
||||||
// Don't apply optimizations below when no vector region remains, as they all | ||||||
// require a vector loop at the moment. | ||||||
if (!State.Plan->getVectorLoopRegion()) | ||||||
// Don't apply optimizations below when no vector loop remains, as they all | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
this no longer checks if the loop is vector or not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated thanks |
||||||
// require one at the moment. | ||||||
VPBasicBlock *HeaderVPBB = | ||||||
getHeaderForMainVectorLoop(*State.Plan, State.VPDT); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that here we could ask if LI->isHeader() instead as we'll be calling LI->getLoopFor() next, i.e., rely on the IRBB's generated from VPBB's and their mappings in VPBB2IRBB and LI. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could, although it wouldnt' be much simpler? I left it as is for now |
||||||
if (!HeaderVPBB) | ||||||
return; | ||||||
|
||||||
VPRegionBlock *VectorRegion = State.Plan->getVectorLoopRegion(); | ||||||
VPBasicBlock *HeaderVPBB = VectorRegion->getEntryBasicBlock(); | ||||||
BasicBlock *HeaderBB = State.CFG.VPBB2IRBB[HeaderVPBB]; | ||||||
|
||||||
// Remove redundant induction instructions. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could cse() be applied as a VPlan2VPlan transformation rather than to the generated IR below? Possibly even before dismantling regions, when header blocks are easier to find. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, this should also be moved to a VPlan-transform. The limitation to the loop region isn't really material for the VPlan version. |
||||||
|
@@ -2809,7 +2818,7 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) { | |||||
} | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could setProfileInfoAfterUnrolling() be applied to the branch recipe as part of its VPIRMetadata, rather than above to the branch instruction it generates? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, but it would also need adding metadata to the branch in the scalar loop. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Which calls for expanding VPlan's scope... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, although it will require modifying the original scalar loop in VPlan, which probably needs more thought what kind of modifications we need. Maybe adding metadata would be sufficient? |
||||||
|
||||||
void InnerLoopVectorizer::fixNonInductionPHIs(VPTransformState &State) { | ||||||
auto Iter = vp_depth_first_deep(Plan.getEntry()); | ||||||
auto Iter = vp_depth_first_shallow(Plan.getEntry()); | ||||||
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(Iter)) { | ||||||
for (VPRecipeBase &P : VPBB->phis()) { | ||||||
VPWidenPHIRecipe *VPPhi = dyn_cast<VPWidenPHIRecipe>(&P); | ||||||
|
@@ -7799,6 +7808,9 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan( | |||||
BestVPlan, BestVF, | ||||||
TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector)); | ||||||
VPlanTransforms::removeDeadRecipes(BestVPlan); | ||||||
|
||||||
VPBasicBlock *MiddleVPBB = | ||||||
BestVPlan.getVectorLoopRegion() ? BestVPlan.getMiddleBlock() : nullptr; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Introduce closer to use - why hoist (from line 7964 below)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is done here, just before disolving the region. To sink it, we would need to find the middle-block w/o region. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, right. Deserves a comment. Also worth noting phase ordering aspects of disolveLoopRegions() - intentionally after optimizeForVFAndUF()? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done thanks. |
||||||
VPlanTransforms::convertToConcreteRecipes(BestVPlan, | ||||||
*Legal->getWidestInductionType()); | ||||||
|
||||||
|
@@ -7894,14 +7906,14 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan( | |||||
// 2.6. Maintain Loop Hints | ||||||
// Keep all loop hints from the original loop on the vector loop (we'll | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could the Loop Hints be captured in VPIRMetadata? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That should be possible, can do separately, thanks |
||||||
// replace the vectorizer-specific hints below). | ||||||
if (auto *LoopRegion = BestVPlan.getVectorLoopRegion()) { | ||||||
VPBasicBlock *HeaderVPBB = getHeaderForMainVectorLoop(BestVPlan, State.VPDT); | ||||||
if (HeaderVPBB) { | ||||||
MDNode *OrigLoopID = OrigLoop->getLoopID(); | ||||||
|
||||||
std::optional<MDNode *> VectorizedLoopID = | ||||||
makeFollowupLoopID(OrigLoopID, {LLVMLoopVectorizeFollowupAll, | ||||||
LLVMLoopVectorizeFollowupVectorized}); | ||||||
|
||||||
VPBasicBlock *HeaderVPBB = LoopRegion->getEntryBasicBlock(); | ||||||
Loop *L = LI->getLoopFor(State.CFG.VPBB2IRBB[HeaderVPBB]); | ||||||
if (VectorizedLoopID) { | ||||||
L->setLoopID(*VectorizedLoopID); | ||||||
|
@@ -7947,8 +7959,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan( | |||||
ILV.printDebugTracesAtEnd(); | ||||||
|
||||||
// 4. Adjust branch weight of the branch in the middle block. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could this branch weight be captured in VPIRMetadata? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, will look into that separately, thanks |
||||||
if (BestVPlan.getVectorLoopRegion()) { | ||||||
auto *MiddleVPBB = BestVPlan.getMiddleBlock(); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why hoist setting MiddleVPBB from here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As above, currently it relies on the region. Could also do a bit more work to find it w/o region if desired |
||||||
if (HeaderVPBB) { | ||||||
auto *MiddleTerm = | ||||||
cast<BranchInst>(State.CFG.VPBB2IRBB[MiddleVPBB]->getTerminator()); | ||||||
if (MiddleTerm->isConditional() && | ||||||
|
Original file line number | Diff line number | Diff line change | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -207,6 +207,11 @@ VPBlockBase *VPBlockBase::getEnclosingBlockWithPredecessors() { | |||||||||||||||||
return Parent->getEnclosingBlockWithPredecessors(); | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
bool VPBasicBlock::isHeader(const VPDominatorTree &VPDT) const { | ||||||||||||||||||
return getPredecessors().size() == 2 && | ||||||||||||||||||
VPDT.dominates(this, getPredecessors()[1]); | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps BlockUtils should provide an Another option to identify header blocks after dismantling regions (and possibly also before), rather than depending on external control flow nor dominance info, is by depending on the block's internal content - a block is a header iff its first recipe is a header phi. If VPWidenPHIRecipes and/or VPPhi VPInstructions may reside in header and need to be identified as such (in addition to VPHeaderPhiRecipes), they could be defined as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Region headers for the top region will always have canonical phis, but the header block of the inner loop when doing outer-loop vectorization would not have one. I moved it to VPBlockUtils for now. Not sure if it is worth adding the extra abstractions, just to make it slightly easier to identify header blocks independent of the CFG? Would there be other benefits? |
||||||||||||||||||
VPBasicBlock::iterator VPBasicBlock::getFirstNonPhi() { | ||||||||||||||||||
iterator It = begin(); | ||||||||||||||||||
while (It != end() && It->isPhi()) | ||||||||||||||||||
|
@@ -424,14 +429,18 @@ void VPBasicBlock::connectToPredecessors(VPTransformState &State) { | |||||||||||||||||
for (VPBlockBase *PredVPBlock : getHierarchicalPredecessors()) { | ||||||||||||||||||
VPBasicBlock *PredVPBB = PredVPBlock->getExitingBasicBlock(); | ||||||||||||||||||
auto &PredVPSuccessors = PredVPBB->getHierarchicalSuccessors(); | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note: getHierarchical*() are still needed due to replicate regions, when VF>1 (but removed when unrolling only, with UF>1, VF=1?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep |
||||||||||||||||||
BasicBlock *PredBB = CFG.VPBB2IRBB[PredVPBB]; | ||||||||||||||||||
BasicBlock *PredBB = CFG.VPBB2IRBB.lookup(PredVPBB); | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can retrain
Suggested change
? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Restored, thanks |
||||||||||||||||||
if (!PredBB) | ||||||||||||||||||
continue; | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: early-continue asap
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Documented, this is when executing the header. Made the check more explicit and moved out of the loop. |
||||||||||||||||||
|
||||||||||||||||||
assert(PredBB && "Predecessor basic-block not found building successor."); | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: can alternatively introduce an |
||||||||||||||||||
auto *PredBBTerminator = PredBB->getTerminator(); | ||||||||||||||||||
LLVM_DEBUG(dbgs() << "LV: draw edge from" << PredBB->getName() << '\n'); | ||||||||||||||||||
|
||||||||||||||||||
auto *TermBr = dyn_cast<BranchInst>(PredBBTerminator); | ||||||||||||||||||
if (isa<UnreachableInst>(PredBBTerminator)) { | ||||||||||||||||||
if (PredVPSuccessors.size() == 2) | ||||||||||||||||||
continue; | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add comment explaining this newly ignored case? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Check not needed for the latest version, removed thanks. |
||||||||||||||||||
assert(PredVPSuccessors.size() == 1 && | ||||||||||||||||||
"Predecessor ending w/o branch must have single successor."); | ||||||||||||||||||
DebugLoc DL = PredBBTerminator->getDebugLoc(); | ||||||||||||||||||
|
@@ -487,11 +496,25 @@ void VPBasicBlock::execute(VPTransformState *State) { | |||||||||||||||||
bool Replica = bool(State->Lane); | ||||||||||||||||||
BasicBlock *NewBB = State->CFG.PrevBB; // Reuse it if possible. | ||||||||||||||||||
|
||||||||||||||||||
if (isHeader(State->VPDT)) { | ||||||||||||||||||
// Create and register the new vector loop. | ||||||||||||||||||
Loop *PrevParentLoop = State->CurrentParentLoop; | ||||||||||||||||||
State->CurrentParentLoop = State->LI->AllocateLoop(); | ||||||||||||||||||
|
||||||||||||||||||
// Insert the new loop into the loop nest and register the new basic blocks | ||||||||||||||||||
// before calling any utilities such as SCEV that require valid LoopInfo. | ||||||||||||||||||
if (PrevParentLoop) | ||||||||||||||||||
PrevParentLoop->addChildLoop(State->CurrentParentLoop); | ||||||||||||||||||
else | ||||||||||||||||||
State->LI->addTopLevelLoop(State->CurrentParentLoop); | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
auto IsReplicateRegion = [](VPBlockBase *BB) { | ||||||||||||||||||
auto *R = dyn_cast_or_null<VPRegionBlock>(BB); | ||||||||||||||||||
return R && R->isReplicator(); | ||||||||||||||||||
assert((!R || R->isReplicator()) && | ||||||||||||||||||
"only replicate region blocks should remain"); | ||||||||||||||||||
return R; | ||||||||||||||||||
}; | ||||||||||||||||||
|
||||||||||||||||||
// 1. Create an IR basic block. | ||||||||||||||||||
if ((Replica && this == getParent()->getEntry()) || | ||||||||||||||||||
IsReplicateRegion(getSingleHierarchicalPredecessor())) { | ||||||||||||||||||
|
@@ -514,6 +537,14 @@ void VPBasicBlock::execute(VPTransformState *State) { | |||||||||||||||||
|
||||||||||||||||||
// 2. Fill the IR basic block with IR instructions. | ||||||||||||||||||
executeRecipes(State, NewBB); | ||||||||||||||||||
|
||||||||||||||||||
// If this block is a latch, update CurrentParentLoop. | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Checking if current canonical block There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Simplified thanks There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Wrapping in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done thanks |
||||||||||||||||||
if (any_of(getSuccessors(), [State, this](VPBlockBase *Succ) { | ||||||||||||||||||
auto *VPBB = dyn_cast<VPBasicBlock>(Succ); | ||||||||||||||||||
return VPBB && VPBB->isHeader(State->VPDT) && | ||||||||||||||||||
State->VPDT.dominates(Succ, this); | ||||||||||||||||||
})) | ||||||||||||||||||
State->CurrentParentLoop = State->CurrentParentLoop->getParentLoop(); | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
VPBasicBlock *VPBasicBlock::clone() { | ||||||||||||||||||
|
@@ -725,35 +756,13 @@ VPRegionBlock *VPRegionBlock::clone() { | |||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
void VPRegionBlock::execute(VPTransformState *State) { | ||||||||||||||||||
ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> | ||||||||||||||||||
RPOT(Entry); | ||||||||||||||||||
|
||||||||||||||||||
if (!isReplicator()) { | ||||||||||||||||||
// Create and register the new vector loop. | ||||||||||||||||||
Loop *PrevParentLoop = State->CurrentParentLoop; | ||||||||||||||||||
State->CurrentParentLoop = State->LI->AllocateLoop(); | ||||||||||||||||||
|
||||||||||||||||||
// Insert the new loop into the loop nest and register the new basic blocks | ||||||||||||||||||
// before calling any utilities such as SCEV that require valid LoopInfo. | ||||||||||||||||||
if (PrevParentLoop) | ||||||||||||||||||
PrevParentLoop->addChildLoop(State->CurrentParentLoop); | ||||||||||||||||||
else | ||||||||||||||||||
State->LI->addTopLevelLoop(State->CurrentParentLoop); | ||||||||||||||||||
|
||||||||||||||||||
// Visit the VPBlocks connected to "this", starting from it. | ||||||||||||||||||
for (VPBlockBase *Block : RPOT) { | ||||||||||||||||||
LLVM_DEBUG(dbgs() << "LV: VPBlock in RPO " << Block->getName() << '\n'); | ||||||||||||||||||
Block->execute(State); | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
State->CurrentParentLoop = PrevParentLoop; | ||||||||||||||||||
return; | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
assert(isReplicator() && | ||||||||||||||||||
"Loop regions should have been lowered to plain CFG"); | ||||||||||||||||||
assert(!State->Lane && "Replicating a Region with non-null instance."); | ||||||||||||||||||
|
||||||||||||||||||
// Enter replicating mode. | ||||||||||||||||||
assert(!State->VF.isScalable() && "VF is assumed to be non scalable."); | ||||||||||||||||||
|
||||||||||||||||||
ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT( | ||||||||||||||||||
Entry); | ||||||||||||||||||
State->Lane = VPLane(0); | ||||||||||||||||||
for (unsigned Lane = 0, VF = State->VF.getKnownMinValue(); Lane < VF; | ||||||||||||||||||
++Lane) { | ||||||||||||||||||
|
@@ -847,6 +856,22 @@ void VPRegionBlock::print(raw_ostream &O, const Twine &Indent, | |||||||||||||||||
} | ||||||||||||||||||
#endif | ||||||||||||||||||
|
||||||||||||||||||
void VPRegionBlock::removeRegion() { | ||||||||||||||||||
auto *Header = cast<VPBasicBlock>(getEntry()); | ||||||||||||||||||
VPBlockBase *Preheader = getSinglePredecessor(); | ||||||||||||||||||
auto *Exiting = cast<VPBasicBlock>(getExiting()); | ||||||||||||||||||
|
||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done thanks |
||||||||||||||||||
VPBlockBase *Middle = getSingleSuccessor(); | ||||||||||||||||||
VPBlockUtils::disconnectBlocks(Preheader, this); | ||||||||||||||||||
VPBlockUtils::disconnectBlocks(this, Middle); | ||||||||||||||||||
|
||||||||||||||||||
for (VPBlockBase *VPB : vp_depth_first_shallow(Entry)) | ||||||||||||||||||
VPB->setParent(nullptr); | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The new parent should conceptually be the removed region's parent, or Preheader/Middle's parent? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, thanks |
||||||||||||||||||
|
||||||||||||||||||
VPBlockUtils::connectBlocks(Preheader, Header); | ||||||||||||||||||
VPBlockUtils::connectBlocks(Exiting, Middle); | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should the backedge be added too? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, moved here, thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated to |
||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
VPlan::VPlan(Loop *L) { | ||||||||||||||||||
setEntry(createVPIRBasicBlock(L->getLoopPreheader())); | ||||||||||||||||||
ScalarHeader = createVPIRBasicBlock(L->getHeader()); | ||||||||||||||||||
|
@@ -956,57 +981,57 @@ void VPlan::execute(VPTransformState *State) { | |||||||||||||||||
for (VPBlockBase *Block : RPOT) | ||||||||||||||||||
Block->execute(State); | ||||||||||||||||||
|
||||||||||||||||||
State->CFG.DTU.flush(); | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need to flush now? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is independent, will submit separately, thanks! |
||||||||||||||||||
|
||||||||||||||||||
auto *LoopRegion = getVectorLoopRegion(); | ||||||||||||||||||
if (!LoopRegion) | ||||||||||||||||||
return; | ||||||||||||||||||
|
||||||||||||||||||
VPBasicBlock *LatchVPBB = LoopRegion->getExitingBasicBlock(); | ||||||||||||||||||
BasicBlock *VectorLatchBB = State->CFG.VPBB2IRBB[LatchVPBB]; | ||||||||||||||||||
|
||||||||||||||||||
// Fix the latch value of canonical, reduction and first-order recurrences | ||||||||||||||||||
// phis in the vector loop. | ||||||||||||||||||
VPBasicBlock *Header = LoopRegion->getEntryBasicBlock(); | ||||||||||||||||||
for (VPRecipeBase &R : Header->phis()) { | ||||||||||||||||||
// Skip phi-like recipes that generate their backedege values themselves. | ||||||||||||||||||
if (isa<VPWidenPHIRecipe>(&R)) | ||||||||||||||||||
for (VPBasicBlock *Header : | ||||||||||||||||||
VPBlockUtils::blocksOnly<VPBasicBlock>(vp_depth_first_shallow(Entry))) { | ||||||||||||||||||
if (!Header->isHeader(State->VPDT)) | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If a single header is expected, get it? Helps reduce the diff/indent changes, possibly leaving the extension to multiple headers for a follow-up patch, if/when needed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, moved getSingleLoopHeader to VPlanUtils and used here, thanks |
||||||||||||||||||
continue; | ||||||||||||||||||
for (VPRecipeBase &R : Header->phis()) { | ||||||||||||||||||
if (isa<VPWidenPHIRecipe>(&R)) | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. retain?
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should remain unchanged now |
||||||||||||||||||
continue; | ||||||||||||||||||
|
||||||||||||||||||
if (isa<VPWidenInductionRecipe>(&R)) { | ||||||||||||||||||
PHINode *Phi = nullptr; | ||||||||||||||||||
if (isa<VPWidenIntOrFpInductionRecipe>(&R)) { | ||||||||||||||||||
Phi = cast<PHINode>(State->get(R.getVPSingleValue())); | ||||||||||||||||||
} else { | ||||||||||||||||||
auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R); | ||||||||||||||||||
assert(!WidenPhi->onlyScalarsGenerated(State->VF.isScalable()) && | ||||||||||||||||||
"recipe generating only scalars should have been replaced"); | ||||||||||||||||||
auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi)); | ||||||||||||||||||
Phi = cast<PHINode>(GEP->getPointerOperand()); | ||||||||||||||||||
auto *LatchVPBB = cast<VPBasicBlock>(Header->getPredecessors()[1]); | ||||||||||||||||||
BasicBlock *VectorLatchBB = State->CFG.VPBB2IRBB[LatchVPBB]; | ||||||||||||||||||
|
||||||||||||||||||
if (isa<VPWidenInductionRecipe>(&R)) { | ||||||||||||||||||
PHINode *Phi = nullptr; | ||||||||||||||||||
if (isa<VPWidenIntOrFpInductionRecipe>(&R)) { | ||||||||||||||||||
Phi = cast<PHINode>(State->get(R.getVPSingleValue())); | ||||||||||||||||||
} else { | ||||||||||||||||||
auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R); | ||||||||||||||||||
assert(!WidenPhi->onlyScalarsGenerated(State->VF.isScalable()) && | ||||||||||||||||||
"recipe generating only scalars should have been replaced"); | ||||||||||||||||||
auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi)); | ||||||||||||||||||
Phi = cast<PHINode>(GEP->getPointerOperand()); | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
Phi->setIncomingBlock(1, VectorLatchBB); | ||||||||||||||||||
|
||||||||||||||||||
// Move the last step to the end of the latch block. This ensures | ||||||||||||||||||
// consistent placement of all induction updates. | ||||||||||||||||||
Instruction *Inc = cast<Instruction>(Phi->getIncomingValue(1)); | ||||||||||||||||||
Inc->moveBefore( | ||||||||||||||||||
std::prev(VectorLatchBB->getTerminator()->getIterator())); | ||||||||||||||||||
|
||||||||||||||||||
// Use the steps for the last part as backedge value for the induction. | ||||||||||||||||||
if (auto *IV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&R)) | ||||||||||||||||||
Inc->setOperand(0, State->get(IV->getLastUnrolledPartOperand())); | ||||||||||||||||||
continue; | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
Phi->setIncomingBlock(1, VectorLatchBB); | ||||||||||||||||||
|
||||||||||||||||||
// Move the last step to the end of the latch block. This ensures | ||||||||||||||||||
// consistent placement of all induction updates. | ||||||||||||||||||
Instruction *Inc = cast<Instruction>(Phi->getIncomingValue(1)); | ||||||||||||||||||
Inc->moveBefore(std::prev(VectorLatchBB->getTerminator()->getIterator())); | ||||||||||||||||||
|
||||||||||||||||||
// Use the steps for the last part as backedge value for the induction. | ||||||||||||||||||
if (auto *IV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&R)) | ||||||||||||||||||
Inc->setOperand(0, State->get(IV->getLastUnrolledPartOperand())); | ||||||||||||||||||
continue; | ||||||||||||||||||
auto *PhiR = cast<VPSingleDefRecipe>(&R); | ||||||||||||||||||
// VPInstructions currently model scalar Phis only. | ||||||||||||||||||
bool NeedsScalar = isa<VPInstruction>(PhiR) || | ||||||||||||||||||
(isa<VPReductionPHIRecipe>(PhiR) && | ||||||||||||||||||
cast<VPReductionPHIRecipe>(PhiR)->isInLoop()); | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use dyn_cast instead of isa/cast There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. dyn_cast would need a separate assignment I think? Left as is for now, as this just moves the existing code There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It doesn't even move the existing code, merely indents it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changes should be gone now, thanks |
||||||||||||||||||
|
||||||||||||||||||
Value *Phi = State->get(PhiR, NeedsScalar); | ||||||||||||||||||
// VPHeaderPHIRecipe supports getBackedgeValue() but VPInstruction does | ||||||||||||||||||
// not. | ||||||||||||||||||
Value *Val = State->get(PhiR->getOperand(1), NeedsScalar); | ||||||||||||||||||
cast<PHINode>(Phi)->addIncoming(Val, VectorLatchBB); | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
auto *PhiR = cast<VPSingleDefRecipe>(&R); | ||||||||||||||||||
// VPInstructions currently model scalar Phis only. | ||||||||||||||||||
bool NeedsScalar = isa<VPInstruction>(PhiR) || | ||||||||||||||||||
(isa<VPReductionPHIRecipe>(PhiR) && | ||||||||||||||||||
cast<VPReductionPHIRecipe>(PhiR)->isInLoop()); | ||||||||||||||||||
Value *Phi = State->get(PhiR, NeedsScalar); | ||||||||||||||||||
// VPHeaderPHIRecipe supports getBackedgeValue() but VPInstruction does not. | ||||||||||||||||||
Value *Val = State->get(PhiR->getOperand(1), NeedsScalar); | ||||||||||||||||||
cast<PHINode>(Phi)->addIncoming(Val, VectorLatchBB); | ||||||||||||||||||
} | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
|
@@ -1365,16 +1390,16 @@ void VPlanPrinter::dumpRegion(const VPRegionBlock *Region) { | |||||||||||||||||
|
||||||||||||||||||
#endif | ||||||||||||||||||
|
||||||||||||||||||
/// Returns true if there is a vector loop region and \p VPV is defined in a | ||||||||||||||||||
/// loop region. | ||||||||||||||||||
static bool isDefinedInsideLoopRegions(const VPValue *VPV) { | ||||||||||||||||||
const VPRecipeBase *DefR = VPV->getDefiningRecipe(); | ||||||||||||||||||
return DefR && (!DefR->getParent()->getPlan()->getVectorLoopRegion() || | ||||||||||||||||||
DefR->getParent()->getEnclosingLoopRegion()); | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
bool VPValue::isDefinedOutsideLoopRegions() const { | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
whether loop is represented as a region or not? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated, thanks |
||||||||||||||||||
return !isDefinedInsideLoopRegions(this); | ||||||||||||||||||
auto *DefR = getDefiningRecipe(); | ||||||||||||||||||
if (!DefR) | ||||||||||||||||||
return true; | ||||||||||||||||||
|
||||||||||||||||||
const VPBasicBlock *DefVPBB = DefR->getParent(); | ||||||||||||||||||
auto *Plan = DefVPBB->getPlan(); | ||||||||||||||||||
if (Plan->getVectorLoopRegion()) | ||||||||||||||||||
return !DefR->getParent()->getEnclosingLoopRegion(); | ||||||||||||||||||
return DefVPBB == Plan->getEntry(); | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The only out-of-loop place that values can be defined in when regions are gone - is in Plan's entry block? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All relevant hoisting should already have happened before disolving the regions; updated to only check non-live-ins if the top-level loop region still exists. |
||||||||||||||||||
} | ||||||||||||||||||
void VPValue::replaceAllUsesWith(VPValue *New) { | ||||||||||||||||||
replaceUsesWithIf(New, [](VPUser &, unsigned) { return true; }); | ||||||||||||||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -3415,6 +3415,9 @@ class VPBasicBlock : public VPBlockBase { | |||||
/// second predecessor is the exiting block of the region. | ||||||
const VPBasicBlock *getCFGPredecessor(unsigned Idx) const; | ||||||
|
||||||
/// Returns true if the block is a loop header in a plain-CFG VPlan. | ||||||
bool isHeader(const VPDominatorTree &VPDT) const; | ||||||
|
||||||
protected: | ||||||
/// Execute the recipes in the IR basic block \p BB. | ||||||
void executeRecipes(VPTransformState *State, BasicBlock *BB); | ||||||
|
@@ -3566,6 +3569,10 @@ class VPRegionBlock : public VPBlockBase { | |||||
/// Clone all blocks in the single-entry single-exit region of the block and | ||||||
/// their recipes without updating the operands of the cloned recipes. | ||||||
VPRegionBlock *clone() override; | ||||||
|
||||||
/// Remove the current region from its VPlan, connecting its predecessor to | ||||||
/// its entry and exiting block to its successor. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated ,thanks |
||||||
void removeRegion(); | ||||||
}; | ||||||
|
||||||
/// VPlan models a candidate for vectorization, encoding various decisions take | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? As this is regardless of loop being vector, main or otherwise. I.e., works until epilog and/or scalar remainder/fallback loops join main vector loop in same Plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to vputils, now this utility is also used for native path, where there may be multiple headers; Adjusted the name to
getTopLevelVectorLoopHeader