Skip to content

[VPlan] Replace VPRegionBlock with explicit CFG before execute (NFCI). #117506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 24, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 21 additions & 10 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2760,6 +2760,15 @@ LoopVectorizationCostModel::getVectorIntrinsicCost(CallInst *CI,
return TTI.getIntrinsicInstrCost(CostAttrs, CostKind);
}

static VPBasicBlock *getHeaderForMainVectorLoop(VPlan &Plan,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static VPBasicBlock *getHeaderForMainVectorLoop(VPlan &Plan,
static VPBasicBlock *getSingleLoopHeader(VPlan &Plan,

? As this is regardless of loop being vector, main or otherwise. I.e., works until epilog and/or scalar remainder/fallback loops join main vector loop in same Plan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to vputils, now this utility is also used for native path, where there may be multiple headers; Adjusted the name to getTopLevelVectorLoopHeader

VPDominatorTree &VPDT) {
return find_singleton<VPBasicBlock>(
vp_depth_first_shallow(Plan.getEntry()), [&VPDT](VPBlockBase *VPB, bool) {
auto *VPBB = dyn_cast<VPBasicBlock>(VPB);
return VPBB && VPBB->isHeader(VPDT) ? VPBB : nullptr;
});
}

void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {
// Fix widened non-induction PHIs by setting up the PHI operands.
if (EnableVPlanNativePath)
Expand All @@ -2778,13 +2787,13 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {
PSE.getSE()->forgetLoop(OrigLoop);
PSE.getSE()->forgetBlockAndLoopDispositions();

// Don't apply optimizations below when no vector region remains, as they all
// require a vector loop at the moment.
if (!State.Plan->getVectorLoopRegion())
// Don't apply optimizations below when no vector loop remains, as they all
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Don't apply optimizations below when no vector loop remains, as they all
// Don't apply optimizations below when no (vector) loop remains, as they all

this no longer checks if the loop is vector or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated thanks

// require one at the moment.
VPBasicBlock *HeaderVPBB =
getHeaderForMainVectorLoop(*State.Plan, State.VPDT);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that here we could ask if LI->isHeader() instead as we'll be calling LI->getLoopFor() next, i.e., rely on the IRBB's generated from VPBB's and their mappings in VPBB2IRBB and LI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, although it wouldnt' be much simpler? I left it as is for now

if (!HeaderVPBB)
return;

VPRegionBlock *VectorRegion = State.Plan->getVectorLoopRegion();
VPBasicBlock *HeaderVPBB = VectorRegion->getEntryBasicBlock();
BasicBlock *HeaderBB = State.CFG.VPBB2IRBB[HeaderVPBB];

// Remove redundant induction instructions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could cse() be applied as a VPlan2VPlan transformation rather than to the generated IR below? Possibly even before dismantling regions, when header blocks are easier to find.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this should also be moved to a VPlan-transform. The limitation to the loop region isn't really material for the VPlan version.

Expand All @@ -2809,7 +2818,7 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could setProfileInfoAfterUnrolling() be applied to the branch recipe as part of its VPIRMetadata, rather than above to the branch instruction it generates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but it would also need adding metadata to the branch in the scalar loop.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which calls for expanding VPlan's scope...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, although it will require modifying the original scalar loop in VPlan, which probably needs more thought what kind of modifications we need. Maybe adding metadata would be sufficient?


void InnerLoopVectorizer::fixNonInductionPHIs(VPTransformState &State) {
auto Iter = vp_depth_first_deep(Plan.getEntry());
auto Iter = vp_depth_first_shallow(Plan.getEntry());
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(Iter)) {
for (VPRecipeBase &P : VPBB->phis()) {
VPWidenPHIRecipe *VPPhi = dyn_cast<VPWidenPHIRecipe>(&P);
Expand Down Expand Up @@ -7799,6 +7808,9 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
BestVPlan, BestVF,
TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector));
VPlanTransforms::removeDeadRecipes(BestVPlan);

VPBasicBlock *MiddleVPBB =
BestVPlan.getVectorLoopRegion() ? BestVPlan.getMiddleBlock() : nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce closer to use - why hoist (from line 7964 below)?
If set to null, will its uses still work ok?
Should getMiddleBlock() return null in the absence of a vector loop region?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done here, just before disolving the region. To sink it, we would need to find the middle-block w/o region.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Deserves a comment.

Also worth noting phase ordering aspects of disolveLoopRegions() - intentionally after optimizeForVFAndUF()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks.

VPlanTransforms::convertToConcreteRecipes(BestVPlan,
*Legal->getWidestInductionType());

Expand Down Expand Up @@ -7894,14 +7906,14 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
// 2.6. Maintain Loop Hints
// Keep all loop hints from the original loop on the vector loop (we'll
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the Loop Hints be captured in VPIRMetadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be possible, can do separately, thanks

// replace the vectorizer-specific hints below).
if (auto *LoopRegion = BestVPlan.getVectorLoopRegion()) {
VPBasicBlock *HeaderVPBB = getHeaderForMainVectorLoop(BestVPlan, State.VPDT);
if (HeaderVPBB) {
MDNode *OrigLoopID = OrigLoop->getLoopID();

std::optional<MDNode *> VectorizedLoopID =
makeFollowupLoopID(OrigLoopID, {LLVMLoopVectorizeFollowupAll,
LLVMLoopVectorizeFollowupVectorized});

VPBasicBlock *HeaderVPBB = LoopRegion->getEntryBasicBlock();
Loop *L = LI->getLoopFor(State.CFG.VPBB2IRBB[HeaderVPBB]);
if (VectorizedLoopID) {
L->setLoopID(*VectorizedLoopID);
Expand Down Expand Up @@ -7947,8 +7959,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
ILV.printDebugTracesAtEnd();

// 4. Adjust branch weight of the branch in the middle block.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this branch weight be captured in VPIRMetadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will look into that separately, thanks

if (BestVPlan.getVectorLoopRegion()) {
auto *MiddleVPBB = BestVPlan.getMiddleBlock();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hoist setting MiddleVPBB from here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, currently it relies on the region. Could also do a bit more work to find it w/o region if desired

if (HeaderVPBB) {
auto *MiddleTerm =
cast<BranchInst>(State.CFG.VPBB2IRBB[MiddleVPBB]->getTerminator());
if (MiddleTerm->isConditional() &&
Expand Down
191 changes: 108 additions & 83 deletions llvm/lib/Transforms/Vectorize/VPlan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,11 @@ VPBlockBase *VPBlockBase::getEnclosingBlockWithPredecessors() {
return Parent->getEnclosingBlockWithPredecessors();
}

bool VPBasicBlock::isHeader(const VPDominatorTree &VPDT) const {
return getPredecessors().size() == 2 &&
VPDT.dominates(this, getPredecessors()[1]);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps BlockUtils should provide an isHeader(VPBasicBlock& VPBB, const VPDominatorTree &VPDT), which could first check if VPBB is inside a loop region? I.e., works in both HCFG and CFG modes.

Another option to identify header blocks after dismantling regions (and possibly also before), rather than depending on external control flow nor dominance info, is by depending on the block's internal content - a block is a header iff its first recipe is a header phi. If VPWidenPHIRecipes and/or VPPhi VPInstructions may reside in header and need to be identified as such (in addition to VPHeaderPhiRecipes), they could be defined as VPWidenMuRecipes and VPMu's instead, respectively, to check if isa<VPHeaderPhiRecipe, VPWidenMuRecipe, VPMu>. And/or have a VPMuAccessors trait which adds start & backedge values API to VPPhiAccessors. Sounds reasonable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Region headers for the top region will always have canonical phis, but the header block of the inner loop when doing outer-loop vectorization would not have one.

I moved it to VPBlockUtils for now.

Not sure if it is worth adding the extra abstractions, just to make it slightly easier to identify header blocks independent of the CFG? Would there be other benefits?

VPBasicBlock::iterator VPBasicBlock::getFirstNonPhi() {
iterator It = begin();
while (It != end() && It->isPhi())
Expand Down Expand Up @@ -424,14 +429,18 @@ void VPBasicBlock::connectToPredecessors(VPTransformState &State) {
for (VPBlockBase *PredVPBlock : getHierarchicalPredecessors()) {
VPBasicBlock *PredVPBB = PredVPBlock->getExitingBasicBlock();
auto &PredVPSuccessors = PredVPBB->getHierarchicalSuccessors();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: getHierarchical*() are still needed due to replicate regions, when VF>1 (but removed when unrolling only, with UF>1, VF=1?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

BasicBlock *PredBB = CFG.VPBB2IRBB[PredVPBB];
BasicBlock *PredBB = CFG.VPBB2IRBB.lookup(PredVPBB);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can retrain

Suggested change
BasicBlock *PredBB = CFG.VPBB2IRBB.lookup(PredVPBB);
BasicBlock *PredBB = CFG.VPBB2IRBB[PredVPBB];

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored, thanks

if (!PredBB)
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: early-continue asap

Suggested change
BasicBlock *PredBB = CFG.VPBB2IRBB.lookup(PredVPBB);
if (!PredBB)
continue;
BasicBlock *PredBB = CFG.VPBB2IRBB.lookup(PredVPBB);
// Explain when PredBB may be absent?
if (!PredBB)
continue;
auto &PredVPSuccessors = PredVPBB->getHierarchicalSuccessors();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented, this is when executing the header. Made the check more explicit and moved out of the loop.


assert(PredBB && "Predecessor basic-block not found building successor.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can alternatively introduce an assert(CFG.VPBB2IRBB.contains(PredVPBB) && ... before the CFG.VPBB2IRBB[PredVPBB].

auto *PredBBTerminator = PredBB->getTerminator();
LLVM_DEBUG(dbgs() << "LV: draw edge from" << PredBB->getName() << '\n');

auto *TermBr = dyn_cast<BranchInst>(PredBBTerminator);
if (isa<UnreachableInst>(PredBBTerminator)) {
if (PredVPSuccessors.size() == 2)
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment explaining this newly ignored case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check not needed for the latest version, removed thanks.

assert(PredVPSuccessors.size() == 1 &&
"Predecessor ending w/o branch must have single successor.");
DebugLoc DL = PredBBTerminator->getDebugLoc();
Expand Down Expand Up @@ -487,11 +496,25 @@ void VPBasicBlock::execute(VPTransformState *State) {
bool Replica = bool(State->Lane);
BasicBlock *NewBB = State->CFG.PrevBB; // Reuse it if possible.

if (isHeader(State->VPDT)) {
// Create and register the new vector loop.
Loop *PrevParentLoop = State->CurrentParentLoop;
State->CurrentParentLoop = State->LI->AllocateLoop();

// Insert the new loop into the loop nest and register the new basic blocks
// before calling any utilities such as SCEV that require valid LoopInfo.
if (PrevParentLoop)
PrevParentLoop->addChildLoop(State->CurrentParentLoop);
else
State->LI->addTopLevelLoop(State->CurrentParentLoop);
}

auto IsReplicateRegion = [](VPBlockBase *BB) {
auto *R = dyn_cast_or_null<VPRegionBlock>(BB);
return R && R->isReplicator();
assert((!R || R->isReplicator()) &&
"only replicate region blocks should remain");
return R;
};

// 1. Create an IR basic block.
if ((Replica && this == getParent()->getEntry()) ||
IsReplicateRegion(getSingleHierarchicalPredecessor())) {
Expand All @@ -514,6 +537,14 @@ void VPBasicBlock::execute(VPTransformState *State) {

// 2. Fill the IR basic block with IR instructions.
executeRecipes(State, NewBB);

// If this block is a latch, update CurrentParentLoop.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking if current canonical block isLatch could be done by checking if its 2nd successor isHeader?
(Preheader has a single successor whereas latch of canonicalized vector loop has two, with header as 2nd).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If this block is a latch, update CurrentParentLoop.
// If this block is a latch, update CurrentParentLoop. The second successor of a latch is a header, with the other successor being an exit/middle.

Wrapping in VPBlockUtils::isLatch() may be clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks

if (any_of(getSuccessors(), [State, this](VPBlockBase *Succ) {
auto *VPBB = dyn_cast<VPBasicBlock>(Succ);
return VPBB && VPBB->isHeader(State->VPDT) &&
State->VPDT.dominates(Succ, this);
}))
State->CurrentParentLoop = State->CurrentParentLoop->getParentLoop();
}

VPBasicBlock *VPBasicBlock::clone() {
Expand Down Expand Up @@ -725,35 +756,13 @@ VPRegionBlock *VPRegionBlock::clone() {
}

void VPRegionBlock::execute(VPTransformState *State) {
ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>>
RPOT(Entry);

if (!isReplicator()) {
// Create and register the new vector loop.
Loop *PrevParentLoop = State->CurrentParentLoop;
State->CurrentParentLoop = State->LI->AllocateLoop();

// Insert the new loop into the loop nest and register the new basic blocks
// before calling any utilities such as SCEV that require valid LoopInfo.
if (PrevParentLoop)
PrevParentLoop->addChildLoop(State->CurrentParentLoop);
else
State->LI->addTopLevelLoop(State->CurrentParentLoop);

// Visit the VPBlocks connected to "this", starting from it.
for (VPBlockBase *Block : RPOT) {
LLVM_DEBUG(dbgs() << "LV: VPBlock in RPO " << Block->getName() << '\n');
Block->execute(State);
}

State->CurrentParentLoop = PrevParentLoop;
return;
}

assert(isReplicator() &&
"Loop regions should have been lowered to plain CFG");
assert(!State->Lane && "Replicating a Region with non-null instance.");

// Enter replicating mode.
assert(!State->VF.isScalable() && "VF is assumed to be non scalable.");

ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
Entry);
State->Lane = VPLane(0);
for (unsigned Lane = 0, VF = State->VF.getKnownMinValue(); Lane < VF;
++Lane) {
Expand Down Expand Up @@ -847,6 +856,22 @@ void VPRegionBlock::print(raw_ostream &O, const Twine &Indent,
}
#endif

void VPRegionBlock::removeRegion() {
auto *Header = cast<VPBasicBlock>(getEntry());
VPBlockBase *Preheader = getSinglePredecessor();
auto *Exiting = cast<VPBasicBlock>(getExiting());

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

VPBlockBase *Middle = getSingleSuccessor();
VPBlockUtils::disconnectBlocks(Preheader, this);
VPBlockUtils::disconnectBlocks(this, Middle);

for (VPBlockBase *VPB : vp_depth_first_shallow(Entry))
VPB->setParent(nullptr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new parent should conceptually be the removed region's parent, or Preheader/Middle's parent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks


VPBlockUtils::connectBlocks(Preheader, Header);
VPBlockUtils::connectBlocks(Exiting, Middle);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the backedge be added too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, moved here, thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Exiting be called Latch? It is admittedly both, so could also be ExitingLatch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to ExitingLatch, thanks!

}

VPlan::VPlan(Loop *L) {
setEntry(createVPIRBasicBlock(L->getLoopPreheader()));
ScalarHeader = createVPIRBasicBlock(L->getHeader());
Expand Down Expand Up @@ -956,57 +981,57 @@ void VPlan::execute(VPTransformState *State) {
for (VPBlockBase *Block : RPOT)
Block->execute(State);

State->CFG.DTU.flush();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to flush now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is independent, will submit separately, thanks!


auto *LoopRegion = getVectorLoopRegion();
if (!LoopRegion)
return;

VPBasicBlock *LatchVPBB = LoopRegion->getExitingBasicBlock();
BasicBlock *VectorLatchBB = State->CFG.VPBB2IRBB[LatchVPBB];

// Fix the latch value of canonical, reduction and first-order recurrences
// phis in the vector loop.
VPBasicBlock *Header = LoopRegion->getEntryBasicBlock();
for (VPRecipeBase &R : Header->phis()) {
// Skip phi-like recipes that generate their backedege values themselves.
if (isa<VPWidenPHIRecipe>(&R))
for (VPBasicBlock *Header :
VPBlockUtils::blocksOnly<VPBasicBlock>(vp_depth_first_shallow(Entry))) {
if (!Header->isHeader(State->VPDT))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a single header is expected, get it? Helps reduce the diff/indent changes, possibly leaving the extension to multiple headers for a follow-up patch, if/when needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, moved getSingleLoopHeader to VPlanUtils and used here, thanks

continue;
for (VPRecipeBase &R : Header->phis()) {
if (isa<VPWidenPHIRecipe>(&R))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retain?

Suggested change
if (isa<VPWidenPHIRecipe>(&R))
// Skip phi-like recipes that generate their backedege values themselves.
if (isa<VPWidenPHIRecipe>(&R))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should remain unchanged now

continue;

if (isa<VPWidenInductionRecipe>(&R)) {
PHINode *Phi = nullptr;
if (isa<VPWidenIntOrFpInductionRecipe>(&R)) {
Phi = cast<PHINode>(State->get(R.getVPSingleValue()));
} else {
auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R);
assert(!WidenPhi->onlyScalarsGenerated(State->VF.isScalable()) &&
"recipe generating only scalars should have been replaced");
auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi));
Phi = cast<PHINode>(GEP->getPointerOperand());
auto *LatchVPBB = cast<VPBasicBlock>(Header->getPredecessors()[1]);
BasicBlock *VectorLatchBB = State->CFG.VPBB2IRBB[LatchVPBB];

if (isa<VPWidenInductionRecipe>(&R)) {
PHINode *Phi = nullptr;
if (isa<VPWidenIntOrFpInductionRecipe>(&R)) {
Phi = cast<PHINode>(State->get(R.getVPSingleValue()));
} else {
auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R);
assert(!WidenPhi->onlyScalarsGenerated(State->VF.isScalable()) &&
"recipe generating only scalars should have been replaced");
auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi));
Phi = cast<PHINode>(GEP->getPointerOperand());
}

Phi->setIncomingBlock(1, VectorLatchBB);

// Move the last step to the end of the latch block. This ensures
// consistent placement of all induction updates.
Instruction *Inc = cast<Instruction>(Phi->getIncomingValue(1));
Inc->moveBefore(
std::prev(VectorLatchBB->getTerminator()->getIterator()));

// Use the steps for the last part as backedge value for the induction.
if (auto *IV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&R))
Inc->setOperand(0, State->get(IV->getLastUnrolledPartOperand()));
continue;
}

Phi->setIncomingBlock(1, VectorLatchBB);

// Move the last step to the end of the latch block. This ensures
// consistent placement of all induction updates.
Instruction *Inc = cast<Instruction>(Phi->getIncomingValue(1));
Inc->moveBefore(std::prev(VectorLatchBB->getTerminator()->getIterator()));

// Use the steps for the last part as backedge value for the induction.
if (auto *IV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&R))
Inc->setOperand(0, State->get(IV->getLastUnrolledPartOperand()));
continue;
auto *PhiR = cast<VPSingleDefRecipe>(&R);
// VPInstructions currently model scalar Phis only.
bool NeedsScalar = isa<VPInstruction>(PhiR) ||
(isa<VPReductionPHIRecipe>(PhiR) &&
cast<VPReductionPHIRecipe>(PhiR)->isInLoop());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use dyn_cast instead of isa/cast

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dyn_cast would need a separate assignment I think? Left as is for now, as this just moves the existing code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't even move the existing code, merely indents it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes should be gone now, thanks


Value *Phi = State->get(PhiR, NeedsScalar);
// VPHeaderPHIRecipe supports getBackedgeValue() but VPInstruction does
// not.
Value *Val = State->get(PhiR->getOperand(1), NeedsScalar);
cast<PHINode>(Phi)->addIncoming(Val, VectorLatchBB);
}

auto *PhiR = cast<VPSingleDefRecipe>(&R);
// VPInstructions currently model scalar Phis only.
bool NeedsScalar = isa<VPInstruction>(PhiR) ||
(isa<VPReductionPHIRecipe>(PhiR) &&
cast<VPReductionPHIRecipe>(PhiR)->isInLoop());
Value *Phi = State->get(PhiR, NeedsScalar);
// VPHeaderPHIRecipe supports getBackedgeValue() but VPInstruction does not.
Value *Val = State->get(PhiR->getOperand(1), NeedsScalar);
cast<PHINode>(Phi)->addIncoming(Val, VectorLatchBB);
}
}

Expand Down Expand Up @@ -1365,16 +1390,16 @@ void VPlanPrinter::dumpRegion(const VPRegionBlock *Region) {

#endif

/// Returns true if there is a vector loop region and \p VPV is defined in a
/// loop region.
static bool isDefinedInsideLoopRegions(const VPValue *VPV) {
const VPRecipeBase *DefR = VPV->getDefiningRecipe();
return DefR && (!DefR->getParent()->getPlan()->getVectorLoopRegion() ||
DefR->getParent()->getEnclosingLoopRegion());
}

bool VPValue::isDefinedOutsideLoopRegions() const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool VPValue::isDefinedOutsideLoopRegions() const {
bool VPValue::isDefinedOutsideLoop() const {

whether loop is represented as a region or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

return !isDefinedInsideLoopRegions(this);
auto *DefR = getDefiningRecipe();
if (!DefR)
return true;

const VPBasicBlock *DefVPBB = DefR->getParent();
auto *Plan = DefVPBB->getPlan();
if (Plan->getVectorLoopRegion())
return !DefR->getParent()->getEnclosingLoopRegion();
return DefVPBB == Plan->getEntry();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only out-of-loop place that values can be defined in when regions are gone - is in Plan's entry block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All relevant hoisting should already have happened before disolving the regions; updated to only check non-live-ins if the top-level loop region still exists.

}
void VPValue::replaceAllUsesWith(VPValue *New) {
replaceUsesWithIf(New, [](VPUser &, unsigned) { return true; });
Expand Down
7 changes: 7 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -3415,6 +3415,9 @@ class VPBasicBlock : public VPBlockBase {
/// second predecessor is the exiting block of the region.
const VPBasicBlock *getCFGPredecessor(unsigned Idx) const;

/// Returns true if the block is a loop header in a plain-CFG VPlan.
bool isHeader(const VPDominatorTree &VPDT) const;

protected:
/// Execute the recipes in the IR basic block \p BB.
void executeRecipes(VPTransformState *State, BasicBlock *BB);
Expand Down Expand Up @@ -3566,6 +3569,10 @@ class VPRegionBlock : public VPBlockBase {
/// Clone all blocks in the single-entry single-exit region of the block and
/// their recipes without updating the operands of the cloned recipes.
VPRegionBlock *clone() override;

/// Remove the current region from its VPlan, connecting its predecessor to
/// its entry and exiting block to its successor.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// its entry and exiting block to its successor.
/// its entry, and its exiting block to its successor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated ,thanks

void removeRegion();
};

/// VPlan models a candidate for vectorization, encoding various decisions take
Expand Down
Loading
Loading