Skip to content

[VPlan] Delay adding canonical IV increment. #82270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 18 additions & 21 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7639,6 +7639,14 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(

// TODO: Move to VPlan transform stage once the transition to the VPlan-based
// cost model is complete for better cost estimates.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this comment refer to, should it move?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It refers to the VPlan transforms below, I moved down and clarified.

TailFoldingStyle Style = CM.getTailFoldingStyle(
!isIndvarOverflowCheckKnownFalse(&CM, BestVF, BestUF));
// When not folding the tail, we know that the induction increment will not
// overflow.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, when folding tail, the induction increment may always overflow? Perhaps consider above isIndvarOverflowCheckKnownFalse()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code just moves the existing logic. Put up #111758 to improve this separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR has now landed, updated the code here.

bool HasNUW = Style == TailFoldingStyle::None;
bool WithoutRuntimeCheck =
Style == TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck;
VPlanTransforms::lowerCanonicalIV(BestVPlan, HasNUW, WithoutRuntimeCheck);
VPlanTransforms::unrollByUF(BestVPlan, BestUF,
OrigLoop->getHeader()->getContext());
VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
Expand Down Expand Up @@ -8768,29 +8776,25 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
}
}

// Add the necessary canonical IV and branch recipes required to control the
// loop.
static void addCanonicalIVRecipes(VPlan &Plan, Type *IdxTy, bool HasNUW,
DebugLoc DL) {
// Add the required canonical IV.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Add the required canonical IV.
// Add the required canonical IV along with its loop branch, but w/o its increment - which is introduced later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

static void addCanonicalIV(VPlan &Plan, Type *IdxTy, DebugLoc DL) {
Value *StartIdx = ConstantInt::get(IdxTy, 0);
auto *StartV = Plan.getOrAddLiveIn(StartIdx);

// Add a VPCanonicalIVPHIRecipe starting at 0 to the header.
// TODO: Introduce a separate scalar phi recipe that can be used for codegen,
// turning VPCanonicalIVPHIRecipe into an 'abstract' recipe which cannot be
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VPCanonicalIVPHIRecipe, as a phi recipe in the header block, and BranchOnCount as a recipe in the latch, seem to retain their individual concrete semantics, even if the increment is introduced later between them.

// executed directly.
auto *CanonicalIVPHI = new VPCanonicalIVPHIRecipe(StartV, DL);
VPRegionBlock *TopRegion = Plan.getVectorLoopRegion();
VPBasicBlock *Header = TopRegion->getEntryBasicBlock();
Header->insert(CanonicalIVPHI, Header->begin());

VPBuilder Builder(TopRegion->getExitingBasicBlock());
// Add a VPInstruction to increment the scalar canonical IV by VF * UF.
auto *CanonicalIVIncrement = Builder.createOverflowingOp(
Instruction::Add, {CanonicalIVPHI, &Plan.getVFxUF()}, {HasNUW, false}, DL,
"index.next");
CanonicalIVPHI->addOperand(CanonicalIVIncrement);

// Add the BranchOnCount VPInstruction to the latch.
VPBuilder Builder(TopRegion->getExitingBasicBlock());
// TODO: introduce branch-on-count during VPlan final (pre-codegen) lowering.
Builder.createNaryOp(VPInstruction::BranchOnCount,
{CanonicalIVIncrement, &Plan.getVectorTripCount()}, DL);
{CanonicalIVPHI, &Plan.getVectorTripCount()}, DL);
Comment on lines +8933 to +8935
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about having BranchOnCount also be responsible for bumping the IV, at least initially? (Inspired by PowerPC's bdnz instruction; can call it BranchOnIncrementCount.) It could then feed back the canonical IV Phi across the back-edge, and possibly be split into a separate Add later to simplify code-gen.
Unclear if doing so helps achieve the desired simplification(?)

}

/// Create resume phis in the scalar preheader for first-order recurrences and
Expand Down Expand Up @@ -9038,10 +9042,7 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {

DebugLoc DL = getDebugLocFromInstOrOperands(Legal->getPrimaryInduction());
TailFoldingStyle Style = CM.getTailFoldingStyle(IVUpdateMayOverflow);
// When not folding the tail, we know that the induction increment will not
// overflow.
bool HasNUW = Style == TailFoldingStyle::None;
addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), HasNUW, DL);
addCanonicalIV(*Plan, Legal->getWidestInductionType(), DL);

VPRecipeBuilder RecipeBuilder(*Plan, OrigLoop, TLI, Legal, CM, PSE, Builder);

Expand Down Expand Up @@ -9279,11 +9280,7 @@ VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
Plan->getVectorLoopRegion()->getExitingBasicBlock()->getTerminator();
Term->eraseFromParent();

// Tail folding is not supported for outer loops, so the induction increment
// is guaranteed to not wrap.
bool HasNUW = true;
addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), HasNUW,
DebugLoc());
addCanonicalIV(*Plan, Legal->getWidestInductionType(), DebugLoc());
assert(verifyVPlanIsValid(*Plan) && "VPlan is invalid");
return Plan;
}
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Transforms/Vectorize/VPlan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include "LoopVectorizationPlanner.h"
#include "VPlanCFG.h"
#include "VPlanPatternMatch.h"
#include "VPlanVerifier.h"
#include "VPlanTransforms.h"
#include "VPlanUtils.h"
#include "llvm/ADT/PostOrderIterator.h"
Expand Down
3 changes: 2 additions & 1 deletion llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -3122,7 +3122,8 @@ class VPCanonicalIVPHIRecipe : public VPHeaderPHIRecipe {

VPCanonicalIVPHIRecipe *clone() override {
auto *R = new VPCanonicalIVPHIRecipe(getOperand(0), getDebugLoc());
R->addOperand(getBackedgeValue());
if (getNumOperands() == 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to expect every header phi recipe to have two operands (always) - implementing getStartValue() and getBackedgeValue() of VPHeaderPhiRecipe?

R->addOperand(getBackedgeValue());
return R;
}

Expand Down
137 changes: 83 additions & 54 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1289,19 +1289,12 @@ void VPlanTransforms::optimize(VPlan &Plan) {
// %Negated = Not %ALM
// branch-on-cond %Negated
//
static VPActiveLaneMaskPHIRecipe *addVPLaneMaskPhiAndUpdateExitBranch(
VPlan &Plan, bool DataAndControlFlowWithoutRuntimeCheck) {
static VPActiveLaneMaskPHIRecipe *createActiveLaneMaskPhi(VPlan &Plan) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above documentation should be kept in sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

VPRegionBlock *TopRegion = Plan.getVectorLoopRegion();
VPBasicBlock *EB = TopRegion->getExitingBasicBlock();
auto *CanonicalIVPHI = Plan.getCanonicalIV();
VPValue *StartV = CanonicalIVPHI->getStartValue();

auto *CanonicalIVIncrement =
cast<VPInstruction>(CanonicalIVPHI->getBackedgeValue());
// TODO: Check if dropping the flags is needed if
// !DataAndControlFlowWithoutRuntimeCheck.
CanonicalIVIncrement->dropPoisonGeneratingFlags();
DebugLoc DL = CanonicalIVIncrement->getDebugLoc();
DebugLoc DL = CanonicalIVPHI->getDebugLoc();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for an increment placed in the preheader, so ok to use the DL of the phi instead of that of the in-loop/backedged-value increment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think so, the DL from the CanonicalIV should be the closest accurate debug location

// We can't use StartV directly in the ActiveLaneMask VPInstruction, since
// we have to take unrolling into account. Each part needs to start at
// Part * VF
Expand All @@ -1311,21 +1304,6 @@ static VPActiveLaneMaskPHIRecipe *addVPLaneMaskPhiAndUpdateExitBranch(
// Create the ActiveLaneMask instruction using the correct start values.
VPValue *TC = Plan.getTripCount();

VPValue *TripCount, *IncrementValue;
if (!DataAndControlFlowWithoutRuntimeCheck) {
// When the loop is guarded by a runtime overflow check for the loop
// induction variable increment by VF, we can increment the value before
// the get.active.lane mask and use the unmodified tripcount.
IncrementValue = CanonicalIVIncrement;
TripCount = TC;
} else {
// When avoiding a runtime check, the active.lane.mask inside the loop
// uses a modified trip count and the induction variable increment is
// done after the active.lane.mask intrinsic is called.
IncrementValue = CanonicalIVPHI;
TripCount = Builder.createNaryOp(VPInstruction::CalculateTripCountMinusVF,
{TC}, DL);
}
auto *EntryIncrement = Builder.createOverflowingOp(
VPInstruction::CanonicalIVIncrementForPart, {StartV}, {false, false}, DL,
"index.part.next");
Expand All @@ -1339,24 +1317,6 @@ static VPActiveLaneMaskPHIRecipe *addVPLaneMaskPhiAndUpdateExitBranch(
// preheader ActiveLaneMask instruction.
auto *LaneMaskPhi = new VPActiveLaneMaskPHIRecipe(EntryALM, DebugLoc());
LaneMaskPhi->insertAfter(CanonicalIVPHI);

// Create the active lane mask for the next iteration of the loop before the
// original terminator.
VPRecipeBase *OriginalTerminator = EB->getTerminator();
Builder.setInsertPoint(OriginalTerminator);
auto *InLoopIncrement =
Builder.createOverflowingOp(VPInstruction::CanonicalIVIncrementForPart,
{IncrementValue}, {false, false}, DL);
auto *ALM = Builder.createNaryOp(VPInstruction::ActiveLaneMask,
{InLoopIncrement, TripCount}, DL,
"active.lane.mask.next");
LaneMaskPhi->addOperand(ALM);

// Replace the original terminator with BranchOnCond. We have to invert the
// mask here because a true condition means jumping to the exit block.
auto *NotMask = Builder.createNot(ALM, DL);
Builder.createNaryOp(VPInstruction::BranchOnCond, {NotMask}, DL);
OriginalTerminator->eraseFromParent();
return LaneMaskPhi;
}

Expand Down Expand Up @@ -1422,8 +1382,7 @@ void VPlanTransforms::addActiveLaneMask(
cast<VPWidenCanonicalIVRecipe>(*FoundWidenCanonicalIVUser);
VPSingleDefRecipe *LaneMask;
if (UseActiveLaneMaskForControlFlow) {
LaneMask = addVPLaneMaskPhiAndUpdateExitBranch(
Plan, DataAndControlFlowWithoutRuntimeCheck);
LaneMask = createActiveLaneMaskPhi(Plan);
} else {
VPBuilder B = VPBuilder::getToInsertAfter(WideCanonicalIV);
LaneMask = B.createNaryOp(VPInstruction::ActiveLaneMask,
Expand Down Expand Up @@ -1575,6 +1534,7 @@ bool VPlanTransforms::tryAddExplicitVectorLength(

auto *CanonicalIVPHI = Plan.getCanonicalIV();
VPValue *StartV = CanonicalIVPHI->getStartValue();
VPBasicBlock *Latch = Plan.getVectorLoopRegion()->getExitingBasicBlock();

// Create the ExplicitVectorLengthPhi recipe in the main loop.
auto *EVLPhi = new VPEVLBasedIVPHIRecipe(StartV, DebugLoc());
Expand All @@ -1593,30 +1553,26 @@ bool VPlanTransforms::tryAddExplicitVectorLength(
auto *VPEVL = Builder.createNaryOp(VPInstruction::ExplicitVectorLength, AVL,
DebugLoc());

auto *CanonicalIVIncrement =
cast<VPInstruction>(CanonicalIVPHI->getBackedgeValue());
VPSingleDefRecipe *OpVPEVL = VPEVL;
VPRecipeBase *LatchTerm = Latch->getTerminator();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Term may be ambiguous. Spell out LatchTerminator or rename LatchEnd or LatchBranch - currently expecting (and replacing) a BranchOnCount?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

if (unsigned IVSize = CanonicalIVPHI->getScalarType()->getScalarSizeInBits();
IVSize != 32) {
OpVPEVL = new VPScalarCastRecipe(IVSize < 32 ? Instruction::Trunc
: Instruction::ZExt,
OpVPEVL, CanonicalIVPHI->getScalarType());
OpVPEVL->insertBefore(CanonicalIVIncrement);
OpVPEVL->insertBefore(LatchTerm);
}
auto *NextEVLIV =
new VPInstruction(Instruction::Add, {OpVPEVL, EVLPhi},
{CanonicalIVIncrement->hasNoUnsignedWrap(),
CanonicalIVIncrement->hasNoSignedWrap()},
CanonicalIVIncrement->getDebugLoc(), "index.evl.next");
NextEVLIV->insertBefore(CanonicalIVIncrement);
new VPInstruction(Instruction::Add, {OpVPEVL, EVLPhi}, {false, false},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap flags initiated to false may later be turned on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes at this point we don't know if it wraps or not. Without #111758, the flags for EVL increment would always be false I think, as it always folds the tail.

With it, we could update it when introducing the increment (currently not done)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a comment and/or TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Possibly worth commenting what each false stands for.

CanonicalIVPHI->getDebugLoc(), "index.evl.next");
NextEVLIV->insertBefore(LatchTerm);
EVLPhi->addOperand(NextEVLIV);

transformRecipestoEVLRecipes(Plan, *VPEVL);

// Replace all uses of VPCanonicalIVPHIRecipe by
// VPEVLBasedIVPHIRecipe except for the canonical IV increment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should all uses be replaced now except for BranchOnCount?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes updated, thanks

// VPEVLBasedIVPHIRecipe.
CanonicalIVPHI->replaceAllUsesWith(EVLPhi);
CanonicalIVIncrement->setOperand(0, CanonicalIVPHI);
// TODO: support unroll factor > 1.
Plan.setUF(1);
return true;
Expand Down Expand Up @@ -1794,3 +1750,76 @@ void VPlanTransforms::createInterleaveGroups(
}
}
}

void VPlanTransforms::lowerCanonicalIV(
VPlan &Plan, bool HasNUW, bool DataAndControlFlowWithoutRuntimeCheck) {
auto *CanIV = Plan.getCanonicalIV();

VPBasicBlock *EB = Plan.getVectorLoopRegion()->getExitingBasicBlock();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPBasicBlock *EB = Plan.getVectorLoopRegion()->getExitingBasicBlock();
VPBasicBlock *Latch = Plan.getVectorLoopRegion()->getExitingBasicBlock();

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

auto *Term = EB->getTerminator();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *Term = EB->getTerminator();
auto *LatchBranch = Latch->getTerminator();

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

VPBuilder Builder(Term);
DebugLoc DL = CanIV->getDebugLoc();
// Add a VPInstruction to increment the scalar canonical IV by VF * UF.
auto *CanonicalIVIncrement =
Builder.createOverflowingOp(Instruction::Add, {CanIV, &Plan.getVFxUF()},
{HasNUW, false}, DL, "index.next");

CanIV->addOperand(CanonicalIVIncrement);

auto FoundLaneMaskPhi = find_if(
Plan.getVectorLoopRegion()->getEntryBasicBlock()->phis(),
[](VPRecipeBase &P) { return isa<VPActiveLaneMaskPHIRecipe>(P); });

if (FoundLaneMaskPhi ==
Plan.getVectorLoopRegion()->getEntryBasicBlock()->phis().end()) {

// Update BranchOnCount VPInstruction in the latch to use increment.
// TODO: Should have separate opcodes for separate semantics.
Term->setOperand(0, CanonicalIVIncrement);
return;
}

// Now introduce a conditional branch to control the loop until the lane mask
// is exhuasted.
auto *LaneMaskPhi = cast<VPActiveLaneMaskPHIRecipe>(&*FoundLaneMaskPhi);
auto *VecPreheader =
cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());
Builder.setInsertPoint(VecPreheader);

VPValue *TC = Plan.getTripCount();

// TODO: Check if dropping the flags is needed if
// !DataAndControlFlowWithoutRuntimeCheck.
CanonicalIVIncrement->dropPoisonGeneratingFlags();
VPValue *TripCount, *IncrementValue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is either the pre-incremented or the post-incremented IV

Suggested change
VPValue *TripCount, *IncrementValue;
VPValue *BoundToCheck, *IVToCheck;

("IncrementValue" itself is VF*UF, "TripCount" may be TripCount-VF(*UF))

if (!DataAndControlFlowWithoutRuntimeCheck) {
// When the loop is guarded by a runtime overflow check for the loop
// induction variable increment by VF, we can increment the value before
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// induction variable increment by VF, we can increment the value before
// induction variable increment by VFxUF, we can increment the value before

?

// the get.active.lane mask and use the unmodified tripcount.
Comment on lines +2050 to +2052
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth clarifying the distinction between HasNUW and DataAndControlFlowWithoutRuntimeCheck - the former (also) seems to imply/suffice for the condition here?

IncrementValue = CanonicalIVIncrement;
TripCount = TC;
} else {
// When avoiding a runtime check, the active.lane.mask inside the loop
// uses a modified trip count and the induction variable increment is
// done after the active.lane.mask intrinsic is called.
IncrementValue = CanIV;
TripCount = Builder.createNaryOp(VPInstruction::CalculateTripCountMinusVF,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(independent) Should be CalculateTripCountMinusVFxUF

{TC}, DL);
}
// Create the active lane mask for the next iteration of the loop before the
// original terminator.
Builder.setInsertPoint(EB);
auto *InLoopIncrement =
Builder.createOverflowingOp(VPInstruction::CanonicalIVIncrementForPart,
{IncrementValue}, {false, false}, DL);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth clarifying the distinction between CanonicalIVIncrement and CanonicalIVIncrementForPart: the former is the scalar value CanIV + VF * UF which is uniform across VF * UF and belongs to the scalar canonical IV chain, whereas the latter is the scalar value CanIV + (VF * UF) + P * UF which is uniform across VF and uses the canonical IV chain (either pre or post increment), as in "build scalar steps" per part P. In particular, the latter is useful only for UF>1, i.e., when unrolling. Right?

auto *ALM = Builder.createNaryOp(VPInstruction::ActiveLaneMask,
{InLoopIncrement, TripCount}, DL,
"active.lane.mask.next");
LaneMaskPhi->addOperand(ALM);

// Replace the original terminator with BranchOnCond. We have to invert the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now introduces the terminal rather than replace it.

Perhaps a terminal branch-on-count recipe should be introduced along with the abstract canonical IV (could conceptually check the pre-bumped IV with TC-step), delaying only the introduction of the canonical IV's increment between them for later? The canonical IV still remains abstract until this increment is added, but the VPlan continues to be "valid" w/o updating verify().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to initially still introduce the branch early. At the moment it still uses the same opcode, but would probably need to introduce a separate one for the different semantics (or define them conditionally on whether lowering has been finalized)

// mask here because a true condition means jumping to the exit block.
auto *NotMask = Builder.createNot(ALM, DL);
Builder.createNaryOp(VPInstruction::BranchOnCond, {NotMask}, DL);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BranchOnCond operates on a single condition bit, so NotMask (only last part thereof?) should effectively be reduced using AnyOf reduction - although its execute/generate seems to look for the first lane (only), rather than the last?

Term->eraseFromParent();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing terminator is already a branch on cond; reset its operand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version leaves the BranchOnCount until we replace it here.

}
5 changes: 5 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.h
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,11 @@ struct VPlanTransforms {

/// Remove dead recipes from \p Plan.
static void removeDeadRecipes(VPlan &Plan);

/// Finalize \p Plan by introducing explicit increments for the canonical
/// induction.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Finalize \p Plan by introducing explicit increments for the canonical
/// induction.
/// Complete the canonical IV by introducing an explicit increment between its header phi and latch branch on count.

this is one step in getting Plan to be ready for code-gen, not necessarily the last one after which Plan has been "Finalized".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we need to do this before unrolling, adjusted, thanks!

static void lowerCanonicalIV(VPlan &Plan, bool HasNUW,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of lower can use convert to be consistent with convertToConcreteRecipes(), but for now this is more of a complementing/completing process rather than a lowering/conversion one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

bool DataAndControlFlowWithoutRuntimeCheck);
};

} // namespace llvm
Expand Down
2 changes: 1 addition & 1 deletion llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ bool vputils::isUniformAcrossVFsAndUFs(VPValue *V) {

auto *CanonicalIV = R->getParent()->getPlan()->getCanonicalIV();
// Canonical IV chain is uniform.
if (V == CanonicalIV || V == CanonicalIV->getBackedgeValue())
if (V == CanonicalIV) // || V == CanonicalIV->getBackedgeValue())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continue to consider the backedge value uniform, if it exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

return true;

return TypeSwitch<const VPRecipeBase *, bool>(R)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,6 @@ define float @fadd_strict(ptr noalias nocapture readonly %a, i64 %n) #0 {
; CHECK-ORDERED-TF-NEXT: [[TMP7:%.*]] = sub i64 [[N]], [[TMP6]]
; CHECK-ORDERED-TF-NEXT: [[TMP8:%.*]] = icmp ugt i64 [[N]], [[TMP6]]
; CHECK-ORDERED-TF-NEXT: [[TMP9:%.*]] = select i1 [[TMP8]], i64 [[TMP7]], i64 0
; CHECK-ORDERED-TF-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 [[N]])
; CHECK-ORDERED-TF-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK-ORDERED-TF: vector.body:
; CHECK-ORDERED-TF-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
Expand Down Expand Up @@ -653,11 +652,11 @@ define void @fadd_strict_interleave(ptr noalias nocapture readonly %a, ptr noali
; CHECK-ORDERED-TF-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-ORDERED-TF-NEXT: [[TMP7:%.*]] = mul i64 [[TMP6]], 4
; CHECK-ORDERED-TF-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-ORDERED-TF-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 0, i64 [[TMP2]])
; CHECK-ORDERED-TF-NEXT: [[TMP9:%.*]] = mul i64 [[TMP8]], 4
; CHECK-ORDERED-TF-NEXT: [[TMP10:%.*]] = sub i64 [[TMP2]], [[TMP9]]
; CHECK-ORDERED-TF-NEXT: [[TMP11:%.*]] = icmp ugt i64 [[TMP2]], [[TMP9]]
; CHECK-ORDERED-TF-NEXT: [[TMP12:%.*]] = select i1 [[TMP11]], i64 [[TMP10]], i64 0
; CHECK-ORDERED-TF-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 0, i64 [[TMP2]])
; CHECK-ORDERED-TF-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK-ORDERED-TF: vector.body:
; CHECK-ORDERED-TF-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
Expand Down Expand Up @@ -915,7 +914,6 @@ define float @fadd_of_sum(ptr noalias nocapture readonly %a, ptr noalias nocaptu
; CHECK-ORDERED-TF-NEXT: [[TMP8:%.*]] = sub i64 [[N]], [[TMP7]]
; CHECK-ORDERED-TF-NEXT: [[TMP9:%.*]] = icmp ugt i64 [[N]], [[TMP7]]
; CHECK-ORDERED-TF-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i64 [[TMP8]], i64 0
; CHECK-ORDERED-TF-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 0, i64 [[N]])
; CHECK-ORDERED-TF-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK-ORDERED-TF: vector.body:
; CHECK-ORDERED-TF-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
Expand Down Expand Up @@ -1154,7 +1152,6 @@ define float @fadd_conditional(ptr noalias nocapture readonly %a, ptr noalias no
; CHECK-ORDERED-TF-NEXT: [[TMP7:%.*]] = sub i64 [[N]], [[TMP6]]
; CHECK-ORDERED-TF-NEXT: [[TMP8:%.*]] = icmp ugt i64 [[N]], [[TMP6]]
; CHECK-ORDERED-TF-NEXT: [[TMP9:%.*]] = select i1 [[TMP8]], i64 [[TMP7]], i64 0
; CHECK-ORDERED-TF-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 0, i64 [[N]])
; CHECK-ORDERED-TF-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK-ORDERED-TF: vector.body:
; CHECK-ORDERED-TF-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,12 @@ define void @simple_memset(i32 %val, ptr %ptr, i64 %n) #0 {
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
; CHECK-NEXT: [[TMP13:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP14:%.*]] = mul i64 [[TMP13]], 4
; CHECK-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 0, i64 [[UMAX]])
; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP6:%.*]] = mul i64 [[TMP5]], 4
; CHECK-NEXT: [[TMP7:%.*]] = sub i64 [[UMAX]], [[TMP6]]
; CHECK-NEXT: [[TMP8:%.*]] = icmp ugt i64 [[UMAX]], [[TMP6]]
; CHECK-NEXT: [[TMP9:%.*]] = select i1 [[TMP8]], i64 [[TMP7]], i64 0
; CHECK-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 0, i64 [[UMAX]])
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[VAL:%.*]], i64 0
; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,10 +78,10 @@ define void @can_overflow_i64_induction_var(ptr noalias %dst, ptr readonly %src,
; CHECK: vector.ph:
; CHECK-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[TMP6]], 2
; CHECK-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 0, i64 [[N]])
; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 2
; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.usub.sat.i64(i64 [[N]], i64 [[TMP1]])
; CHECK-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 0, i64 [[N]])
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
Expand Down
Loading
Loading