Skip to content

[VPlan] Replace VPRegionBlock with explicit CFG before execute (NFCI). #117506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 24, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 16 additions & 10 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2781,13 +2781,13 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {
PSE.getSE()->forgetLoop(OrigLoop);
PSE.getSE()->forgetBlockAndLoopDispositions();

// Don't apply optimizations below when no vector region remains, as they all
// require a vector loop at the moment.
if (!State.Plan->getVectorLoopRegion())
// Don't apply optimizations below when no (vector) loop remains, as they all
// require one at the moment.
VPBasicBlock *HeaderVPBB =
vputils::getFirstLoopHeader(*State.Plan, State.VPDT);
if (!HeaderVPBB)
return;

VPRegionBlock *VectorRegion = State.Plan->getVectorLoopRegion();
VPBasicBlock *HeaderVPBB = VectorRegion->getEntryBasicBlock();
BasicBlock *HeaderBB = State.CFG.VPBB2IRBB[HeaderVPBB];

// Remove redundant induction instructions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could cse() be applied as a VPlan2VPlan transformation rather than to the generated IR below? Possibly even before dismantling regions, when header blocks are easier to find.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this should also be moved to a VPlan-transform. The limitation to the loop region isn't really material for the VPlan version.

Expand All @@ -2812,7 +2812,7 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could setProfileInfoAfterUnrolling() be applied to the branch recipe as part of its VPIRMetadata, rather than above to the branch instruction it generates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but it would also need adding metadata to the branch in the scalar loop.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which calls for expanding VPlan's scope...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, although it will require modifying the original scalar loop in VPlan, which probably needs more thought what kind of modifications we need. Maybe adding metadata would be sufficient?


void InnerLoopVectorizer::fixNonInductionPHIs(VPTransformState &State) {
auto Iter = vp_depth_first_deep(Plan.getEntry());
auto Iter = vp_depth_first_shallow(Plan.getEntry());
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(Iter)) {
for (VPRecipeBase &P : VPBB->phis()) {
VPWidenPHIRecipe *VPPhi = dyn_cast<VPWidenPHIRecipe>(&P);
Expand Down Expand Up @@ -7623,6 +7623,13 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
BestVPlan, BestVF,
TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector));
VPlanTransforms::removeDeadRecipes(BestVPlan);

// Retrieve and store the middle block before dissolving regions. Regions are
// dissolved after optimizing for VF and UF, which completely removes unneeded
// loop regions first.
VPBasicBlock *MiddleVPBB =
BestVPlan.getVectorLoopRegion() ? BestVPlan.getMiddleBlock() : nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce closer to use - why hoist (from line 7964 below)?
If set to null, will its uses still work ok?
Should getMiddleBlock() return null in the absence of a vector loop region?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done here, just before disolving the region. To sink it, we would need to find the middle-block w/o region.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Deserves a comment.

Also worth noting phase ordering aspects of disolveLoopRegions() - intentionally after optimizeForVFAndUF()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks.

VPlanTransforms::dissolveLoopRegions(BestVPlan);
VPlanTransforms::convertToConcreteRecipes(BestVPlan,
*Legal->getWidestInductionType());
// Perform the actual loop transformation.
Expand Down Expand Up @@ -7720,14 +7727,14 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
// 2.6. Maintain Loop Hints
// Keep all loop hints from the original loop on the vector loop (we'll
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the Loop Hints be captured in VPIRMetadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be possible, can do separately, thanks

// replace the vectorizer-specific hints below).
if (auto *LoopRegion = BestVPlan.getVectorLoopRegion()) {
VPBasicBlock *HeaderVPBB = vputils::getFirstLoopHeader(BestVPlan, State.VPDT);
if (HeaderVPBB) {
MDNode *OrigLoopID = OrigLoop->getLoopID();

std::optional<MDNode *> VectorizedLoopID =
makeFollowupLoopID(OrigLoopID, {LLVMLoopVectorizeFollowupAll,
LLVMLoopVectorizeFollowupVectorized});

VPBasicBlock *HeaderVPBB = LoopRegion->getEntryBasicBlock();
Loop *L = LI->getLoopFor(State.CFG.VPBB2IRBB[HeaderVPBB]);
if (VectorizedLoopID) {
L->setLoopID(*VectorizedLoopID);
Expand Down Expand Up @@ -7773,8 +7780,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
ILV.printDebugTracesAtEnd();

// 4. Adjust branch weight of the branch in the middle block.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this branch weight be captured in VPIRMetadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will look into that separately, thanks

if (BestVPlan.getVectorLoopRegion()) {
auto *MiddleVPBB = BestVPlan.getMiddleBlock();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hoist setting MiddleVPBB from here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, currently it relies on the region. Could also do a bit more work to find it w/o region if desired

if (HeaderVPBB) {
auto *MiddleTerm =
cast<BranchInst>(State.CFG.VPBB2IRBB[MiddleVPBB]->getTerminator());
if (MiddleTerm->isConditional() &&
Expand Down
121 changes: 84 additions & 37 deletions llvm/lib/Transforms/Vectorize/VPlan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,32 @@ VPBlockBase *VPBlockBase::getEnclosingBlockWithPredecessors() {
return Parent->getEnclosingBlockWithPredecessors();
}

bool VPBlockUtils::isHeader(const VPBlockBase *VPB,
const VPDominatorTree &VPDT) {
auto *VPBB = dyn_cast<VPBasicBlock>(VPB);
if (!VPBB)
return false;

// If VPBB is in a region R, VPBB is a loop header if R is a loop region with
// VPBB as its entry, i.e., free of predecessors.
if (auto *R = VPBB->getParent())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (auto *R = VPBB->getParent())
// If VPBB is in a region R, VPBB is a loop header if R is a loop region with VPBB as its entry, i.e., free of predecessors.
if (auto *R = VPBB->getParent())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

return !R->isReplicator() && VPBB->getNumPredecessors() == 0;

// A header dominates its second predecessor (the latch), with the other
// predecessor being the preheader
return VPB->getPredecessors().size() == 2 &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return VPB->getPredecessors().size() == 2 &&
// A header dominates its second predecessor (the latch), with the other predecessor being the preheader.
return VPB->getPredecessors().size() == 2 &&

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

VPDT.dominates(VPB, VPB->getPredecessors()[1]);
}

bool VPBlockUtils::isLatch(const VPBlockBase *VPB,
const VPDominatorTree &VPDT) {
// A latch has a header as its second successor, with its other successor
// leaving the loop. A preheader OTOH has a header as its first (and only)
// successor.
return VPB->getNumSuccessors() == 2 &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return VPB->getNumSuccessors() == 2 &&
// A latch has a header as its second successor, with its other successor leaving the loop.
// A preheader OTOH has a header as its first (and only) successor.
return VPB->getNumSuccessors() == 2 &&

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated thanks

VPBlockUtils::isHeader(VPB->getSuccessors()[1], VPDT);
}

VPBasicBlock::iterator VPBasicBlock::getFirstNonPhi() {
iterator It = begin();
while (It != end() && It->isPhi())
Expand Down Expand Up @@ -424,13 +450,21 @@ void VPBasicBlock::connectToPredecessors(VPTransformState &State) {
if (ParentLoop && !State.LI->getLoopFor(NewBB))
ParentLoop->addBasicBlockToLoop(NewBB, *State.LI);

SmallVector<VPBlockBase *> Preds;
if (VPBlockUtils::isHeader(this, State.VPDT)) {
// There's no block for the latch yet, connect to the preheader only.
Preds = {getPredecessors()[0]};
} else {
Preds = to_vector(getPredecessors());
}

// Hook up the new basic block to its predecessors.
for (VPBlockBase *PredVPBlock : getHierarchicalPredecessors()) {
for (VPBlockBase *PredVPBlock : Preds) {
VPBasicBlock *PredVPBB = PredVPBlock->getExitingBasicBlock();
auto &PredVPSuccessors = PredVPBB->getHierarchicalSuccessors();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: getHierarchical*() are still needed due to replicate regions, when VF>1 (but removed when unrolling only, with UF>1, VF=1?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

assert(CFG.VPBB2IRBB.contains(PredVPBB) &&
"Predecessor basic-block not found building successor.");
BasicBlock *PredBB = CFG.VPBB2IRBB[PredVPBB];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of accessing may create nullptr entry in CFG.VPBB2IRBB. Is this ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use lookup, thanks~


assert(PredBB && "Predecessor basic-block not found building successor.");
auto *PredBBTerminator = PredBB->getTerminator();
LLVM_DEBUG(dbgs() << "LV: draw edge from" << PredBB->getName() << '\n');

Expand Down Expand Up @@ -491,11 +525,25 @@ void VPBasicBlock::execute(VPTransformState *State) {
bool Replica = bool(State->Lane);
BasicBlock *NewBB = State->CFG.PrevBB; // Reuse it if possible.

if (VPBlockUtils::isHeader(this, State->VPDT)) {
// Create and register the new vector loop.
Loop *PrevParentLoop = State->CurrentParentLoop;
State->CurrentParentLoop = State->LI->AllocateLoop();

// Insert the new loop into the loop nest and register the new basic blocks
// before calling any utilities such as SCEV that require valid LoopInfo.
if (PrevParentLoop)
PrevParentLoop->addChildLoop(State->CurrentParentLoop);
else
State->LI->addTopLevelLoop(State->CurrentParentLoop);
}

auto IsReplicateRegion = [](VPBlockBase *BB) {
auto *R = dyn_cast_or_null<VPRegionBlock>(BB);
return R && R->isReplicator();
assert((!R || R->isReplicator()) &&
"only replicate region blocks should remain");
return R;
};

// 1. Create an IR basic block.
if ((Replica && this == getParent()->getEntry()) ||
IsReplicateRegion(getSingleHierarchicalPredecessor())) {
Expand All @@ -518,6 +566,10 @@ void VPBasicBlock::execute(VPTransformState *State) {

// 2. Fill the IR basic block with IR instructions.
executeRecipes(State, NewBB);

// If this block is a latch, update CurrentParentLoop.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking if current canonical block isLatch could be done by checking if its 2nd successor isHeader?
(Preheader has a single successor whereas latch of canonicalized vector loop has two, with header as 2nd).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If this block is a latch, update CurrentParentLoop.
// If this block is a latch, update CurrentParentLoop. The second successor of a latch is a header, with the other successor being an exit/middle.

Wrapping in VPBlockUtils::isLatch() may be clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks

if (VPBlockUtils::isLatch(this, State->VPDT))
State->CurrentParentLoop = State->CurrentParentLoop->getParentLoop();
}

VPBasicBlock *VPBasicBlock::clone() {
Expand Down Expand Up @@ -729,35 +781,13 @@ VPRegionBlock *VPRegionBlock::clone() {
}

void VPRegionBlock::execute(VPTransformState *State) {
ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>>
RPOT(Entry);

if (!isReplicator()) {
// Create and register the new vector loop.
Loop *PrevParentLoop = State->CurrentParentLoop;
State->CurrentParentLoop = State->LI->AllocateLoop();

// Insert the new loop into the loop nest and register the new basic blocks
// before calling any utilities such as SCEV that require valid LoopInfo.
if (PrevParentLoop)
PrevParentLoop->addChildLoop(State->CurrentParentLoop);
else
State->LI->addTopLevelLoop(State->CurrentParentLoop);

// Visit the VPBlocks connected to "this", starting from it.
for (VPBlockBase *Block : RPOT) {
LLVM_DEBUG(dbgs() << "LV: VPBlock in RPO " << Block->getName() << '\n');
Block->execute(State);
}

State->CurrentParentLoop = PrevParentLoop;
return;
}

assert(isReplicator() &&
"Loop regions should have been lowered to plain CFG");
assert(!State->Lane && "Replicating a Region with non-null instance.");

// Enter replicating mode.
assert(!State->VF.isScalable() && "VF is assumed to be non scalable.");

ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
Entry);
State->Lane = VPLane(0);
for (unsigned Lane = 0, VF = State->VF.getKnownMinValue(); Lane < VF;
++Lane) {
Expand Down Expand Up @@ -851,6 +881,22 @@ void VPRegionBlock::print(raw_ostream &O, const Twine &Indent,
}
#endif

void VPRegionBlock::dissolveToCFGLoop() {
auto *Header = cast<VPBasicBlock>(getEntry());
VPBlockBase *Preheader = getSinglePredecessor();
auto *ExitingLatch = cast<VPBasicBlock>(getExiting());
VPBlockBase *Middle = getSingleSuccessor();
VPBlockUtils::disconnectBlocks(Preheader, this);
VPBlockUtils::disconnectBlocks(this, Middle);

for (VPBlockBase *VPB : vp_depth_first_shallow(Entry))
VPB->setParent(getParent());

VPBlockUtils::connectBlocks(Preheader, Header);
VPBlockUtils::connectBlocks(ExitingLatch, Middle);
VPBlockUtils::connectBlocks(ExitingLatch, Header);
}

VPlan::VPlan(Loop *L) {
setEntry(createVPIRBasicBlock(L->getLoopPreheader()));
ScalarHeader = createVPIRBasicBlock(L->getHeader());
Expand Down Expand Up @@ -962,16 +1008,15 @@ void VPlan::execute(VPTransformState *State) {

State->CFG.DTU.flush();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to flush now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is independent, will submit separately, thanks!


auto *LoopRegion = getVectorLoopRegion();
if (!LoopRegion)
VPBasicBlock *Header = vputils::getFirstLoopHeader(*this, State->VPDT);
if (!Header)
return;

VPBasicBlock *LatchVPBB = LoopRegion->getExitingBasicBlock();
auto *LatchVPBB = cast<VPBasicBlock>(Header->getPredecessors()[1]);
BasicBlock *VectorLatchBB = State->CFG.VPBB2IRBB[LatchVPBB];

// Fix the latch value of canonical, reduction and first-order recurrences
// phis in the vector loop.
VPBasicBlock *Header = LoopRegion->getEntryBasicBlock();
for (VPRecipeBase &R : Header->phis()) {
// Skip phi-like recipes that generate their backedege values themselves.
if (isa<VPWidenPHIRecipe>(&R))
Expand Down Expand Up @@ -1007,8 +1052,10 @@ void VPlan::execute(VPTransformState *State) {
bool NeedsScalar = isa<VPInstruction>(PhiR) ||
(isa<VPReductionPHIRecipe>(PhiR) &&
cast<VPReductionPHIRecipe>(PhiR)->isInLoop());

Value *Phi = State->get(PhiR, NeedsScalar);
// VPHeaderPHIRecipe supports getBackedgeValue() but VPInstruction does not.
// VPHeaderPHIRecipe supports getBackedgeValue() but VPInstruction does
// not.
Value *Val = State->get(PhiR->getOperand(1), NeedsScalar);
cast<PHINode>(Phi)->addIncoming(Val, VectorLatchBB);
}
Expand Down
4 changes: 4 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -3872,6 +3872,10 @@ class VPRegionBlock : public VPBlockBase {
/// Clone all blocks in the single-entry single-exit region of the block and
/// their recipes without updating the operands of the cloned recipes.
VPRegionBlock *clone() override;

/// Remove the current region from its VPlan, connecting its predecessor to
/// its entry, and its exiting block to its successor.
void dissolveToCFGLoop();
};

/// VPlan models a candidate for vectorization, encoding various decisions take
Expand Down
57 changes: 22 additions & 35 deletions llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -462,6 +462,26 @@ Value *VPInstruction::generatePerLane(VPTransformState &State,
State.get(getOperand(1), Lane), Name);
}

/// Create a conditional branch using \p Cond branching to the successors of \p
/// VPBB. Note that the first successor is always forward (i.e. not created yet)
/// while the second successor may already have been created (if it is a header
/// block and VPBB is a latch).
static BranchInst *createCondBranch(Value *Cond, VPBasicBlock *VPBB,
VPTransformState &State) {
// Replace the temporary unreachable terminator with a new conditional
// branch, hooking it up to backward destination (header) for latch blocks
// now, and to forward destination(s) later when they are created.
// Second successor may be backwards - iff it is already in VPBB2IRBB.
VPBasicBlock *SecondVPSucc = cast<VPBasicBlock>(VPBB->getSuccessors()[1]);
BasicBlock *SecondIRSucc = State.CFG.VPBB2IRBB.lookup(SecondVPSucc);
BasicBlock *IRBB = State.CFG.VPBB2IRBB[VPBB];
BranchInst *CondBr = State.Builder.CreateCondBr(Cond, IRBB, SecondIRSucc);
// First successor is always forward, reset it to nullptr
CondBr->setSuccessor(0, nullptr);
IRBB->getTerminator()->eraseFromParent();
return CondBr;
}

Value *VPInstruction::generate(VPTransformState &State) {
IRBuilderBase &Builder = State.Builder;

Expand Down Expand Up @@ -581,43 +601,14 @@ Value *VPInstruction::generate(VPTransformState &State) {
}
case VPInstruction::BranchOnCond: {
Value *Cond = State.get(getOperand(0), VPLane(0));
// Replace the temporary unreachable terminator with a new conditional
// branch, hooking it up to backward destination for exiting blocks now and
// to forward destination(s) later when they are created.
BranchInst *CondBr =
Builder.CreateCondBr(Cond, Builder.GetInsertBlock(), nullptr);
CondBr->setSuccessor(0, nullptr);
Builder.GetInsertBlock()->getTerminator()->eraseFromParent();

if (!getParent()->isExiting())
return CondBr;

VPRegionBlock *ParentRegion = getParent()->getParent();
VPBasicBlock *Header = ParentRegion->getEntryBasicBlock();
CondBr->setSuccessor(1, State.CFG.VPBB2IRBB[Header]);
return CondBr;
return createCondBranch(Cond, getParent(), State);
}
case VPInstruction::BranchOnCount: {
// First create the compare.
Value *IV = State.get(getOperand(0), /*IsScalar*/ true);
Value *TC = State.get(getOperand(1), /*IsScalar*/ true);
Value *Cond = Builder.CreateICmpEQ(IV, TC);

// Now create the branch.
auto *Plan = getParent()->getPlan();
VPRegionBlock *TopRegion = Plan->getVectorLoopRegion();
VPBasicBlock *Header = TopRegion->getEntry()->getEntryBasicBlock();

// Replace the temporary unreachable terminator with a new conditional
// branch, hooking it up to backward destination (the header) now and to the
// forward destination (the exit/middle block) later when it is created.
// Note that CreateCondBr expects a valid BB as first argument, so we need
// to set it to nullptr later.
BranchInst *CondBr = Builder.CreateCondBr(Cond, Builder.GetInsertBlock(),
State.CFG.VPBB2IRBB[Header]);
CondBr->setSuccessor(0, nullptr);
Builder.GetInsertBlock()->getTerminator()->eraseFromParent();
return CondBr;
return createCondBranch(Cond, getParent(), State);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
(Wonder if it's also possible to drop from one case down into the other, but seems messy.)

}
case VPInstruction::Broadcast: {
return Builder.CreateVectorSplat(
Expand Down Expand Up @@ -1127,10 +1118,6 @@ void VPInstructionWithType::print(raw_ostream &O, const Twine &Indent,

void VPPhi::execute(VPTransformState &State) {
State.setDebugLocFrom(getDebugLoc());
assert(getParent() ==
getParent()->getPlan()->getVectorLoopRegion()->getEntry() &&
"VPInstructions with PHI opcodes must be used for header phis only "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another hint to clearly distinguish between the two, rather than drop the assert for all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, VPPhi can only be in header blocks; could assert that this is so here still, if desired? But the general accessors don't require this restriction

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, the assert is essentially still valid, but harder to detect - if VPPhi is in a header or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

"at the moment");
BasicBlock *VectorPH = State.CFG.VPBB2IRBB.at(getIncomingBlock(0));
Value *Start = State.get(getIncomingValue(0), VPLane(0));
PHINode *Phi = State.Builder.CreatePHI(Start->getType(), 2, getName());
Expand Down
12 changes: 12 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2506,6 +2506,18 @@ void VPlanTransforms::createInterleaveGroups(
}
}

void VPlanTransforms::dissolveLoopRegions(VPlan &Plan) {
// Replace loop regions with explicity CFG.
SmallVector<VPRegionBlock *> LoopRegions;
for (VPRegionBlock *R : VPBlockUtils::blocksOnly<VPRegionBlock>(
vp_depth_first_deep(Plan.getEntry()))) {
if (!R->isReplicator())
LoopRegions.push_back(R);
}
for (VPRegionBlock *R : LoopRegions)
R->dissolveToCFGLoop();
}

// Expand VPExtendedReductionRecipe to VPWidenCastRecipe + VPReductionRecipe.
static void expandVPExtendedReduction(VPExtendedReductionRecipe *ExtRed) {
VPWidenCastRecipe *Ext;
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.h
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,9 @@ struct VPlanTransforms {
VPBasicBlock *LatchVPBB,
VFRange &Range);

/// Replace loop regions with explicit CFG.
static void dissolveLoopRegions(VPlan &Plan);

/// Lower abstract recipes to concrete ones, that can be codegen'd. Use \p
/// CanonicalIVTy as type for all un-typed live-ins in VPTypeAnalysis.
static void convertToConcreteRecipes(VPlan &Plan, Type &CanonicalIVTy);
Expand Down
Loading