-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[SimplifyIndVar] Push more users to worklist for simplifyUsers #93598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[SimplifyIndVar] Push more users to worklist for simplifyUsers #93598
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
Two potential problems with this approach:
CC @fhahn |
To answer your first question, it seems the Users are expected to be within the loop (test-suite fails with the assert My goal was to allow simplification of a IV comparison in an exit block. |
The following breaks: target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
define void @foo(
i32 noundef %n_real_zones,
ptr noundef %real_zones
) {
entry:
br label %for.cond
for.cond: ; preds = %for.cond.cleanup3, %entry
%j.0 = phi i32 [ 0, %entry ], [ %inc7, %for.cond.cleanup3 ]
%n_real_zones.addr.0 = phi i32 [ %n_real_zones, %entry ], [ %n_real_zones.addr.1, %for.cond.cleanup3 ]
%cmp = icmp ult i32 %j.0, 128
br label %for.cond1
for.cond1: ; preds = %for.body4, %for.cond
%n_real_zones.addr.1 = phi i32 [ %n_real_zones.addr.0, %for.cond ], [ %inc, %for.body4 ]
%i.0 = phi i32 [ 0, %for.cond ], [ %inc5, %for.body4 ]
%cmp2 = icmp ult i32 %i.0, 128
br i1 %cmp2, label %for.body4, label %for.cond.cleanup3
for.cond.cleanup3: ; preds = %for.cond1
%inc7 = add nuw nsw i32 %j.0, 1
br label %for.cond
for.body4: ; preds = %for.cond1
%idxprom = sext i32 %n_real_zones.addr.1 to i64
%arrayidx = getelementptr inbounds i32, ptr %real_zones, i64 %idxprom
store i32 %j.0, ptr %arrayidx, align 4
%inc = add nsw i32 %n_real_zones.addr.1, 1
%inc5 = add nuw nsw i32 %i.0, 1
br label %for.cond1
} I don't know why it seems to depend on |
I was able to pass the test-suite with the new changes. |
@@ -511,6 +511,7 @@ bool SimplifyIndvar::eliminateTrunc(TruncInst *TI) { | |||
ICmpInst *ICI = dyn_cast<ICmpInst>(U); | |||
if (!ICI) return false; | |||
assert(L->contains(ICI->getParent()) && "LCSSA form broken?"); | |||
// assert(L->contains(ICI->getParent()) && "LCSSA form broken?"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, i think this code assume we run the loop close SSA pass first. Could you include a LIT test to demonstrate the problem? I am not sure this is the right fix to your problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check will break if there is an exit (or dominated) block with a use-chain trunc-icmp
such as the following one from Bitcode/Regression/fft/x86_17.06.19_halide_runtime.bc halide_memoization_cache_lookup
:
for.cond82.preheader: ; preds = %_ZNK15halide_buffer_t13size_in_bytesEv.exit
%48 = trunc i64 %indvars.iv214 to i32
%cmp83195 = icmp sgt i32 %48, 0
br i1 %cmp83195, label %for.body85.preheader, label %cleanup111
or such as:
define i1 @foo(i32 %n) {
entry:
br label %loop
loop:
%i = phi i64 [0, %entry], [%i.inc, %loop]
%i.inc = add i64 %i, 1
%i.ult.Icnt = icmp ult i64 %i, 128
br i1 %i.ult.Icnt, label %loop, label %loop.exit
loop.exit:
%i.32 = trunc i64 %i to i32
%i.32.ult.n = icmp ult i32 %i.32, %n
ret i1 %i.32.ult.n
}
I've got to take a closer look at the code to understand why LCSSA is required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me the utility class SimplifyIndVar
don't have to be fed LCSSA-normalised loop but the loop pass IndVarSimplify
needs it (I don't know exactly all the reasons, just spotted it uses llvm::rewriteExitValue
which simply uses the Backedge Info + SCEV to replace the LCSSA phis with SCEV Constant if the phi is a SCEV Add Rec)
Maybe I'm making a mistake but eliminateTrunc
checks all the trunc
users are icmp
that are eligible to simplifications. I don't see why the LCSSA is required here (except to ensure staying inside the loop).
So I added a cl opt to limit the traversal block-wise. I'm trying to benchmark to see if it hurts performance. |
@efriedma-quic The simplification algorithm is almost greedy forward. It stops descending as soon as there is no possible simplification. There is one big exception with instructions representing a SCEV Add Rec associated to the current analysed loop which are always followed, but it's restricting and the user will be traversed only during the analysis of the right loop. The quadratic behaviour could in theory appear when a simplification is always available. One way to avoid going too far is to keep a counter when traversing the use chain which is incremented everytime we jump to a out-of-loop block and reset when we go back inside the loop. It's a fast way to compute a overapproximation of I ran |
Ping. |
ping I could add a mechanism that will specify a radius of action for the simplification depending on the optimisation level. |
If you want a review, move the PR out of draft. |
248e441
to
3af0c12
Compare
@llvm/pr-subscribers-llvm-transforms Author: None (v01dXYZ) ChangesInstead of considering only users that are inside the loop, consider all the users with parent dominated by the loop header. This is a draft PR to discuss about the code before doing some QA. @etherzhhb @efriedma-quic Do you think my code is valid ? Provides a solution to #90417. Full diff: https://github.com/llvm/llvm-project/pull/93598.diff 3 Files Affected:
diff --git a/llvm/include/llvm/Transforms/Utils/SimplifyIndVar.h b/llvm/include/llvm/Transforms/Utils/SimplifyIndVar.h
index bd1718dd8cad7..dbf2bdd18467d 100644
--- a/llvm/include/llvm/Transforms/Utils/SimplifyIndVar.h
+++ b/llvm/include/llvm/Transforms/Utils/SimplifyIndVar.h
@@ -52,12 +52,11 @@ class IVVisitor {
/// where the first entry indicates that the function makes changes and the
/// second entry indicates that it introduced new opportunities for loop
/// unswitching.
-std::pair<bool, bool> simplifyUsersOfIV(PHINode *CurrIV, ScalarEvolution *SE,
- DominatorTree *DT, LoopInfo *LI,
- const TargetTransformInfo *TTI,
- SmallVectorImpl<WeakTrackingVH> &Dead,
- SCEVExpander &Rewriter,
- IVVisitor *V = nullptr);
+std::pair<bool, bool>
+simplifyUsersOfIV(PHINode *CurrIV, ScalarEvolution *SE, DominatorTree *DT,
+ LoopInfo *LI, const TargetTransformInfo *TTI,
+ SmallVectorImpl<WeakTrackingVH> &Dead, SCEVExpander &Rewriter,
+ unsigned MaxDepthOutOfLoop = 1, IVVisitor *V = nullptr);
/// SimplifyLoopIVs - Simplify users of induction variables within this
/// loop. This does not actually change or add IVs.
diff --git a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
index dd7c89034ca09..ea0f7a986e4b9 100644
--- a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
+++ b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
@@ -124,6 +124,12 @@ static cl::opt<bool>
AllowIVWidening("indvars-widen-indvars", cl::Hidden, cl::init(true),
cl::desc("Allow widening of indvars to eliminate s/zext"));
+static cl::opt<unsigned> MaxDepthOutOfLoop(
+ "indvars-max-depth-out-of-loop", cl::Hidden, cl::init(1),
+ cl::desc(
+ "Strict upper bound for the number of successive out-of-loop blocks "
+ "when traversing use-def chains. 0 enables full traversal"));
+
namespace {
class IndVarSimplify {
@@ -624,8 +630,9 @@ bool IndVarSimplify::simplifyAndExtend(Loop *L,
// Information about sign/zero extensions of CurrIV.
IndVarSimplifyVisitor Visitor(CurrIV, SE, TTI, DT);
- const auto &[C, U] = simplifyUsersOfIV(CurrIV, SE, DT, LI, TTI, DeadInsts,
- Rewriter, &Visitor);
+ const auto &[C, U] =
+ simplifyUsersOfIV(CurrIV, SE, DT, LI, TTI, DeadInsts, Rewriter,
+ MaxDepthOutOfLoop, &Visitor);
Changed |= C;
RunUnswitching |= U;
diff --git a/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp b/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp
index 912c02c2ed3ae..d41298fcfa6a5 100644
--- a/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp
@@ -62,13 +62,18 @@ namespace {
bool Changed = false;
bool RunUnswitching = false;
+ // When following the def-use chains, it can go outside the loop.
+ // Strict upper bound on number of traversed out-of-loop blocks.
+ unsigned MaxDepthOutOfLoop;
+
public:
SimplifyIndvar(Loop *Loop, ScalarEvolution *SE, DominatorTree *DT,
LoopInfo *LI, const TargetTransformInfo *TTI,
SCEVExpander &Rewriter,
- SmallVectorImpl<WeakTrackingVH> &Dead)
+ SmallVectorImpl<WeakTrackingVH> &Dead,
+ unsigned MaxDepthOutOfLoop = 1)
: L(Loop), LI(LI), SE(SE), DT(DT), TTI(TTI), Rewriter(Rewriter),
- DeadInsts(Dead) {
+ DeadInsts(Dead), MaxDepthOutOfLoop(MaxDepthOutOfLoop) {
assert(LI && "IV simplification requires LoopInfo");
}
@@ -509,8 +514,8 @@ bool SimplifyIndvar::eliminateTrunc(TruncInst *TI) {
!DT->isReachableFromEntry(cast<Instruction>(U)->getParent()))
continue;
ICmpInst *ICI = dyn_cast<ICmpInst>(U);
- if (!ICI) return false;
- assert(L->contains(ICI->getParent()) && "LCSSA form broken?");
+ if (!ICI)
+ return false;
if (!(ICI->getOperand(0) == TI && L->isLoopInvariant(ICI->getOperand(1))) &&
!(ICI->getOperand(1) == TI && L->isLoopInvariant(ICI->getOperand(0))))
return false;
@@ -839,10 +844,12 @@ bool SimplifyIndvar::strengthenRightShift(BinaryOperator *BO,
}
/// Add all uses of Def to the current IV's worklist.
-static void pushIVUsers(
- Instruction *Def, Loop *L,
- SmallPtrSet<Instruction*,16> &Simplified,
- SmallVectorImpl< std::pair<Instruction*,Instruction*> > &SimpleIVUsers) {
+static void
+pushIVUsers(Instruction *Def, Loop *L, DominatorTree *DT,
+ SmallPtrSet<Instruction *, 16> &Simplified,
+ SmallVectorImpl<std::tuple<Instruction *, Instruction *, unsigned>>
+ &SimpleIVUsers,
+ unsigned OutOfLoopChainCounter, unsigned MaxOutOfLoopChainCounter) {
for (User *U : Def->users()) {
Instruction *UI = cast<Instruction>(U);
@@ -854,16 +861,23 @@ static void pushIVUsers(
if (UI == Def)
continue;
- // Only change the current Loop, do not change the other parts (e.g. other
- // Loops).
- if (!L->contains(UI))
+ // Avoid adding Defs that SCEV expand to themselves, e.g. the LoopPhis
+ // of the outter loops.
+ if (!DT->dominates(L->getHeader(), UI->getParent()))
continue;
// Do not push the same instruction more than once.
if (!Simplified.insert(UI).second)
continue;
- SimpleIVUsers.push_back(std::make_pair(UI, Def));
+ unsigned Counter =
+ L->contains(UI)
+ ? 0 // reset depth if we go back inside the loop.
+ : OutOfLoopChainCounter + (UI->getParent() != Def->getParent());
+
+ if (!MaxOutOfLoopChainCounter || Counter < MaxOutOfLoopChainCounter) {
+ SimpleIVUsers.push_back(std::make_tuple(UI, Def, Counter));
+ }
}
}
@@ -908,17 +922,17 @@ void SimplifyIndvar::simplifyUsers(PHINode *CurrIV, IVVisitor *V) {
SmallPtrSet<Instruction*,16> Simplified;
// Use-def pairs if IV users waiting to be processed for CurrIV.
- SmallVector<std::pair<Instruction*, Instruction*>, 8> SimpleIVUsers;
+ SmallVector<std::tuple<Instruction *, Instruction *, unsigned>, 8>
+ SimpleIVUsers;
// Push users of the current LoopPhi. In rare cases, pushIVUsers may be
// called multiple times for the same LoopPhi. This is the proper thing to
// do for loop header phis that use each other.
- pushIVUsers(CurrIV, L, Simplified, SimpleIVUsers);
+ pushIVUsers(CurrIV, L, DT, Simplified, SimpleIVUsers, 0, MaxDepthOutOfLoop);
while (!SimpleIVUsers.empty()) {
- std::pair<Instruction*, Instruction*> UseOper =
- SimpleIVUsers.pop_back_val();
- Instruction *UseInst = UseOper.first;
+ auto [UseInst, IVOperand, OutOfLoopChainCounter] =
+ SimpleIVUsers.pop_back_val();
// If a user of the IndVar is trivially dead, we prefer just to mark it dead
// rather than try to do some complex analysis or transformation (such as
@@ -942,11 +956,11 @@ void SimplifyIndvar::simplifyUsers(PHINode *CurrIV, IVVisitor *V) {
if ((isa<PtrToIntInst>(UseInst)) || (isa<TruncInst>(UseInst)))
for (Use &U : UseInst->uses()) {
Instruction *User = cast<Instruction>(U.getUser());
- if (replaceIVUserWithLoopInvariant(User))
+ if (DT->dominates(L->getHeader(), User->getParent()) &&
+ replaceIVUserWithLoopInvariant(User))
break; // done replacing
}
- Instruction *IVOperand = UseOper.second;
for (unsigned N = 0; IVOperand; ++N) {
assert(N <= Simplified.size() && "runaway iteration");
(void) N;
@@ -960,7 +974,8 @@ void SimplifyIndvar::simplifyUsers(PHINode *CurrIV, IVVisitor *V) {
continue;
if (eliminateIVUser(UseInst, IVOperand)) {
- pushIVUsers(IVOperand, L, Simplified, SimpleIVUsers);
+ pushIVUsers(IVOperand, L, DT, Simplified, SimpleIVUsers,
+ OutOfLoopChainCounter, MaxDepthOutOfLoop);
continue;
}
@@ -968,14 +983,16 @@ void SimplifyIndvar::simplifyUsers(PHINode *CurrIV, IVVisitor *V) {
if (strengthenBinaryOp(BO, IVOperand)) {
// re-queue uses of the now modified binary operator and fall
// through to the checks that remain.
- pushIVUsers(IVOperand, L, Simplified, SimpleIVUsers);
+ pushIVUsers(IVOperand, L, DT, Simplified, SimpleIVUsers,
+ OutOfLoopChainCounter, MaxDepthOutOfLoop);
}
}
// Try to use integer induction for FPToSI of float induction directly.
if (replaceFloatIVWithIntegerIV(UseInst)) {
// Re-queue the potentially new direct uses of IVOperand.
- pushIVUsers(IVOperand, L, Simplified, SimpleIVUsers);
+ pushIVUsers(IVOperand, L, DT, Simplified, SimpleIVUsers,
+ OutOfLoopChainCounter, MaxDepthOutOfLoop);
continue;
}
@@ -985,7 +1002,8 @@ void SimplifyIndvar::simplifyUsers(PHINode *CurrIV, IVVisitor *V) {
continue;
}
if (isSimpleIVUser(UseInst, L, SE)) {
- pushIVUsers(UseInst, L, Simplified, SimpleIVUsers);
+ pushIVUsers(UseInst, L, DT, Simplified, SimpleIVUsers,
+ OutOfLoopChainCounter, MaxDepthOutOfLoop);
}
}
}
@@ -999,13 +1017,13 @@ void IVVisitor::anchor() { }
/// Returns a pair where the first entry indicates that the function makes
/// changes and the second entry indicates that it introduced new opportunities
/// for loop unswitching.
-std::pair<bool, bool> simplifyUsersOfIV(PHINode *CurrIV, ScalarEvolution *SE,
- DominatorTree *DT, LoopInfo *LI,
- const TargetTransformInfo *TTI,
- SmallVectorImpl<WeakTrackingVH> &Dead,
- SCEVExpander &Rewriter, IVVisitor *V) {
+std::pair<bool, bool>
+simplifyUsersOfIV(PHINode *CurrIV, ScalarEvolution *SE, DominatorTree *DT,
+ LoopInfo *LI, const TargetTransformInfo *TTI,
+ SmallVectorImpl<WeakTrackingVH> &Dead, SCEVExpander &Rewriter,
+ unsigned MaxDepthOutOfLoop, IVVisitor *V) {
SimplifyIndvar SIV(LI->getLoopFor(CurrIV->getParent()), SE, DT, LI, TTI,
- Rewriter, Dead);
+ Rewriter, Dead, MaxDepthOutOfLoop);
SIV.simplifyUsers(CurrIV, V);
return {SIV.hasChanged(), SIV.runUnswitching()};
}
|
ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test in llvm/test/Transforms/IndVarSimplify.
simplifyUsersOfIV(PHINode *CurrIV, ScalarEvolution *SE, DominatorTree *DT, | ||
LoopInfo *LI, const TargetTransformInfo *TTI, | ||
SmallVectorImpl<WeakTrackingVH> &Dead, SCEVExpander &Rewriter, | ||
unsigned MaxDepthOutOfLoop = 1, IVVisitor *V = nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the MaxDepthOutOfLoop parameter? Can you move the cl::opt into SimplifyIndVar and directly use it instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transforms/Utils/LoopUnroll.cpp
uses it via simplifyLoopIVs
. I didn't want to add it here as it could create regressions in passes I'm not interested in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that makes sense. But as implemented, aren't you still calling this with the default of 1 from LoopUnroll? Shouldn't the default be 0 instead?
Could you please rebase over 6e3725d so we at least don't have to pass through this parameter everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question you raise is actually quite important as this code is disabled with this current default config. I disabled it as I wanted to make it optimisation level dependent but I needed some input to calibrate it according to the level.
It is set to 1
which means the set of BBs that will be affected are dist(Loop, BB) < 1 => BB in Loop
(the strict bound is on purpose). The 0
value is a special value for an +infinity
bound.
I don't know if this encoding is satisfying. Another one is to use -1
but I don't know if cl::opt<unsigned>
supports it.
+1 for making pushIVUsers
a method instead of a function.
I think I'll put it to 0
now as it will show performance regressions and allow discussing about that. I'll add some tests before though.
SimpleIVUsers.push_back(std::make_pair(UI, Def)); | ||
unsigned Counter = | ||
L->contains(UI) | ||
? 0 // reset depth if we go back inside the loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
If we want to compute a overapproximation of the distance in the CFG dist(Loop, BB)
, this helps to tighten the approximation to the real distance.
unsigned Counter = | ||
L->contains(UI) | ||
? 0 // reset depth if we go back inside the loop. | ||
: OutOfLoopChainCounter + (UI->getParent() != Def->getParent()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a limit on the depth of the blocks rather than instructions? You can have a very long chain of instructions within a single block, or a very short one across blocks. It seems like that would be the more relevant quantity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first, I implemented a instruction-wise depth but I changed that as I visualised the algorithm at the CFG level, as accepting optimising basic blocks that are at given distance of the loop dist(Loop, BB) < limit
.
Also, I didn't like the idea to have partially optimised basic-blocks.
Instead of considering only users that are inside the loop, consider all the users with parent dominated by the loop header.
3af0c12
to
aeac9e8
Compare
I changed the default config which disabled the traversal of users that are out of loop. I expect performance regressions with reenabling full traversal. |
I don't understand why the CI pipeline failed. Is there a way to relaunch it for linux without repushing the last commit ? |
Ping |
It looks like the updated patch miscompiles clang: http://llvm-compile-time-tracker.com/show_error.php?commit=9236a1de08a0f522a4c123b3818ea1dcb0361c3c There are crashes when building llvm-test-suite with stage2 clang. |
Can you please also add a test for your last fix? We do get a working stage2 build now: https://llvm-compile-time-tracker.com/compare.php?from=3c50cbfda4fc3ad85349167132f7ed809ecc685a&to=8c47a1722b8b303e3262a77f992fea606a0a305c&stat=instructions:u |
Out of loop, an IV user is loop invariant. Using L->isLoopInvariant is no more valid to distinguish which operand is the loop-constant one from the IV user.
2276def
to
8a92def
Compare
I want to add another point to the ones mentioned by @efriedma-quic:
|
Instead of considering only users that are inside the loop, consider all the users with parent dominated by the loop header.
This is a draft PR to discuss about the code before doing some QA. @etherzhhb @efriedma-quic Do you think my code is valid ?
Provides a solution to #90417.