Skip to content

[SimplifyIndVar] Push more users to worklist for simplifyUsers #93598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

v01dXYZ
Copy link
Contributor

@v01dXYZ v01dXYZ commented May 28, 2024

Instead of considering only users that are inside the loop, consider all the users with parent dominated by the loop header.

This is a draft PR to discuss about the code before doing some QA. @etherzhhb @efriedma-quic Do you think my code is valid ?

Provides a solution to #90417.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@efriedma-quic
Copy link
Collaborator

Two potential problems with this approach:

  • Some places might implicitly assume IV users are inside the loop.
  • I'm a bit concerned about compile-time. If you recursively visit every value derived from the exit value of the loop, you can easily end up with quadratic compile-time growth: if you have N successive loops, you visit instructions in the last loop N^2 times.

CC @fhahn

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 29, 2024

To answer your first question, it seems the Users are expected to be within the loop (test-suite fails with the assert L->contains(ICI->getParent()) && "LCSSA form broken?").

My goal was to allow simplification of a IV comparison in an exit block.

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 30, 2024

The following breaks:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"

define void @foo(
       i32 noundef %n_real_zones,
        ptr noundef %real_zones
) {
entry:
  br label %for.cond

for.cond:                                         ; preds = %for.cond.cleanup3, %entry
  %j.0 = phi i32 [ 0, %entry ], [ %inc7, %for.cond.cleanup3 ]
  %n_real_zones.addr.0 = phi i32 [ %n_real_zones, %entry ], [ %n_real_zones.addr.1, %for.cond.cleanup3 ]
  %cmp = icmp ult i32 %j.0, 128
  br label %for.cond1

for.cond1:                                        ; preds = %for.body4, %for.cond
  %n_real_zones.addr.1 = phi i32 [ %n_real_zones.addr.0, %for.cond ], [ %inc, %for.body4 ]
  %i.0 = phi i32 [ 0, %for.cond ], [ %inc5, %for.body4 ]
  %cmp2 = icmp ult i32 %i.0, 128
  br i1 %cmp2, label %for.body4, label %for.cond.cleanup3

for.cond.cleanup3:                                ; preds = %for.cond1
  %inc7 = add nuw nsw i32 %j.0, 1
  br label %for.cond

for.body4:                                        ; preds = %for.cond1
  %idxprom = sext i32 %n_real_zones.addr.1 to i64
  %arrayidx = getelementptr inbounds i32, ptr %real_zones, i64 %idxprom
  store i32 %j.0, ptr %arrayidx, align 4
  %inc = add nsw i32 %n_real_zones.addr.1, 1
  %inc5 = add nuw nsw i32 %i.0, 1
  br label %for.cond1
}

I don't know why it seems to depend on datalayout (especially if n64).

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 31, 2024

I was able to pass the test-suite with the new changes.

@@ -511,6 +511,7 @@ bool SimplifyIndvar::eliminateTrunc(TruncInst *TI) {
ICmpInst *ICI = dyn_cast<ICmpInst>(U);
if (!ICI) return false;
assert(L->contains(ICI->getParent()) && "LCSSA form broken?");
// assert(L->contains(ICI->getParent()) && "LCSSA form broken?");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, i think this code assume we run the loop close SSA pass first. Could you include a LIT test to demonstrate the problem? I am not sure this is the right fix to your problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check will break if there is an exit (or dominated) block with a use-chain trunc-icmp such as the following one from Bitcode/Regression/fft/x86_17.06.19_halide_runtime.bc halide_memoization_cache_lookup:

for.cond82.preheader:                             ; preds = %_ZNK15halide_buffer_t13size_in_bytesEv.exit
  %48 = trunc i64 %indvars.iv214 to i32
  %cmp83195 = icmp sgt i32 %48, 0
  br i1 %cmp83195, label %for.body85.preheader, label %cleanup111

or such as:

define i1 @foo(i32 %n) {
entry:
    br label %loop

loop:
    %i = phi i64 [0, %entry], [%i.inc, %loop]
    %i.inc = add i64 %i, 1

    %i.ult.Icnt = icmp ult i64 %i, 128

    br i1 %i.ult.Icnt, label %loop, label %loop.exit

loop.exit:
    %i.32 = trunc i64 %i to i32
    %i.32.ult.n = icmp ult i32 %i.32, %n
    ret i1 %i.32.ult.n
}

I've got to take a closer look at the code to understand why LCSSA is required here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me the utility class SimplifyIndVar don't have to be fed LCSSA-normalised loop but the loop pass IndVarSimplify needs it (I don't know exactly all the reasons, just spotted it uses llvm::rewriteExitValue which simply uses the Backedge Info + SCEV to replace the LCSSA phis with SCEV Constant if the phi is a SCEV Add Rec)

Maybe I'm making a mistake but eliminateTrunc checks all the trunc users are icmp that are eligible to simplifications. I don't see why the LCSSA is required here (except to ensure staying inside the loop).

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented Jun 3, 2024

So I added a cl opt to limit the traversal block-wise. I'm trying to benchmark to see if it hurts performance.

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented Jun 4, 2024

@efriedma-quic The simplification algorithm is almost greedy forward. It stops descending as soon as there is no possible simplification. There is one big exception with instructions representing a SCEV Add Rec associated to the current analysed loop which are always followed, but it's restricting and the user will be traversed only during the analysis of the right loop.

The quadratic behaviour could in theory appear when a simplification is always available. One way to avoid going too far is to keep a counter when traversing the use chain which is incremented everytime we jump to a out-of-loop block and reset when we go back inside the loop. It's a fast way to compute a overapproximation of dist(Loop, CurrentBlock).

I ran test-suite for compile_time but I don't see significant regressions. I would like to have some advice about performance regressions as I fear I could make a dumb mistake (a short sanity check list).

@fhahn
Copy link
Contributor

fhahn commented Jun 6, 2024

FWIW it looks like we can handle the motivating case from #90417 in constraint-elimination: #94610

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented Jun 11, 2024

Ping.

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented Jun 17, 2024

ping

I could add a mechanism that will specify a radius of action for the simplification depending on the optimisation level.

@nikic
Copy link
Contributor

nikic commented Jun 17, 2024

If you want a review, move the PR out of draft.

@v01dXYZ v01dXYZ force-pushed the 90417-indvars-simplify-users-out-of-loop branch from 248e441 to 3af0c12 Compare June 18, 2024 14:37
@v01dXYZ v01dXYZ marked this pull request as ready for review June 18, 2024 14:39
@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2024

@llvm/pr-subscribers-llvm-transforms

Author: None (v01dXYZ)

Changes

Instead of considering only users that are inside the loop, consider all the users with parent dominated by the loop header.

This is a draft PR to discuss about the code before doing some QA. @etherzhhb @efriedma-quic Do you think my code is valid ?

Provides a solution to #90417.


Full diff: https://github.com/llvm/llvm-project/pull/93598.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Transforms/Utils/SimplifyIndVar.h (+5-6)
  • (modified) llvm/lib/Transforms/Scalar/IndVarSimplify.cpp (+9-2)
  • (modified) llvm/lib/Transforms/Utils/SimplifyIndVar.cpp (+47-29)
diff --git a/llvm/include/llvm/Transforms/Utils/SimplifyIndVar.h b/llvm/include/llvm/Transforms/Utils/SimplifyIndVar.h
index bd1718dd8cad7..dbf2bdd18467d 100644
--- a/llvm/include/llvm/Transforms/Utils/SimplifyIndVar.h
+++ b/llvm/include/llvm/Transforms/Utils/SimplifyIndVar.h
@@ -52,12 +52,11 @@ class IVVisitor {
 /// where the first entry indicates that the function makes changes and the
 /// second entry indicates that it introduced new opportunities for loop
 /// unswitching.
-std::pair<bool, bool> simplifyUsersOfIV(PHINode *CurrIV, ScalarEvolution *SE,
-                                        DominatorTree *DT, LoopInfo *LI,
-                                        const TargetTransformInfo *TTI,
-                                        SmallVectorImpl<WeakTrackingVH> &Dead,
-                                        SCEVExpander &Rewriter,
-                                        IVVisitor *V = nullptr);
+std::pair<bool, bool>
+simplifyUsersOfIV(PHINode *CurrIV, ScalarEvolution *SE, DominatorTree *DT,
+                  LoopInfo *LI, const TargetTransformInfo *TTI,
+                  SmallVectorImpl<WeakTrackingVH> &Dead, SCEVExpander &Rewriter,
+                  unsigned MaxDepthOutOfLoop = 1, IVVisitor *V = nullptr);
 
 /// SimplifyLoopIVs - Simplify users of induction variables within this
 /// loop. This does not actually change or add IVs.
diff --git a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
index dd7c89034ca09..ea0f7a986e4b9 100644
--- a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
+++ b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
@@ -124,6 +124,12 @@ static cl::opt<bool>
 AllowIVWidening("indvars-widen-indvars", cl::Hidden, cl::init(true),
                 cl::desc("Allow widening of indvars to eliminate s/zext"));
 
+static cl::opt<unsigned> MaxDepthOutOfLoop(
+    "indvars-max-depth-out-of-loop", cl::Hidden, cl::init(1),
+    cl::desc(
+        "Strict upper bound for the number of successive out-of-loop blocks "
+        "when traversing use-def chains. 0 enables full traversal"));
+
 namespace {
 
 class IndVarSimplify {
@@ -624,8 +630,9 @@ bool IndVarSimplify::simplifyAndExtend(Loop *L,
       // Information about sign/zero extensions of CurrIV.
       IndVarSimplifyVisitor Visitor(CurrIV, SE, TTI, DT);
 
-      const auto &[C, U] = simplifyUsersOfIV(CurrIV, SE, DT, LI, TTI, DeadInsts,
-                                             Rewriter, &Visitor);
+      const auto &[C, U] =
+          simplifyUsersOfIV(CurrIV, SE, DT, LI, TTI, DeadInsts, Rewriter,
+                            MaxDepthOutOfLoop, &Visitor);
 
       Changed |= C;
       RunUnswitching |= U;
diff --git a/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp b/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp
index 912c02c2ed3ae..d41298fcfa6a5 100644
--- a/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyIndVar.cpp
@@ -62,13 +62,18 @@ namespace {
     bool Changed = false;
     bool RunUnswitching = false;
 
+    // When following the def-use chains, it can go outside the loop.
+    // Strict upper bound on number of traversed out-of-loop blocks.
+    unsigned MaxDepthOutOfLoop;
+
   public:
     SimplifyIndvar(Loop *Loop, ScalarEvolution *SE, DominatorTree *DT,
                    LoopInfo *LI, const TargetTransformInfo *TTI,
                    SCEVExpander &Rewriter,
-                   SmallVectorImpl<WeakTrackingVH> &Dead)
+                   SmallVectorImpl<WeakTrackingVH> &Dead,
+                   unsigned MaxDepthOutOfLoop = 1)
         : L(Loop), LI(LI), SE(SE), DT(DT), TTI(TTI), Rewriter(Rewriter),
-          DeadInsts(Dead) {
+          DeadInsts(Dead), MaxDepthOutOfLoop(MaxDepthOutOfLoop) {
       assert(LI && "IV simplification requires LoopInfo");
     }
 
@@ -509,8 +514,8 @@ bool SimplifyIndvar::eliminateTrunc(TruncInst *TI) {
         !DT->isReachableFromEntry(cast<Instruction>(U)->getParent()))
       continue;
     ICmpInst *ICI = dyn_cast<ICmpInst>(U);
-    if (!ICI) return false;
-    assert(L->contains(ICI->getParent()) && "LCSSA form broken?");
+    if (!ICI)
+      return false;
     if (!(ICI->getOperand(0) == TI && L->isLoopInvariant(ICI->getOperand(1))) &&
         !(ICI->getOperand(1) == TI && L->isLoopInvariant(ICI->getOperand(0))))
       return false;
@@ -839,10 +844,12 @@ bool SimplifyIndvar::strengthenRightShift(BinaryOperator *BO,
 }
 
 /// Add all uses of Def to the current IV's worklist.
-static void pushIVUsers(
-  Instruction *Def, Loop *L,
-  SmallPtrSet<Instruction*,16> &Simplified,
-  SmallVectorImpl< std::pair<Instruction*,Instruction*> > &SimpleIVUsers) {
+static void
+pushIVUsers(Instruction *Def, Loop *L, DominatorTree *DT,
+            SmallPtrSet<Instruction *, 16> &Simplified,
+            SmallVectorImpl<std::tuple<Instruction *, Instruction *, unsigned>>
+                &SimpleIVUsers,
+            unsigned OutOfLoopChainCounter, unsigned MaxOutOfLoopChainCounter) {
 
   for (User *U : Def->users()) {
     Instruction *UI = cast<Instruction>(U);
@@ -854,16 +861,23 @@ static void pushIVUsers(
     if (UI == Def)
       continue;
 
-    // Only change the current Loop, do not change the other parts (e.g. other
-    // Loops).
-    if (!L->contains(UI))
+    // Avoid adding Defs that SCEV expand to themselves, e.g. the LoopPhis
+    // of the outter loops.
+    if (!DT->dominates(L->getHeader(), UI->getParent()))
       continue;
 
     // Do not push the same instruction more than once.
     if (!Simplified.insert(UI).second)
       continue;
 
-    SimpleIVUsers.push_back(std::make_pair(UI, Def));
+    unsigned Counter =
+        L->contains(UI)
+            ? 0 // reset depth if we go back inside the loop.
+            : OutOfLoopChainCounter + (UI->getParent() != Def->getParent());
+
+    if (!MaxOutOfLoopChainCounter || Counter < MaxOutOfLoopChainCounter) {
+      SimpleIVUsers.push_back(std::make_tuple(UI, Def, Counter));
+    }
   }
 }
 
@@ -908,17 +922,17 @@ void SimplifyIndvar::simplifyUsers(PHINode *CurrIV, IVVisitor *V) {
   SmallPtrSet<Instruction*,16> Simplified;
 
   // Use-def pairs if IV users waiting to be processed for CurrIV.
-  SmallVector<std::pair<Instruction*, Instruction*>, 8> SimpleIVUsers;
+  SmallVector<std::tuple<Instruction *, Instruction *, unsigned>, 8>
+      SimpleIVUsers;
 
   // Push users of the current LoopPhi. In rare cases, pushIVUsers may be
   // called multiple times for the same LoopPhi. This is the proper thing to
   // do for loop header phis that use each other.
-  pushIVUsers(CurrIV, L, Simplified, SimpleIVUsers);
+  pushIVUsers(CurrIV, L, DT, Simplified, SimpleIVUsers, 0, MaxDepthOutOfLoop);
 
   while (!SimpleIVUsers.empty()) {
-    std::pair<Instruction*, Instruction*> UseOper =
-      SimpleIVUsers.pop_back_val();
-    Instruction *UseInst = UseOper.first;
+    auto [UseInst, IVOperand, OutOfLoopChainCounter] =
+        SimpleIVUsers.pop_back_val();
 
     // If a user of the IndVar is trivially dead, we prefer just to mark it dead
     // rather than try to do some complex analysis or transformation (such as
@@ -942,11 +956,11 @@ void SimplifyIndvar::simplifyUsers(PHINode *CurrIV, IVVisitor *V) {
     if ((isa<PtrToIntInst>(UseInst)) || (isa<TruncInst>(UseInst)))
       for (Use &U : UseInst->uses()) {
         Instruction *User = cast<Instruction>(U.getUser());
-        if (replaceIVUserWithLoopInvariant(User))
+        if (DT->dominates(L->getHeader(), User->getParent()) &&
+            replaceIVUserWithLoopInvariant(User))
           break; // done replacing
       }
 
-    Instruction *IVOperand = UseOper.second;
     for (unsigned N = 0; IVOperand; ++N) {
       assert(N <= Simplified.size() && "runaway iteration");
       (void) N;
@@ -960,7 +974,8 @@ void SimplifyIndvar::simplifyUsers(PHINode *CurrIV, IVVisitor *V) {
       continue;
 
     if (eliminateIVUser(UseInst, IVOperand)) {
-      pushIVUsers(IVOperand, L, Simplified, SimpleIVUsers);
+      pushIVUsers(IVOperand, L, DT, Simplified, SimpleIVUsers,
+                  OutOfLoopChainCounter, MaxDepthOutOfLoop);
       continue;
     }
 
@@ -968,14 +983,16 @@ void SimplifyIndvar::simplifyUsers(PHINode *CurrIV, IVVisitor *V) {
       if (strengthenBinaryOp(BO, IVOperand)) {
         // re-queue uses of the now modified binary operator and fall
         // through to the checks that remain.
-        pushIVUsers(IVOperand, L, Simplified, SimpleIVUsers);
+        pushIVUsers(IVOperand, L, DT, Simplified, SimpleIVUsers,
+                    OutOfLoopChainCounter, MaxDepthOutOfLoop);
       }
     }
 
     // Try to use integer induction for FPToSI of float induction directly.
     if (replaceFloatIVWithIntegerIV(UseInst)) {
       // Re-queue the potentially new direct uses of IVOperand.
-      pushIVUsers(IVOperand, L, Simplified, SimpleIVUsers);
+      pushIVUsers(IVOperand, L, DT, Simplified, SimpleIVUsers,
+                  OutOfLoopChainCounter, MaxDepthOutOfLoop);
       continue;
     }
 
@@ -985,7 +1002,8 @@ void SimplifyIndvar::simplifyUsers(PHINode *CurrIV, IVVisitor *V) {
       continue;
     }
     if (isSimpleIVUser(UseInst, L, SE)) {
-      pushIVUsers(UseInst, L, Simplified, SimpleIVUsers);
+      pushIVUsers(UseInst, L, DT, Simplified, SimpleIVUsers,
+                  OutOfLoopChainCounter, MaxDepthOutOfLoop);
     }
   }
 }
@@ -999,13 +1017,13 @@ void IVVisitor::anchor() { }
 ///  Returns a pair where the first entry indicates that the function makes
 ///  changes and the second entry indicates that it introduced new opportunities
 ///  for loop unswitching.
-std::pair<bool, bool> simplifyUsersOfIV(PHINode *CurrIV, ScalarEvolution *SE,
-                                        DominatorTree *DT, LoopInfo *LI,
-                                        const TargetTransformInfo *TTI,
-                                        SmallVectorImpl<WeakTrackingVH> &Dead,
-                                        SCEVExpander &Rewriter, IVVisitor *V) {
+std::pair<bool, bool>
+simplifyUsersOfIV(PHINode *CurrIV, ScalarEvolution *SE, DominatorTree *DT,
+                  LoopInfo *LI, const TargetTransformInfo *TTI,
+                  SmallVectorImpl<WeakTrackingVH> &Dead, SCEVExpander &Rewriter,
+                  unsigned MaxDepthOutOfLoop, IVVisitor *V) {
   SimplifyIndvar SIV(LI->getLoopFor(CurrIV->getParent()), SE, DT, LI, TTI,
-                     Rewriter, Dead);
+                     Rewriter, Dead, MaxDepthOutOfLoop);
   SIV.simplifyUsers(CurrIV, V);
   return {SIV.hasChanged(), SIV.runUnswitching()};
 }

@v01dXYZ v01dXYZ changed the title (Draft) [SimplifyIndVar] Push more users to worklist for simplifyUsers [SimplifyIndVar] Push more users to worklist for simplifyUsers Jun 18, 2024
@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented Jun 24, 2024

ping

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test in llvm/test/Transforms/IndVarSimplify.

simplifyUsersOfIV(PHINode *CurrIV, ScalarEvolution *SE, DominatorTree *DT,
LoopInfo *LI, const TargetTransformInfo *TTI,
SmallVectorImpl<WeakTrackingVH> &Dead, SCEVExpander &Rewriter,
unsigned MaxDepthOutOfLoop = 1, IVVisitor *V = nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the MaxDepthOutOfLoop parameter? Can you move the cl::opt into SimplifyIndVar and directly use it instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transforms/Utils/LoopUnroll.cpp uses it via simplifyLoopIVs. I didn't want to add it here as it could create regressions in passes I'm not interested in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that makes sense. But as implemented, aren't you still calling this with the default of 1 from LoopUnroll? Shouldn't the default be 0 instead?

Could you please rebase over 6e3725d so we at least don't have to pass through this parameter everywhere?

Copy link
Contributor Author

@v01dXYZ v01dXYZ Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question you raise is actually quite important as this code is disabled with this current default config. I disabled it as I wanted to make it optimisation level dependent but I needed some input to calibrate it according to the level.

It is set to 1 which means the set of BBs that will be affected are dist(Loop, BB) < 1 => BB in Loop (the strict bound is on purpose). The 0 value is a special value for an +infinity bound.

I don't know if this encoding is satisfying. Another one is to use -1 but I don't know if cl::opt<unsigned> supports it.

+1 for making pushIVUsers a method instead of a function.

I think I'll put it to 0 now as it will show performance regressions and allow discussing about that. I'll add some tests before though.

SimpleIVUsers.push_back(std::make_pair(UI, Def));
unsigned Counter =
L->contains(UI)
? 0 // reset depth if we go back inside the loop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

If we want to compute a overapproximation of the distance in the CFG dist(Loop, BB), this helps to tighten the approximation to the real distance.

unsigned Counter =
L->contains(UI)
? 0 // reset depth if we go back inside the loop.
: OutOfLoopChainCounter + (UI->getParent() != Def->getParent());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a limit on the depth of the blocks rather than instructions? You can have a very long chain of instructions within a single block, or a very short one across blocks. It seems like that would be the more relevant quantity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, I implemented a instruction-wise depth but I changed that as I visualised the algorithm at the CFG level, as accepting optimising basic blocks that are at given distance of the loop dist(Loop, BB) < limit.

Also, I didn't like the idea to have partially optimised basic-blocks.

v01dxyz added 4 commits June 27, 2024 00:16
Instead of considering only users that are inside the loop,
consider all the users with parent dominated by the loop header.
@v01dXYZ v01dXYZ force-pushed the 90417-indvars-simplify-users-out-of-loop branch from 3af0c12 to aeac9e8 Compare June 26, 2024 22:27
@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented Jun 26, 2024

No compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=e5a41f0afc152cc24b8fef3aa177ef53b2e77c43&to=96dd00646ea8346b9eb4efc0b5b5c78375e8189b&stat=instructions:u

I changed the default config which disabled the traversal of users that are out of loop.

I expect performance regressions with reenabling full traversal.

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented Jun 28, 2024

I don't understand why the CI pipeline failed. Is there a way to relaunch it for linux without repushing the last commit ?

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented Jul 1, 2024

Ping

@nikic
Copy link
Contributor

nikic commented Jul 1, 2024

It looks like the updated patch miscompiles clang: http://llvm-compile-time-tracker.com/show_error.php?commit=9236a1de08a0f522a4c123b3818ea1dcb0361c3c There are crashes when building llvm-test-suite with stage2 clang.

@nikic
Copy link
Contributor

nikic commented Jul 2, 2024

v01dxyz added 2 commits July 3, 2024 16:31
Out of loop, an IV user is loop invariant. Using L->isLoopInvariant is
no more valid to distinguish which operand is the loop-constant one from
the IV user.
@v01dXYZ v01dXYZ force-pushed the 90417-indvars-simplify-users-out-of-loop branch from 2276def to 8a92def Compare July 3, 2024 14:41
@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented Jul 3, 2024

I want to add another point to the ones mentioned by @efriedma-quic:

  • the optimisations done in-loop are much more susceptible to improve the perf compared to out-of-loop. It's more profitable compared to colder paths. It could be wiser to use a light version instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants