[SystemZ] Fix compile time regression in adjustInliningThreshold(). #137527

JonPsson1 · 2025-04-27T16:25:28Z

Instead of always iterating over all GlobalVariable:s in the Module to find the case where both Caller and Callee is using the same GV heavily, first scan Callee (only if less than 200 instructions) for all GVs used more than 10 times, and then do the counting for the Caller for just those relevant GVs.

The limit of 200 instructions makes sense as this aims to inline a relatively small function using a GV +10 times. This limit changed only 7 files across 3 SPEC benchmarks . Previously only perlbench performance was affected, and perl is not among these 3 changed benchmarks, so there should not be any difference to consider here. SPEC runs seem to confirm this ("full/home-dir").

Compile time across SPEC shows no difference compared to main. It however resolves the compile time problem with zig where it is on main (compared to removing the heuristic) a 380% increase, but with this change only 2.4% increase (total user compile time with opt).

Fixes #134714.

llvmbot · 2025-04-27T16:25:49Z

@llvm/pr-subscribers-backend-systemz

Author: Jonas Paulsson (JonPsson1)

Changes

Instead of always iterating over all GlobalVariable:s in the Module to find the case where both Caller and Callee is using the same GV heavily, first scan Callee (only if less than 200 instructions) for all GVs used more than 10 times, and then do the counting for the Caller for just those relevant GVs.

The limit of 200 instructions makes sense as this aims to inline a relatively small function using a GV +10 times. This limit changed only 7 files across 3 SPEC benchmarks . Previously only perlbench performance was affected, and perl is not among these 3 changed benchmarks, so there should not be any difference to consider here. SPEC runs seem to confirm this ("full/home-dir").

Compile time across SPEC shows no difference compared to main. It however resolves the compile time problem with zig where it is on main (compared to removing the heuristic) a 380% increase, but with this change only 2.4% increase (total user compile time with opt).

Fixes #134714.

Full diff: https://github.com/llvm/llvm-project/pull/137527.diff

1 Files Affected:

(modified) llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp (+29-21)

diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
index ee142ccd20e20..78f5154229f55 100644
--- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
@@ -80,7 +80,6 @@ unsigned SystemZTTIImpl::adjustInliningThreshold(const CallBase *CB) const {
   const Function *Callee = CB->getCalledFunction();
   if (!Callee)
     return 0;
-  const Module *M = Caller->getParent();
 
   // Increase the threshold if an incoming argument is used only as a memcpy
   // source.
@@ -92,29 +91,38 @@ unsigned SystemZTTIImpl::adjustInliningThreshold(const CallBase *CB) const {
     }
   }
 
-  // Give bonus for globals used much in both caller and callee.
-  std::set<const GlobalVariable *> CalleeGlobals;
-  std::set<const GlobalVariable *> CallerGlobals;
-  for (const GlobalVariable &Global : M->globals())
-    for (const User *U : Global.users())
-      if (const Instruction *User = dyn_cast<Instruction>(U)) {
-        if (User->getParent()->getParent() == Callee)
-          CalleeGlobals.insert(&Global);
-        if (User->getParent()->getParent() == Caller)
-          CallerGlobals.insert(&Global);
+  // Give bonus for globals used much in both caller and a relatively small
+  // callee.
+  if (Callee->getInstructionCount() < 200) {
+    std::map<const Value *, unsigned> Ptr2NumUses;
+    for (auto &BB : *Callee)
+      for (auto &I : BB) {
+        if (const auto *SI = dyn_cast<StoreInst>(&I)) {
+          if (!SI->isVolatile())
+            Ptr2NumUses[SI->getPointerOperand()]++;
+        } else if (const auto *LI = dyn_cast<LoadInst>(&I)) {
+          if (!LI->isVolatile())
+            Ptr2NumUses[LI->getPointerOperand()]++;
+        } else if (const auto *GEP = dyn_cast<GetElementPtrInst>(&I)) {
+          unsigned NumStores = 0, NumLoads = 0;
+          countNumMemAccesses(GEP, NumStores, NumLoads, Callee);
+          Ptr2NumUses[GEP->getPointerOperand()] += NumLoads + NumStores;
+        }
       }
-  for (auto *GV : CalleeGlobals)
-    if (CallerGlobals.count(GV)) {
-      unsigned CalleeStores = 0, CalleeLoads = 0;
-      unsigned CallerStores = 0, CallerLoads = 0;
-      countNumMemAccesses(GV, CalleeStores, CalleeLoads, Callee);
-      countNumMemAccesses(GV, CallerStores, CallerLoads, Caller);
-      if ((CalleeStores + CalleeLoads) > 10 &&
-          (CallerStores + CallerLoads) > 10) {
-        Bonus = 1000;
-        break;
+
+    for (auto I : Ptr2NumUses) {
+      const Value *Ptr = I.first;
+      unsigned NumCalleeUses = I.second;
+      if (NumCalleeUses > 10 && isa<GlobalVariable>(Ptr)) {
+        unsigned CallerStores = 0, CallerLoads = 0;
+        countNumMemAccesses(Ptr, CallerStores, CallerLoads, Caller);
+        if (CallerStores + CallerLoads > 10) {
+          Bonus = 1000;
+          break;
+        }
       }
     }
+  }
 
   // Give bonus when Callee accesses an Alloca of Caller heavily.
   unsigned NumStores = 0;

JonPsson1 · 2025-04-27T16:31:26Z

@alexrp Does this help on your side as well?

nikic · 2025-04-27T16:29:36Z

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp

-          CallerGlobals.insert(&Global);
+  // Give bonus for globals used much in both caller and a relatively small
+  // callee.
+  if (Callee->getInstructionCount() < 200) {


getInstructionCount() is going to iterate over the whole function to determine the instruction count. It's usually better to always do the analysis and just bail out if it inspects too many instructions.

ok, using a counter instead.

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp

nikic · 2025-04-27T16:30:48Z

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp

+        } else if (const auto *GEP = dyn_cast<GetElementPtrInst>(&I)) {
+          unsigned NumStores = 0, NumLoads = 0;
+          countNumMemAccesses(GEP, NumStores, NumLoads, Callee);
+          Ptr2NumUses[GEP->getPointerOperand()] += NumLoads + NumStores;


It seems like nothing actually cares about loads and stores separately? Make things simpler by having countNumMemAccesses return the total count?

There actually is a bit further down - separate checks for number of loads and stores. It may be that these could be combined there as well, but that should probably wait and be benchmarked after this.

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp

JonPsson1 · 2025-04-28T06:13:32Z

@nikic thanks for review! I followed your suggestions and indeed see a further speedup: zig.bc is now regressing <0.5% with this heuristic (compared to removing it), so nearly gone. And average over SPEC for InlinerPass is now 2% better than with main (1.46% vs 1.49%).

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp

uweigand · 2025-04-28T11:30:37Z

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp

+  // callee.
+  unsigned InstrCount = 0;
+  SmallDenseMap<const Value *, unsigned> Ptr2NumUses;
+  for (auto &I : instructions(Callee)) {


Do we need to watch out for debug instructions here? We don't want the count == 200 check to differ between a debug and a non-debug compile ...

I started out with that actually, but @nikic suggested this not being needed with the new debug refs, which was committed for SystemZ a month ago. (haven't double-checked though)

This is due to the switch from debug intrinsics to debug records. It's a separate change from debug instr ref (which is a backend thing).

I see, thanks. Would this still be the case for a LLVM 20 backport? (Not sure when this change came in ...)

Yes, it's already in LLVM 20.

nikic

LGTM

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp

uweigand

LGTM as well, thanks! This should also go into LLVM 20 to fix the regression.

JonPsson1 · 2025-04-28T13:07:39Z

/cherry-pick 98b895d

llvmbot · 2025-04-28T13:13:50Z

/pull-request #137628

…lvm#137527) Instead of always iterating over all GlobalVariable:s in the Module to find the case where both Caller and Callee is using the same GV heavily, first scan Callee (only if less than 200 instructions) for all GVs used more than 10 times, and then do the counting for the Caller for just those relevant GVs. The limit of 200 instructions makes sense as this aims to inline a relatively small function using a GV +10 times. This resolves the compile time problem with zig where it is on main (compared to removing the heuristic) a 380% increase, but with this change <0.5% increase (total user compile time with opt). Fixes llvm#134714. (cherry picked from commit 98b895d)

…lvm#137527) Instead of always iterating over all GlobalVariable:s in the Module to find the case where both Caller and Callee is using the same GV heavily, first scan Callee (only if less than 200 instructions) for all GVs used more than 10 times, and then do the counting for the Caller for just those relevant GVs. The limit of 200 instructions makes sense as this aims to inline a relatively small function using a GV +10 times. This resolves the compile time problem with zig where it is on main (compared to removing the heuristic) a 380% increase, but with this change <0.5% increase (total user compile time with opt). Fixes llvm#134714.

IP

65bc5bf

JonPsson1 added the backend:SystemZ label Apr 27, 2025

JonPsson1 requested review from alexrp, nikic and uweigand April 27, 2025 16:25

JonPsson1 mentioned this pull request Apr 27, 2025

[SystemZ] Large compile time regression in SystemZTTIImpl::adjustInliningThreshold() #134714

Closed

nikic reviewed Apr 27, 2025

View reviewed changes

Updated per review.

be26bc9

nikic reviewed Apr 28, 2025

View reviewed changes

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp Outdated Show resolved Hide resolved

Use instructions() and avoid goto.

7e44c55

uweigand reviewed Apr 28, 2025

View reviewed changes

nikic approved these changes Apr 28, 2025

View reviewed changes

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp Outdated Show resolved Hide resolved

GV -> *GV in auto decl for clarity

079b2d5

uweigand approved these changes Apr 28, 2025

View reviewed changes

JonPsson1 marked this pull request as ready for review April 28, 2025 12:59

JonPsson1 merged commit 98b895d into llvm:main Apr 28, 2025
13 checks passed

JonPsson1 deleted the InlinerZig branch April 28, 2025 13:04

JonPsson1 added this to the LLVM 20.X Release milestone Apr 28, 2025

github-project-automation bot added this to LLVM Release Status Apr 28, 2025

github-project-automation bot moved this to Needs Triage in LLVM Release Status Apr 28, 2025

llvmbot moved this from Needs Triage to Done in LLVM Release Status Apr 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SystemZ] Fix compile time regression in adjustInliningThreshold(). #137527

[SystemZ] Fix compile time regression in adjustInliningThreshold(). #137527

JonPsson1 commented Apr 27, 2025

llvmbot commented Apr 27, 2025

JonPsson1 commented Apr 27, 2025

nikic Apr 27, 2025

JonPsson1 Apr 28, 2025

nikic Apr 27, 2025

JonPsson1 Apr 27, 2025

JonPsson1 commented Apr 28, 2025

uweigand Apr 28, 2025

JonPsson1 Apr 28, 2025

nikic Apr 28, 2025

uweigand Apr 28, 2025

nikic Apr 28, 2025

nikic left a comment

uweigand left a comment

JonPsson1 commented Apr 28, 2025

llvmbot commented Apr 28, 2025

[SystemZ] Fix compile time regression in adjustInliningThreshold(). #137527

[SystemZ] Fix compile time regression in adjustInliningThreshold(). #137527

Conversation

JonPsson1 commented Apr 27, 2025

llvmbot commented Apr 27, 2025

JonPsson1 commented Apr 27, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JonPsson1 commented Apr 28, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikic left a comment

Choose a reason for hiding this comment

uweigand left a comment

Choose a reason for hiding this comment

JonPsson1 commented Apr 28, 2025

llvmbot commented Apr 28, 2025