Skip to content

[AMDGPU][NPM] Port SIPreEmitPeephole to NPM #130065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 8, 2025

Conversation

optimisan
Copy link
Contributor

No description provided.

Copy link

github-actions bot commented Mar 7, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@optimisan optimisan force-pushed the users/optimisan/preemit/port-si-pre-emit-peephole branch from 1864597 to bedfcb1 Compare March 10, 2025 04:42
@optimisan optimisan force-pushed the users/optimisan/preemit/port-amdgpu-set-wave-priority branch from 147e7af to 192871b Compare March 10, 2025 04:44
@optimisan optimisan force-pushed the users/optimisan/preemit/port-amdgpu-set-wave-priority branch from 192871b to 031fd29 Compare March 10, 2025 05:22
@optimisan optimisan force-pushed the users/optimisan/preemit/port-si-pre-emit-peephole branch from bedfcb1 to 586bcbc Compare March 10, 2025 05:22
@optimisan optimisan marked this pull request as ready for review March 10, 2025 05:22
@llvmbot
Copy link
Member

llvmbot commented Mar 10, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Akshat Oke (optimisan)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/130065.diff

7 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.h (+8-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def (+1-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+3-4)
  • (modified) llvm/lib/Target/AMDGPU/SIPreEmitPeephole.cpp (+23-8)
  • (modified) llvm/test/CodeGen/AMDGPU/insert-handle-flat-vmem-ds.mir (+1)
  • (modified) llvm/test/CodeGen/AMDGPU/remove-short-exec-branches-special-instructions.mir (+1)
  • (modified) llvm/test/CodeGen/AMDGPU/set-gpr-idx-peephole.mir (+1)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 27ae6d42ec21d..b8f5d85ef0b9a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -210,7 +210,7 @@ extern char &SIWholeQuadModeID;
 void initializeSILowerControlFlowLegacyPass(PassRegistry &);
 extern char &SILowerControlFlowLegacyID;
 
-void initializeSIPreEmitPeepholePass(PassRegistry &);
+void initializeSIPreEmitPeepholeLegacyPass(PassRegistry &);
 extern char &SIPreEmitPeepholeID;
 
 void initializeSILateBranchLoweringLegacyPass(PassRegistry &);
@@ -392,6 +392,13 @@ class SILateBranchLoweringPass
   static bool isRequired() { return true; }
 };
 
+class SIPreEmitPeepholePass : public PassInfoMixin<SIPreEmitPeepholePass> {
+public:
+  PreservedAnalyses run(MachineFunction &MF,
+                        MachineFunctionAnalysisManager &MFAM);
+  static bool isRequired() { return true; }
+};
+
 class AMDGPUSetWavePriorityPass
     : public PassInfoMixin<AMDGPUSetWavePriorityPass> {
 public:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 4956897d22fde..f14499d0d3146 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -125,6 +125,7 @@ MACHINE_FUNCTION_PASS("si-optimize-exec-masking-pre-ra", SIOptimizeExecMaskingPr
 MACHINE_FUNCTION_PASS("si-peephole-sdwa", SIPeepholeSDWAPass())
 MACHINE_FUNCTION_PASS("si-post-ra-bundler", SIPostRABundlerPass())
 MACHINE_FUNCTION_PASS("si-pre-allocate-wwm-regs", SIPreAllocateWWMRegsPass())
+MACHINE_FUNCTION_PASS("si-pre-emit-peephole", SIPreEmitPeepholePass())
 MACHINE_FUNCTION_PASS("si-shrink-instructions", SIShrinkInstructionsPass())
 MACHINE_FUNCTION_PASS("si-wqm", SIWholeQuadModePass())
 #undef MACHINE_FUNCTION_PASS
@@ -133,7 +134,6 @@ MACHINE_FUNCTION_PASS("si-wqm", SIWholeQuadModePass())
 DUMMY_MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations", GCNPreRAOptimizationsPass())
 DUMMY_MACHINE_FUNCTION_PASS("amdgpu-rewrite-partial-reg-uses", GCNRewritePartialRegUsesPass())
 
-DUMMY_MACHINE_FUNCTION_PASS("si-pre-emit-peephole", SIPreEmitPeepholePass())
 // TODO: Move amdgpu-preload-kern-arg-prolog to MACHINE_FUNCTION_PASS since it
 // already exists.
 DUMMY_MACHINE_FUNCTION_PASS("amdgpu-preload-kern-arg-prolog", AMDGPUPreloadKernArgPrologPass())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 857af30b348cb..05eb609956199 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -539,7 +539,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
   initializeSIModeRegisterLegacyPass(*PR);
   initializeSIWholeQuadModeLegacyPass(*PR);
   initializeSILowerControlFlowLegacyPass(*PR);
-  initializeSIPreEmitPeepholePass(*PR);
+  initializeSIPreEmitPeepholeLegacyPass(*PR);
   initializeSILateBranchLoweringLegacyPass(*PR);
   initializeSIMemoryLegalizerLegacyPass(*PR);
   initializeSIOptimizeExecMaskingLegacyPass(*PR);
@@ -2166,9 +2166,8 @@ void AMDGPUCodeGenPassBuilder::addPreEmitPass(AddMachinePass &addPass) const {
   if (isPassEnabled(EnableSetWavePriority, CodeGenOptLevel::Less))
     addPass(AMDGPUSetWavePriorityPass());
 
-  if (TM.getOptLevel() > CodeGenOptLevel::None) {
-    // TODO: addPass(SIPreEmitPeepholePass());
-  }
+  if (TM.getOptLevel() > CodeGenOptLevel::None)
+    addPass(SIPreEmitPeepholePass());
 
   // The hazard recognizer that runs as part of the post-ra scheduler does not
   // guarantee to be able handle all hazards correctly. This is because if there
diff --git a/llvm/lib/Target/AMDGPU/SIPreEmitPeephole.cpp b/llvm/lib/Target/AMDGPU/SIPreEmitPeephole.cpp
index 2bb70c138a50c..9db2118f2997b 100644
--- a/llvm/lib/Target/AMDGPU/SIPreEmitPeephole.cpp
+++ b/llvm/lib/Target/AMDGPU/SIPreEmitPeephole.cpp
@@ -24,7 +24,7 @@ using namespace llvm;
 
 namespace {
 
-class SIPreEmitPeephole : public MachineFunctionPass {
+class SIPreEmitPeephole {
 private:
   const SIInstrInfo *TII = nullptr;
   const SIRegisterInfo *TRI = nullptr;
@@ -40,24 +40,31 @@ class SIPreEmitPeephole : public MachineFunctionPass {
                              const MachineBasicBlock &To) const;
   bool removeExeczBranch(MachineInstr &MI, MachineBasicBlock &SrcMBB);
 
+public:
+  bool run(MachineFunction &MF);
+};
+
+class SIPreEmitPeepholeLegacy : public MachineFunctionPass {
 public:
   static char ID;
 
-  SIPreEmitPeephole() : MachineFunctionPass(ID) {
-    initializeSIPreEmitPeepholePass(*PassRegistry::getPassRegistry());
+  SIPreEmitPeepholeLegacy() : MachineFunctionPass(ID) {
+    initializeSIPreEmitPeepholeLegacyPass(*PassRegistry::getPassRegistry());
   }
 
-  bool runOnMachineFunction(MachineFunction &MF) override;
+  bool runOnMachineFunction(MachineFunction &MF) override {
+    return SIPreEmitPeephole().run(MF);
+  }
 };
 
 } // End anonymous namespace.
 
-INITIALIZE_PASS(SIPreEmitPeephole, DEBUG_TYPE,
+INITIALIZE_PASS(SIPreEmitPeepholeLegacy, DEBUG_TYPE,
                 "SI peephole optimizations", false, false)
 
-char SIPreEmitPeephole::ID = 0;
+char SIPreEmitPeepholeLegacy::ID = 0;
 
-char &llvm::SIPreEmitPeepholeID = SIPreEmitPeephole::ID;
+char &llvm::SIPreEmitPeepholeID = SIPreEmitPeepholeLegacy::ID;
 
 bool SIPreEmitPeephole::optimizeVccBranch(MachineInstr &MI) const {
   // Match:
@@ -410,7 +417,15 @@ bool SIPreEmitPeephole::removeExeczBranch(MachineInstr &MI,
   return true;
 }
 
-bool SIPreEmitPeephole::runOnMachineFunction(MachineFunction &MF) {
+PreservedAnalyses
+llvm::SIPreEmitPeepholePass::run(MachineFunction &MF,
+                                 MachineFunctionAnalysisManager &MFAM) {
+  if (!SIPreEmitPeephole().run(MF))
+    return PreservedAnalyses::all();
+  return getMachineFunctionPassPreservedAnalyses();
+}
+
+bool SIPreEmitPeephole::run(MachineFunction &MF) {
   const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
   TII = ST.getInstrInfo();
   TRI = &TII->getRegisterInfo();
diff --git a/llvm/test/CodeGen/AMDGPU/insert-handle-flat-vmem-ds.mir b/llvm/test/CodeGen/AMDGPU/insert-handle-flat-vmem-ds.mir
index d89f306c96a36..2e8c8ca9c7a6c 100644
--- a/llvm/test/CodeGen/AMDGPU/insert-handle-flat-vmem-ds.mir
+++ b/llvm/test/CodeGen/AMDGPU/insert-handle-flat-vmem-ds.mir
@@ -1,5 +1,6 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
 # RUN: llc -mtriple=amdgcn -mcpu=polaris10 -run-pass si-pre-emit-peephole -verify-machineinstrs %s -o - | FileCheck %s
+# RUN: llc -mtriple=amdgcn -mcpu=polaris10 -passes si-pre-emit-peephole -verify-machineinstrs %s -o - | FileCheck %s
 
 ---
 
diff --git a/llvm/test/CodeGen/AMDGPU/remove-short-exec-branches-special-instructions.mir b/llvm/test/CodeGen/AMDGPU/remove-short-exec-branches-special-instructions.mir
index 20de119471ba3..92a9a195fc4c7 100644
--- a/llvm/test/CodeGen/AMDGPU/remove-short-exec-branches-special-instructions.mir
+++ b/llvm/test/CodeGen/AMDGPU/remove-short-exec-branches-special-instructions.mir
@@ -1,5 +1,6 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
 # RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -run-pass=si-pre-emit-peephole -verify-machineinstrs  %s -o - | FileCheck %s
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -passes=si-pre-emit-peephole -verify-machineinstrs  %s -o - | FileCheck %s
 # Make sure mandatory skips are not removed around mode defs.
 
 ---
diff --git a/llvm/test/CodeGen/AMDGPU/set-gpr-idx-peephole.mir b/llvm/test/CodeGen/AMDGPU/set-gpr-idx-peephole.mir
index 796a70cfe8a39..1d0a6db36ea3b 100644
--- a/llvm/test/CodeGen/AMDGPU/set-gpr-idx-peephole.mir
+++ b/llvm/test/CodeGen/AMDGPU/set-gpr-idx-peephole.mir
@@ -1,5 +1,6 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
 # RUN: llc -mtriple=amdgcn -mcpu=gfx900 -run-pass si-pre-emit-peephole -verify-machineinstrs -o - %s | FileCheck -check-prefix=GCN %s -implicit-check-not=S_SET_GPR_IDX
+# RUN: llc -mtriple=amdgcn -mcpu=gfx900 -passes si-pre-emit-peephole -verify-machineinstrs -o - %s | FileCheck -check-prefix=GCN %s -implicit-check-not=S_SET_GPR_IDX
 
 ---
 name:            simple

@optimisan optimisan requested a review from arsenm March 10, 2025 05:23
@optimisan optimisan requested review from cdevadas and vikramRH March 10, 2025 05:23
Comment on lines 424 to 426
return PreservedAnalyses::all();
return getMachineFunctionPassPreservedAnalyses();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return PreservedAnalyses::all();
return getMachineFunctionPassPreservedAnalyses();
return PreservedAnalyses::all();
return getMachineFunctionPassPreservedAnalyses();

@@ -1,5 +1,6 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -run-pass=si-pre-emit-peephole -verify-machineinstrs %s -o - | FileCheck %s
# RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -passes=si-pre-emit-peephole -verify-machineinstrs %s -o - | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -passes=si-pre-emit-peephole -verify-machineinstrs %s -o - | FileCheck %s
# RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -passes=si-pre-emit-peephole %s -o - | FileCheck %s

@@ -1,5 +1,6 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -mtriple=amdgcn -mcpu=gfx900 -run-pass si-pre-emit-peephole -verify-machineinstrs -o - %s | FileCheck -check-prefix=GCN %s -implicit-check-not=S_SET_GPR_IDX
# RUN: llc -mtriple=amdgcn -mcpu=gfx900 -passes si-pre-emit-peephole -verify-machineinstrs -o - %s | FileCheck -check-prefix=GCN %s -implicit-check-not=S_SET_GPR_IDX
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# RUN: llc -mtriple=amdgcn -mcpu=gfx900 -passes si-pre-emit-peephole -verify-machineinstrs -o - %s | FileCheck -check-prefix=GCN %s -implicit-check-not=S_SET_GPR_IDX
# RUN: llc -mtriple=amdgcn -mcpu=gfx900 -passes si-pre-emit-peephole -o - %s | FileCheck -check-prefix=GCN %s -implicit-check-not=S_SET_GPR_IDX

@optimisan optimisan force-pushed the users/optimisan/preemit/port-amdgpu-set-wave-priority branch from 031fd29 to 46ced15 Compare March 11, 2025 09:42
@optimisan optimisan force-pushed the users/optimisan/preemit/port-si-pre-emit-peephole branch 2 times, most recently from 5334985 to 4f0da14 Compare March 11, 2025 09:44
@optimisan optimisan force-pushed the users/optimisan/preemit/port-amdgpu-set-wave-priority branch from 46ced15 to 75e1e52 Compare March 26, 2025 08:14
@optimisan optimisan force-pushed the users/optimisan/preemit/port-amdgpu-set-wave-priority branch from 75e1e52 to 15a6c1f Compare April 2, 2025 09:55
Base automatically changed from users/optimisan/preemit/port-amdgpu-set-wave-priority to main April 2, 2025 10:58
@optimisan optimisan force-pushed the users/optimisan/preemit/port-si-pre-emit-peephole branch from 4f0da14 to ce8d79b Compare April 2, 2025 11:03
@optimisan optimisan merged commit fcaefc2 into main Apr 8, 2025
10 of 12 checks passed
@optimisan optimisan deleted the users/optimisan/preemit/port-si-pre-emit-peephole branch April 8, 2025 12:28
qiaojbao pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Apr 29, 2025
Local branch origin/amd-gfx 25b4073 Merged main:ac42b083f104 into origin/amd-gfx:160270860d83
Remote branch main fcaefc2 [AMDGPU][NPM] Port SIPreEmitPeephole to NPM (llvm#130065)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants