Skip to content

[SystemZ] Consider VST/VL as SimpleBDXStore/Load #135623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 14, 2025

Conversation

dominik-steenken
Copy link
Contributor

Previously vst and vl were not considered "simple" BDX stores and loads, leading to, among other things, some opportunities for mvc optimization to be missed.

This PR addresses this and updates some tests to account for additional mvc instructions being emitted.

This is observed to have a neutral or slightly beneficial effect performance-wise.

@llvmbot
Copy link
Member

llvmbot commented Apr 14, 2025

@llvm/pr-subscribers-backend-systemz

Author: Dominik Steenken (dominik-steenken)

Changes

Previously vst and vl were not considered "simple" BDX stores and loads, leading to, among other things, some opportunities for mvc optimization to be missed.

This PR addresses this and updates some tests to account for additional mvc instructions being emitted.

This is observed to have a neutral or slightly beneficial effect performance-wise.


Full diff: https://github.com/llvm/llvm-project/pull/135623.diff

2 Files Affected:

  • (modified) llvm/lib/Target/SystemZ/SystemZInstrVector.td (+4-2)
  • (modified) llvm/test/CodeGen/SystemZ/vector-constrained-fp-intrinsics.ll (+27-41)
diff --git a/llvm/lib/Target/SystemZ/SystemZInstrVector.td b/llvm/lib/Target/SystemZ/SystemZInstrVector.td
index e1fe7edc4cb08..d8c48239ac633 100644
--- a/llvm/lib/Target/SystemZ/SystemZInstrVector.td
+++ b/llvm/lib/Target/SystemZ/SystemZInstrVector.td
@@ -105,7 +105,8 @@ let Predicates = [FeatureVector] in {
 
 let Predicates = [FeatureVector] in {
   // Load.
-  defm VL : UnaryVRXAlign<"vl", 0xE706>;
+  let SimpleBDXLoad = 1 in
+      defm VL : UnaryVRXAlign<"vl", 0xE706>;
 
   // Load to block boundary.  The number of loaded bytes is only known
   // at run time.  The instruction is really polymorphic, but v128b matches
@@ -213,7 +214,8 @@ defm : ReplicatePeephole<VLREPG, v2f64, z_load, f64>;
 
 let Predicates = [FeatureVector] in {
   // Store.
-  defm VST : StoreVRXAlign<"vst", 0xE70E>;
+  let SimpleBDXStore = 1 in
+      defm VST : StoreVRXAlign<"vst", 0xE70E>;
 
   // Store with length.  The number of stored bytes is only known at run time.
   def VSTL : StoreLengthVRSb<"vstl", 0xE73F, int_s390_vstl, 0>;
diff --git a/llvm/test/CodeGen/SystemZ/vector-constrained-fp-intrinsics.ll b/llvm/test/CodeGen/SystemZ/vector-constrained-fp-intrinsics.ll
index e0818ea3da294..96d91953b9f83 100644
--- a/llvm/test/CodeGen/SystemZ/vector-constrained-fp-intrinsics.ll
+++ b/llvm/test/CodeGen/SystemZ/vector-constrained-fp-intrinsics.ll
@@ -440,10 +440,9 @@ define void @constrained_vector_frem_v3f64(ptr %a) #0 {
 ; SZ13-NEXT:    ld %f0, 0(%r1)
 ; SZ13-NEXT:    ldr %f2, %f8
 ; SZ13-NEXT:    brasl %r14, fmod@PLT
-; SZ13-NEXT:    std %f0, 16(%r13)
-; SZ13-NEXT:    vl %v0, 160(%r15), 3 # 16-byte Folded Reload
+; SZ13-NEXT:    mvc 0(16,%r13), 160(%r15) # 16-byte Folded Reload
 ; SZ13-NEXT:    ld %f8, 192(%r15) # 8-byte Reload
-; SZ13-NEXT:    vst %v0, 0(%r13), 4
+; SZ13-NEXT:    std %f0, 16(%r13)
 ; SZ13-NEXT:    lmg %r13, %r15, 304(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
@@ -1473,11 +1472,10 @@ define void @constrained_vector_pow_v3f64(ptr %a) #0 {
 ; SZ13-NEXT:    ldr %f0, %f8
 ; SZ13-NEXT:    ldr %f2, %f9
 ; SZ13-NEXT:    brasl %r14, pow@PLT
-; SZ13-NEXT:    std %f0, 16(%r13)
-; SZ13-NEXT:    vl %v0, 160(%r15), 3 # 16-byte Folded Reload
+; SZ13-NEXT:    mvc 0(16,%r13), 160(%r15) # 16-byte Folded Reload
 ; SZ13-NEXT:    ld %f8, 200(%r15) # 8-byte Reload
 ; SZ13-NEXT:    ld %f9, 192(%r15) # 8-byte Reload
-; SZ13-NEXT:    vst %v0, 0(%r13), 4
+; SZ13-NEXT:    std %f0, 16(%r13)
 ; SZ13-NEXT:    lmg %r13, %r15, 312(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
@@ -1829,9 +1827,8 @@ define void @constrained_vector_powi_v3f64(ptr %a) #0 {
 ; SZ13-NEXT:    ld %f0, 0(%r1)
 ; SZ13-NEXT:    lghi %r2, 3
 ; SZ13-NEXT:    brasl %r14, __powidf2@PLT
+; SZ13-NEXT:    mvc 0(16,%r13), 160(%r15) # 16-byte Folded Reload
 ; SZ13-NEXT:    std %f0, 16(%r13)
-; SZ13-NEXT:    vl %v0, 160(%r15), 3 # 16-byte Folded Reload
-; SZ13-NEXT:    vst %v0, 0(%r13), 4
 ; SZ13-NEXT:    lmg %r13, %r15, 280(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
@@ -2155,10 +2152,9 @@ define void @constrained_vector_sin_v3f64(ptr %a) #0 {
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Folded Spill
 ; SZ13-NEXT:    ldr %f0, %f8
 ; SZ13-NEXT:    brasl %r14, sin@PLT
-; SZ13-NEXT:    std %f0, 16(%r13)
-; SZ13-NEXT:    vl %v0, 160(%r15), 3 # 16-byte Folded Reload
+; SZ13-NEXT:    mvc 0(16,%r13), 160(%r15) # 16-byte Folded Reload
 ; SZ13-NEXT:    ld %f8, 192(%r15) # 8-byte Reload
-; SZ13-NEXT:    vst %v0, 0(%r13), 4
+; SZ13-NEXT:    std %f0, 16(%r13)
 ; SZ13-NEXT:    lmg %r13, %r15, 304(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
@@ -2472,10 +2468,9 @@ define void @constrained_vector_cos_v3f64(ptr %a) #0 {
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Folded Spill
 ; SZ13-NEXT:    ldr %f0, %f8
 ; SZ13-NEXT:    brasl %r14, cos@PLT
-; SZ13-NEXT:    std %f0, 16(%r13)
-; SZ13-NEXT:    vl %v0, 160(%r15), 3 # 16-byte Folded Reload
+; SZ13-NEXT:    mvc 0(16,%r13), 160(%r15) # 16-byte Folded Reload
 ; SZ13-NEXT:    ld %f8, 192(%r15) # 8-byte Reload
-; SZ13-NEXT:    vst %v0, 0(%r13), 4
+; SZ13-NEXT:    std %f0, 16(%r13)
 ; SZ13-NEXT:    lmg %r13, %r15, 304(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
@@ -2789,10 +2784,9 @@ define void @constrained_vector_exp_v3f64(ptr %a) #0 {
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Folded Spill
 ; SZ13-NEXT:    ldr %f0, %f8
 ; SZ13-NEXT:    brasl %r14, exp@PLT
-; SZ13-NEXT:    std %f0, 16(%r13)
-; SZ13-NEXT:    vl %v0, 160(%r15), 3 # 16-byte Folded Reload
+; SZ13-NEXT:    mvc 0(16,%r13), 160(%r15) # 16-byte Folded Reload
 ; SZ13-NEXT:    ld %f8, 192(%r15) # 8-byte Reload
-; SZ13-NEXT:    vst %v0, 0(%r13), 4
+; SZ13-NEXT:    std %f0, 16(%r13)
 ; SZ13-NEXT:    lmg %r13, %r15, 304(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
@@ -3106,10 +3100,9 @@ define void @constrained_vector_exp2_v3f64(ptr %a) #0 {
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Folded Spill
 ; SZ13-NEXT:    ldr %f0, %f8
 ; SZ13-NEXT:    brasl %r14, exp2@PLT
-; SZ13-NEXT:    std %f0, 16(%r13)
-; SZ13-NEXT:    vl %v0, 160(%r15), 3 # 16-byte Folded Reload
+; SZ13-NEXT:    mvc 0(16,%r13), 160(%r15) # 16-byte Folded Reload
 ; SZ13-NEXT:    ld %f8, 192(%r15) # 8-byte Reload
-; SZ13-NEXT:    vst %v0, 0(%r13), 4
+; SZ13-NEXT:    std %f0, 16(%r13)
 ; SZ13-NEXT:    lmg %r13, %r15, 304(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
@@ -3423,10 +3416,9 @@ define void @constrained_vector_log_v3f64(ptr %a) #0 {
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Folded Spill
 ; SZ13-NEXT:    ldr %f0, %f8
 ; SZ13-NEXT:    brasl %r14, log@PLT
-; SZ13-NEXT:    std %f0, 16(%r13)
-; SZ13-NEXT:    vl %v0, 160(%r15), 3 # 16-byte Folded Reload
+; SZ13-NEXT:    mvc 0(16,%r13), 160(%r15) # 16-byte Folded Reload
 ; SZ13-NEXT:    ld %f8, 192(%r15) # 8-byte Reload
-; SZ13-NEXT:    vst %v0, 0(%r13), 4
+; SZ13-NEXT:    std %f0, 16(%r13)
 ; SZ13-NEXT:    lmg %r13, %r15, 304(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
@@ -3740,10 +3732,9 @@ define void @constrained_vector_log10_v3f64(ptr %a) #0 {
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Folded Spill
 ; SZ13-NEXT:    ldr %f0, %f8
 ; SZ13-NEXT:    brasl %r14, log10@PLT
-; SZ13-NEXT:    std %f0, 16(%r13)
-; SZ13-NEXT:    vl %v0, 160(%r15), 3 # 16-byte Folded Reload
+; SZ13-NEXT:    mvc 0(16,%r13), 160(%r15) # 16-byte Folded Reload
 ; SZ13-NEXT:    ld %f8, 192(%r15) # 8-byte Reload
-; SZ13-NEXT:    vst %v0, 0(%r13), 4
+; SZ13-NEXT:    std %f0, 16(%r13)
 ; SZ13-NEXT:    lmg %r13, %r15, 304(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
@@ -4057,10 +4048,9 @@ define void @constrained_vector_log2_v3f64(ptr %a) #0 {
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Folded Spill
 ; SZ13-NEXT:    ldr %f0, %f8
 ; SZ13-NEXT:    brasl %r14, log2@PLT
-; SZ13-NEXT:    std %f0, 16(%r13)
-; SZ13-NEXT:    vl %v0, 160(%r15), 3 # 16-byte Folded Reload
+; SZ13-NEXT:    mvc 0(16,%r13), 160(%r15) # 16-byte Folded Reload
 ; SZ13-NEXT:    ld %f8, 192(%r15) # 8-byte Reload
-; SZ13-NEXT:    vst %v0, 0(%r13), 4
+; SZ13-NEXT:    std %f0, 16(%r13)
 ; SZ13-NEXT:    lmg %r13, %r15, 304(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
@@ -4788,10 +4778,9 @@ define void @constrained_vector_log10_maxnum_v3f64(ptr %a) #0 {
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Folded Spill
 ; SZ13-NEXT:    ldr %f0, %f8
 ; SZ13-NEXT:    brasl %r14, fmax@PLT
-; SZ13-NEXT:    std %f0, 16(%r13)
-; SZ13-NEXT:    vl %v0, 160(%r15), 3 # 16-byte Folded Reload
+; SZ13-NEXT:    mvc 0(16,%r13), 160(%r15) # 16-byte Folded Reload
 ; SZ13-NEXT:    ld %f8, 192(%r15) # 8-byte Reload
-; SZ13-NEXT:    vst %v0, 0(%r13), 4
+; SZ13-NEXT:    std %f0, 16(%r13)
 ; SZ13-NEXT:    lmg %r13, %r15, 304(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
@@ -5165,11 +5154,10 @@ define void @constrained_vector_minnum_v3f64(ptr %a) #0 {
 ; SZ13-NEXT:    ldr %f0, %f8
 ; SZ13-NEXT:    ldr %f2, %f9
 ; SZ13-NEXT:    brasl %r14, fmin@PLT
-; SZ13-NEXT:    std %f0, 16(%r13)
-; SZ13-NEXT:    vl %v0, 160(%r15), 3 # 16-byte Folded Reload
+; SZ13-NEXT:    mvc 0(16,%r13), 160(%r15) # 16-byte Folded Reload
 ; SZ13-NEXT:    ld %f8, 200(%r15) # 8-byte Reload
 ; SZ13-NEXT:    ld %f9, 192(%r15) # 8-byte Reload
-; SZ13-NEXT:    vst %v0, 0(%r13), 4
+; SZ13-NEXT:    std %f0, 16(%r13)
 ; SZ13-NEXT:    lmg %r13, %r15, 312(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
@@ -6612,10 +6600,9 @@ define void @constrained_vector_tan_v3f64(ptr %a) #0 {
 ; SZ13-NEXT:    vst %v0, 160(%r15), 3 # 16-byte Folded Spill
 ; SZ13-NEXT:    ldr %f0, %f8
 ; SZ13-NEXT:    brasl %r14, tan@PLT
-; SZ13-NEXT:    std %f0, 16(%r13)
-; SZ13-NEXT:    vl %v0, 160(%r15), 3 # 16-byte Folded Reload
+; SZ13-NEXT:    mvc 0(16,%r13), 160(%r15) # 16-byte Folded Reload
 ; SZ13-NEXT:    ld %f8, 192(%r15) # 8-byte Reload
-; SZ13-NEXT:    vst %v0, 0(%r13), 4
+; SZ13-NEXT:    std %f0, 16(%r13)
 ; SZ13-NEXT:    lmg %r13, %r15, 304(%r15)
 ; SZ13-NEXT:    br %r14
 entry:
@@ -6977,11 +6964,10 @@ define void @constrained_vector_atan2_v3f64(ptr %a, ptr %b) #0 {
 ; SZ13-NEXT:    ldr %f0, %f8
 ; SZ13-NEXT:    ldr %f2, %f9
 ; SZ13-NEXT:    brasl %r14, atan2@PLT
-; SZ13-NEXT:    std %f0, 16(%r13)
-; SZ13-NEXT:    vl %v0, 160(%r15), 3 # 16-byte Folded Reload
+; SZ13-NEXT:    mvc 0(16,%r13), 160(%r15) # 16-byte Folded Reload
 ; SZ13-NEXT:    ld %f8, 216(%r15) # 8-byte Reload
 ; SZ13-NEXT:    ld %f9, 208(%r15) # 8-byte Reload
-; SZ13-NEXT:    vst %v0, 0(%r13), 4
+; SZ13-NEXT:    std %f0, 16(%r13)
 ; SZ13-NEXT:    lmg %r13, %r15, 328(%r15)
 ; SZ13-NEXT:    br %r14
 entry:

@dominik-steenken
Copy link
Contributor Author

I saw the test fail, am investigating

Previously VST and VL were not considered "simple" BDX stores and loads,
leading to, among other things, some opportunities for mvc optimization
to be missed.

This commit addresses this and updates some tests to account for additional
mvc instructions being emitted.

This is observed to have a neutral or slightly beneficial effect
performance-wise.
Copy link
Member

@uweigand uweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@uweigand uweigand merged commit e071233 into llvm:main Apr 14, 2025
10 of 11 checks passed
bcardosolopes added a commit to bcardosolopes/llvm-project that referenced this pull request Apr 14, 2025
* origin/main: (199 commits)
  [NFC][AsmPrinter] Refactor AsmPrinter and AArch64AsmPrinter to prepare for jump table partitions on aarch64 (llvm#125993)
  [HEXAGON] Fix corner cases for hwloops pass (llvm#135439)
  [flang] Handle volatility in lowering and codegen (llvm#135311)
  [MLIR][Shape] Support >2 args in `shape.broadcast` folder (llvm#126808)
  [DirectX] Use scalar arguments for @llvm.dx.dot intrinsics (llvm#134570)
  Remove the redundant check for "WeakPtr" in isSmartPtrClass to fix the issue 135612. (llvm#135629)
  [BOLT] Support relative vtable (llvm#135449)
  [flang] Fix linking to libMLIR (llvm#135483)
  [AsmPrinter] Link .section_sizes to the correct section (llvm#135583)
  [ctxprof] Handle instrumenting functions with `musttail` calls (llvm#135121)
  [SystemZ] Consider VST/VL as SimpleBDXStore/Load (llvm#135623)
  [libc++][CI] Pin the XCode version. (llvm#135412)
  [lldb-dap] Fix win32 build. (llvm#135638)
  [Interp] Mark inline-virtual.cpp as unsupported with ASan (llvm#135402)
  [libc++] Removes the _LIBCPP_VERBOSE_ABORT_NOT_NOEXCEPT macro. (llvm#135494)
  [CVP] Add tests for ucmp/scmp with switch (NFC)
  [mlir][tosa] Align AbsOp example variable names (llvm#135268)
  [mlir][tosa] Align AddOp examples to spec (llvm#135266)
  [mlir][tosa] Align RFFT2d and FFT2d operator examples (llvm#135261)
  [flang][OpenMP][HLFIR] Support vector subscripted array sections for DEPEND (llvm#133892)
  ...
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
Previously `vst` and `vl` were not considered "simple" BDX stores and
loads, leading to, among other things, some opportunities for `mvc`
optimization to be missed.

This PR addresses this and updates some tests to account for additional
`mvc` instructions being emitted.

This is observed to have a neutral or slightly beneficial effect
performance-wise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants