[GlobalIsel][AArch64] Add ADDO combine to the postlegalizer combiner #101327

KRM7 · 2024-07-31T12:42:46Z

No description provided.

github-actions · 2024-07-31T12:43:05Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-07-31T12:43:38Z

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-globalisel

Author: None (KRM7)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/101327.diff

4 Files Affected:

(modified) llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h (+2)
(modified) llvm/include/llvm/Target/GlobalISel/Combine.td (+9-1)
(modified) llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp (+63)
(added) llvm/test/CodeGen/AArch64/GlobalISel/narrow-add.mir (+114)

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index 05d7e882f5135..f4abdc2dc22ea 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -714,6 +714,8 @@ class CombinerHelper {
   /// (G_*SUBE x, y, 0) -> (G_*SUBO x, y)
   bool matchAddEToAddO(MachineInstr &MI, BuildFnTy &MatchInfo);
 
+  bool matchAddWithKnownZeroLowerHalfBits(MachineInstr &MI, BuildFnTy &MatchInfo);
+
   /// Transform (fadd x, fneg(y)) -> (fsub x, y)
   ///           (fadd fneg(x), y) -> (fsub y, x)
   ///           (fsub x, fneg(y)) -> (fadd x, y)
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 2246e20ecc1dc..b12a36e48f94b 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -1142,6 +1142,13 @@ def adde_to_addo: GICombineRule<
          [{ return Helper.matchAddEToAddO(*${root}, ${matchinfo}); }]),
   (apply [{ Helper.applyBuildFnNoErase(*${root}, ${matchinfo}); }])>;
 
+def narrow_add_to_half: GICombineRule<
+  (defs root:$root, build_fn_matchinfo:$matchinfo),
+  (match (wip_match_opcode G_ADD):$root,
+    [{ return Helper.matchAddWithKnownZeroLowerHalfBits(*${root}, ${matchinfo}); }]),
+  (apply [{ Helper.applyBuildFn(*${root}, ${matchinfo}); }])
+>;
+
 def mulh_to_lshr : GICombineRule<
   (defs root:$root),
   (match (wip_match_opcode G_UMULH):$root,
@@ -1829,7 +1836,8 @@ def known_bits_simplifications : GICombineGroup<[
   sext_inreg_to_zext_inreg]>;
 
 def width_reduction_combines : GICombineGroup<[reduce_shl_of_extend,
-                                               narrow_binop_feeding_and]>;
+                                               narrow_binop_feeding_and,
+                                               narrow_add_to_half]>;
 
 def phi_combines : GICombineGroup<[extend_through_phis]>;
 
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index d930ab2984629..afa202a0a8e4c 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -5099,6 +5099,69 @@ bool CombinerHelper::matchAddEToAddO(MachineInstr &MI, BuildFnTy &MatchInfo) {
   return true;
 }
 
+bool CombinerHelper::matchAddWithKnownZeroLowerHalfBits(MachineInstr &MI,
+                                                        BuildFnTy &MatchInfo) {
+  assert(MI.getOpcode() == TargetOpcode::G_ADD);
+
+  const Register DstReg = MI.getOperand(0).getReg();
+  const LLT DstTy = MRI.getType(DstReg);
+
+  if (!DstTy.isScalar()) {
+    return false;
+  }
+
+  const std::uint64_t FullSize = DstTy.getSizeInBits();
+  const std::uint64_t HalfSize = (FullSize + 1) / 2;
+
+  MachineFunction &MF = *MI.getMF();
+  const DataLayout &DL = MF.getDataLayout();
+
+  if (DL.isLegalInteger(FullSize) || !DL.isLegalInteger(HalfSize)) {
+    return false;
+  }
+
+  const Register LhsReg = MI.getOperand(1).getReg();
+  const Register RhsReg = MI.getOperand(2).getReg();
+
+  const KnownBits LhsKnownBits = KB->getKnownBits(LhsReg);
+  const KnownBits LhsLoBits = LhsKnownBits.extractBits(HalfSize, 0);
+
+  const KnownBits RhsKnownBits = KB->getKnownBits(RhsReg);
+  const KnownBits RhsLoBits = RhsKnownBits.extractBits(HalfSize, 0);
+
+  const bool LhsHasLoZeros =
+      LhsLoBits.isConstant() && LhsLoBits.getConstant().isZero();
+  const bool RhsHasLoZeros =
+      RhsLoBits.isConstant() && RhsLoBits.getConstant().isZero();
+
+  if (!LhsHasLoZeros && !RhsHasLoZeros) {
+    return false;
+  }
+
+  const auto Flags = MI.getFlags();
+
+  MatchInfo = [=](MachineIRBuilder &MIRBuilder) {
+    const LLT HalfTy = LLT::scalar(HalfSize);
+
+    const auto LhsSubRegs = MIRBuilder.buildUnmerge(HalfTy, LhsReg);
+    const auto RhsSubRegs = MIRBuilder.buildUnmerge(HalfTy, RhsReg);
+
+    const Register ResHiReg = MRI.createGenericVirtualRegister(HalfTy);
+
+    MIRBuilder.buildAdd(ResHiReg, LhsSubRegs.getReg(1), RhsSubRegs.getReg(1),
+                        Flags);
+
+    if (LhsHasLoZeros) {
+      MIRBuilder.buildMergeLikeInstr(DstReg, {RhsSubRegs.getReg(0), ResHiReg});
+    } else {
+      assert(RhsHasLoZeros);
+      MIRBuilder.buildMergeLikeInstr(DstReg, {LhsSubRegs.getReg(0), ResHiReg});
+    }
+  };
+
+  return true;
+}
+
 bool CombinerHelper::matchSubAddSameReg(MachineInstr &MI,
                                         BuildFnTy &MatchInfo) {
   assert(MI.getOpcode() == TargetOpcode::G_SUB);
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/narrow-add.mir b/llvm/test/CodeGen/AArch64/GlobalISel/narrow-add.mir
new file mode 100644
index 0000000000000..10701da868bb3
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/narrow-add.mir
@@ -0,0 +1,114 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2
+# RUN: llc -mtriple aarch64 -global-isel -run-pass=aarch64-prelegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name:            add_s128_unknown_bits
+body:             |
+  bb.0:
+    liveins: $q0, $q1
+    ; CHECK-LABEL: name: add_s128_unknown_bits
+    ; CHECK: liveins: $q0, $q1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %lhs:_(s128) = COPY $q0
+    ; CHECK-NEXT: %rhs:_(s128) = COPY $q1
+    ; CHECK-NEXT: %res:_(s128) = G_ADD %lhs, %rhs
+    ; CHECK-NEXT: $q0 = COPY %res(s128)
+    %lhs:_(s128) = COPY $q0
+    %rhs:_(s128) = COPY $q1
+    %res:_(s128) = G_ADD %lhs, %rhs
+    $q0 = COPY %res(s128)
+...
+
+---
+name:            add_s64_low32_known_zero_bits
+body:             |
+  bb.0:
+    liveins: $x0, $x1
+    ; CHECK-LABEL: name: add_s64_low32_known_zero_bits
+    ; CHECK: liveins: $x0, $x1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %a:_(s64) = COPY $x0
+    ; CHECK-NEXT: %rhs:_(s64) = COPY $x1
+    ; CHECK-NEXT: %mask:_(s64) = G_CONSTANT i64 -4294967296
+    ; CHECK-NEXT: %lhs:_(s64) = G_AND %a, %mask
+    ; CHECK-NEXT: %res:_(s64) = G_ADD %lhs, %rhs
+    ; CHECK-NEXT: $x0 = COPY %res(s64)
+    %a:_(s64) = COPY $x0
+    %rhs:_(s64) = COPY $x1
+    %mask:_(s64) = G_CONSTANT i64 -4294967296
+    %lhs:_(s64) = G_AND %a, %mask
+    %res:_(s64) = G_ADD %lhs, %rhs
+    $x0 = COPY %res(s64)
+...
+
+---
+name:            add_s128_low64_known_nonzero_bits
+body:             |
+  bb.0:
+    liveins: $q0, $q1
+    ; CHECK-LABEL: name: add_s128_low64_known_nonzero_bits
+    ; CHECK: liveins: $q0, $q1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %a:_(s128) = COPY $q0
+    ; CHECK-NEXT: %rhs:_(s128) = COPY $q1
+    ; CHECK-NEXT: %mask:_(s128) = G_CONSTANT i128 18446744073709551615
+    ; CHECK-NEXT: %lhs:_(s128) = G_OR %a, %mask
+    ; CHECK-NEXT: %res:_(s128) = G_ADD %lhs, %rhs
+    ; CHECK-NEXT: $q0 = COPY %res(s128)
+    %a:_(s128) = COPY $q0
+    %rhs:_(s128) = COPY $q1
+    %mask:_(s128) = G_CONSTANT i128 18446744073709551615
+    %lhs:_(s128) = G_OR %a, %mask
+    %res:_(s128) = G_ADD %lhs, %rhs
+    $q0 = COPY %res(s128)
+...
+
+---
+name:            add_s128_lhs_low64_known_zero_bits
+body:             |
+  bb.0:
+    liveins: $q0, $q1
+    ; CHECK-LABEL: name: add_s128_lhs_low64_known_zero_bits
+    ; CHECK: liveins: $q0, $q1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %a:_(s128) = COPY $q0
+    ; CHECK-NEXT: %rhs:_(s128) = COPY $q1
+    ; CHECK-NEXT: %mask:_(s128) = G_CONSTANT i128 -18446744073709551616
+    ; CHECK-NEXT: %lhs:_(s128) = G_AND %a, %mask
+    ; CHECK-NEXT: [[UV:%[0-9]+]]:_(s64), [[UV1:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES %lhs(s128)
+    ; CHECK-NEXT: [[UV2:%[0-9]+]]:_(s64), [[UV3:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES %rhs(s128)
+    ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s64) = G_ADD [[UV1]], [[UV3]]
+    ; CHECK-NEXT: %res:_(s128) = G_MERGE_VALUES [[UV2]](s64), [[ADD]](s64)
+    ; CHECK-NEXT: $q0 = COPY %res(s128)
+    %a:_(s128) = COPY $q0
+    %rhs:_(s128) = COPY $q1
+    %mask:_(s128) = G_CONSTANT i128 -18446744073709551616
+    %lhs:_(s128) = G_AND %a, %mask
+    %res:_(s128) = G_ADD %lhs, %rhs
+    $q0 = COPY %res(s128)
+...
+
+---
+name:            add_s128_rhs_low64_known_zero_bits
+body:             |
+  bb.0:
+    liveins: $q0, $q1
+    ; CHECK-LABEL: name: add_s128_rhs_low64_known_zero_bits
+    ; CHECK: liveins: $q0, $q1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %lhs:_(s128) = COPY $q0
+    ; CHECK-NEXT: %b:_(s128) = COPY $q1
+    ; CHECK-NEXT: %mask:_(s128) = G_CONSTANT i128 -18446744073709551616
+    ; CHECK-NEXT: %rhs:_(s128) = G_AND %b, %mask
+    ; CHECK-NEXT: [[UV:%[0-9]+]]:_(s64), [[UV1:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES %lhs(s128)
+    ; CHECK-NEXT: [[UV2:%[0-9]+]]:_(s64), [[UV3:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES %rhs(s128)
+    ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s64) = G_ADD [[UV1]], [[UV3]]
+    ; CHECK-NEXT: %res:_(s128) = G_MERGE_VALUES [[UV]](s64), [[ADD]](s64)
+    ; CHECK-NEXT: $q0 = COPY %res(s128)
+    %lhs:_(s128) = COPY $q0
+    %b:_(s128) = COPY $q1
+    %mask:_(s128) = G_CONSTANT i128 -18446744073709551616
+    %rhs:_(s128) = G_AND %b, %mask
+    %res:_(s128) = G_ADD %lhs, %rhs
+    $q0 = COPY %res(s128)
+...

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

davemgreen · 2024-07-31T13:19:14Z

This sounds like an interesting optimization. Why doesn't it happen already after legalizing the add?

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

KRM7 · 2024-08-01T11:13:38Z

This sounds like an interesting optimization. Why doesn't it happen already after legalizing the add?

Where/why would you expect this to already happen?

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

llvm/test/CodeGen/AArch64/GlobalISel/narrow-add.mir

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

davemgreen · 2024-08-01T13:39:59Z

This sounds like an interesting optimization. Why doesn't it happen already after legalizing the add?

Where/why would you expect this to already happen?

If an i128 add, with the bottom 64 bits known to be zero, is split into a i64 addo + i64 adde. Then the i64 addo should be able to see that one of the inputs is zero and optimize away to the other input.

Do we have some missing post-legalizer combines for G_UADDO? (And G_UADDE if the carry is known to be 0).

tschuett · 2024-08-01T14:02:08Z

We have both combines independent of pre- or post-legalization. I outlined a harsh combine for G_UADDE in
#97194 , but I didn't had the bandwidth yet.

topperc · 2024-08-01T14:42:41Z

This sounds like an interesting optimization. Why doesn't it happen already after legalizing the add?

Where/why would you expect this to already happen?

The legalizer will split the ADD into G_UADDO and G_UADDE. A post legalizer combiner should then be able to see that the G_UADDO created for the lower half has an all 0 input. That's how it's handed in SelectionDAG. Maybe such a post legalizer combine is missing for G_UADDO?

tschuett · 2024-08-01T14:47:50Z

This sounds like an interesting optimization. Why doesn't it happen already after legalizing the add?

Where/why would you expect this to already happen?

The legalizer will split the ADD into G_UADDO and G_UADDE. A post legalizer combiner should then be able to see that the G_UADDO created for the lower half has an all 0 input. That's how it's handed in SelectionDAG. Maybe such a post legalizer combine is missing for G_UADDO?

We rely on

llvm-project/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

Line 7215 in 5dfdac7

bool CombinerHelper::matchAddOverflow(MachineInstr &MI, BuildFnTy &MatchInfo) {

topperc · 2024-08-01T15:20:53Z

This sounds like an interesting optimization. Why doesn't it happen already after legalizing the add?

Where/why would you expect this to already happen?

The legalizer will split the ADD into G_UADDO and G_UADDE. A post legalizer combiner should then be able to see that the G_UADDO created for the lower half has an all 0 input. That's how it's handed in SelectionDAG. Maybe such a post legalizer combine is missing for G_UADDO?

We rely on

llvm-project/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

Line 7215 in 5dfdac7

bool CombinerHelper::matchAddOverflow(MachineInstr &MI, BuildFnTy &MatchInfo) {

So we need this change?

diff --git a/llvm/lib/Target/AArch64/AArch64Combine.td b/llvm/lib/Target/AArch64/AArch64Combine.td
index 3f717c8a6005..90cb004dcebd 100644
--- a/llvm/lib/Target/AArch64/AArch64Combine.td
+++ b/llvm/lib/Target/AArch64/AArch64Combine.td
@@ -322,5 +322,5 @@ def AArch64PostLegalizerCombiner
                         select_to_minmax, or_to_bsp, combine_concat_vector,
                         commute_constant_to_rhs,
                         push_freeze_to_prevent_poison_from_propagating,
-                        combine_mul_cmlt]> {
+                        combine_mul_cmlt, match_addos]> {
 }

tschuett · 2024-08-01T15:24:16Z

Nice find!

KRM7 · 2024-08-02T14:04:40Z

This sounds like an interesting optimization. Why doesn't it happen already after legalizing the add?

Where/why would you expect this to already happen?

The legalizer will split the ADD into G_UADDO and G_UADDE. A post legalizer combiner should then be able to see that the G_UADDO created for the lower half has an all 0 input. That's how it's handed in SelectionDAG. Maybe such a post legalizer combine is missing for G_UADDO?

We rely on

llvm-project/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

Line 7215 in 5dfdac7

bool CombinerHelper::matchAddOverflow(MachineInstr &MI, BuildFnTy &MatchInfo) {

So we need this change?
diff --git a/llvm/lib/Target/AArch64/AArch64Combine.td b/llvm/lib/Target/AArch64/AArch64Combine.td
index 3f717c8a6005..90cb004dcebd 100644
--- a/llvm/lib/Target/AArch64/AArch64Combine.td
+++ b/llvm/lib/Target/AArch64/AArch64Combine.td
@@ -322,5 +322,5 @@ def AArch64PostLegalizerCombiner
                         select_to_minmax, or_to_bsp, combine_concat_vector,
                         commute_constant_to_rhs,
                         push_freeze_to_prevent_poison_from_propagating,
-                        combine_mul_cmlt]> {
+                        combine_mul_cmlt, match_addos]> {
 }

Yep, adding this was enough.

arsenm · 2024-08-16T12:35:22Z

llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-combiner-fold-addo.mir

+    %lhs:_(s128) = G_AND %a, %mask
+    %res:_(s128) = G_ADD %lhs, %rhs
+    $q0 = COPY %res(s128)
+...


Can you include an end to end IR test that shows an improvement?

See #85961 and related PR.

llvmbot added backend:AArch64 llvm:globalisel labels Jul 31, 2024

spaits requested review from Pierre-vh, jayfoad and arsenm July 31, 2024 12:47

tschuett reviewed Jul 31, 2024

View reviewed changes