Skip to content

[GlobalIsel][AArch64] Add ADDO combine to the postlegalizer combiner #101327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

KRM7
Copy link

@KRM7 KRM7 commented Jul 31, 2024

No description provided.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Jul 31, 2024

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-globalisel

Author: None (KRM7)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/101327.diff

4 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h (+2)
  • (modified) llvm/include/llvm/Target/GlobalISel/Combine.td (+9-1)
  • (modified) llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp (+63)
  • (added) llvm/test/CodeGen/AArch64/GlobalISel/narrow-add.mir (+114)
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index 05d7e882f5135..f4abdc2dc22ea 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -714,6 +714,8 @@ class CombinerHelper {
   /// (G_*SUBE x, y, 0) -> (G_*SUBO x, y)
   bool matchAddEToAddO(MachineInstr &MI, BuildFnTy &MatchInfo);
 
+  bool matchAddWithKnownZeroLowerHalfBits(MachineInstr &MI, BuildFnTy &MatchInfo);
+
   /// Transform (fadd x, fneg(y)) -> (fsub x, y)
   ///           (fadd fneg(x), y) -> (fsub y, x)
   ///           (fsub x, fneg(y)) -> (fadd x, y)
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 2246e20ecc1dc..b12a36e48f94b 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -1142,6 +1142,13 @@ def adde_to_addo: GICombineRule<
          [{ return Helper.matchAddEToAddO(*${root}, ${matchinfo}); }]),
   (apply [{ Helper.applyBuildFnNoErase(*${root}, ${matchinfo}); }])>;
 
+def narrow_add_to_half: GICombineRule<
+  (defs root:$root, build_fn_matchinfo:$matchinfo),
+  (match (wip_match_opcode G_ADD):$root,
+    [{ return Helper.matchAddWithKnownZeroLowerHalfBits(*${root}, ${matchinfo}); }]),
+  (apply [{ Helper.applyBuildFn(*${root}, ${matchinfo}); }])
+>;
+
 def mulh_to_lshr : GICombineRule<
   (defs root:$root),
   (match (wip_match_opcode G_UMULH):$root,
@@ -1829,7 +1836,8 @@ def known_bits_simplifications : GICombineGroup<[
   sext_inreg_to_zext_inreg]>;
 
 def width_reduction_combines : GICombineGroup<[reduce_shl_of_extend,
-                                               narrow_binop_feeding_and]>;
+                                               narrow_binop_feeding_and,
+                                               narrow_add_to_half]>;
 
 def phi_combines : GICombineGroup<[extend_through_phis]>;
 
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index d930ab2984629..afa202a0a8e4c 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -5099,6 +5099,69 @@ bool CombinerHelper::matchAddEToAddO(MachineInstr &MI, BuildFnTy &MatchInfo) {
   return true;
 }
 
+bool CombinerHelper::matchAddWithKnownZeroLowerHalfBits(MachineInstr &MI,
+                                                        BuildFnTy &MatchInfo) {
+  assert(MI.getOpcode() == TargetOpcode::G_ADD);
+
+  const Register DstReg = MI.getOperand(0).getReg();
+  const LLT DstTy = MRI.getType(DstReg);
+
+  if (!DstTy.isScalar()) {
+    return false;
+  }
+
+  const std::uint64_t FullSize = DstTy.getSizeInBits();
+  const std::uint64_t HalfSize = (FullSize + 1) / 2;
+
+  MachineFunction &MF = *MI.getMF();
+  const DataLayout &DL = MF.getDataLayout();
+
+  if (DL.isLegalInteger(FullSize) || !DL.isLegalInteger(HalfSize)) {
+    return false;
+  }
+
+  const Register LhsReg = MI.getOperand(1).getReg();
+  const Register RhsReg = MI.getOperand(2).getReg();
+
+  const KnownBits LhsKnownBits = KB->getKnownBits(LhsReg);
+  const KnownBits LhsLoBits = LhsKnownBits.extractBits(HalfSize, 0);
+
+  const KnownBits RhsKnownBits = KB->getKnownBits(RhsReg);
+  const KnownBits RhsLoBits = RhsKnownBits.extractBits(HalfSize, 0);
+
+  const bool LhsHasLoZeros =
+      LhsLoBits.isConstant() && LhsLoBits.getConstant().isZero();
+  const bool RhsHasLoZeros =
+      RhsLoBits.isConstant() && RhsLoBits.getConstant().isZero();
+
+  if (!LhsHasLoZeros && !RhsHasLoZeros) {
+    return false;
+  }
+
+  const auto Flags = MI.getFlags();
+
+  MatchInfo = [=](MachineIRBuilder &MIRBuilder) {
+    const LLT HalfTy = LLT::scalar(HalfSize);
+
+    const auto LhsSubRegs = MIRBuilder.buildUnmerge(HalfTy, LhsReg);
+    const auto RhsSubRegs = MIRBuilder.buildUnmerge(HalfTy, RhsReg);
+
+    const Register ResHiReg = MRI.createGenericVirtualRegister(HalfTy);
+
+    MIRBuilder.buildAdd(ResHiReg, LhsSubRegs.getReg(1), RhsSubRegs.getReg(1),
+                        Flags);
+
+    if (LhsHasLoZeros) {
+      MIRBuilder.buildMergeLikeInstr(DstReg, {RhsSubRegs.getReg(0), ResHiReg});
+    } else {
+      assert(RhsHasLoZeros);
+      MIRBuilder.buildMergeLikeInstr(DstReg, {LhsSubRegs.getReg(0), ResHiReg});
+    }
+  };
+
+  return true;
+}
+
 bool CombinerHelper::matchSubAddSameReg(MachineInstr &MI,
                                         BuildFnTy &MatchInfo) {
   assert(MI.getOpcode() == TargetOpcode::G_SUB);
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/narrow-add.mir b/llvm/test/CodeGen/AArch64/GlobalISel/narrow-add.mir
new file mode 100644
index 0000000000000..10701da868bb3
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/narrow-add.mir
@@ -0,0 +1,114 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2
+# RUN: llc -mtriple aarch64 -global-isel -run-pass=aarch64-prelegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name:            add_s128_unknown_bits
+body:             |
+  bb.0:
+    liveins: $q0, $q1
+    ; CHECK-LABEL: name: add_s128_unknown_bits
+    ; CHECK: liveins: $q0, $q1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %lhs:_(s128) = COPY $q0
+    ; CHECK-NEXT: %rhs:_(s128) = COPY $q1
+    ; CHECK-NEXT: %res:_(s128) = G_ADD %lhs, %rhs
+    ; CHECK-NEXT: $q0 = COPY %res(s128)
+    %lhs:_(s128) = COPY $q0
+    %rhs:_(s128) = COPY $q1
+    %res:_(s128) = G_ADD %lhs, %rhs
+    $q0 = COPY %res(s128)
+...
+
+---
+name:            add_s64_low32_known_zero_bits
+body:             |
+  bb.0:
+    liveins: $x0, $x1
+    ; CHECK-LABEL: name: add_s64_low32_known_zero_bits
+    ; CHECK: liveins: $x0, $x1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %a:_(s64) = COPY $x0
+    ; CHECK-NEXT: %rhs:_(s64) = COPY $x1
+    ; CHECK-NEXT: %mask:_(s64) = G_CONSTANT i64 -4294967296
+    ; CHECK-NEXT: %lhs:_(s64) = G_AND %a, %mask
+    ; CHECK-NEXT: %res:_(s64) = G_ADD %lhs, %rhs
+    ; CHECK-NEXT: $x0 = COPY %res(s64)
+    %a:_(s64) = COPY $x0
+    %rhs:_(s64) = COPY $x1
+    %mask:_(s64) = G_CONSTANT i64 -4294967296
+    %lhs:_(s64) = G_AND %a, %mask
+    %res:_(s64) = G_ADD %lhs, %rhs
+    $x0 = COPY %res(s64)
+...
+
+---
+name:            add_s128_low64_known_nonzero_bits
+body:             |
+  bb.0:
+    liveins: $q0, $q1
+    ; CHECK-LABEL: name: add_s128_low64_known_nonzero_bits
+    ; CHECK: liveins: $q0, $q1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %a:_(s128) = COPY $q0
+    ; CHECK-NEXT: %rhs:_(s128) = COPY $q1
+    ; CHECK-NEXT: %mask:_(s128) = G_CONSTANT i128 18446744073709551615
+    ; CHECK-NEXT: %lhs:_(s128) = G_OR %a, %mask
+    ; CHECK-NEXT: %res:_(s128) = G_ADD %lhs, %rhs
+    ; CHECK-NEXT: $q0 = COPY %res(s128)
+    %a:_(s128) = COPY $q0
+    %rhs:_(s128) = COPY $q1
+    %mask:_(s128) = G_CONSTANT i128 18446744073709551615
+    %lhs:_(s128) = G_OR %a, %mask
+    %res:_(s128) = G_ADD %lhs, %rhs
+    $q0 = COPY %res(s128)
+...
+
+---
+name:            add_s128_lhs_low64_known_zero_bits
+body:             |
+  bb.0:
+    liveins: $q0, $q1
+    ; CHECK-LABEL: name: add_s128_lhs_low64_known_zero_bits
+    ; CHECK: liveins: $q0, $q1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %a:_(s128) = COPY $q0
+    ; CHECK-NEXT: %rhs:_(s128) = COPY $q1
+    ; CHECK-NEXT: %mask:_(s128) = G_CONSTANT i128 -18446744073709551616
+    ; CHECK-NEXT: %lhs:_(s128) = G_AND %a, %mask
+    ; CHECK-NEXT: [[UV:%[0-9]+]]:_(s64), [[UV1:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES %lhs(s128)
+    ; CHECK-NEXT: [[UV2:%[0-9]+]]:_(s64), [[UV3:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES %rhs(s128)
+    ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s64) = G_ADD [[UV1]], [[UV3]]
+    ; CHECK-NEXT: %res:_(s128) = G_MERGE_VALUES [[UV2]](s64), [[ADD]](s64)
+    ; CHECK-NEXT: $q0 = COPY %res(s128)
+    %a:_(s128) = COPY $q0
+    %rhs:_(s128) = COPY $q1
+    %mask:_(s128) = G_CONSTANT i128 -18446744073709551616
+    %lhs:_(s128) = G_AND %a, %mask
+    %res:_(s128) = G_ADD %lhs, %rhs
+    $q0 = COPY %res(s128)
+...
+
+---
+name:            add_s128_rhs_low64_known_zero_bits
+body:             |
+  bb.0:
+    liveins: $q0, $q1
+    ; CHECK-LABEL: name: add_s128_rhs_low64_known_zero_bits
+    ; CHECK: liveins: $q0, $q1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %lhs:_(s128) = COPY $q0
+    ; CHECK-NEXT: %b:_(s128) = COPY $q1
+    ; CHECK-NEXT: %mask:_(s128) = G_CONSTANT i128 -18446744073709551616
+    ; CHECK-NEXT: %rhs:_(s128) = G_AND %b, %mask
+    ; CHECK-NEXT: [[UV:%[0-9]+]]:_(s64), [[UV1:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES %lhs(s128)
+    ; CHECK-NEXT: [[UV2:%[0-9]+]]:_(s64), [[UV3:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES %rhs(s128)
+    ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s64) = G_ADD [[UV1]], [[UV3]]
+    ; CHECK-NEXT: %res:_(s128) = G_MERGE_VALUES [[UV]](s64), [[ADD]](s64)
+    ; CHECK-NEXT: $q0 = COPY %res(s128)
+    %lhs:_(s128) = COPY $q0
+    %b:_(s128) = COPY $q1
+    %mask:_(s128) = G_CONSTANT i128 -18446744073709551616
+    %rhs:_(s128) = G_AND %b, %mask
+    %res:_(s128) = G_ADD %lhs, %rhs
+    $q0 = COPY %res(s128)
+...

@spaits spaits requested review from Pierre-vh, jayfoad and arsenm July 31, 2024 12:47
@davemgreen
Copy link
Collaborator

This sounds like an interesting optimization. Why doesn't it happen already after legalizing the add?

@KRM7 KRM7 force-pushed the narrow-add-to-half branch from edfabea to f48b0df Compare August 1, 2024 11:02
@KRM7
Copy link
Author

KRM7 commented Aug 1, 2024

This sounds like an interesting optimization. Why doesn't it happen already after legalizing the add?

Where/why would you expect this to already happen?

@KRM7 KRM7 requested review from arsenm and tschuett August 1, 2024 12:16
@davemgreen
Copy link
Collaborator

This sounds like an interesting optimization. Why doesn't it happen already after legalizing the add?

Where/why would you expect this to already happen?

If an i128 add, with the bottom 64 bits known to be zero, is split into a i64 addo + i64 adde. Then the i64 addo should be able to see that one of the inputs is zero and optimize away to the other input.

Do we have some missing post-legalizer combines for G_UADDO? (And G_UADDE if the carry is known to be 0).

@tschuett
Copy link

tschuett commented Aug 1, 2024

We have both combines independent of pre- or post-legalization. I outlined a harsh combine for G_UADDE in
#97194 , but I didn't had the bandwidth yet.

@topperc
Copy link
Collaborator

topperc commented Aug 1, 2024

This sounds like an interesting optimization. Why doesn't it happen already after legalizing the add?

Where/why would you expect this to already happen?

The legalizer will split the ADD into G_UADDO and G_UADDE. A post legalizer combiner should then be able to see that the G_UADDO created for the lower half has an all 0 input. That's how it's handed in SelectionDAG. Maybe such a post legalizer combine is missing for G_UADDO?

@tschuett
Copy link

tschuett commented Aug 1, 2024

This sounds like an interesting optimization. Why doesn't it happen already after legalizing the add?

Where/why would you expect this to already happen?

The legalizer will split the ADD into G_UADDO and G_UADDE. A post legalizer combiner should then be able to see that the G_UADDO created for the lower half has an all 0 input. That's how it's handed in SelectionDAG. Maybe such a post legalizer combine is missing for G_UADDO?

We rely on

bool CombinerHelper::matchAddOverflow(MachineInstr &MI, BuildFnTy &MatchInfo) {

@KRM7 KRM7 force-pushed the narrow-add-to-half branch from f48b0df to ba6a80d Compare August 1, 2024 15:14
@topperc
Copy link
Collaborator

topperc commented Aug 1, 2024

This sounds like an interesting optimization. Why doesn't it happen already after legalizing the add?

Where/why would you expect this to already happen?

The legalizer will split the ADD into G_UADDO and G_UADDE. A post legalizer combiner should then be able to see that the G_UADDO created for the lower half has an all 0 input. That's how it's handed in SelectionDAG. Maybe such a post legalizer combine is missing for G_UADDO?

We rely on

bool CombinerHelper::matchAddOverflow(MachineInstr &MI, BuildFnTy &MatchInfo) {

So we need this change?

diff --git a/llvm/lib/Target/AArch64/AArch64Combine.td b/llvm/lib/Target/AArch64/AArch64Combine.td
index 3f717c8a6005..90cb004dcebd 100644
--- a/llvm/lib/Target/AArch64/AArch64Combine.td
+++ b/llvm/lib/Target/AArch64/AArch64Combine.td
@@ -322,5 +322,5 @@ def AArch64PostLegalizerCombiner
                         select_to_minmax, or_to_bsp, combine_concat_vector,
                         commute_constant_to_rhs,
                         push_freeze_to_prevent_poison_from_propagating,
-                        combine_mul_cmlt]> {
+                        combine_mul_cmlt, match_addos]> {
 }

@tschuett
Copy link

tschuett commented Aug 1, 2024

Nice find!

@KRM7 KRM7 force-pushed the narrow-add-to-half branch from ba6a80d to 13d62d7 Compare August 2, 2024 14:02
@KRM7 KRM7 changed the title [GlobalIsel][AArch64] Replace N bit G_ADD with N/2 bit G_ADD if the lower bits are known to be zeros [GlobalIsel][AArch64] Add ADDO combine to the postlegalizer combiner Aug 2, 2024
@KRM7
Copy link
Author

KRM7 commented Aug 2, 2024

This sounds like an interesting optimization. Why doesn't it happen already after legalizing the add?

Where/why would you expect this to already happen?

The legalizer will split the ADD into G_UADDO and G_UADDE. A post legalizer combiner should then be able to see that the G_UADDO created for the lower half has an all 0 input. That's how it's handed in SelectionDAG. Maybe such a post legalizer combine is missing for G_UADDO?

We rely on

bool CombinerHelper::matchAddOverflow(MachineInstr &MI, BuildFnTy &MatchInfo) {

So we need this change?

diff --git a/llvm/lib/Target/AArch64/AArch64Combine.td b/llvm/lib/Target/AArch64/AArch64Combine.td
index 3f717c8a6005..90cb004dcebd 100644
--- a/llvm/lib/Target/AArch64/AArch64Combine.td
+++ b/llvm/lib/Target/AArch64/AArch64Combine.td
@@ -322,5 +322,5 @@ def AArch64PostLegalizerCombiner
                         select_to_minmax, or_to_bsp, combine_concat_vector,
                         commute_constant_to_rhs,
                         push_freeze_to_prevent_poison_from_propagating,
-                        combine_mul_cmlt]> {
+                        combine_mul_cmlt, match_addos]> {
 }

Yep, adding this was enough.

@KRM7 KRM7 force-pushed the narrow-add-to-half branch from 13d62d7 to 82c4169 Compare August 3, 2024 10:11
@KRM7 KRM7 requested a review from arsenm August 5, 2024 07:33
%lhs:_(s128) = G_AND %a, %mask
%res:_(s128) = G_ADD %lhs, %rhs
$q0 = COPY %res(s128)
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include an end to end IR test that shows an improvement?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #85961 and related PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants