Skip to content

[GISel][CombinerHelper] Combine op(trunc(x), trunc(y)) -> trunc(op(x, y)) #89023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3158,6 +3158,22 @@ bool CombinerHelper::matchHoistLogicOpWithSameOpcodeHands(
// Match: logic (ext X), (ext Y) --> ext (logic X, Y)
break;
}
case TargetOpcode::G_TRUNC: {
// Match: logic (trunc X), (trunc Y) -> trunc (logic X, Y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for the DAG combine treats the extend and truncate cases a bit differently. The truncate case considers whether the extend and truncate are free, and doesn't do this if they are. Arguably this could be handled by the target not adding the combine, but I don't think we have a way to add combines for specific types right now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to do this using TLI. Please have a look.

const MachineFunction *MF = MI.getMF();
const DataLayout &DL = MF->getDataLayout();
LLVMContext &Ctx = MF->getFunction().getContext();

LLT DstTy = MRI.getType(Dst);
const TargetLowering &TLI = getTargetLowering();

// Be extra careful sinking truncate. If it's free, there's no benefit in
// widening a binop.
if (TLI.isZExtFree(DstTy, XTy, DL, Ctx) &&
TLI.isTruncateFree(XTy, DstTy, DL, Ctx))
return false;
break;
}
case TargetOpcode::G_AND:
case TargetOpcode::G_ASHR:
case TargetOpcode::G_LSHR:
Expand Down
315 changes: 315 additions & 0 deletions llvm/test/CodeGen/AArch64/GlobalISel/combine-op-trunc.mir
Original file line number Diff line number Diff line change
@@ -0,0 +1,315 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -o - -mtriple=aarch64-unknown-unknown -run-pass=aarch64-prelegalizer-combiner -verify-machineinstrs %s | FileCheck %s

# Truncs with a single use get folded.

# and(trunc(x), trunc(y)) -> trunc(and(x, y))
---
name: and_trunc
body: |
bb.0:
liveins: $w0, $w1
; CHECK-LABEL: name: and_trunc
; CHECK: liveins: $w0, $w1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1
; CHECK-NEXT: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY]], [[COPY1]]
; CHECK-NEXT: $w0 = COPY [[AND]](s32)
%0:_(s32) = COPY $w0
%1:_(s32) = COPY $w1
%2:_(s16) = G_TRUNC %0
%3:_(s16) = G_TRUNC %1
%4:_(s16) = G_AND %2, %3
%5:_(s32) = G_ANYEXT %4
$w0 = COPY %5
...
---
name: and_trunc_vector
body: |
bb.0:
liveins: $q0, $q1
; CHECK-LABEL: name: and_trunc_vector
; CHECK: liveins: $q0, $q1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(<4 x s32>) = COPY $q0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(<4 x s32>) = COPY $q1
; CHECK-NEXT: [[AND:%[0-9]+]]:_(<4 x s32>) = G_AND [[COPY]], [[COPY1]]
; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(<4 x s16>) = G_TRUNC [[AND]](<4 x s32>)
; CHECK-NEXT: $x0 = COPY [[TRUNC]](<4 x s16>)
%0:_(<4 x s32>) = COPY $q0
%1:_(<4 x s32>) = COPY $q1
%2:_(<4 x s16>) = G_TRUNC %0
%3:_(<4 x s16>) = G_TRUNC %1
%4:_(<4 x s16>) = G_AND %2, %3
$x0 = COPY %4
...

# or(trunc(x), trunc(y)) -> trunc(or(x, y))
---
name: or_trunc
body: |
bb.0:
liveins: $w0, $w1
; CHECK-LABEL: name: or_trunc
; CHECK: liveins: $w0, $w1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1
; CHECK-NEXT: [[OR:%[0-9]+]]:_(s32) = G_OR [[COPY]], [[COPY1]]
; CHECK-NEXT: $w0 = COPY [[OR]](s32)
%0:_(s32) = COPY $w0
%1:_(s32) = COPY $w1
%2:_(s16) = G_TRUNC %0
%3:_(s16) = G_TRUNC %1
%4:_(s16) = G_OR %2, %3
%5:_(s32) = G_ANYEXT %4
$w0 = COPY %5
...
---
name: or_trunc_vector
body: |
bb.0:
liveins: $q0, $q1
; CHECK-LABEL: name: or_trunc_vector
; CHECK: liveins: $q0, $q1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(<4 x s32>) = COPY $q0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(<4 x s32>) = COPY $q1
; CHECK-NEXT: [[OR:%[0-9]+]]:_(<4 x s32>) = G_OR [[COPY]], [[COPY1]]
; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(<4 x s16>) = G_TRUNC [[OR]](<4 x s32>)
; CHECK-NEXT: $x0 = COPY [[TRUNC]](<4 x s16>)
%0:_(<4 x s32>) = COPY $q0
%1:_(<4 x s32>) = COPY $q1
%2:_(<4 x s16>) = G_TRUNC %0
%3:_(<4 x s16>) = G_TRUNC %1
%4:_(<4 x s16>) = G_OR %2, %3
$x0 = COPY %4
...

# xor(trunc(x), trunc(y)) -> trunc(xor(x, y))
---
name: xor_trunc
body: |
bb.0:
liveins: $w0, $w1
; CHECK-LABEL: name: xor_trunc
; CHECK: liveins: $w0, $w1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1
; CHECK-NEXT: [[XOR:%[0-9]+]]:_(s32) = G_XOR [[COPY]], [[COPY1]]
; CHECK-NEXT: $w0 = COPY [[XOR]](s32)
%0:_(s32) = COPY $w0
%1:_(s32) = COPY $w1
%2:_(s16) = G_TRUNC %0
%3:_(s16) = G_TRUNC %1
%4:_(s16) = G_XOR %2, %3
%5:_(s32) = G_ANYEXT %4
$w0 = COPY %5
...
---
name: xor_trunc_vector
body: |
bb.0:
liveins: $q0, $q1
; CHECK-LABEL: name: xor_trunc_vector
; CHECK: liveins: $q0, $q1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(<4 x s32>) = COPY $q0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(<4 x s32>) = COPY $q1
; CHECK-NEXT: [[XOR:%[0-9]+]]:_(<4 x s32>) = G_XOR [[COPY]], [[COPY1]]
; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(<4 x s16>) = G_TRUNC [[XOR]](<4 x s32>)
; CHECK-NEXT: $x0 = COPY [[TRUNC]](<4 x s16>)
%0:_(<4 x s32>) = COPY $q0
%1:_(<4 x s32>) = COPY $q1
%2:_(<4 x s16>) = G_TRUNC %0
%3:_(<4 x s16>) = G_TRUNC %1
%4:_(<4 x s16>) = G_XOR %2, %3
$x0 = COPY %4
...

# Truncs with multiple uses do not get folded.
---
name: or_trunc_multiuse_1
body: |
bb.0:
liveins: $w0, $w1, $x2
; CHECK-LABEL: name: or_trunc_multiuse_1
; CHECK: liveins: $w0, $w1, $x2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1
; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(p0) = COPY $x2
; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
; CHECK-NEXT: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32)
; CHECK-NEXT: G_STORE [[TRUNC]](s16), [[COPY2]](p0) :: (store (s16))
; CHECK-NEXT: [[OR:%[0-9]+]]:_(s16) = G_OR [[TRUNC]], [[TRUNC1]]
; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[OR]](s16)
; CHECK-NEXT: $w0 = COPY [[ANYEXT]](s32)
%0:_(s32) = COPY $w0
%1:_(s32) = COPY $w1
%5:_(p0) = COPY $x2
%2:_(s16) = G_TRUNC %0
%3:_(s16) = G_TRUNC %1
G_STORE %2, %5 :: (store (s16))
%4:_(s16) = G_OR %2, %3
%6:_(s32) = G_ANYEXT %4
$w0 = COPY %6
...
---
name: and_trunc_multiuse_2
body: |
bb.0:
liveins: $w0, $w1, $x2
; CHECK-LABEL: name: and_trunc_multiuse_2
; CHECK: liveins: $w0, $w1, $x2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1
; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(p0) = COPY $x2
; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
; CHECK-NEXT: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32)
; CHECK-NEXT: G_STORE [[TRUNC]](s16), [[COPY2]](p0) :: (store (s16))
; CHECK-NEXT: [[AND:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[TRUNC1]]
; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[AND]](s16)
; CHECK-NEXT: $w0 = COPY [[ANYEXT]](s32)
%0:_(s32) = COPY $w0
%1:_(s32) = COPY $w1
%5:_(p0) = COPY $x2
%2:_(s16) = G_TRUNC %0
%3:_(s16) = G_TRUNC %1
G_STORE %2, %5 :: (store (s16))
%4:_(s16) = G_AND %2, %3
%6:_(s32) = G_ANYEXT %4
$w0 = COPY %6
...
---
name: xor_trunc_vector_multiuse
body: |
bb.0:
liveins: $w0, $w1, $x2
; CHECK-LABEL: name: xor_trunc_vector_multiuse
; CHECK: liveins: $w0, $w1, $x2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(<4 x s32>) = COPY $q0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(<4 x s32>) = COPY $q1
; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(p0) = COPY $x2
; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(<4 x s16>) = G_TRUNC [[COPY]](<4 x s32>)
; CHECK-NEXT: [[TRUNC1:%[0-9]+]]:_(<4 x s16>) = G_TRUNC [[COPY1]](<4 x s32>)
; CHECK-NEXT: G_STORE [[TRUNC]](<4 x s16>), [[COPY2]](p0) :: (store (<4 x s16>))
; CHECK-NEXT: [[XOR:%[0-9]+]]:_(<4 x s16>) = G_XOR [[TRUNC]], [[TRUNC1]]
; CHECK-NEXT: $x0 = COPY [[XOR]](<4 x s16>)
%0:_(<4 x s32>) = COPY $q0
%1:_(<4 x s32>) = COPY $q1
%5:_(p0) = COPY $x2
%2:_(<4 x s16>) = G_TRUNC %0
%3:_(<4 x s16>) = G_TRUNC %1
G_STORE %2, %5 :: (store (<4 x s16>))
%4:_(<4 x s16>) = G_XOR %2, %3
$x0 = COPY %4
...

# Freezes should get pushed through truncs.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These folds will trigger once #90618 lands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done now.


# This optimizes the pattern where `select(cond, T, 0)` gets converted to
# `and(cond, freeze(T))`.

# and(freeze(trunc(x)), trunc(y)) -> trunc(and(freeze(x), y))
---
name: and_trunc_freeze
body: |
bb.0:
liveins: $w0, $w1
; CHECK-LABEL: name: and_trunc_freeze
; CHECK: liveins: $w0, $w1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1
; CHECK-NEXT: [[FREEZE:%[0-9]+]]:_(s32) = G_FREEZE [[COPY]]
; CHECK-NEXT: [[AND:%[0-9]+]]:_(s32) = G_AND [[FREEZE]], [[COPY1]]
; CHECK-NEXT: $w0 = COPY [[AND]](s32)
%0:_(s32) = COPY $w0
%1:_(s32) = COPY $w1
%2:_(s16) = G_TRUNC %0
%3:_(s16) = G_TRUNC %1
%6:_(s16) = G_FREEZE %2
%4:_(s16) = G_AND %6, %3
%5:_(s32) = G_ANYEXT %4
$w0 = COPY %5
...

# and(freeze(trunc(x)), freeze(trunc(y))) -> trunc(and(freeze(x), freeze(y)))
---
name: and_trunc_freeze_both
body: |
bb.0:
liveins: $w0, $w1
; CHECK-LABEL: name: and_trunc_freeze_both
; CHECK: liveins: $w0, $w1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1
; CHECK-NEXT: [[FREEZE:%[0-9]+]]:_(s32) = G_FREEZE [[COPY]]
; CHECK-NEXT: [[FREEZE1:%[0-9]+]]:_(s32) = G_FREEZE [[COPY1]]
; CHECK-NEXT: [[AND:%[0-9]+]]:_(s32) = G_AND [[FREEZE]], [[FREEZE1]]
; CHECK-NEXT: $w0 = COPY [[AND]](s32)
%0:_(s32) = COPY $w0
%1:_(s32) = COPY $w1
%2:_(s16) = G_TRUNC %0
%3:_(s16) = G_TRUNC %1
%6:_(s16) = G_FREEZE %2
%7:_(s16) = G_FREEZE %3
%4:_(s16) = G_AND %6, %7
%5:_(s32) = G_ANYEXT %4
$w0 = COPY %5
...

# The freeze fold is less important for G_OR and G_XOR, however it can still
# trigger.
---
name: or_trunc_freeze
body: |
bb.0:
liveins: $w0, $w1
; CHECK-LABEL: name: or_trunc_freeze
; CHECK: liveins: $w0, $w1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1
; CHECK-NEXT: [[FREEZE:%[0-9]+]]:_(s32) = G_FREEZE [[COPY]]
; CHECK-NEXT: [[OR:%[0-9]+]]:_(s32) = G_OR [[FREEZE]], [[COPY1]]
; CHECK-NEXT: $w0 = COPY [[OR]](s32)
%0:_(s32) = COPY $w0
%1:_(s32) = COPY $w1
%2:_(s16) = G_TRUNC %0
%3:_(s16) = G_TRUNC %1
%6:_(s16) = G_FREEZE %2
%4:_(s16) = G_OR %6, %3
%5:_(s32) = G_ANYEXT %4
$w0 = COPY %5
...
---
name: xor_trunc_freeze_both
body: |
bb.0:
liveins: $w0, $w1
; CHECK-LABEL: name: xor_trunc_freeze_both
; CHECK: liveins: $w0, $w1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1
; CHECK-NEXT: [[FREEZE:%[0-9]+]]:_(s32) = G_FREEZE [[COPY]]
; CHECK-NEXT: [[FREEZE1:%[0-9]+]]:_(s32) = G_FREEZE [[COPY1]]
; CHECK-NEXT: [[XOR:%[0-9]+]]:_(s32) = G_XOR [[FREEZE]], [[FREEZE1]]
; CHECK-NEXT: $w0 = COPY [[XOR]](s32)
%0:_(s32) = COPY $w0
%1:_(s32) = COPY $w1
%2:_(s16) = G_TRUNC %0
%3:_(s16) = G_TRUNC %1
%6:_(s16) = G_FREEZE %2
%7:_(s16) = G_FREEZE %3
%4:_(s16) = G_XOR %6, %7
%5:_(s32) = G_ANYEXT %4
$w0 = COPY %5
...
Loading
Loading