[X86] Improve KnownBits for X86ISD::PSADBW nodes #83830

RKSimon · 2024-03-04T11:42:23Z

Don't just return the known zero upperbits, compute the absdiff Knownbits and perform the horizontal sum.

Add implementations that handle both the X86ISD::PSADBW nodes and the INTRINSIC_WO_CHAIN intrinsics (pre-legalization).

llvmbot · 2024-03-04T11:42:41Z

@llvm/pr-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

Changes

Don't just return the known zero upperbits, compute the absdiff Knownbits and perform the horizontal sum

Full diff: https://github.com/llvm/llvm-project/pull/83830.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+21-4)
(modified) llvm/test/CodeGen/X86/psadbw.ll (+1-4)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index b87e3121838dcc..5076ac5e347e9f 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -36836,12 +36836,23 @@ void X86TargetLowering::computeKnownBitsForTargetNode(const SDValue Op,
     break;
   }
   case X86ISD::PSADBW: {
+    SDValue LHS = Op.getOperand(0);
+    SDValue RHS = Op.getOperand(1);
     assert(VT.getScalarType() == MVT::i64 &&
-           Op.getOperand(0).getValueType().getScalarType() == MVT::i8 &&
+           LHS.getValueType() == RHS.getValueType() &&
+           LHS.getValueType().getScalarType() == MVT::i8 &&
            "Unexpected PSADBW types");
 
-    // PSADBW - fills low 16 bits and zeros upper 48 bits of each i64 result.
-    Known.Zero.setBitsFrom(16);
+    KnownBits Known2;
+    unsigned NumSrcElts = LHS.getValueType().getVectorNumElements();
+    APInt DemandedSrcElts = APIntOps::ScaleBitMask(DemandedElts, NumSrcElts);
+    Known = DAG.computeKnownBits(RHS, DemandedSrcElts, Depth + 1);
+    Known2 = DAG.computeKnownBits(LHS, DemandedSrcElts, Depth + 1);
+    Known = KnownBits::absdiff(Known, Known2).zext(16);
+    Known = KnownBits::computeForAddSub(true, true, Known, Known);
+    Known = KnownBits::computeForAddSub(true, true, Known, Known);
+    Known = KnownBits::computeForAddSub(true, true, Known, Known);
+    Known = Known.zext(64);
     break;
   }
   case X86ISD::PCMPGT:
@@ -54853,6 +54864,7 @@ static SDValue combineSub(SDNode *N, SelectionDAG &DAG,
 }
 
 static SDValue combineVectorCompare(SDNode *N, SelectionDAG &DAG,
+                                    TargetLowering::DAGCombinerInfo &DCI,
                                     const X86Subtarget &Subtarget) {
   MVT VT = N->getSimpleValueType(0);
   SDLoc DL(N);
@@ -54864,6 +54876,11 @@ static SDValue combineVectorCompare(SDNode *N, SelectionDAG &DAG,
       return DAG.getConstant(0, DL, VT);
   }
 
+  const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+  if (TLI.SimplifyDemandedBits(
+          SDValue(N, 0), APInt::getAllOnes(VT.getScalarSizeInBits()), DCI))
+    return SDValue(N, 0);
+
   return SDValue();
 }
 
@@ -56587,7 +56604,7 @@ SDValue X86TargetLowering::PerformDAGCombine(SDNode *N,
   case ISD::MGATHER:
   case ISD::MSCATTER:       return combineGatherScatter(N, DAG, DCI);
   case X86ISD::PCMPEQ:
-  case X86ISD::PCMPGT:      return combineVectorCompare(N, DAG, Subtarget);
+  case X86ISD::PCMPGT:      return combineVectorCompare(N, DAG, DCI, Subtarget);
   case X86ISD::PMULDQ:
   case X86ISD::PMULUDQ:     return combinePMULDQ(N, DAG, DCI, Subtarget);
   case X86ISD::VPMADDUBSW:
diff --git a/llvm/test/CodeGen/X86/psadbw.ll b/llvm/test/CodeGen/X86/psadbw.ll
index 8141b22d321f4d..8044472b13e3a8 100644
--- a/llvm/test/CodeGen/X86/psadbw.ll
+++ b/llvm/test/CodeGen/X86/psadbw.ll
@@ -70,10 +70,7 @@ define <2 x i64> @combine_psadbw_cmp_knownbits(<16 x i8> %a0) nounwind {
 ;
 ; AVX2-LABEL: combine_psadbw_cmp_knownbits:
 ; AVX2:       # %bb.0:
-; AVX2-NEXT:    vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
-; AVX2-NEXT:    vpxor %xmm1, %xmm1, %xmm1
-; AVX2-NEXT:    vpsadbw %xmm1, %xmm0, %xmm0
-; AVX2-NEXT:    vpcmpgtq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; AVX2-NEXT:    vxorps %xmm0, %xmm0, %xmm0
 ; AVX2-NEXT:    retq
   %mask = and <16 x i8> %a0, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
   %sad = tail call <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8> %mask, <16 x i8> zeroinitializer)

llvm/lib/Target/X86/X86ISelLowering.cpp

Don't just return the known zero upperbits, compute the absdiff Knownbits and perform the horizontal sum

goldsteinn · 2024-03-06T16:46:38Z

llvm/lib/Target/X86/X86ISelLowering.cpp

+  Known = KnownBits::computeForAddSub(/*Add=*/true, /*NSW=*/true, /*NUW=*/true,
+                                      Known, Known);
+  Known = KnownBits::computeForAddSub(/*Add=*/true, /*NSW=*/true, /*NUW=*/true,
+                                      Known, Known);


Known = KnownBits::shl(Known, KnownBits::makeConstant(APInt(Known.getBitWidth(), 8)), /*NSW=*/true, /*NUW=*/true);?
Does the 3x adds do a better job or something? Think our shl and add impl are both optimal.

Other than this, it all looks good.

No, I was just trying to make it explicitly match the PSADBW expansion as possible, but I can just replace it with a shl by 3 (not 8).

Fun fact: KnownBits::shl doesn't need the shift amount to be the same bitwidth as the shift value :)

Would KnownBits::shl make the lower 3 bits of Known.Zero true? That would be wrong.

Ah you're right :)

Yup, noticed that when I tried it - I used the computeForAddSub chain instead

RKSimon added the backend:X86 label Mar 4, 2024

RKSimon requested review from phoebewang and goldsteinn March 4, 2024 11:42

phoebewang reviewed Mar 4, 2024

View reviewed changes

llvm/lib/Target/X86/X86ISelLowering.cpp Outdated Show resolved Hide resolved

RKSimon force-pushed the knownbits-psadbw branch from f5bb6c3 to cf8f19b Compare March 4, 2024 14:41

goldsteinn reviewed Mar 4, 2024

View reviewed changes

llvm/lib/Target/X86/X86ISelLowering.cpp Outdated Show resolved Hide resolved

RKSimon force-pushed the knownbits-psadbw branch from cf8f19b to 2bb14ec Compare March 5, 2024 14:57

RKSimon mentioned this pull request Mar 5, 2024

[X86] computeKnownBitsForTargetNode - add INTRINSIC_WO_CHAIN handling for PSADBW intrinsics #83580

Closed

goldsteinn reviewed Mar 5, 2024

View reviewed changes

llvm/lib/Target/X86/X86ISelLowering.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed Mar 5, 2024

View reviewed changes

llvm/lib/Target/X86/X86ISelLowering.cpp Show resolved Hide resolved

RKSimon force-pushed the knownbits-psadbw branch from 2bb14ec to 31c9bde Compare March 6, 2024 12:24

[X86] Improve KnownBits for X86ISD::PSADBW nodes

8073a10

Don't just return the known zero upperbits, compute the absdiff Knownbits and perform the horizontal sum

RKSimon force-pushed the knownbits-psadbw branch from 31c9bde to 8073a10 Compare March 6, 2024 12:42

goldsteinn reviewed Mar 6, 2024

View reviewed changes

RKSimon merged commit 0bd9255 into llvm:main Mar 6, 2024

RKSimon deleted the knownbits-psadbw branch March 6, 2024 17:23

RKSimon mentioned this pull request Mar 7, 2024

[X86] Improve KnownBits for X86ISD::PSADBW nodes #81765

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] Improve KnownBits for X86ISD::PSADBW nodes #83830

[X86] Improve KnownBits for X86ISD::PSADBW nodes #83830

Uh oh!

RKSimon commented Mar 4, 2024 •

edited

Loading

Uh oh!

llvmbot commented Mar 4, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

goldsteinn Mar 6, 2024

Uh oh!

goldsteinn Mar 6, 2024

Uh oh!

RKSimon Mar 6, 2024

Uh oh!

topperc Mar 6, 2024

Uh oh!

goldsteinn Mar 6, 2024

Uh oh!

RKSimon Mar 6, 2024

Uh oh!

Uh oh!

[X86] Improve KnownBits for X86ISD::PSADBW nodes #83830

[X86] Improve KnownBits for X86ISD::PSADBW nodes #83830

Uh oh!

Conversation

RKSimon commented Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 4, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

goldsteinn Mar 6, 2024

Choose a reason for hiding this comment

Uh oh!

goldsteinn Mar 6, 2024

Choose a reason for hiding this comment

Uh oh!

RKSimon Mar 6, 2024

Choose a reason for hiding this comment

Uh oh!

topperc Mar 6, 2024

Choose a reason for hiding this comment

Uh oh!

goldsteinn Mar 6, 2024

Choose a reason for hiding this comment

Uh oh!

RKSimon Mar 6, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RKSimon commented Mar 4, 2024 •

edited

Loading