[X86] Improve KnownBits for X86ISD::PSADBW nodes #81765

RKSimon · 2024-02-14T17:12:28Z

X86TargetLowering::computeKnownBitsForTargetNode currently just sets the upper 48-bits to be zero and makes no attempt to determine the remaining active bits.

typedef uint8_t __v2qu __attribute__((__vector_size__(2)));
auto sum_of_bits(__m128i x) {
  x = _mm_and_si128(x, _mm_set1_epi8(1));
  x = _mm_sad_epu8(x, _mm_setzero_si128());
  return __builtin_convertvector(x, __v2qu);
}

https://godbolt.org/z/74bYbTMh8

sum_of_bits(long long vector[2]):                   # @sum_of_bits(long long vector[2])
        pand    .LCPI0_0(%rip), %xmm0    # v16i8 values: 0-1
        pxor    %xmm1, %xmm1
        psadbw  %xmm0, %xmm1             # v2i64 values: 0-8
        pand    .LCPI0_1(%rip), %xmm1    # unneccessary
        packuswb        %xmm1, %xmm1
        packuswb        %xmm1, %xmm1
        packuswb        %xmm1, %xmm1
        movd    %xmm1, %eax
        retq

llvmbot · 2024-02-14T17:12:43Z

@llvm/issue-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

X86TargetLowering::computeKnownBitsForTargetNode currently just sets the upper 48-bits to be zero and makes no attempt to determine the remaining active bits.

typedef uint8_t __v2qu __attribute__((__vector_size__(2)));
auto sum_of_bits(__m128i x) {
  x = _mm_and_si128(x, _mm_set1_epi8(1));
  x = _mm_sad_epu8(x, _mm_setzero_si128());
  return __builtin_convertvector(x, __v2qu);
}

https://godbolt.org/z/74bYbTMh8

sum_of_bits(long long vector[2]):                   # @<!-- -->sum_of_bits(long long vector[2])
        pand    .LCPI0_0(%rip), %xmm0    # v16i8 values: 0-1
        pxor    %xmm1, %xmm1
        psadbw  %xmm0, %xmm1             # v2i64 values: 0-8
        pand    .LCPI0_1(%rip), %xmm1    # unneccessary
        packuswb        %xmm1, %xmm1
        packuswb        %xmm1, %xmm1
        packuswb        %xmm1, %xmm1
        movd    %xmm1, %eax
        retq

Test cases demonstrating poor value tracking of PSADBW results

… unsigned values Equivalent to "umax(A, B) - umin(A, B)" First step towards adding knownbits support for absdiff patterns for llvm#81765

… unsigned values (#82255) Equivalent to "umax(A, B) - umin(A, B)" First step towards adding knownbits support for absdiff patterns for #81765

RKSimon · 2024-02-20T17:14:34Z

Another example:

define <2 x double> @sum_of_bits(<16 x i8> noundef %x) {
  %mask = and <16 x i8> %x, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
  %sad = tail call <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8> %mask, <16 x i8> zeroinitializer)
  %cvt = sitofp <2 x i64> %sad to <2 x double>
  ret <2 x double> %cvt
}
declare <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8>, <16 x i8>)

sum_of_bits(char vector[16]):                  # @sum_of_bits(char vector[16])
        pand    .LCPI0_0(%rip), %xmm0
        pxor    %xmm1, %xmm1
        psadbw  %xmm0, %xmm1
        movd    %xmm1, %eax
        xorps   %xmm0, %xmm0
        cvtsi2sd        %eax, %xmm0
        pextrq  $1, %xmm1, %rax
        xorps   %xmm1, %xmm1
        cvtsi2sd        %eax, %xmm1
        unpcklpd        %xmm1, %xmm0                    # xmm0 = xmm0[0],xmm1[0]
        retq

Late expansion of the intrinsic into X86ISD::PSADBW means we don't know ANY knownbits when we lower sitofp before then

goldsteinn · 2024-02-20T17:19:31Z

@RKSimon Are you working on implementing this? If not I can create a patch.

RKSimon · 2024-02-20T17:30:51Z

Slowly working through it - that's what #82354 is to help with - the late expansion of target intrinsics is annoying though, and I'm not sure how much of a rathole that might turn into...

goldsteinn · 2024-02-20T17:36:47Z

Slowly working through it - that's what #82354 is to help with - the late expansion of target intrinsics is annoying though, and I'm not sure how much of a rathole that might turn into...

Where you planning on doing more than expand the:

  case X86ISD::PSADBW: {
    assert(VT.getScalarType() == MVT::i64 &&
           Op.getOperand(0).getValueType().getScalarType() == MVT::i8 &&
           "Unexpected PSADBW types");

    // PSADBW - fills low 16 bits and zeros upper 48 bits of each i64 result.
    Known.Zero.setBitsFrom(16);
    break;
  }

case of computeKnownBitsForTargetNode?

RKSimon · 2024-02-20T17:41:28Z

As a first step that will become:

  case X86ISD::PSADBW: {
    SDValue LHS = Op.getOperand(0);
    SDValue RHS = Op.getOperand(1);
    assert(VT.getScalarType() == MVT::i64 &&
           LHS.getValueType() == RHS.getValueType() &&
           LHS.getValueType().getScalarType() == MVT::i8 &&
           "Unexpected PSADBW types");

    KnownBits Known2;
    unsigned NumSrcElts = LHS.getValueType().getVectorNumElements();
    APInt DemandedSrcElts = APIntOps::ScaleBitMask(DemandedElts, NumSrcElts);
    Known = DAG.computeKnownBits(RHS, DemandedSrcElts, Depth + 1);
    Known2 = DAG.computeKnownBits(LHS, DemandedSrcElts, Depth + 1);
    Known = KnownBits::absdiff(Known, Known2).zext(16);
    Known = KnownBits::computeForAddSub(true, false, Known, Known);
    Known = KnownBits::computeForAddSub(true, false, Known, Known);
    Known = KnownBits::computeForAddSub(true, false, Known, Known);
    Known = Known.zext(64);
    break;
  }

goldsteinn · 2024-02-20T17:44:49Z

Think you can have nsw for the firs ttwoo computeForAddSub, but yeah that was essentially what I was planning to do.

goldsteinn · 2024-02-24T17:39:48Z

@RKSimon you planning to post a patch?

RKSimon · 2024-02-24T17:41:21Z

Yes, @nikic has asked for some changes to #82354 which I will need to deal with first.

goldsteinn · 2024-02-24T18:25:43Z

Yes, @nikic has asked for some changes to #82354 which I will need to deal with first.

Ah, didnt see that. My email filter was lacking

… for PSADBW intrinsics Waiting for intrinsics to be lowered to ISD target nodes is causing some poor combine decisions - at the very least we need better value tracking handling. As an initial example I've added support for the PSADBW intrinsics (which can be expanded along with the ISD node in llvm#81765) as this is a good example of an intrinsic that we need to handle as early as possible.

RKSimon · 2024-03-07T11:48:19Z

Fixed by #83830 / 0bd9255

RKSimon added the backend:X86 label Feb 14, 2024

RKSimon added a commit that referenced this issue Feb 19, 2024

[X86] Add some basic test coverage for #81765

8d8bb35

Test cases demonstrating poor value tracking of PSADBW results

RKSimon mentioned this issue Feb 19, 2024

[APInt] Add APIntOps::absdiff to compute the absolute difference of 2 unsigned values #82255

Merged

RKSimon added a commit that referenced this issue Feb 19, 2024

[APInt] Add APIntOps::absdiff to compute the absolute difference of 2…

73f76cd

… unsigned values (#82255) Equivalent to "umax(A, B) - umin(A, B)" First step towards adding knownbits support for absdiff patterns for #81765

RKSimon self-assigned this Feb 22, 2024

RKSimon mentioned this issue Mar 1, 2024

[X86] computeKnownBitsForTargetNode - add INTRINSIC_WO_CHAIN handling for PSADBW intrinsics #83580

Closed

RKSimon closed this as completed Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] Improve KnownBits for X86ISD::PSADBW nodes #81765

[X86] Improve KnownBits for X86ISD::PSADBW nodes #81765

RKSimon commented Feb 14, 2024

llvmbot commented Feb 14, 2024

Uh oh!

RKSimon commented Feb 20, 2024

Uh oh!

goldsteinn commented Feb 20, 2024

Uh oh!

RKSimon commented Feb 20, 2024

Uh oh!

goldsteinn commented Feb 20, 2024

Uh oh!

RKSimon commented Feb 20, 2024

Uh oh!

goldsteinn commented Feb 20, 2024

Uh oh!

goldsteinn commented Feb 24, 2024

Uh oh!

RKSimon commented Feb 24, 2024

Uh oh!

goldsteinn commented Feb 24, 2024

Uh oh!

RKSimon commented Mar 7, 2024

Uh oh!

[X86] Improve KnownBits for X86ISD::PSADBW nodes #81765

[X86] Improve KnownBits for X86ISD::PSADBW nodes #81765

Comments

RKSimon commented Feb 14, 2024

llvmbot commented Feb 14, 2024

Uh oh!

RKSimon commented Feb 20, 2024

Uh oh!

goldsteinn commented Feb 20, 2024

Uh oh!

RKSimon commented Feb 20, 2024

Uh oh!

goldsteinn commented Feb 20, 2024

Uh oh!

RKSimon commented Feb 20, 2024

Uh oh!

goldsteinn commented Feb 20, 2024

Uh oh!

goldsteinn commented Feb 24, 2024

Uh oh!

RKSimon commented Feb 24, 2024

Uh oh!

goldsteinn commented Feb 24, 2024

Uh oh!

RKSimon commented Mar 7, 2024

Uh oh!