Skip to content

[X86] Improve KnownBits for X86ISD::PSADBW nodes #81765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RKSimon opened this issue Feb 14, 2024 · 11 comments
Closed

[X86] Improve KnownBits for X86ISD::PSADBW nodes #81765

RKSimon opened this issue Feb 14, 2024 · 11 comments
Assignees

Comments

@RKSimon
Copy link
Collaborator

RKSimon commented Feb 14, 2024

X86TargetLowering::computeKnownBitsForTargetNode currently just sets the upper 48-bits to be zero and makes no attempt to determine the remaining active bits.

typedef uint8_t __v2qu __attribute__((__vector_size__(2)));
auto sum_of_bits(__m128i x) {
  x = _mm_and_si128(x, _mm_set1_epi8(1));
  x = _mm_sad_epu8(x, _mm_setzero_si128());
  return __builtin_convertvector(x, __v2qu);
}

https://godbolt.org/z/74bYbTMh8

sum_of_bits(long long vector[2]):                   # @sum_of_bits(long long vector[2])
        pand    .LCPI0_0(%rip), %xmm0    # v16i8 values: 0-1
        pxor    %xmm1, %xmm1
        psadbw  %xmm0, %xmm1             # v2i64 values: 0-8
        pand    .LCPI0_1(%rip), %xmm1    # unneccessary
        packuswb        %xmm1, %xmm1
        packuswb        %xmm1, %xmm1
        packuswb        %xmm1, %xmm1
        movd    %xmm1, %eax
        retq
@llvmbot
Copy link
Member

llvmbot commented Feb 14, 2024

@llvm/issue-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

X86TargetLowering::computeKnownBitsForTargetNode currently just sets the upper 48-bits to be zero and makes no attempt to determine the remaining active bits.
typedef uint8_t __v2qu __attribute__((__vector_size__(2)));
auto sum_of_bits(__m128i x) {
  x = _mm_and_si128(x, _mm_set1_epi8(1));
  x = _mm_sad_epu8(x, _mm_setzero_si128());
  return __builtin_convertvector(x, __v2qu);
}

https://godbolt.org/z/74bYbTMh8

sum_of_bits(long long vector[2]):                   # @<!-- -->sum_of_bits(long long vector[2])
        pand    .LCPI0_0(%rip), %xmm0    # v16i8 values: 0-1
        pxor    %xmm1, %xmm1
        psadbw  %xmm0, %xmm1             # v2i64 values: 0-8
        pand    .LCPI0_1(%rip), %xmm1    # unneccessary
        packuswb        %xmm1, %xmm1
        packuswb        %xmm1, %xmm1
        packuswb        %xmm1, %xmm1
        movd    %xmm1, %eax
        retq

RKSimon added a commit that referenced this issue Feb 19, 2024
Test cases demonstrating poor value tracking of PSADBW results
RKSimon added a commit to RKSimon/llvm-project that referenced this issue Feb 19, 2024
… unsigned values

Equivalent to "umax(A, B) - umin(A, B)"

First step towards adding knownbits support for absdiff patterns for llvm#81765
RKSimon added a commit that referenced this issue Feb 19, 2024
… unsigned values (#82255)

Equivalent to "umax(A, B) - umin(A, B)"

First step towards adding knownbits support for absdiff patterns for #81765
@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 20, 2024

Another example:

define <2 x double> @sum_of_bits(<16 x i8> noundef %x) {
  %mask = and <16 x i8> %x, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
  %sad = tail call <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8> %mask, <16 x i8> zeroinitializer)
  %cvt = sitofp <2 x i64> %sad to <2 x double>
  ret <2 x double> %cvt
}
declare <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8>, <16 x i8>)
sum_of_bits(char vector[16]):                  # @sum_of_bits(char vector[16])
        pand    .LCPI0_0(%rip), %xmm0
        pxor    %xmm1, %xmm1
        psadbw  %xmm0, %xmm1
        movd    %xmm1, %eax
        xorps   %xmm0, %xmm0
        cvtsi2sd        %eax, %xmm0
        pextrq  $1, %xmm1, %rax
        xorps   %xmm1, %xmm1
        cvtsi2sd        %eax, %xmm1
        unpcklpd        %xmm1, %xmm0                    # xmm0 = xmm0[0],xmm1[0]
        retq

Late expansion of the intrinsic into X86ISD::PSADBW means we don't know ANY knownbits when we lower sitofp before then

@goldsteinn
Copy link
Contributor

@RKSimon Are you working on implementing this? If not I can create a patch.

@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 20, 2024

Slowly working through it - that's what #82354 is to help with - the late expansion of target intrinsics is annoying though, and I'm not sure how much of a rathole that might turn into...

@goldsteinn
Copy link
Contributor

Slowly working through it - that's what #82354 is to help with - the late expansion of target intrinsics is annoying though, and I'm not sure how much of a rathole that might turn into...

Where you planning on doing more than expand the:

  case X86ISD::PSADBW: {
    assert(VT.getScalarType() == MVT::i64 &&
           Op.getOperand(0).getValueType().getScalarType() == MVT::i8 &&
           "Unexpected PSADBW types");

    // PSADBW - fills low 16 bits and zeros upper 48 bits of each i64 result.
    Known.Zero.setBitsFrom(16);
    break;
  }

case of computeKnownBitsForTargetNode?

@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 20, 2024

As a first step that will become:

  case X86ISD::PSADBW: {
    SDValue LHS = Op.getOperand(0);
    SDValue RHS = Op.getOperand(1);
    assert(VT.getScalarType() == MVT::i64 &&
           LHS.getValueType() == RHS.getValueType() &&
           LHS.getValueType().getScalarType() == MVT::i8 &&
           "Unexpected PSADBW types");

    KnownBits Known2;
    unsigned NumSrcElts = LHS.getValueType().getVectorNumElements();
    APInt DemandedSrcElts = APIntOps::ScaleBitMask(DemandedElts, NumSrcElts);
    Known = DAG.computeKnownBits(RHS, DemandedSrcElts, Depth + 1);
    Known2 = DAG.computeKnownBits(LHS, DemandedSrcElts, Depth + 1);
    Known = KnownBits::absdiff(Known, Known2).zext(16);
    Known = KnownBits::computeForAddSub(true, false, Known, Known);
    Known = KnownBits::computeForAddSub(true, false, Known, Known);
    Known = KnownBits::computeForAddSub(true, false, Known, Known);
    Known = Known.zext(64);
    break;
  }

@goldsteinn
Copy link
Contributor

Think you can have nsw for the firs ttwoo computeForAddSub, but yeah that was essentially what I was planning to do.

@RKSimon RKSimon self-assigned this Feb 22, 2024
@goldsteinn
Copy link
Contributor

@RKSimon you planning to post a patch?

@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 24, 2024

Yes, @nikic has asked for some changes to #82354 which I will need to deal with first.

@goldsteinn
Copy link
Contributor

Yes, @nikic has asked for some changes to #82354 which I will need to deal with first.

Ah, didnt see that. My email filter was lacking

RKSimon added a commit to RKSimon/llvm-project that referenced this issue Mar 1, 2024
… for PSADBW intrinsics

Waiting for intrinsics to be lowered to ISD target nodes is causing some poor combine decisions - at the very least we need better value tracking handling.

As an initial example I've added support for the PSADBW intrinsics (which can be expanded along with the ISD node in llvm#81765) as this is a good example of an intrinsic that we need to handle as early as possible.
RKSimon added a commit to RKSimon/llvm-project that referenced this issue Mar 4, 2024
… for PSADBW intrinsics

Waiting for intrinsics to be lowered to ISD target nodes is causing some poor combine decisions - at the very least we need better value tracking handling.

As an initial example I've added support for the PSADBW intrinsics (which can be expanded along with the ISD node in llvm#81765) as this is a good example of an intrinsic that we need to handle as early as possible.
@RKSimon
Copy link
Collaborator Author

RKSimon commented Mar 7, 2024

Fixed by #83830 / 0bd9255

@RKSimon RKSimon closed this as completed Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants