-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[X86] Improve KnownBits for X86ISD::PSADBW nodes #81765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-backend-x86 Author: Simon Pilgrim (RKSimon)
X86TargetLowering::computeKnownBitsForTargetNode currently just sets the upper 48-bits to be zero and makes no attempt to determine the remaining active bits.
typedef uint8_t __v2qu __attribute__((__vector_size__(2)));
auto sum_of_bits(__m128i x) {
x = _mm_and_si128(x, _mm_set1_epi8(1));
x = _mm_sad_epu8(x, _mm_setzero_si128());
return __builtin_convertvector(x, __v2qu);
} https://godbolt.org/z/74bYbTMh8 sum_of_bits(long long vector[2]): # @<!-- -->sum_of_bits(long long vector[2])
pand .LCPI0_0(%rip), %xmm0 # v16i8 values: 0-1
pxor %xmm1, %xmm1
psadbw %xmm0, %xmm1 # v2i64 values: 0-8
pand .LCPI0_1(%rip), %xmm1 # unneccessary
packuswb %xmm1, %xmm1
packuswb %xmm1, %xmm1
packuswb %xmm1, %xmm1
movd %xmm1, %eax
retq |
Test cases demonstrating poor value tracking of PSADBW results
… unsigned values Equivalent to "umax(A, B) - umin(A, B)" First step towards adding knownbits support for absdiff patterns for llvm#81765
Another example: define <2 x double> @sum_of_bits(<16 x i8> noundef %x) {
%mask = and <16 x i8> %x, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
%sad = tail call <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8> %mask, <16 x i8> zeroinitializer)
%cvt = sitofp <2 x i64> %sad to <2 x double>
ret <2 x double> %cvt
}
declare <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8>, <16 x i8>) sum_of_bits(char vector[16]): # @sum_of_bits(char vector[16])
pand .LCPI0_0(%rip), %xmm0
pxor %xmm1, %xmm1
psadbw %xmm0, %xmm1
movd %xmm1, %eax
xorps %xmm0, %xmm0
cvtsi2sd %eax, %xmm0
pextrq $1, %xmm1, %rax
xorps %xmm1, %xmm1
cvtsi2sd %eax, %xmm1
unpcklpd %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[0]
retq Late expansion of the intrinsic into X86ISD::PSADBW means we don't know ANY knownbits when we lower sitofp before then |
@RKSimon Are you working on implementing this? If not I can create a patch. |
Slowly working through it - that's what #82354 is to help with - the late expansion of target intrinsics is annoying though, and I'm not sure how much of a rathole that might turn into... |
Where you planning on doing more than expand the:
case of |
As a first step that will become: case X86ISD::PSADBW: {
SDValue LHS = Op.getOperand(0);
SDValue RHS = Op.getOperand(1);
assert(VT.getScalarType() == MVT::i64 &&
LHS.getValueType() == RHS.getValueType() &&
LHS.getValueType().getScalarType() == MVT::i8 &&
"Unexpected PSADBW types");
KnownBits Known2;
unsigned NumSrcElts = LHS.getValueType().getVectorNumElements();
APInt DemandedSrcElts = APIntOps::ScaleBitMask(DemandedElts, NumSrcElts);
Known = DAG.computeKnownBits(RHS, DemandedSrcElts, Depth + 1);
Known2 = DAG.computeKnownBits(LHS, DemandedSrcElts, Depth + 1);
Known = KnownBits::absdiff(Known, Known2).zext(16);
Known = KnownBits::computeForAddSub(true, false, Known, Known);
Known = KnownBits::computeForAddSub(true, false, Known, Known);
Known = KnownBits::computeForAddSub(true, false, Known, Known);
Known = Known.zext(64);
break;
} |
Think you can have |
@RKSimon you planning to post a patch? |
… for PSADBW intrinsics Waiting for intrinsics to be lowered to ISD target nodes is causing some poor combine decisions - at the very least we need better value tracking handling. As an initial example I've added support for the PSADBW intrinsics (which can be expanded along with the ISD node in llvm#81765) as this is a good example of an intrinsic that we need to handle as early as possible.
… for PSADBW intrinsics Waiting for intrinsics to be lowered to ISD target nodes is causing some poor combine decisions - at the very least we need better value tracking handling. As an initial example I've added support for the PSADBW intrinsics (which can be expanded along with the ISD node in llvm#81765) as this is a good example of an intrinsic that we need to handle as early as possible.
X86TargetLowering::computeKnownBitsForTargetNode currently just sets the upper 48-bits to be zero and makes no attempt to determine the remaining active bits.
https://godbolt.org/z/74bYbTMh8
The text was updated successfully, but these errors were encountered: