-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[AArch64][SVE] Improve fixed-length addressing modes. #129732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64][SVE] Improve fixed-length addressing modes. #129732
Conversation
When compiling VLS SVE, the compiler often replaces VL-based offsets with immediate-based ones. This leads to a mismatch during isel since SVE loads/stores generally expect immediate offsets relative to VL. For example, given: ```c svfloat64_t foo(const double *x) { svbool_t pg = svptrue_b64(); return svld1_f64(pg, x+svcntd()); } ``` When compiled with `-msve-vector-bits=128`, we currently generate: ```gas foo: ptrue p0.d mov x8, llvm#2 ld1d { z0.d }, p0/z, [x0, x8, lsl llvm#3] ret ``` In practice, we could instead be generating: ```gas foo: ldr z0, [x0, llvm#1, mul vl] ret ``` Likewise for other types, stores, and other VLS lengths.
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-clang Author: Ricardo Jesus (rj-jesus) ChangesThis is based on the discussion in #127500 (comment) and subsequent comments. When compiling VLS SVE, the compiler often replaces VL-based offsets with immediate-based ones. This leads to a mismatch in the allowed addressing modes due to SVE loads/stores generally expecting immediate offsets relative to VL. For example, given: svfloat64_t foo(const double *x) {
svbool_t pg = svptrue_b64();
return svld1_f64(pg, x+svcntd());
} When compiled with foo:
ptrue p0.d
mov x8, #<!-- -->2
ld1d { z0.d }, p0/z, [x0, x8, lsl #<!-- -->3]
ret Instead, we could be generating: foo:
ldr z0, [x0, #<!-- -->1, mul vl]
ret Likewise for other types, stores, and other VLS lengths. This patch attempts to achieve the above by extending Full diff: https://github.com/llvm/llvm-project/pull/129732.diff 4 Files Affected:
diff --git a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
index 0ed14b4b3b793..1391a1b09fbd1 100644
--- a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
+++ b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
@@ -13,12 +13,9 @@
void func(int *restrict a, int *restrict b) {
// CHECK-LABEL: func
-// CHECK256-COUNT-1: str
-// CHECK256-COUNT-7: st1w
-// CHECK512-COUNT-1: str
-// CHECK512-COUNT-3: st1w
-// CHECK1024-COUNT-1: str
-// CHECK1024-COUNT-1: st1w
+// CHECK256-COUNT-8: str
+// CHECK512-COUNT-4: str
+// CHECK1024-COUNT-2: str
// CHECK2048-COUNT-1: st1w
#pragma clang loop vectorize(enable)
for (int i = 0; i < 64; ++i)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
index 3ca9107cb2ce5..2459b17e68c36 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
@@ -7379,13 +7379,27 @@ bool AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N,
if (N.getOpcode() != ISD::ADD)
return false;
- SDValue VScale = N.getOperand(1);
- if (VScale.getOpcode() != ISD::VSCALE)
+ int64_t MulImm = std::numeric_limits<int64_t>::max();
+ if (SDValue VScale = N.getOperand(1); VScale.getOpcode() == ISD::VSCALE)
+ MulImm = cast<ConstantSDNode>(VScale.getOperand(0))->getSExtValue();
+ else if (auto C = dyn_cast<ConstantSDNode>(N.getOperand(1))) {
+ int64_t ByteOffset = C->getSExtValue();
+ constexpr auto SVEBitsPerBlock = AArch64::SVEBitsPerBlock;
+ auto MinVScale = Subtarget->getMinSVEVectorSizeInBits() / SVEBitsPerBlock;
+ auto MaxVScale = Subtarget->getMaxSVEVectorSizeInBits() / SVEBitsPerBlock;
+
+ if (!MaxVScale || MinVScale != MaxVScale || ByteOffset % MaxVScale != 0)
+ return false;
+
+ MulImm = ByteOffset / MaxVScale;
+ } else
return false;
+ assert(MulImm != std::numeric_limits<int64_t>::max() &&
+ "Uninitialized MulImm.");
+
TypeSize TS = MemVT.getSizeInBits();
int64_t MemWidthBytes = static_cast<int64_t>(TS.getKnownMinValue()) / 8;
- int64_t MulImm = cast<ConstantSDNode>(VScale.getOperand(0))->getSExtValue();
if ((MulImm % MemWidthBytes) != 0)
return false;
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
new file mode 100644
index 0000000000000..6b25ce3abdc8c
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
@@ -0,0 +1,187 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | FileCheck %s --check-prefix=CHECK-128
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | FileCheck %s --check-prefix=CHECK-256
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | FileCheck %s --check-prefix=CHECK-512
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | FileCheck %s --check-prefix=CHECK-1024
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | FileCheck %s --check-prefix=CHECK-2048
+
+define void @nxv16i8(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv16i8:
+; CHECK: // %bb.0:
+; CHECK-NEXT: ptrue p0.b
+; CHECK-NEXT: mov w8, #256 // =0x100
+; CHECK-NEXT: ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-NEXT: st1b { z0.b }, p0, [x1, x8]
+; CHECK-NEXT: ret
+;
+; CHECK-128-LABEL: nxv16i8:
+; CHECK-128: // %bb.0:
+; CHECK-128-NEXT: ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT: str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT: ret
+;
+; CHECK-256-LABEL: nxv16i8:
+; CHECK-256: // %bb.0:
+; CHECK-256-NEXT: ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT: str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT: ret
+;
+; CHECK-512-LABEL: nxv16i8:
+; CHECK-512: // %bb.0:
+; CHECK-512-NEXT: ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT: str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT: ret
+;
+; CHECK-1024-LABEL: nxv16i8:
+; CHECK-1024: // %bb.0:
+; CHECK-1024-NEXT: ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT: str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT: ret
+;
+; CHECK-2048-LABEL: nxv16i8:
+; CHECK-2048: // %bb.0:
+; CHECK-2048-NEXT: ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT: str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT: ret
+ %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256
+ %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256
+ %x = load <vscale x 16 x i8>, ptr %ldoff, align 1
+ store <vscale x 16 x i8> %x, ptr %stoff, align 1
+ ret void
+}
+
+define void @nxv8i16(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv8i16:
+; CHECK: // %bb.0:
+; CHECK-NEXT: ptrue p0.h
+; CHECK-NEXT: mov x8, #128 // =0x80
+; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-NEXT: st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-NEXT: ret
+;
+; CHECK-128-LABEL: nxv8i16:
+; CHECK-128: // %bb.0:
+; CHECK-128-NEXT: ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT: str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT: ret
+;
+; CHECK-256-LABEL: nxv8i16:
+; CHECK-256: // %bb.0:
+; CHECK-256-NEXT: ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT: str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT: ret
+;
+; CHECK-512-LABEL: nxv8i16:
+; CHECK-512: // %bb.0:
+; CHECK-512-NEXT: ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT: str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT: ret
+;
+; CHECK-1024-LABEL: nxv8i16:
+; CHECK-1024: // %bb.0:
+; CHECK-1024-NEXT: ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT: str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT: ret
+;
+; CHECK-2048-LABEL: nxv8i16:
+; CHECK-2048: // %bb.0:
+; CHECK-2048-NEXT: ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT: str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT: ret
+ %ldoff = getelementptr inbounds nuw i16, ptr %ldptr, i64 128
+ %stoff = getelementptr inbounds nuw i16, ptr %stptr, i64 128
+ %x = load <vscale x 8 x i16>, ptr %ldoff, align 2
+ store <vscale x 8 x i16> %x, ptr %stoff, align 2
+ ret void
+}
+
+define void @nxv4i32(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv4i32:
+; CHECK: // %bb.0:
+; CHECK-NEXT: ptrue p0.s
+; CHECK-NEXT: mov x8, #64 // =0x40
+; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-NEXT: st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-NEXT: ret
+;
+; CHECK-128-LABEL: nxv4i32:
+; CHECK-128: // %bb.0:
+; CHECK-128-NEXT: ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT: str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT: ret
+;
+; CHECK-256-LABEL: nxv4i32:
+; CHECK-256: // %bb.0:
+; CHECK-256-NEXT: ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT: str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT: ret
+;
+; CHECK-512-LABEL: nxv4i32:
+; CHECK-512: // %bb.0:
+; CHECK-512-NEXT: ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT: str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT: ret
+;
+; CHECK-1024-LABEL: nxv4i32:
+; CHECK-1024: // %bb.0:
+; CHECK-1024-NEXT: ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT: str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT: ret
+;
+; CHECK-2048-LABEL: nxv4i32:
+; CHECK-2048: // %bb.0:
+; CHECK-2048-NEXT: ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT: str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT: ret
+ %ldoff = getelementptr inbounds nuw i32, ptr %ldptr, i64 64
+ %stoff = getelementptr inbounds nuw i32, ptr %stptr, i64 64
+ %x = load <vscale x 4 x i32>, ptr %ldoff, align 4
+ store <vscale x 4 x i32> %x, ptr %stoff, align 4
+ ret void
+}
+
+define void @nxv2i64(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv2i64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: ptrue p0.d
+; CHECK-NEXT: mov x8, #32 // =0x20
+; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
+; CHECK-NEXT: st1d { z0.d }, p0, [x1, x8, lsl #3]
+; CHECK-NEXT: ret
+;
+; CHECK-128-LABEL: nxv2i64:
+; CHECK-128: // %bb.0:
+; CHECK-128-NEXT: ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT: str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT: ret
+;
+; CHECK-256-LABEL: nxv2i64:
+; CHECK-256: // %bb.0:
+; CHECK-256-NEXT: ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT: str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT: ret
+;
+; CHECK-512-LABEL: nxv2i64:
+; CHECK-512: // %bb.0:
+; CHECK-512-NEXT: ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT: str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT: ret
+;
+; CHECK-1024-LABEL: nxv2i64:
+; CHECK-1024: // %bb.0:
+; CHECK-1024-NEXT: ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT: str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT: ret
+;
+; CHECK-2048-LABEL: nxv2i64:
+; CHECK-2048: // %bb.0:
+; CHECK-2048-NEXT: ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT: str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT: ret
+ %ldoff = getelementptr inbounds nuw i64, ptr %ldptr, i64 32
+ %stoff = getelementptr inbounds nuw i64, ptr %stptr, i64 32
+ %x = load <vscale x 2 x i64>, ptr %ldoff, align 8
+ store <vscale x 2 x i64> %x, ptr %stoff, align 8
+ ret void
+}
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll
index e33bc8da97c05..2d4cdfa7278b9 100644
--- a/llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll
@@ -30,64 +30,64 @@ define void @crash_when_lowering_extract_shuffle(ptr %dst, i1 %cond) vscale_rang
; CHECK-NEXT: // %bb.1: // %vector.body
; CHECK-NEXT: mov z0.b, #0 // =0x0
; CHECK-NEXT: ptrue p0.s
-; CHECK-NEXT: mov x9, #8 // =0x8
-; CHECK-NEXT: mov x10, #24 // =0x18
+; CHECK-NEXT: mov x9, #24 // =0x18
; CHECK-NEXT: umov w8, v0.b[8]
-; CHECK-NEXT: mov v1.16b, v0.16b
-; CHECK-NEXT: mov v1.b[1], v0.b[1]
-; CHECK-NEXT: fmov s2, w8
-; CHECK-NEXT: mov x8, #16 // =0x10
-; CHECK-NEXT: mov v2.b[1], v0.b[9]
-; CHECK-NEXT: mov v1.b[2], v0.b[2]
-; CHECK-NEXT: mov v2.b[2], v0.b[10]
-; CHECK-NEXT: mov v1.b[3], v0.b[3]
-; CHECK-NEXT: mov v2.b[3], v0.b[11]
-; CHECK-NEXT: mov v1.b[4], v0.b[4]
-; CHECK-NEXT: mov v2.b[4], v0.b[12]
-; CHECK-NEXT: mov v1.b[5], v0.b[5]
-; CHECK-NEXT: mov v2.b[5], v0.b[13]
-; CHECK-NEXT: mov v1.b[6], v0.b[6]
-; CHECK-NEXT: mov v2.b[6], v0.b[14]
-; CHECK-NEXT: mov v1.b[7], v0.b[7]
-; CHECK-NEXT: mov v2.b[7], v0.b[15]
-; CHECK-NEXT: ext z0.b, z0.b, z0.b, #16
-; CHECK-NEXT: uunpklo z1.h, z1.b
-; CHECK-NEXT: ext v3.16b, v0.16b, v0.16b, #8
-; CHECK-NEXT: uunpklo z0.h, z0.b
+; CHECK-NEXT: mov v2.16b, v0.16b
+; CHECK-NEXT: mov z3.d, z0.d
+; CHECK-NEXT: mov v2.b[1], v0.b[1]
+; CHECK-NEXT: ext z3.b, z3.b, z0.b, #16
+; CHECK-NEXT: fmov s1, w8
+; CHECK-NEXT: mov x8, #8 // =0x8
+; CHECK-NEXT: ext v4.16b, v3.16b, v3.16b, #8
+; CHECK-NEXT: mov v1.b[1], v0.b[9]
+; CHECK-NEXT: mov v2.b[2], v0.b[2]
+; CHECK-NEXT: mov v1.b[2], v0.b[10]
+; CHECK-NEXT: mov v2.b[3], v0.b[3]
+; CHECK-NEXT: mov v1.b[3], v0.b[11]
+; CHECK-NEXT: mov v2.b[4], v0.b[4]
+; CHECK-NEXT: mov v1.b[4], v0.b[12]
+; CHECK-NEXT: mov v2.b[5], v0.b[5]
+; CHECK-NEXT: mov v1.b[5], v0.b[13]
+; CHECK-NEXT: mov v2.b[6], v0.b[6]
+; CHECK-NEXT: mov v1.b[6], v0.b[14]
+; CHECK-NEXT: mov v2.b[7], v0.b[7]
+; CHECK-NEXT: mov v1.b[7], v0.b[15]
; CHECK-NEXT: uunpklo z2.h, z2.b
-; CHECK-NEXT: uunpklo z1.s, z1.h
-; CHECK-NEXT: uunpklo z3.h, z3.b
-; CHECK-NEXT: uunpklo z0.s, z0.h
+; CHECK-NEXT: uunpklo z0.h, z1.b
+; CHECK-NEXT: uunpklo z1.h, z3.b
+; CHECK-NEXT: uunpklo z3.h, z4.b
; CHECK-NEXT: uunpklo z2.s, z2.h
-; CHECK-NEXT: lsl z1.s, z1.s, #31
+; CHECK-NEXT: uunpklo z0.s, z0.h
+; CHECK-NEXT: uunpklo z1.s, z1.h
; CHECK-NEXT: uunpklo z3.s, z3.h
-; CHECK-NEXT: lsl z0.s, z0.s, #31
-; CHECK-NEXT: asr z1.s, z1.s, #31
; CHECK-NEXT: lsl z2.s, z2.s, #31
-; CHECK-NEXT: asr z0.s, z0.s, #31
-; CHECK-NEXT: and z1.s, z1.s, #0x1
+; CHECK-NEXT: lsl z0.s, z0.s, #31
+; CHECK-NEXT: lsl z1.s, z1.s, #31
; CHECK-NEXT: lsl z3.s, z3.s, #31
; CHECK-NEXT: asr z2.s, z2.s, #31
-; CHECK-NEXT: and z0.s, z0.s, #0x1
-; CHECK-NEXT: cmpne p4.s, p0/z, z1.s, #0
-; CHECK-NEXT: ld1w { z1.s }, p0/z, [x0]
+; CHECK-NEXT: asr z0.s, z0.s, #31
+; CHECK-NEXT: asr z1.s, z1.s, #31
; CHECK-NEXT: asr z3.s, z3.s, #31
; CHECK-NEXT: and z2.s, z2.s, #0x1
-; CHECK-NEXT: cmpne p1.s, p0/z, z0.s, #0
-; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-NEXT: and z0.s, z0.s, #0x1
+; CHECK-NEXT: and z1.s, z1.s, #0x1
; CHECK-NEXT: and z3.s, z3.s, #0x1
-; CHECK-NEXT: cmpne p2.s, p0/z, z2.s, #0
-; CHECK-NEXT: ld1w { z2.s }, p0/z, [x0, x9, lsl #2]
-; CHECK-NEXT: mov z1.s, p4/m, #0 // =0x0
+; CHECK-NEXT: cmpne p4.s, p0/z, z2.s, #0
+; CHECK-NEXT: ld1w { z2.s }, p0/z, [x0]
+; CHECK-NEXT: cmpne p1.s, p0/z, z0.s, #0
+; CHECK-NEXT: cmpne p2.s, p0/z, z1.s, #0
; CHECK-NEXT: cmpne p3.s, p0/z, z3.s, #0
-; CHECK-NEXT: ld1w { z3.s }, p0/z, [x0, x10, lsl #2]
+; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-NEXT: ld1w { z1.s }, p0/z, [x0, #1, mul vl]
+; CHECK-NEXT: ld1w { z3.s }, p0/z, [x0, x9, lsl #2]
+; CHECK-NEXT: mov z2.s, p4/m, #0 // =0x0
; CHECK-NEXT: mov z0.s, p1/m, #0 // =0x0
-; CHECK-NEXT: mov z2.s, p2/m, #0 // =0x0
-; CHECK-NEXT: st1w { z1.s }, p0, [x0]
-; CHECK-NEXT: st1w { z0.s }, p0, [x0, x8, lsl #2]
+; CHECK-NEXT: mov z1.s, p2/m, #0 // =0x0
; CHECK-NEXT: mov z3.s, p3/m, #0 // =0x0
-; CHECK-NEXT: st1w { z2.s }, p0, [x0, x9, lsl #2]
-; CHECK-NEXT: st1w { z3.s }, p0, [x0, x10, lsl #2]
+; CHECK-NEXT: st1w { z2.s }, p0, [x0]
+; CHECK-NEXT: st1w { z0.s }, p0, [x0, x8, lsl #2]
+; CHECK-NEXT: st1w { z1.s }, p0, [x0, #1, mul vl]
+; CHECK-NEXT: st1w { z3.s }, p0, [x0, x9, lsl #2]
; CHECK-NEXT: .LBB1_2: // %exit
; CHECK-NEXT: ret
%broadcast.splat = shufflevector <32 x i1> zeroinitializer, <32 x i1> zeroinitializer, <32 x i32> zeroinitializer
|
Hi @paulwalker-arm, In the other PR we had discussed this not being high-priority, and I agree it isn't. Having said that, I asked around and a few colleagues spotted these cases in generated assembly and would appreciate it if we could improve them. It's also something that GCC seems to get right (https://godbolt.org/z/ns3neaMs9). Do you think doing this in |
Very nice. Does this mean there's new hope for teaching |
Yep, that was also my thinking. I think it should work now. |
I think making `MulImm` std::optional helps make the intent of the if-else chain clearer. If anyone doesn't agree please let me know and I'll undo this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than a potentially erroneous assert this looks good to me.
if (MaxSVEVectorSizeInBits && | ||
MinSVEVectorSizeInBits == MaxSVEVectorSizeInBits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be just MinSVEVectorSizeInBits == MaxSVEVectorSizeInBits
given that returning zero when both are zero is valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for the suggestion; that looks much better.
Should we let through the case !MinSVEVectorSizeInBits && MaxSVEVectorSizeInBits == 128
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this can happen outside of handwritten IR but otherwise it sounds like a reasonable suggestion to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left it as is for now due to the lack of a motivating example and to keep it consistent with getMinSVEVectorSizeInBits
/getMaxSVEVectorSizeInBits
, which presumably could return 128/2048 as the architecture bounds and avoid this problem altogether. Please let me know if you'd rather I add it.
I'll let the tests run and assuming they are OK merge it.
This PR appears to have broken clang-aarch64-sve-vls buildbot. Here is the failing build https://lab.llvm.org/buildbot/#/builders/143/builds/5952 |
When compiling VLS SVE, the compiler often replaces VL-based offsets with immediate-based ones. This leads to a mismatch in the allowed addressing modes due to SVE loads/stores generally expecting immediate offsets relative to VL. For example, given: ```c svfloat64_t foo(const double *x) { svbool_t pg = svptrue_b64(); return svld1_f64(pg, x+svcntd()); } ``` When compiled with `-msve-vector-bits=128`, we currently generate: ```gas foo: ptrue p0.d mov x8, llvm#2 ld1d { z0.d }, p0/z, [x0, x8, lsl llvm#3] ret ``` Instead, we could be generating: ```gas foo: ldr z0, [x0, llvm#1, mul vl] ret ``` Likewise for other types, stores, and other VLS lengths. This patch achieves the above by extending `SelectAddrModeIndexedSVE` to let constants through when `vscale` is known.
This is based on the discussion in #127500 (comment) and subsequent comments.
When compiling VLS SVE, the compiler often replaces VL-based offsets with immediate-based ones. This leads to a mismatch in the allowed addressing modes due to SVE loads/stores generally expecting immediate offsets relative to VL. For example, given:
When compiled with
-msve-vector-bits=128
, we currently generate:Instead, we could be generating:
Likewise for other types, stores, and other VLS lengths.
This patch attempts to achieve the above by extending
SelectAddrModeIndexedSVE
to let constants through whenvscale
is known.