[AArch64][SVE] Improve fixed-length addressing modes. #129732

rj-jesus · 2025-03-04T16:19:16Z

This is based on the discussion in #127500 (comment) and subsequent comments.

When compiling VLS SVE, the compiler often replaces VL-based offsets with immediate-based ones. This leads to a mismatch in the allowed addressing modes due to SVE loads/stores generally expecting immediate offsets relative to VL. For example, given:

svfloat64_t foo(const double *x) {
  svbool_t pg = svptrue_b64();
  return svld1_f64(pg, x+svcntd());
}

When compiled with -msve-vector-bits=128, we currently generate:

foo:
        ptrue   p0.d
        mov     x8, #2
        ld1d    { z0.d }, p0/z, [x0, x8, lsl #3]
        ret

Instead, we could be generating:

foo:
        ldr     z0, [x0, #1, mul vl]
        ret

Likewise for other types, stores, and other VLS lengths.

This patch attempts to achieve the above by extending SelectAddrModeIndexedSVE to let constants through when vscale is known.

When compiling VLS SVE, the compiler often replaces VL-based offsets with immediate-based ones. This leads to a mismatch during isel since SVE loads/stores generally expect immediate offsets relative to VL. For example, given: ```c svfloat64_t foo(const double *x) { svbool_t pg = svptrue_b64(); return svld1_f64(pg, x+svcntd()); } ``` When compiled with `-msve-vector-bits=128`, we currently generate: ```gas foo: ptrue p0.d mov x8, llvm#2 ld1d { z0.d }, p0/z, [x0, x8, lsl llvm#3] ret ``` In practice, we could instead be generating: ```gas foo: ldr z0, [x0, llvm#1, mul vl] ret ``` Likewise for other types, stores, and other VLS lengths.

llvmbot · 2025-03-04T16:19:51Z

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-clang

Author: Ricardo Jesus (rj-jesus)

Changes

This is based on the discussion in #127500 (comment) and subsequent comments.

When compiling VLS SVE, the compiler often replaces VL-based offsets with immediate-based ones. This leads to a mismatch in the allowed addressing modes due to SVE loads/stores generally expecting immediate offsets relative to VL. For example, given:

svfloat64_t foo(const double *x) {
  svbool_t pg = svptrue_b64();
  return svld1_f64(pg, x+svcntd());
}

When compiled with -msve-vector-bits=128, we currently generate:

foo:
        ptrue   p0.d
        mov     x8, #<!-- -->2
        ld1d    { z0.d }, p0/z, [x0, x8, lsl #<!-- -->3]
        ret

Instead, we could be generating:

foo:
        ldr     z0, [x0, #<!-- -->1, mul vl]
        ret

Likewise for other types, stores, and other VLS lengths.

This patch attempts to achieve the above by extending SelectAddrModeIndexedSVE to let constants through when vscale is known.

Full diff: https://github.com/llvm/llvm-project/pull/129732.diff

4 Files Affected:

(modified) clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c (+3-6)
(modified) llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (+17-3)
(added) llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll (+187)
(modified) llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll (+45-45)

diff --git a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
index 0ed14b4b3b793..1391a1b09fbd1 100644
--- a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
+++ b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
@@ -13,12 +13,9 @@
 
 void func(int *restrict a, int *restrict b) {
 // CHECK-LABEL: func
-// CHECK256-COUNT-1: str
-// CHECK256-COUNT-7: st1w
-// CHECK512-COUNT-1: str
-// CHECK512-COUNT-3: st1w
-// CHECK1024-COUNT-1: str
-// CHECK1024-COUNT-1: st1w
+// CHECK256-COUNT-8: str
+// CHECK512-COUNT-4: str
+// CHECK1024-COUNT-2: str
 // CHECK2048-COUNT-1: st1w
 #pragma clang loop vectorize(enable)
   for (int i = 0; i < 64; ++i)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
index 3ca9107cb2ce5..2459b17e68c36 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
@@ -7379,13 +7379,27 @@ bool AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N,
   if (N.getOpcode() != ISD::ADD)
     return false;
 
-  SDValue VScale = N.getOperand(1);
-  if (VScale.getOpcode() != ISD::VSCALE)
+  int64_t MulImm = std::numeric_limits<int64_t>::max();
+  if (SDValue VScale = N.getOperand(1); VScale.getOpcode() == ISD::VSCALE)
+    MulImm = cast<ConstantSDNode>(VScale.getOperand(0))->getSExtValue();
+  else if (auto C = dyn_cast<ConstantSDNode>(N.getOperand(1))) {
+    int64_t ByteOffset = C->getSExtValue();
+    constexpr auto SVEBitsPerBlock = AArch64::SVEBitsPerBlock;
+    auto MinVScale = Subtarget->getMinSVEVectorSizeInBits() / SVEBitsPerBlock;
+    auto MaxVScale = Subtarget->getMaxSVEVectorSizeInBits() / SVEBitsPerBlock;
+
+    if (!MaxVScale || MinVScale != MaxVScale || ByteOffset % MaxVScale != 0)
+      return false;
+
+    MulImm = ByteOffset / MaxVScale;
+  } else
     return false;
 
+  assert(MulImm != std::numeric_limits<int64_t>::max() &&
+         "Uninitialized MulImm.");
+
   TypeSize TS = MemVT.getSizeInBits();
   int64_t MemWidthBytes = static_cast<int64_t>(TS.getKnownMinValue()) / 8;
-  int64_t MulImm = cast<ConstantSDNode>(VScale.getOperand(0))->getSExtValue();
 
   if ((MulImm % MemWidthBytes) != 0)
     return false;
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
new file mode 100644
index 0000000000000..6b25ce3abdc8c
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
@@ -0,0 +1,187 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | FileCheck %s --check-prefix=CHECK-128
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | FileCheck %s --check-prefix=CHECK-256
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | FileCheck %s --check-prefix=CHECK-512
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | FileCheck %s --check-prefix=CHECK-1024
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | FileCheck %s --check-prefix=CHECK-2048
+
+define void @nxv16i8(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv16i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.b
+; CHECK-NEXT:    mov w8, #256 // =0x100
+; CHECK-NEXT:    ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-NEXT:    st1b { z0.b }, p0, [x1, x8]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv16i8:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv16i8:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv16i8:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv16i8:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv16i8:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256
+  %x = load <vscale x 16 x i8>, ptr %ldoff, align 1
+  store <vscale x 16 x i8> %x, ptr %stoff, align 1
+  ret void
+}
+
+define void @nxv8i16(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv8i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.h
+; CHECK-NEXT:    mov x8, #128 // =0x80
+; CHECK-NEXT:    ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-NEXT:    st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv8i16:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv8i16:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv8i16:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv8i16:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv8i16:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i16, ptr %ldptr, i64 128
+  %stoff = getelementptr inbounds nuw i16, ptr %stptr, i64 128
+  %x = load <vscale x 8 x i16>, ptr %ldoff, align 2
+  store <vscale x 8 x i16> %x, ptr %stoff, align 2
+  ret void
+}
+
+define void @nxv4i32(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv4i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    mov x8, #64 // =0x40
+; CHECK-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv4i32:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv4i32:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv4i32:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv4i32:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv4i32:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i32, ptr %ldptr, i64 64
+  %stoff = getelementptr inbounds nuw i32, ptr %stptr, i64 64
+  %x = load <vscale x 4 x i32>, ptr %ldoff, align 4
+  store <vscale x 4 x i32> %x, ptr %stoff, align 4
+  ret void
+}
+
+define void @nxv2i64(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv2i64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    mov x8, #32 // =0x20
+; CHECK-NEXT:    ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
+; CHECK-NEXT:    st1d { z0.d }, p0, [x1, x8, lsl #3]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv2i64:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv2i64:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv2i64:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv2i64:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv2i64:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i64, ptr %ldptr, i64 32
+  %stoff = getelementptr inbounds nuw i64, ptr %stptr, i64 32
+  %x = load <vscale x 2 x i64>, ptr %ldoff, align 8
+  store <vscale x 2 x i64> %x, ptr %stoff, align 8
+  ret void
+}
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll
index e33bc8da97c05..2d4cdfa7278b9 100644
--- a/llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-shuffles.ll
@@ -30,64 +30,64 @@ define void @crash_when_lowering_extract_shuffle(ptr %dst, i1 %cond) vscale_rang
 ; CHECK-NEXT:  // %bb.1: // %vector.body
 ; CHECK-NEXT:    mov z0.b, #0 // =0x0
 ; CHECK-NEXT:    ptrue p0.s
-; CHECK-NEXT:    mov x9, #8 // =0x8
-; CHECK-NEXT:    mov x10, #24 // =0x18
+; CHECK-NEXT:    mov x9, #24 // =0x18
 ; CHECK-NEXT:    umov w8, v0.b[8]
-; CHECK-NEXT:    mov v1.16b, v0.16b
-; CHECK-NEXT:    mov v1.b[1], v0.b[1]
-; CHECK-NEXT:    fmov s2, w8
-; CHECK-NEXT:    mov x8, #16 // =0x10
-; CHECK-NEXT:    mov v2.b[1], v0.b[9]
-; CHECK-NEXT:    mov v1.b[2], v0.b[2]
-; CHECK-NEXT:    mov v2.b[2], v0.b[10]
-; CHECK-NEXT:    mov v1.b[3], v0.b[3]
-; CHECK-NEXT:    mov v2.b[3], v0.b[11]
-; CHECK-NEXT:    mov v1.b[4], v0.b[4]
-; CHECK-NEXT:    mov v2.b[4], v0.b[12]
-; CHECK-NEXT:    mov v1.b[5], v0.b[5]
-; CHECK-NEXT:    mov v2.b[5], v0.b[13]
-; CHECK-NEXT:    mov v1.b[6], v0.b[6]
-; CHECK-NEXT:    mov v2.b[6], v0.b[14]
-; CHECK-NEXT:    mov v1.b[7], v0.b[7]
-; CHECK-NEXT:    mov v2.b[7], v0.b[15]
-; CHECK-NEXT:    ext z0.b, z0.b, z0.b, #16
-; CHECK-NEXT:    uunpklo z1.h, z1.b
-; CHECK-NEXT:    ext v3.16b, v0.16b, v0.16b, #8
-; CHECK-NEXT:    uunpklo z0.h, z0.b
+; CHECK-NEXT:    mov v2.16b, v0.16b
+; CHECK-NEXT:    mov z3.d, z0.d
+; CHECK-NEXT:    mov v2.b[1], v0.b[1]
+; CHECK-NEXT:    ext z3.b, z3.b, z0.b, #16
+; CHECK-NEXT:    fmov s1, w8
+; CHECK-NEXT:    mov x8, #8 // =0x8
+; CHECK-NEXT:    ext v4.16b, v3.16b, v3.16b, #8
+; CHECK-NEXT:    mov v1.b[1], v0.b[9]
+; CHECK-NEXT:    mov v2.b[2], v0.b[2]
+; CHECK-NEXT:    mov v1.b[2], v0.b[10]
+; CHECK-NEXT:    mov v2.b[3], v0.b[3]
+; CHECK-NEXT:    mov v1.b[3], v0.b[11]
+; CHECK-NEXT:    mov v2.b[4], v0.b[4]
+; CHECK-NEXT:    mov v1.b[4], v0.b[12]
+; CHECK-NEXT:    mov v2.b[5], v0.b[5]
+; CHECK-NEXT:    mov v1.b[5], v0.b[13]
+; CHECK-NEXT:    mov v2.b[6], v0.b[6]
+; CHECK-NEXT:    mov v1.b[6], v0.b[14]
+; CHECK-NEXT:    mov v2.b[7], v0.b[7]
+; CHECK-NEXT:    mov v1.b[7], v0.b[15]
 ; CHECK-NEXT:    uunpklo z2.h, z2.b
-; CHECK-NEXT:    uunpklo z1.s, z1.h
-; CHECK-NEXT:    uunpklo z3.h, z3.b
-; CHECK-NEXT:    uunpklo z0.s, z0.h
+; CHECK-NEXT:    uunpklo z0.h, z1.b
+; CHECK-NEXT:    uunpklo z1.h, z3.b
+; CHECK-NEXT:    uunpklo z3.h, z4.b
 ; CHECK-NEXT:    uunpklo z2.s, z2.h
-; CHECK-NEXT:    lsl z1.s, z1.s, #31
+; CHECK-NEXT:    uunpklo z0.s, z0.h
+; CHECK-NEXT:    uunpklo z1.s, z1.h
 ; CHECK-NEXT:    uunpklo z3.s, z3.h
-; CHECK-NEXT:    lsl z0.s, z0.s, #31
-; CHECK-NEXT:    asr z1.s, z1.s, #31
 ; CHECK-NEXT:    lsl z2.s, z2.s, #31
-; CHECK-NEXT:    asr z0.s, z0.s, #31
-; CHECK-NEXT:    and z1.s, z1.s, #0x1
+; CHECK-NEXT:    lsl z0.s, z0.s, #31
+; CHECK-NEXT:    lsl z1.s, z1.s, #31
 ; CHECK-NEXT:    lsl z3.s, z3.s, #31
 ; CHECK-NEXT:    asr z2.s, z2.s, #31
-; CHECK-NEXT:    and z0.s, z0.s, #0x1
-; CHECK-NEXT:    cmpne p4.s, p0/z, z1.s, #0
-; CHECK-NEXT:    ld1w { z1.s }, p0/z, [x0]
+; CHECK-NEXT:    asr z0.s, z0.s, #31
+; CHECK-NEXT:    asr z1.s, z1.s, #31
 ; CHECK-NEXT:    asr z3.s, z3.s, #31
 ; CHECK-NEXT:    and z2.s, z2.s, #0x1
-; CHECK-NEXT:    cmpne p1.s, p0/z, z0.s, #0
-; CHECK-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-NEXT:    and z0.s, z0.s, #0x1
+; CHECK-NEXT:    and z1.s, z1.s, #0x1
 ; CHECK-NEXT:    and z3.s, z3.s, #0x1
-; CHECK-NEXT:    cmpne p2.s, p0/z, z2.s, #0
-; CHECK-NEXT:    ld1w { z2.s }, p0/z, [x0, x9, lsl #2]
-; CHECK-NEXT:    mov z1.s, p4/m, #0 // =0x0
+; CHECK-NEXT:    cmpne p4.s, p0/z, z2.s, #0
+; CHECK-NEXT:    ld1w { z2.s }, p0/z, [x0]
+; CHECK-NEXT:    cmpne p1.s, p0/z, z0.s, #0
+; CHECK-NEXT:    cmpne p2.s, p0/z, z1.s, #0
 ; CHECK-NEXT:    cmpne p3.s, p0/z, z3.s, #0
-; CHECK-NEXT:    ld1w { z3.s }, p0/z, [x0, x10, lsl #2]
+; CHECK-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-NEXT:    ld1w { z1.s }, p0/z, [x0, #1, mul vl]
+; CHECK-NEXT:    ld1w { z3.s }, p0/z, [x0, x9, lsl #2]
+; CHECK-NEXT:    mov z2.s, p4/m, #0 // =0x0
 ; CHECK-NEXT:    mov z0.s, p1/m, #0 // =0x0
-; CHECK-NEXT:    mov z2.s, p2/m, #0 // =0x0
-; CHECK-NEXT:    st1w { z1.s }, p0, [x0]
-; CHECK-NEXT:    st1w { z0.s }, p0, [x0, x8, lsl #2]
+; CHECK-NEXT:    mov z1.s, p2/m, #0 // =0x0
 ; CHECK-NEXT:    mov z3.s, p3/m, #0 // =0x0
-; CHECK-NEXT:    st1w { z2.s }, p0, [x0, x9, lsl #2]
-; CHECK-NEXT:    st1w { z3.s }, p0, [x0, x10, lsl #2]
+; CHECK-NEXT:    st1w { z2.s }, p0, [x0]
+; CHECK-NEXT:    st1w { z0.s }, p0, [x0, x8, lsl #2]
+; CHECK-NEXT:    st1w { z1.s }, p0, [x0, #1, mul vl]
+; CHECK-NEXT:    st1w { z3.s }, p0, [x0, x9, lsl #2]
 ; CHECK-NEXT:  .LBB1_2: // %exit
 ; CHECK-NEXT:    ret
   %broadcast.splat = shufflevector <32 x i1> zeroinitializer, <32 x i1> zeroinitializer, <32 x i32> zeroinitializer

rj-jesus · 2025-03-04T16:29:52Z

Hi @paulwalker-arm, In the other PR we had discussed this not being high-priority, and I agree it isn't. Having said that, I asked around and a few colleagues spotted these cases in generated assembly and would appreciate it if we could improve them. It's also something that GCC seems to get right (https://godbolt.org/z/ns3neaMs9). Do you think doing this in SelectAddrModeIndexedSVE is a sensible approach?

paulwalker-arm · 2025-03-04T16:30:47Z

Very nice. Does this mean there's new hope for teaching AArch64LoadStoreOpt to convert contiguous SVE fill/spill instructions to NEON ldp/stp?

rj-jesus · 2025-03-04T16:44:40Z

Very nice. Does this mean there's new hope for teaching AArch64LoadStoreOpt to convert contiguous SVE fill/spill instructions to NEON ldp/stp?

Yep, that was also my thinking. I think it should work now.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

I think making `MulImm` std::optional helps make the intent of the if-else chain clearer. If anyone doesn't agree please let me know and I'll undo this.

david-arm

LGTM!

llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

This reverts commit 07d5584.

paulwalker-arm

Other than a potentially erroneous assert this looks good to me.

paulwalker-arm · 2025-03-05T12:26:26Z

llvm/lib/Target/AArch64/AArch64Subtarget.h

+    if (MaxSVEVectorSizeInBits &&
+        MinSVEVectorSizeInBits == MaxSVEVectorSizeInBits)


This can be just MinSVEVectorSizeInBits == MaxSVEVectorSizeInBits given that returning zero when both are zero is valid?

Thanks very much for the suggestion; that looks much better.
Should we let through the case !MinSVEVectorSizeInBits && MaxSVEVectorSizeInBits == 128 too?

Not sure if this can happen outside of handwritten IR but otherwise it sounds like a reasonable suggestion to me.

I've left it as is for now due to the lack of a motivating example and to keep it consistent with getMinSVEVectorSizeInBits/getMaxSVEVectorSizeInBits, which presumably could return 128/2048 as the architecture bounds and avoid this problem altogether. Please let me know if you'd rather I add it.

I'll let the tests run and assuming they are OK merge it.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

omjavaid · 2025-03-06T22:15:26Z

This PR appears to have broken clang-aarch64-sve-vls buildbot. Here is the failing build https://lab.llvm.org/buildbot/#/builders/143/builds/5952

This reverts commit f01e760.

…)" (#130625) The original patch from #129732 exposed a bug in `getMemVTFromNode`, which was returning incorrect types for fixed length vectors.

When compiling VLS SVE, the compiler often replaces VL-based offsets with immediate-based ones. This leads to a mismatch in the allowed addressing modes due to SVE loads/stores generally expecting immediate offsets relative to VL. For example, given: ```c svfloat64_t foo(const double *x) { svbool_t pg = svptrue_b64(); return svld1_f64(pg, x+svcntd()); } ``` When compiled with `-msve-vector-bits=128`, we currently generate: ```gas foo: ptrue p0.d mov x8, llvm#2 ld1d { z0.d }, p0/z, [x0, x8, lsl llvm#3] ret ``` Instead, we could be generating: ```gas foo: ldr z0, [x0, llvm#1, mul vl] ret ``` Likewise for other types, stores, and other VLS lengths. This patch achieves the above by extending `SelectAddrModeIndexedSVE` to let constants through when `vscale` is known.

rj-jesus added 2 commits March 4, 2025 03:22

Precommit tests

624d1e9

rj-jesus requested review from davemgreen, paulwalker-arm, sjoerdmeijer and david-arm March 4, 2025 16:19

llvmbot added clang Clang issues not falling into any other category backend:AArch64 labels Mar 4, 2025

david-arm reviewed Mar 4, 2025

View reviewed changes

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp Outdated Show resolved Hide resolved

llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll Show resolved Hide resolved

Add unpacked/overpacked tests and move up VScale

7ea47e5

madhur13490 reviewed Mar 5, 2025

View reviewed changes

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp Outdated Show resolved Hide resolved

rj-jesus added 2 commits March 5, 2025 00:53

Add braces around if

25cee06

Change MulImm to std::optional<int64_t> (NFC)

07d5584

I think making `MulImm` std::optional helps make the intent of the if-else chain clearer. If anyone doesn't agree please let me know and I'll undo this.

david-arm approved these changes Mar 5, 2025

View reviewed changes

llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll Show resolved Hide resolved

paulwalker-arm reviewed Mar 5, 2025

View reviewed changes

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp Outdated Show resolved Hide resolved

rj-jesus added 2 commits March 5, 2025 02:59

Revert "Change MulImm to std::optional<int64_t> (NFC)"

2becfc6

This reverts commit 07d5584.

Add getSVEVectorSizeInBits to query known SVE length

9329a30

paulwalker-arm approved these changes Mar 5, 2025

View reviewed changes

Remove assert and redundant check

c3ee9c7

rj-jesus merged commit f01e760 into llvm:main Mar 6, 2025
11 checks passed

rj-jesus deleted the rjj/aarch64-sve-vls-imm-addressing-modes branch March 6, 2025 09:47

rj-jesus added a commit that referenced this pull request Mar 7, 2025

Revert "[AArch64][SVE] Improve fixed-length addressing modes. (#129732)"

5a71fab

This reverts commit f01e760.

This was referenced Mar 7, 2025

Revert "[AArch64][SVE] Improve fixed-length addressing modes." #130263

Merged

Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" #130625

Merged

rj-jesus mentioned this pull request Mar 25, 2025

[AArch64][SVE] Lower unpredicated loads/stores as fixed LDR/STR with -msve-vector-bits=128. #127500

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64][SVE] Improve fixed-length addressing modes. #129732

[AArch64][SVE] Improve fixed-length addressing modes. #129732

rj-jesus commented Mar 4, 2025

llvmbot commented Mar 4, 2025 •

edited

Loading

rj-jesus commented Mar 4, 2025

paulwalker-arm commented Mar 4, 2025

rj-jesus commented Mar 4, 2025

david-arm left a comment

paulwalker-arm left a comment

paulwalker-arm Mar 5, 2025

rj-jesus Mar 5, 2025

paulwalker-arm Mar 5, 2025

rj-jesus Mar 5, 2025 •

edited

Loading

omjavaid commented Mar 6, 2025

		if (MaxSVEVectorSizeInBits &&
		MinSVEVectorSizeInBits == MaxSVEVectorSizeInBits)

[AArch64][SVE] Improve fixed-length addressing modes. #129732

[AArch64][SVE] Improve fixed-length addressing modes. #129732

Conversation

rj-jesus commented Mar 4, 2025

llvmbot commented Mar 4, 2025 • edited Loading

rj-jesus commented Mar 4, 2025

paulwalker-arm commented Mar 4, 2025

rj-jesus commented Mar 4, 2025

david-arm left a comment

Choose a reason for hiding this comment

paulwalker-arm left a comment

Choose a reason for hiding this comment

paulwalker-arm Mar 5, 2025

Choose a reason for hiding this comment

rj-jesus Mar 5, 2025

Choose a reason for hiding this comment

paulwalker-arm Mar 5, 2025

Choose a reason for hiding this comment

rj-jesus Mar 5, 2025 • edited Loading

Choose a reason for hiding this comment

omjavaid commented Mar 6, 2025

llvmbot commented Mar 4, 2025 •

edited

Loading

rj-jesus Mar 5, 2025 •

edited

Loading