Skip to content

[AArch64][SME] Allow spills of ZT0 around SME ABI routines again #136726

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 25, 2025

Conversation

MacDue
Copy link
Member

@MacDue MacDue commented Apr 22, 2025

In #132722 spills of ZT0 were disabled around all SME ABI routines to avoid a case where ZT0 is spilled before ZA is enabled (resulting in a crash).

It turns out that the ABI does not promise that routines will preserve ZT0 (however in practice they do), so generally disabling ZT0 spills for ABI routines is not correct.

The case where a crash was possible was "aarch64_new_zt0" functions with ZA disabled on entry and a ZT0 spill around __arm_tpidr2_save. In this case, ZT0 will be undefined at the call to __arm_tpidr2_save, so this patch avoids the ZT0 spill by marking the callsite with "aarch64_zt0_undef". This attribute only applies to callsites and marks that at the point the call is made ZT0 is not defined, so does not need preserving.

In llvm#132722 spills of ZT0 were disabled around all SME ABI routines to
avoid a case where ZT0 is spilled before ZA is enabled (resulting in a
crash).

It turns out that the ABI does not promise that routines will preserve
ZT0 (however in practice they do), so generally disabling ZT0 spills for
ABI routines is not correct.

The case where a crash was possible was "aarch64_new_zt0" functions with
ZA disabled on entry and a ZT0 spill around __arm_tpidr2_save. In this
case, ZT0 will be undefined at the call to __arm_tpidr2_save, so we can
mark the call as preserving ZT0 (whether it does or not) to avoid the
ZT0 spills.
@llvmbot
Copy link
Member

llvmbot commented Apr 22, 2025

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-backend-aarch64

Author: Benjamin Maxwell (MacDue)

Changes

In #132722 spills of ZT0 were disabled around all SME ABI routines to avoid a case where ZT0 is spilled before ZA is enabled (resulting in a crash).

It turns out that the ABI does not promise that routines will preserve ZT0 (however in practice they do), so generally disabling ZT0 spills for ABI routines is not correct.

The case where a crash was possible was "aarch64_new_zt0" functions with ZA disabled on entry and a ZT0 spill around __arm_tpidr2_save. In this case, ZT0 will be undefined at the call to __arm_tpidr2_save, so we can mark the call as preserving ZT0 (whether it does or not) to avoid the ZT0 spills.


Full diff: https://github.com/llvm/llvm-project/pull/136726.diff

4 Files Affected:

  • (modified) llvm/lib/Target/AArch64/SMEABIPass.cpp (+12-4)
  • (modified) llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h (+1-2)
  • (added) llvm/test/CodeGen/AArch64/sme-new-zt0-function.ll (+14)
  • (modified) llvm/test/CodeGen/AArch64/sme-zt0-state.ll (+37-4)
diff --git a/llvm/lib/Target/AArch64/SMEABIPass.cpp b/llvm/lib/Target/AArch64/SMEABIPass.cpp
index bb885d86392fe..440bbb2a941ab 100644
--- a/llvm/lib/Target/AArch64/SMEABIPass.cpp
+++ b/llvm/lib/Target/AArch64/SMEABIPass.cpp
@@ -54,14 +54,22 @@ FunctionPass *llvm::createSMEABIPass() { return new SMEABI(); }
 //===----------------------------------------------------------------------===//
 
 // Utility function to emit a call to __arm_tpidr2_save and clear TPIDR2_EL0.
-void emitTPIDR2Save(Module *M, IRBuilder<> &Builder) {
+void emitTPIDR2Save(Module *M, IRBuilder<> &Builder, bool ZT0IsUndef = false) {
+  auto &Ctx = M->getContext();
   auto *TPIDR2SaveTy =
       FunctionType::get(Builder.getVoidTy(), {}, /*IsVarArgs=*/false);
-  auto Attrs = AttributeList().addFnAttribute(M->getContext(),
-                                              "aarch64_pstate_sm_compatible");
+  auto Attrs =
+      AttributeList().addFnAttribute(Ctx, "aarch64_pstate_sm_compatible");
   FunctionCallee Callee =
       M->getOrInsertFunction("__arm_tpidr2_save", TPIDR2SaveTy, Attrs);
   CallInst *Call = Builder.CreateCall(Callee);
+
+  // If ZT0 is undefined (i.e. we're at the entry of a "new_zt0" function), mark
+  // __arm_tpidr2_save as preserving ZT0. This prevents an unnecessary spill of
+  // ZT0 that can occur before ZA is enabled.
+  if (ZT0IsUndef)
+    Call->addFnAttr(Attribute::get(Ctx, "aarch64_preserves_zt0"));
+
   Call->setCallingConv(
       CallingConv::AArch64_SME_ABI_Support_Routines_PreserveMost_From_X0);
 
@@ -119,7 +127,7 @@ bool SMEABI::updateNewStateFunctions(Module *M, Function *F,
 
     // Create a call __arm_tpidr2_save, which commits the lazy save.
     Builder.SetInsertPoint(&SaveBB->back());
-    emitTPIDR2Save(M, Builder);
+    emitTPIDR2Save(M, Builder, /*ZT0IsUndef=*/FnAttrs.isNewZT0());
 
     // Enable pstate.za at the start of the function.
     Builder.SetInsertPoint(&OrigBB->front());
diff --git a/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h b/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h
index a3ebf764a6e0c..fb093da70c46b 100644
--- a/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h
+++ b/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h
@@ -133,8 +133,7 @@ class SMEAttrs {
   bool hasZT0State() const { return isNewZT0() || sharesZT0(); }
   bool requiresPreservingZT0(const SMEAttrs &Callee) const {
     return hasZT0State() && !Callee.sharesZT0() &&
-           !Callee.hasAgnosticZAInterface() &&
-           !(Callee.Bitmask & SME_ABI_Routine);
+           !Callee.hasAgnosticZAInterface();
   }
   bool requiresDisablingZABeforeCall(const SMEAttrs &Callee) const {
     return hasZT0State() && !hasZAState() && Callee.hasPrivateZAInterface() &&
diff --git a/llvm/test/CodeGen/AArch64/sme-new-zt0-function.ll b/llvm/test/CodeGen/AArch64/sme-new-zt0-function.ll
new file mode 100644
index 0000000000000..715122d0fa4b4
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sme-new-zt0-function.ll
@@ -0,0 +1,14 @@
+; RUN: opt -S -mtriple=aarch64-linux-gnu -aarch64-sme-abi %s | FileCheck %s
+
+declare void @callee();
+
+define void @private_za() "aarch64_new_zt0" {
+  call void @callee()
+  ret void
+}
+
+; CHECK: call aarch64_sme_preservemost_from_x0 void @__arm_tpidr2_save() #[[TPIDR2_SAVE_CALL_ATTR:[0-9]+]]
+; CHECK: declare void @__arm_tpidr2_save() #[[TPIDR2_SAVE_DECL_ATTR:[0-9]+]]
+
+; CHECK: attributes #[[TPIDR2_SAVE_DECL_ATTR]] = { "aarch64_pstate_sm_compatible" }
+; CHECK: attributes #[[TPIDR2_SAVE_CALL_ATTR]] = { "aarch64_preserves_zt0" }
diff --git a/llvm/test/CodeGen/AArch64/sme-zt0-state.ll b/llvm/test/CodeGen/AArch64/sme-zt0-state.ll
index 500fff4eb20db..7361e850d713e 100644
--- a/llvm/test/CodeGen/AArch64/sme-zt0-state.ll
+++ b/llvm/test/CodeGen/AArch64/sme-zt0-state.ll
@@ -167,6 +167,39 @@ define void @zt0_new_caller_zt0_new_callee() "aarch64_new_zt0" nounwind {
   ret void;
 }
 
+; Expect commit of lazy-save if ZA is dormant
+; Expect smstart ZA & clear ZT0
+; No spill & fill of ZT0 around __arm_tpidr2_save
+; Expect spill & fill of ZT0 around __arm_sme_state call
+; Before return, expect smstop ZA
+define i64 @zt0_new_caller_abi_routine_callee() "aarch64_new_zt0" nounwind {
+; CHECK-LABEL: zt0_new_caller_abi_routine_callee:
+; CHECK:       // %bb.0: // %prelude
+; CHECK-NEXT:    sub sp, sp, #80
+; CHECK-NEXT:    stp x30, x19, [sp, #64] // 16-byte Folded Spill
+; CHECK-NEXT:    mrs x8, TPIDR2_EL0
+; CHECK-NEXT:    cbz x8, .LBB7_2
+; CHECK-NEXT:  // %bb.1: // %save.za
+; CHECK-NEXT:    bl __arm_tpidr2_save
+; CHECK-NEXT:    msr TPIDR2_EL0, xzr
+; CHECK-NEXT:  .LBB7_2:
+; CHECK-NEXT:    smstart za
+; CHECK-NEXT:    zero { zt0 }
+; CHECK-NEXT:    mov x19, sp
+; CHECK-NEXT:    str zt0, [x19]
+; CHECK-NEXT:    bl __arm_sme_state
+; CHECK-NEXT:    ldr zt0, [x19]
+; CHECK-NEXT:    smstop za
+; CHECK-NEXT:    ldp x30, x19, [sp, #64] // 16-byte Folded Reload
+; CHECK-NEXT:    add sp, sp, #80
+; CHECK-NEXT:    ret
+  %res = call {i64, i64} @__arm_sme_state()
+  %res.0 = extractvalue {i64, i64} %res, 0
+  ret i64 %res.0
+}
+
+declare {i64, i64} @__arm_sme_state()
+
 ;
 ; New-ZA Caller
 ;
@@ -179,11 +212,11 @@ define void @zt0_new_caller() "aarch64_new_zt0" nounwind {
 ; CHECK:       // %bb.0: // %prelude
 ; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
 ; CHECK-NEXT:    mrs x8, TPIDR2_EL0
-; CHECK-NEXT:    cbz x8, .LBB7_2
+; CHECK-NEXT:    cbz x8, .LBB8_2
 ; CHECK-NEXT:  // %bb.1: // %save.za
 ; CHECK-NEXT:    bl __arm_tpidr2_save
 ; CHECK-NEXT:    msr TPIDR2_EL0, xzr
-; CHECK-NEXT:  .LBB7_2:
+; CHECK-NEXT:  .LBB8_2:
 ; CHECK-NEXT:    smstart za
 ; CHECK-NEXT:    zero { zt0 }
 ; CHECK-NEXT:    bl callee
@@ -202,11 +235,11 @@ define void @new_za_zt0_caller() "aarch64_new_za" "aarch64_new_zt0" nounwind {
 ; CHECK:       // %bb.0: // %prelude
 ; CHECK-NEXT:    str x30, [sp, #-16]! // 8-byte Folded Spill
 ; CHECK-NEXT:    mrs x8, TPIDR2_EL0
-; CHECK-NEXT:    cbz x8, .LBB8_2
+; CHECK-NEXT:    cbz x8, .LBB9_2
 ; CHECK-NEXT:  // %bb.1: // %save.za
 ; CHECK-NEXT:    bl __arm_tpidr2_save
 ; CHECK-NEXT:    msr TPIDR2_EL0, xzr
-; CHECK-NEXT:  .LBB8_2:
+; CHECK-NEXT:  .LBB9_2:
 ; CHECK-NEXT:    smstart za
 ; CHECK-NEXT:    zero {za}
 ; CHECK-NEXT:    zero { zt0 }

@MacDue MacDue changed the title [AArch64][SME] Allow spills of ZT0 arounds SME ABI routines again [AArch64][SME] Allow spills of ZT0 around SME ABI routines again Apr 22, 2025
// __arm_tpidr2_save as preserving ZT0. This prevents an unnecessary spill of
// ZT0 that can occur before ZA is enabled.
if (ZT0IsUndef)
Call->addFnAttr(Attribute::get(Ctx, "aarch64_preserves_zt0"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ACLE attribute __arm_preserves("zt0") maps to LLVM attribute aarch64_preserves_zt0.
__arm_preserves("zt0") means that the function has a "Shared-ZA" interface, which the SME ABI routines do not. I'm worried that we'd be abusing this attribute for a purpose that means something different, so I suggest introducing a new attribute for this instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I've added a new attribute aarch64_zt0_undef, which does not result in a "Shared-ZA" interface. I've also added a few extra tests and limited this attribute to only apply to callsites (as I'm not sure it'd make sense if applied to an entire function).

Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions h,cpp -- llvm/lib/IR/Verifier.cpp llvm/lib/Target/AArch64/SMEABIPass.cpp llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.cpp llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h llvm/unittests/Target/AArch64/SMEAttributesTest.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h b/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h
index 1691d4fec..3cfb27984 100644
--- a/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h
+++ b/llvm/lib/Target/AArch64/Utils/AArch64SMEAttributes.h
@@ -43,7 +43,7 @@ public:
     SM_Body = 1 << 2,         // aarch64_pstate_sm_body
     SME_ABI_Routine = 1 << 3, // Used for SME ABI routines to avoid lazy saves
     ZA_State_Agnostic = 1 << 4,
-    ZT0_Undef = 1 << 5,       // Use to mark ZT0 as undef to avoid spills
+    ZT0_Undef = 1 << 5, // Use to mark ZT0 as undef to avoid spills
     ZA_Shift = 6,
     ZA_Mask = 0b111 << ZA_Shift,
     ZT0_Shift = 9,

@MacDue
Copy link
Member Author

MacDue commented Apr 24, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️
You can test this locally with the following command:

View the diff from clang-format here.

This is deliberate to match the formatting for other enum members.

This allows this to be independent from new/preserves/shares ZT0.
return hasZT0State() && !Callee.sharesZT0() &&
!Callee.hasAgnosticZAInterface() &&
!(Callee.Bitmask & SME_ABI_Routine);
return hasZT0State() && !Callee.isUndefZT0() && !Callee.sharesZT0() &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed this offline, but just putting it out here in case someone else is curious about this; && !Callee.isUndefZT0() isn't really correct, because isUndefZT0() should be a question to ask the caller and is specific to that call-site. Hence why #137239 is in progress to fix that.

@MacDue MacDue merged commit 8c7a2ce into llvm:main Apr 25, 2025
10 of 11 checks passed
@MacDue MacDue deleted the zt0_fix branch April 25, 2025 12:33
MacDue added a commit to MacDue/llvm-project that referenced this pull request Apr 28, 2025
…m#136726)

In llvm#132722 spills of ZT0 were disabled around all SME ABI routines to
avoid a case where ZT0 is spilled before ZA is enabled (resulting in a
crash).

It turns out that the ABI does not promise that routines will preserve
ZT0 (however in practice they do), so generally disabling ZT0 spills for
ABI routines is not correct.

The case where a crash was possible was "aarch64_new_zt0" functions with
ZA disabled on entry and a ZT0 spill around __arm_tpidr2_save. In this
case, ZT0 will be undefined at the call to __arm_tpidr2_save, so this
patch avoids the ZT0 spill by marking the callsite with
"aarch64_zt0_undef". This attribute only applies to callsites and marks
that at the point the call is made ZT0 is not defined, so does not need
preserving.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…m#136726)

In llvm#132722 spills of ZT0 were disabled around all SME ABI routines to
avoid a case where ZT0 is spilled before ZA is enabled (resulting in a
crash).

It turns out that the ABI does not promise that routines will preserve
ZT0 (however in practice they do), so generally disabling ZT0 spills for
ABI routines is not correct.

The case where a crash was possible was "aarch64_new_zt0" functions with
ZA disabled on entry and a ZT0 spill around __arm_tpidr2_save. In this
case, ZT0 will be undefined at the call to __arm_tpidr2_save, so this
patch avoids the ZT0 spill by marking the callsite with
"aarch64_zt0_undef". This attribute only applies to callsites and marks
that at the point the call is made ZT0 is not defined, so does not need
preserving.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…m#136726)

In llvm#132722 spills of ZT0 were disabled around all SME ABI routines to
avoid a case where ZT0 is spilled before ZA is enabled (resulting in a
crash).

It turns out that the ABI does not promise that routines will preserve
ZT0 (however in practice they do), so generally disabling ZT0 spills for
ABI routines is not correct.

The case where a crash was possible was "aarch64_new_zt0" functions with
ZA disabled on entry and a ZT0 spill around __arm_tpidr2_save. In this
case, ZT0 will be undefined at the call to __arm_tpidr2_save, so this
patch avoids the ZT0 spill by marking the callsite with
"aarch64_zt0_undef". This attribute only applies to callsites and marks
that at the point the call is made ZT0 is not defined, so does not need
preserving.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…m#136726)

In llvm#132722 spills of ZT0 were disabled around all SME ABI routines to
avoid a case where ZT0 is spilled before ZA is enabled (resulting in a
crash).

It turns out that the ABI does not promise that routines will preserve
ZT0 (however in practice they do), so generally disabling ZT0 spills for
ABI routines is not correct.

The case where a crash was possible was "aarch64_new_zt0" functions with
ZA disabled on entry and a ZT0 spill around __arm_tpidr2_save. In this
case, ZT0 will be undefined at the call to __arm_tpidr2_save, so this
patch avoids the ZT0 spill by marking the callsite with
"aarch64_zt0_undef". This attribute only applies to callsites and marks
that at the point the call is made ZT0 is not defined, so does not need
preserving.
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request May 9, 2025
…m#136726)

In llvm#132722 spills of ZT0 were disabled around all SME ABI routines to
avoid a case where ZT0 is spilled before ZA is enabled (resulting in a
crash).

It turns out that the ABI does not promise that routines will preserve
ZT0 (however in practice they do), so generally disabling ZT0 spills for
ABI routines is not correct.

The case where a crash was possible was "aarch64_new_zt0" functions with
ZA disabled on entry and a ZT0 spill around __arm_tpidr2_save. In this
case, ZT0 will be undefined at the call to __arm_tpidr2_save, so this
patch avoids the ZT0 spill by marking the callsite with
"aarch64_zt0_undef". This attribute only applies to callsites and marks
that at the point the call is made ZT0 is not defined, so does not need
preserving.
swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request May 9, 2025
…m#136726)

In llvm#132722 spills of ZT0 were disabled around all SME ABI routines to
avoid a case where ZT0 is spilled before ZA is enabled (resulting in a
crash).

It turns out that the ABI does not promise that routines will preserve
ZT0 (however in practice they do), so generally disabling ZT0 spills for
ABI routines is not correct.

The case where a crash was possible was "aarch64_new_zt0" functions with
ZA disabled on entry and a ZT0 spill around __arm_tpidr2_save. In this
case, ZT0 will be undefined at the call to __arm_tpidr2_save, so this
patch avoids the ZT0 spill by marking the callsite with
"aarch64_zt0_undef". This attribute only applies to callsites and marks
that at the point the call is made ZT0 is not defined, so does not need
preserving.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants