Skip to content

[AArch64] Fix STG instruction being moved past memcpy #117191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 3, 2024

Conversation

ostannard
Copy link
Collaborator

When merging STG instructions used for AArch64 stack tagging, we were stopping on reaching a load or store instruction, but not calls, so it was possible for an STG to be moved past a call to memcpy.

This test case (reduced from fuzzer-generated C code) was the result of StackColoring merging allocas A and B into one stack slot, and StackSafetyAnalysis proving that B does not need tagging, so we end up with tagged and untagged objects in the same stack slot. The tagged object (A) is live first, so it is important that it's memory is restored to the background tag before it gets reused to hold B.

When merging STG instructions used for AArch64 stack tagging, we were
stopping on reaching a load or store instruction, but not calls, so it
was possible for an STG to be moved past a call to memcpy.

This test case (reduced from fuzzer-generated C code) was the result of
StackColoring merging allocas A and B into one stack slot, and
StackSafetyAnalysis proving that B does not need tagging, so we end up
with tagged and untagged objects in the same stack slot. The tagged
object (A) is live first, so it is important that it's memory is
restored to the background tag before it gets reused to hold B.
@llvmbot
Copy link
Member

llvmbot commented Nov 21, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Oliver Stannard (ostannard)

Changes

When merging STG instructions used for AArch64 stack tagging, we were stopping on reaching a load or store instruction, but not calls, so it was possible for an STG to be moved past a call to memcpy.

This test case (reduced from fuzzer-generated C code) was the result of StackColoring merging allocas A and B into one stack slot, and StackSafetyAnalysis proving that B does not need tagging, so we end up with tagged and untagged objects in the same stack slot. The tagged object (A) is live first, so it is important that it's memory is restored to the background tag before it gets reused to hold B.


Full diff: https://github.com/llvm/llvm-project/pull/117191.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64FrameLowering.cpp (+1-1)
  • (added) llvm/test/CodeGen/AArch64/stack-tagging-merge-past-memcpy.mir (+103)
diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 216244950ba9ee..320c36b4c54cb1 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -4572,7 +4572,7 @@ MachineBasicBlock::iterator tryMergeAdjacentSTG(MachineBasicBlock::iterator II,
       break;
 
     // Reject anything that may alias the collected instructions.
-    if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects())
+    if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
       break;
   }
 
diff --git a/llvm/test/CodeGen/AArch64/stack-tagging-merge-past-memcpy.mir b/llvm/test/CodeGen/AArch64/stack-tagging-merge-past-memcpy.mir
new file mode 100644
index 00000000000000..45f6bfe80ac2b3
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/stack-tagging-merge-past-memcpy.mir
@@ -0,0 +1,103 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=aarch64 -mattr=+mte -run-pass=prologepilog %s -o - | FileCheck %s
+--- |
+  target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
+  target triple = "aarch64-unknown-none-elf"
+
+  @glob = global [8 x i32] zeroinitializer, align 4
+
+  declare dso_local void @F78(ptr %B)
+
+  define void @F55() sanitize_memtag "target-features"="+mte,+strict-align" {
+  entry:
+    %basetag = call ptr @llvm.aarch64.irg.sp(i64 0)
+    %A = alloca i32, i32 8, align 16
+    %A.tag = call ptr @llvm.aarch64.tagp.p0(ptr %A, ptr %basetag, i64 0)
+    %B = alloca i32, i32 8, align 4
+    %C = alloca i32, i32 8, align 16
+    %C.tag = call ptr @llvm.aarch64.tagp.p0(ptr %C, ptr %basetag, i64 1)
+    call void @llvm.aarch64.settag(ptr %C.tag, i64 32)
+    call void @F56(ptr %C.tag)
+    call void @llvm.lifetime.start.p0(i64 32, ptr %A)
+    call void @llvm.aarch64.settag(ptr %A.tag, i64 32)
+    call void @F56(ptr %A.tag)
+    call void @llvm.aarch64.settag(ptr %A, i64 32)
+    call void @llvm.lifetime.end.p0(i64 32, ptr %A)
+    call void @llvm.lifetime.start.p0(i64 32, ptr %A)
+    call void @llvm.memcpy.p0.p0.i64(ptr align 4 %A, ptr align 4 @glob, i64 32, i1 false)
+    call void @F78(ptr %A)
+    call void @llvm.lifetime.end.p0(i64 32, ptr %A)
+    call void @llvm.aarch64.settag(ptr %C, i64 32)
+    ret void
+  }
+
+  declare void @F56(ptr)
+...
+---
+name:            F55
+frameInfo:
+  adjustsStack:    true
+stack:
+  - { id: 0, name: A, type: default, offset: 0, size: 32, alignment: 16,
+      stack-id: default, callee-saved-register: '', callee-saved-restored: true,
+      local-offset: -32, debug-info-variable: '', debug-info-expression: '',
+      debug-info-location: '' }
+  - { id: 2, name: C, type: default, offset: 0, size: 32, alignment: 16,
+      stack-id: default, callee-saved-register: '', callee-saved-restored: true,
+      local-offset: -64, debug-info-variable: '', debug-info-expression: '',
+      debug-info-location: '' }
+body:             |
+  bb.0.entry:
+    ; CHECK-LABEL: name: F55
+    ; CHECK: liveins: $x19, $lr
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: frame-setup EMITMTETAGGED
+    ; CHECK-NEXT: $sp = frame-setup SUBXri $sp, 80, 0
+    ; CHECK-NEXT: frame-setup STPXi killed $lr, killed $x19, $sp, 8 :: (store (s64) into %stack.3), (store (s64) into %stack.2)
+    ; CHECK-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 80
+    ; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $w19, -8
+    ; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $w30, -16
+    ; CHECK-NEXT: renamable $x0 = IRGstack $sp, $xzr
+    ; CHECK-NEXT: renamable $x19 = TAGPstack $x0, 2, renamable $x0, 1
+    ; CHECK-NEXT: ST2Gi renamable $x0, renamable $x0, 0 :: (store (s256) into %ir.C.tag, align 16)
+    ; CHECK-NEXT: BL @F56, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
+    ; CHECK-NEXT: ST2Gi renamable $x19, renamable $x19, 0 :: (store (s256) into %ir.A.tag, align 16)
+    ; CHECK-NEXT: $x0 = COPY killed renamable $x19
+    ; CHECK-NEXT: BL @F56, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
+    ; CHECK-NEXT: ST2Gi $sp, $sp, 2 :: (store (s256) into %ir.A, align 16)
+    ; CHECK-NEXT: renamable $x1 = LOADgot target-flags(aarch64-got) @glob
+    ; CHECK-NEXT: $x0 = ADDXri $sp, 32, 0
+    ; CHECK-NEXT: dead $w2 = MOVi32imm 32, implicit-def $x2
+    ; CHECK-NEXT: BL &memcpy, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit $x2, implicit-def $sp, implicit-def dead $x0
+    ; CHECK-NEXT: $x0 = ADDXri $sp, 32, 0
+    ; CHECK-NEXT: BL @F78, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
+    ; CHECK-NEXT: ST2Gi $sp, $sp, 0 :: (store (s256) into %ir.C, align 16)
+    ; CHECK-NEXT: $sp = frame-destroy ADDXri $sp, 64, 0
+    ; CHECK-NEXT: early-clobber $sp, $lr, $x19 = frame-destroy LDPXpost $sp, 2 :: (load (s64) from %stack.3), (load (s64) from %stack.2)
+    ; CHECK-NEXT: RET_ReallyLR
+    renamable $x0 = IRGstack $sp, $xzr
+    renamable $x19 = TAGPstack %stack.0.A, 0, renamable $x0, 1
+    ST2Gi renamable $x0, renamable $x0, 0 :: (store (s256) into %ir.C.tag, align 16)
+    ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
+    BL @F56, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
+    ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
+    ST2Gi renamable $x19, renamable $x19, 0 :: (store (s256) into %ir.A.tag, align 16)
+    ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
+    $x0 = COPY killed renamable $x19
+    BL @F56, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
+    ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
+    ST2Gi $sp, %stack.0.A, 0 :: (store (s256) into %ir.A, align 16)
+    ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
+    renamable $x1 = LOADgot target-flags(aarch64-got) @glob
+    $x0 = ADDXri %stack.0.A, 0, 0
+    dead $w2 = MOVi32imm 32, implicit-def $x2
+    BL &memcpy, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit $x2, implicit-def $sp, implicit-def dead $x0
+    ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
+    ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
+    $x0 = ADDXri %stack.0.A, 0, 0
+    BL @F78, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
+    ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
+    ST2Gi $sp, %stack.2.C, 0 :: (store (s256) into %ir.C, align 16)
+    RET_ReallyLR
+
+...

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a isLoadFoldBarrier that this might be able to use too.

@ostannard ostannard merged commit 7d72525 into llvm:main Dec 3, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants