-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[AArch64] Fix STG instruction being moved past memcpy #117191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When merging STG instructions used for AArch64 stack tagging, we were stopping on reaching a load or store instruction, but not calls, so it was possible for an STG to be moved past a call to memcpy. This test case (reduced from fuzzer-generated C code) was the result of StackColoring merging allocas A and B into one stack slot, and StackSafetyAnalysis proving that B does not need tagging, so we end up with tagged and untagged objects in the same stack slot. The tagged object (A) is live first, so it is important that it's memory is restored to the background tag before it gets reused to hold B.
@llvm/pr-subscribers-backend-aarch64 Author: Oliver Stannard (ostannard) ChangesWhen merging STG instructions used for AArch64 stack tagging, we were stopping on reaching a load or store instruction, but not calls, so it was possible for an STG to be moved past a call to memcpy. This test case (reduced from fuzzer-generated C code) was the result of StackColoring merging allocas A and B into one stack slot, and StackSafetyAnalysis proving that B does not need tagging, so we end up with tagged and untagged objects in the same stack slot. The tagged object (A) is live first, so it is important that it's memory is restored to the background tag before it gets reused to hold B. Full diff: https://github.com/llvm/llvm-project/pull/117191.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 216244950ba9ee..320c36b4c54cb1 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -4572,7 +4572,7 @@ MachineBasicBlock::iterator tryMergeAdjacentSTG(MachineBasicBlock::iterator II,
break;
// Reject anything that may alias the collected instructions.
- if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects())
+ if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
break;
}
diff --git a/llvm/test/CodeGen/AArch64/stack-tagging-merge-past-memcpy.mir b/llvm/test/CodeGen/AArch64/stack-tagging-merge-past-memcpy.mir
new file mode 100644
index 00000000000000..45f6bfe80ac2b3
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/stack-tagging-merge-past-memcpy.mir
@@ -0,0 +1,103 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=aarch64 -mattr=+mte -run-pass=prologepilog %s -o - | FileCheck %s
+--- |
+ target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
+ target triple = "aarch64-unknown-none-elf"
+
+ @glob = global [8 x i32] zeroinitializer, align 4
+
+ declare dso_local void @F78(ptr %B)
+
+ define void @F55() sanitize_memtag "target-features"="+mte,+strict-align" {
+ entry:
+ %basetag = call ptr @llvm.aarch64.irg.sp(i64 0)
+ %A = alloca i32, i32 8, align 16
+ %A.tag = call ptr @llvm.aarch64.tagp.p0(ptr %A, ptr %basetag, i64 0)
+ %B = alloca i32, i32 8, align 4
+ %C = alloca i32, i32 8, align 16
+ %C.tag = call ptr @llvm.aarch64.tagp.p0(ptr %C, ptr %basetag, i64 1)
+ call void @llvm.aarch64.settag(ptr %C.tag, i64 32)
+ call void @F56(ptr %C.tag)
+ call void @llvm.lifetime.start.p0(i64 32, ptr %A)
+ call void @llvm.aarch64.settag(ptr %A.tag, i64 32)
+ call void @F56(ptr %A.tag)
+ call void @llvm.aarch64.settag(ptr %A, i64 32)
+ call void @llvm.lifetime.end.p0(i64 32, ptr %A)
+ call void @llvm.lifetime.start.p0(i64 32, ptr %A)
+ call void @llvm.memcpy.p0.p0.i64(ptr align 4 %A, ptr align 4 @glob, i64 32, i1 false)
+ call void @F78(ptr %A)
+ call void @llvm.lifetime.end.p0(i64 32, ptr %A)
+ call void @llvm.aarch64.settag(ptr %C, i64 32)
+ ret void
+ }
+
+ declare void @F56(ptr)
+...
+---
+name: F55
+frameInfo:
+ adjustsStack: true
+stack:
+ - { id: 0, name: A, type: default, offset: 0, size: 32, alignment: 16,
+ stack-id: default, callee-saved-register: '', callee-saved-restored: true,
+ local-offset: -32, debug-info-variable: '', debug-info-expression: '',
+ debug-info-location: '' }
+ - { id: 2, name: C, type: default, offset: 0, size: 32, alignment: 16,
+ stack-id: default, callee-saved-register: '', callee-saved-restored: true,
+ local-offset: -64, debug-info-variable: '', debug-info-expression: '',
+ debug-info-location: '' }
+body: |
+ bb.0.entry:
+ ; CHECK-LABEL: name: F55
+ ; CHECK: liveins: $x19, $lr
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: frame-setup EMITMTETAGGED
+ ; CHECK-NEXT: $sp = frame-setup SUBXri $sp, 80, 0
+ ; CHECK-NEXT: frame-setup STPXi killed $lr, killed $x19, $sp, 8 :: (store (s64) into %stack.3), (store (s64) into %stack.2)
+ ; CHECK-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 80
+ ; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $w19, -8
+ ; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $w30, -16
+ ; CHECK-NEXT: renamable $x0 = IRGstack $sp, $xzr
+ ; CHECK-NEXT: renamable $x19 = TAGPstack $x0, 2, renamable $x0, 1
+ ; CHECK-NEXT: ST2Gi renamable $x0, renamable $x0, 0 :: (store (s256) into %ir.C.tag, align 16)
+ ; CHECK-NEXT: BL @F56, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
+ ; CHECK-NEXT: ST2Gi renamable $x19, renamable $x19, 0 :: (store (s256) into %ir.A.tag, align 16)
+ ; CHECK-NEXT: $x0 = COPY killed renamable $x19
+ ; CHECK-NEXT: BL @F56, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
+ ; CHECK-NEXT: ST2Gi $sp, $sp, 2 :: (store (s256) into %ir.A, align 16)
+ ; CHECK-NEXT: renamable $x1 = LOADgot target-flags(aarch64-got) @glob
+ ; CHECK-NEXT: $x0 = ADDXri $sp, 32, 0
+ ; CHECK-NEXT: dead $w2 = MOVi32imm 32, implicit-def $x2
+ ; CHECK-NEXT: BL &memcpy, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit $x2, implicit-def $sp, implicit-def dead $x0
+ ; CHECK-NEXT: $x0 = ADDXri $sp, 32, 0
+ ; CHECK-NEXT: BL @F78, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
+ ; CHECK-NEXT: ST2Gi $sp, $sp, 0 :: (store (s256) into %ir.C, align 16)
+ ; CHECK-NEXT: $sp = frame-destroy ADDXri $sp, 64, 0
+ ; CHECK-NEXT: early-clobber $sp, $lr, $x19 = frame-destroy LDPXpost $sp, 2 :: (load (s64) from %stack.3), (load (s64) from %stack.2)
+ ; CHECK-NEXT: RET_ReallyLR
+ renamable $x0 = IRGstack $sp, $xzr
+ renamable $x19 = TAGPstack %stack.0.A, 0, renamable $x0, 1
+ ST2Gi renamable $x0, renamable $x0, 0 :: (store (s256) into %ir.C.tag, align 16)
+ ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
+ BL @F56, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
+ ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
+ ST2Gi renamable $x19, renamable $x19, 0 :: (store (s256) into %ir.A.tag, align 16)
+ ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
+ $x0 = COPY killed renamable $x19
+ BL @F56, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
+ ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
+ ST2Gi $sp, %stack.0.A, 0 :: (store (s256) into %ir.A, align 16)
+ ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
+ renamable $x1 = LOADgot target-flags(aarch64-got) @glob
+ $x0 = ADDXri %stack.0.A, 0, 0
+ dead $w2 = MOVi32imm 32, implicit-def $x2
+ BL &memcpy, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit $x2, implicit-def $sp, implicit-def dead $x0
+ ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
+ ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
+ $x0 = ADDXri %stack.0.A, 0, 0
+ BL @F78, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
+ ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
+ ST2Gi $sp, %stack.2.C, 0 :: (store (s256) into %ir.C, align 16)
+ RET_ReallyLR
+
+...
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a isLoadFoldBarrier that this might be able to use too.
When merging STG instructions used for AArch64 stack tagging, we were stopping on reaching a load or store instruction, but not calls, so it was possible for an STG to be moved past a call to memcpy.
This test case (reduced from fuzzer-generated C code) was the result of StackColoring merging allocas A and B into one stack slot, and StackSafetyAnalysis proving that B does not need tagging, so we end up with tagged and untagged objects in the same stack slot. The tagged object (A) is live first, so it is important that it's memory is restored to the background tag before it gets reused to hold B.