Skip to content

Commit 06a4c85

Browse files
committed
Use v16i8 rather than v2i64 as the VT for memset expansion on AArch64.
This allows the instruction selector to realize that it can directly broadcast the low byte of the memset value, rather than replicating it to a 64-bit GPR before broadcasting. This fixes PR50985. Differential Revision: https://reviews.llvm.org/D108354
1 parent 94e1442 commit 06a4c85

File tree

2 files changed

+20
-2
lines changed

2 files changed

+20
-2
lines changed

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12091,8 +12091,8 @@ EVT AArch64TargetLowering::getOptimalMemOpType(
1209112091
};
1209212092

1209312093
if (CanUseNEON && Op.isMemset() && !IsSmallMemset &&
12094-
AlignmentIsAcceptable(MVT::v2i64, Align(16)))
12095-
return MVT::v2i64;
12094+
AlignmentIsAcceptable(MVT::v16i8, Align(16)))
12095+
return MVT::v16i8;
1209612096
if (CanUseFP && !IsSmallMemset && AlignmentIsAcceptable(MVT::f128, Align(16)))
1209712097
return MVT::f128;
1209812098
if (Op.size() >= 8 && AlignmentIsAcceptable(MVT::i64, Align(8)))

llvm/test/CodeGen/AArch64/memset.ll

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
; RUN: llc < %s | FileCheck %s
2+
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
3+
target triple = "aarch64-unknown-linux-gnu"
4+
5+
; CHECK: memset_call:
6+
; CHECK-NOT: and
7+
; CHECK: dup
8+
; CHECK-NEXT: stp
9+
; CHECK-NEXT: stp
10+
; CHECK-NEXT: ret
11+
define void @memset_call(i8* %0, i32 %1) {
12+
%3 = trunc i32 %1 to i8
13+
call void @llvm.memset.p0i8.i64(i8* %0, i8 %3, i64 64, i1 false)
14+
ret void
15+
}
16+
17+
declare void @llvm.memset.p0i8.i64(i8*, i8, i64, i1 immarg)
18+

0 commit comments

Comments
 (0)