You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[AArch64] Prefer to fold dup into fmul/fma as opposed to ld1r
There is a fold to create LD1DUPpost from dup(load) that can be postinc. If the
dup is used by a "by element" operation such as fmul or fma then it can be
slightly better to fold the dup into the fmul instead, which produces slightly
fast code.
ld1r { v1.4s }, [x0], #4
fmul v0.4s, v1.4s, v0.4s
vs
ldr s1, [x0], #4
fmul v0.4s, v0.4s, v1.s[0]
This could also be done with integer operations such as smull/umull too, so
long as the load/dup gets correctly combined into the mul operation. Currently
this just operates on foating point types.
Differential Revision: https://reviews.llvm.org/D145184
0 commit comments