Skip to content

Commit 3393155

Browse files
mauri870gopherbot
authored andcommitted
internal/bytealg: optimize Count/CountString in amd64
Another optimization by aligning a hot loop. ``` │ sec/op │ sec/op vs base │ Count/10-16 11.29n ± 1% 10.50n ± 1% -7.04% (p=0.000 n=10) Count/32-16 11.06n ± 1% 11.36n ± 2% +2.76% (p=0.000 n=10) Count/4K-16 2.852µ ± 1% 1.953µ ± 1% -31.52% (p=0.000 n=10) Count/4M-16 2.884m ± 1% 1.958m ± 1% -32.11% (p=0.000 n=10) Count/64M-16 46.27m ± 1% 30.86m ± 0% -33.31% (p=0.000 n=10) CountEasy/10-16 9.873n ± 1% 9.669n ± 1% -2.07% (p=0.000 n=10) CountEasy/32-16 11.07n ± 1% 11.23n ± 1% +1.49% (p=0.000 n=10) CountEasy/4K-16 73.47n ± 1% 54.20n ± 0% -26.22% (p=0.000 n=10) CountEasy/4M-16 61.12µ ± 1% 49.42µ ± 0% -19.15% (p=0.000 n=10) CountEasy/64M-16 1.303m ± 3% 1.082m ± 4% -16.97% (p=0.000 n=10) CountSingle/10-16 4.150n ± 1% 3.679n ± 1% -11.36% (p=0.000 n=10) CountSingle/32-16 4.815n ± 1% 4.588n ± 1% -4.71% (p=0.000 n=10) CountSingle/4M-16 72.18µ ± 2% 75.38µ ± 1% +4.44% (p=0.000 n=10) CountHard3-16 462.6µ ± 1% 484.4µ ± 1% +4.73% (p=0.000 n=10) │ old.txt │ new.txt │ │ B/s │ B/s vs base │ Count/10-16 844.1Mi ± 1% 908.3Mi ± 1% +7.60% (p=0.000 n=10) Count/32-16 2.695Gi ± 1% 2.623Gi ± 2% -2.66% (p=0.000 n=10) Count/4K-16 1.337Gi ± 1% 1.953Gi ± 1% +46.06% (p=0.000 n=10) Count/4M-16 1.355Gi ± 1% 1.995Gi ± 1% +47.29% (p=0.000 n=10) Count/64M-16 1.351Gi ± 1% 2.026Gi ± 0% +49.95% (p=0.000 n=10) CountEasy/10-16 965.9Mi ± 1% 986.3Mi ± 1% +2.11% (p=0.000 n=10) CountEasy/32-16 2.693Gi ± 1% 2.653Gi ± 1% -1.48% (p=0.000 n=10) CountEasy/4K-16 51.93Gi ± 1% 70.38Gi ± 0% +35.54% (p=0.000 n=10) CountEasy/4M-16 63.91Gi ± 1% 79.05Gi ± 0% +23.68% (p=0.000 n=10) CountEasy/64M-16 47.97Gi ± 3% 57.77Gi ± 4% +20.44% (p=0.000 n=10) CountSingle/10-16 2.244Gi ± 1% 2.532Gi ± 1% +12.80% (p=0.000 n=10) CountSingle/32-16 6.190Gi ± 1% 6.496Gi ± 1% +4.94% (p=0.000 n=10) CountSingle/4M-16 54.12Gi ± 2% 51.82Gi ± 1% -4.25% (p=0.000 n=10) ``` Change-Id: I847b36125d2b11e2a88d31f48f6c160f041b3624 GitHub-Last-Rev: faacba6 GitHub-Pull-Request: #61793 Reviewed-on: https://go-review.googlesource.com/c/go/+/516455 TryBot-Result: Gopher Robot <[email protected]> Auto-Submit: Keith Randall <[email protected]> Reviewed-by: Michael Knyszek <[email protected]> Reviewed-by: Keith Randall <[email protected]> Reviewed-by: Keith Randall <[email protected]> Run-TryBot: Keith Randall <[email protected]>
1 parent b9153f6 commit 3393155

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

src/internal/bytealg/count_amd64.s

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ sse:
5757
LEAQ -16(SI)(BX*1), AX // AX = address of last 16 bytes
5858
JMP sseloopentry
5959

60+
PCALIGN $16
6061
sseloop:
6162
// Move the next 16-byte chunk of the data into X1.
6263
MOVOU (DI), X1
@@ -163,6 +164,7 @@ avx2:
163164
MOVD AX, X0
164165
LEAQ -32(SI)(BX*1), R11
165166
VPBROADCASTB X0, Y1
167+
PCALIGN $32
166168
avx2_loop:
167169
VMOVDQU (DI), Y2
168170
VPCMPEQB Y1, Y2, Y3

0 commit comments

Comments
 (0)