You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Optimize `is_ascii` for `str` and `[u8]` further
Replace the existing optimized function with one that enables auto-vectorization.
This is especially beneficial on x86-64 as `pmovmskb` can be emitted with careful structuring of the code. The instruction can detect non-ASCII characters one vector register width at a time instead of the current `usize` at a time check.
The resulting implementation is completely safe.
`case00_libcore` is the current implementation, `case04_while_loop` is this PR.
```
benchmarks:
ascii::is_ascii_slice::long::case00_libcore 22.25/iter +/- 1.09
ascii::is_ascii_slice::long::case04_while_loop 6.78/iter +/- 0.92
ascii::is_ascii_slice::medium::case00_libcore 2.81/iter +/- 0.39
ascii::is_ascii_slice::medium::case04_while_loop 1.56/iter +/- 0.78
ascii::is_ascii_slice::short::case00_libcore 5.55/iter +/- 0.85
ascii::is_ascii_slice::short::case04_while_loop 3.75/iter +/- 0.22
ascii::is_ascii_slice::unaligned_both_long::case00_libcore 26.59/iter +/- 0.66
ascii::is_ascii_slice::unaligned_both_long::case04_while_loop 5.78/iter +/- 0.16
ascii::is_ascii_slice::unaligned_both_medium::case00_libcore 2.97/iter +/- 0.32
ascii::is_ascii_slice::unaligned_both_medium::case04_while_loop 2.41/iter +/- 0.10
ascii::is_ascii_slice::unaligned_head_long::case00_libcore 23.71/iter +/- 0.79
ascii::is_ascii_slice::unaligned_head_long::case04_while_loop 7.83/iter +/- 1.31
ascii::is_ascii_slice::unaligned_head_medium::case00_libcore 3.69/iter +/- 0.54
ascii::is_ascii_slice::unaligned_head_medium::case04_while_loop 7.05/iter +/- 0.32
ascii::is_ascii_slice::unaligned_tail_long::case00_libcore 24.44/iter +/- 1.41
ascii::is_ascii_slice::unaligned_tail_long::case04_while_loop 5.12/iter +/- 0.18
ascii::is_ascii_slice::unaligned_tail_medium::case00_libcore 3.24/iter +/- 0.40
ascii::is_ascii_slice::unaligned_tail_medium::case04_while_loop 2.86/iter +/- 0.14
```
`unaligned_head_medium` is the main regression in the benchmarks. It is a 32 byte string being sliced `bytes[1..]`.
The first commit can be used to run the benchmarks against the current core implementation.
Previous implementation was done in rust-lang#74066
---
Two potential drawbacks of this implementation are that it increases instruction count and may regress other platforms/architectures. The benches here may also be too artificial to glean much insight from.
https://rust.godbolt.org/z/G9znGfY36
0 commit comments