-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
faster rand(1:n)
by outlining unlikely branch
#58089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The |
So they actually consistently fail. For example at line 497 of the "test/cmdlineargs.jl" file, there is
The 2nd "DA:" line do not match: it's "4,1" in |
|
@rfourquet do you need help fixing the test? I don't want this branch to be forgotten as it is a quite nice perf win. |
e2ccf9d
to
1df417e
Compare
Thanks for the bump, it looks like (most of) the tests pass now, with the reference file changed. |
thanks! |
It's hard to measure the improvement with single calls, but this change substantially improve the situation in JuliaLang#50509, such that these new versions of `randperm` etc are almost always faster (even for big n). Here are some example benchmarks. Note that biggest ranges like `UInt(0):UInt(2)^64-2` are the ones exercising the most the "unlikely" branch: ```julia julia> const xx = Xoshiro(); using Chairmarks julia> rands(rng, ns) = for i=ns rand(rng, zero(i):i) end julia> rands(ns) = for i=ns rand(zero(i):i) end julia> @b rand(xx, 1:100), rand(xx, UInt(0):UInt(2)^63), rand(xx, UInt(0):UInt(2)^64-3), rand(xx, UInt(0):UInt(2)^64-2), rand(xx, UInt(0):UInt(2)^64-1) (1.968 ns, 8.000 ns, 3.321 ns, 3.321 ns, 2.152 ns) # PR (2.151 ns, 7.284 ns, 2.151 ns, 2.151 ns, 2.151 ns) # master julia> @b rand(1:100), rand(UInt(0):UInt(2)^63), rand(UInt(0):UInt(2)^64-3), rand(UInt(0):UInt(2)^64-2),rand(UInt(0):UInt(2)^64-1) # with TaskLocalRNG (2.148 ns, 7.837 ns, 3.317 ns, 3.085 ns, 1.957 ns) # PR (3.128 ns, 8.275 ns, 3.324 ns, 3.324 ns, 1.955 ns) # master julia> rands(xx, 1:100), rands(xx, UInt(2)^62:UInt(2)^59:UInt(2)^64-1), rands(xx, UInt(2)^64-4:UInt(2)^64-2) (95.315 ns, 132.144 ns, 7.486 ns) # PR (217.169 ns, 143.519 ns, 8.065 ns) # master julia> rands(1:100), rands(UInt(2)^62:UInt(2)^59:UInt(2)^64-1), rands(UInt(2)^64-4:UInt(2)^64-2) (235.882 ns, 162.809 ns, 10.603 ns) # PR (202.524 ns, 132.869 ns, 7.631 ns) # master ``` So it's a bit tricky: with an explicit RNG, `rands(xx, 1:100)` becomes much faster, but without, `rands(1:100)` becomes slower. Assuming JuliaLang#50509 was merged, `shuffle` is a good function to benchmark `rand(1:n)`, and the changes here consistently improve performance, as shown by this graph (when `TaskLocalRNG` is mentioned, it means *no* RNG argument was passed to the function):  So although there can be slowdowns, I think this change is overall a win.
In #58089, this method took a small performance hit in some contexts. It turns out that by outlining unlikely branch which throw on empty ranges, his hit can be recovered. In #50509 (comment), a graph of the performance improvement of the "speed-up randperm by using our current rand(1:n)" was posted, but I realized it was only true when calls to `rand(1:n)` were prefixed by `@inline`; without `@inline` it was overall slower for `TaskLocalRNG()` for very big arrays (but still faster otherwise). An alternative to these `@inline` annotation is to outline `throw` like here, for equivalent benefits as `@inline` in that `randperm` PR. Assuming that PR is merged, this PR improves roughly performance by 2x for `TaskLocalRNG()` (no change for other RNGs).
In #58089, this method took a small performance hit in some contexts. It turns out that by outlining the unlikely branch which throws on empty ranges, this hit can be recovered. In #50509 (comment), a graph of the performance improvement of the "speed-up randperm by using our current rand(1:n)" was posted, but I realized it was only true when calls to `rand(1:n)` were prefixed by `@inline`; without `@inline` it was overall slower for `TaskLocalRNG()` for very big arrays (but still faster otherwise). An alternative to these `@inline` annotation is to outline `throw` like here, for equivalent benefits as `@inline` in that `randperm` PR. Assuming that PR is merged, this PR improves roughly performance by 2x for `TaskLocalRNG()` (no change for other RNGs):  While at it, I outlined a bunch of other unliky throwing branches. After that, #50509 can probably be merged, finally!
) In JuliaLang#58089, this method took a small performance hit in some contexts. It turns out that by outlining the unlikely branch which throws on empty ranges, this hit can be recovered. In JuliaLang#50509 (comment), a graph of the performance improvement of the "speed-up randperm by using our current rand(1:n)" was posted, but I realized it was only true when calls to `rand(1:n)` were prefixed by `@inline`; without `@inline` it was overall slower for `TaskLocalRNG()` for very big arrays (but still faster otherwise). An alternative to these `@inline` annotation is to outline `throw` like here, for equivalent benefits as `@inline` in that `randperm` PR. Assuming that PR is merged, this PR improves roughly performance by 2x for `TaskLocalRNG()` (no change for other RNGs):  While at it, I outlined a bunch of other unliky throwing branches. After that, JuliaLang#50509 can probably be merged, finally!
) In JuliaLang#58089, this method took a small performance hit in some contexts. It turns out that by outlining the unlikely branch which throws on empty ranges, this hit can be recovered. In JuliaLang#50509 (comment), a graph of the performance improvement of the "speed-up randperm by using our current rand(1:n)" was posted, but I realized it was only true when calls to `rand(1:n)` were prefixed by `@inline`; without `@inline` it was overall slower for `TaskLocalRNG()` for very big arrays (but still faster otherwise). An alternative to these `@inline` annotation is to outline `throw` like here, for equivalent benefits as `@inline` in that `randperm` PR. Assuming that PR is merged, this PR improves roughly performance by 2x for `TaskLocalRNG()` (no change for other RNGs):  While at it, I outlined a bunch of other unliky throwing branches. After that, JuliaLang#50509 can probably be merged, finally!
It's hard to measure the improvement with single calls, but this change substantially improve the situation in #50509, such that these new versions of
randperm
etc are almost always faster (even for big n).Here are some example benchmarks. Note that biggest ranges like
UInt(0):UInt(2)^64-2
are the ones exercising the most the "unlikely" branch:So it's a bit tricky: with an explicit RNG,
rands(xx, 1:100)
becomes much faster, but without,rands(1:100)
becomes slower.Assuming #50509 was merged,

shuffle
is a good function to benchmarkrand(1:n)
, and the changes here consistently improve performance, as shown by this graph (whenTaskLocalRNG
is mentioned, it means no RNG argument was passed to the function):So although there can be slowdowns, I think this change is overall a win.