-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Perf regression with LLVM 16 for slice::sort
#111559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Dinging into it, I think the relevant code that has this impact is in the LLVM 15: LLVM 16: This can be fixed in code by using 'proper' branchless code: let to_copy = if is_less(&*right, &**left) {
get_and_increment(&mut right)
} else {
get_and_increment(left)
};
ptr::copy_nonoverlapping(to_copy, get_and_increment(out), 1); -> let is_l = is_less(&*right, &**left);
let to_copy = if is_l { right } else { *left };
ptr::copy_nonoverlapping(to_copy, get_and_increment(out), 1);
right = right.add(is_l as usize);
*left = left.add(!is_l as usize); This would fix the stdlib, but it would still mean a regression for any other code out there that relies on this. |
I feel like I should attach the usual disclaimer that this kind of optimization can never be considered fully stable, specifically LLVM has various heuristics that affect whether it compiles a segment of code to use a "branchless" CMOV-alike or use a jump, however that doesn't mean there's nothing we can do here. @rustbot label: +A-LLVM +A-codegen +regression-from-stable-to-beta +I-slow |
WG-prioritization assigning priority (Zulip discussion). @rustbot label -I-prioritize +P-low +T-compiler |
@apiraino out of curiosity. What is the future for a perf regression marked as P-low? Is there any work planned on this? Personally I was surprised that it even generated branchless code in the first place. |
A P-high or P-critical compiler issue means that it will be reviewed regularly by T-compiler until it is resolved, where "regularly" is a much higher frequency for P-critical. So a P-low implicitly means "this isn't so important that we need to keep putting it on the agenda". Though I should note that periodically, all T-compiler issues do get reviewed and triaged, just on a more irregular basis. Performance work is one of those "never done" things so it's rare for it to be high priority in general, which isn't the same as "no one will work on it". It just means it would be addressed in a somewhat irregular fashion. |
I'm not sure I understand the difference between "never done" and "no one will work on it". |
@Voultapher "Never done" as in "never finished" not "never worked on" |
Ohh, thanks for clarifying. Anyway from my perspective me and other people I've talked to were surprised this even generated branchless code in the first place, I'd be fine with closing this issue as won't fix. |
since you @Voultapher have opened a PR (thanks a lot for that!), it makes totally sense to try fixing it, no matter the priority we assign to issues. |
I think the issue can be split into two parts: A) A significant regression in I think A should definitely be fixed, and that's what my PR is for. B however I'd argue doesn't really need fixing, because the code in question was not obviously branchless. |
Until LLVM 16 the code in the `slice::sort` `merge` function generated branchless code. With the upgrade to LLVM 16 in rustc 1.70 this property was broken. See rust-lang/rust#111559 for more info. It's annoying for comparison reasons to have the same code but with significantly different code-gen with newer versions. The updated merge function generates pretty much the same code as the previous one did with older rustc versions.
Code
I tried this code, sorting 30 million random
u64
:On this machine:
Using the vendored version of
slice::sort
from mid 2022 see here https://github.com/Voultapher/sort-research-rs/blob/d3bab202e18a2010e26674677e90ae3048721232/src/stable/rust_std.rsI expected to see this happen: Runtime ~1.45s
Instead, this happened: Runtime ~2s
Version it worked on
Version with regression
I bisected the issue down to 0c61c7a with
cargo bisect-rustc --start=2022-12-07 --end=2023-05-13 --script=./test.sh
.The same happens if you do
v.sort()
instead ofsort_comp::stable::rust_std::sort(&mut v)
. So this also affects the current implementation which is very similar to my vendored version.The text was updated successfully, but these errors were encountered: