-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Following manual_clamp
suggestion results in slower code
#12826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I will look into this from the upstream compiler. This really shouldn't happen. |
Thanks. https://rust.godbolt.org/z/3rnY8d94v |
I'm currently working on a patch on upstream compiler, it should fix this behaviour (that, more of a bug with Clippy, it's more of a possible optimization with the standard library). Note that a difference this big only happens with |
Okis, the PR has been merged, as that lands we should see an improvement (I'll test in a few on nightly). I think that this PR can now be closed, as the new inline What do you think? |
Make `clamp` inline Context: rust-lang/rust-clippy#12826 This results in slightly more optimized assembly. (And most important, it's now less than lines than just manually clamping a value)
Make `clamp` inline Context: rust-lang/rust-clippy#12826 This results in slightly more optimized assembly. (And most important, it's now less than lines than just manually clamping a value)
Rollup merge of rust-lang#125455 - blyxyas:opt-clamp, r=joboet Make `clamp` inline Context: rust-lang/rust-clippy#12826 This results in slightly more optimized assembly. (And most important, it's now less than lines than just manually clamping a value)
That sounds good, thanks for doing that. I reported the "bug" here because when I inlined the clamp definition, it was still producing the selects. I assumed that with the way It's definitely more of a performance optimization upstream and probably most related to this specific saturating truncation case. Hopefully clamp and manual clamp can produce equivalent results soon for this. I agree, it makes more sense to file an issue upstream so it can be tracked and closed there, or closed by regression tests being added if there isn't an improvement. |
Summary
I noticed when clamping and casting from
i32
tou8
, usingclamp(0, 255) as u8
produces unnecessary instructions compared to.max(0).min(255) as u8
. If a loop is auto-vectorized, the branches inclamp
result in slower code than manual clamping.I couldn't find a label for this, but it would be akin to
I-suggestion-causes-perf-regression
.Currently, the lint is set to
warn
but following the suggestion inhibits optimization. I don't believe it should fire on the "branchless" patterns which are semantically different.Lint Name
manual_clamp
Lint Description
I also had a small issue with the wording in the current description.
I slightly disagree with the reasoning here.
I understand the user doesn't have to add any control flow, but the control flow within the clamp implementation is different enough to affect performance in some cases. It is not strictly a "better" clamping method than manually clamping, especially for primitive integers.
Reproducer
Assembly output - https://rust.godbolt.org/z/rdoh97d3v (1.78, but same output on nightly)
The main difference is in the label
.LBB0_4
where extra work is being done by the clamp code.Version
Additional Labels
No response
The text was updated successfully, but these errors were encountered: