Skip to content

i32.clamp() suggested by Clippy produces worse code than i32.min().max() #141915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Shnatsel opened this issue Jun 2, 2025 · 7 comments
Open
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Shnatsel
Copy link
Member

Shnatsel commented Jun 2, 2025

On this code in image-webp, following the Clippy lint to replace .max(0).min(255) with .clamp(0,255) on an i32 value causes a performance regression:

https://github.com/image-rs/image-webp/blob/93baf7de7df50977a1fcb3a0bb53036d4780bff3/src/vp8.rs#L994-L999

It's unfortunate that .min().max() and .clamp() are not equivalent, and doubly so when Clippy nags us to rewrite the code in a way that makes it slower.

I've posted a self-contained sample that reproduces the issue on godbolt:

Generated assembly for .min().max(): https://rust.godbolt.org/z/zr7PK8vz3
Generated assembly for .clamp(): https://rust.godbolt.org/z/b898M45vo

You can see that the .clamp() version results in far more assembly; the vectorized loop is roughly twice the amount of instructions.

I've confirmed that the issue exists in rustc 1.75, 1.82 and 1.87 which is the latest as of this writing.

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jun 2, 2025
@Noratrieb Noratrieb added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jun 2, 2025
@folkertdev
Copy link
Contributor

The issue here is the assert!(min <= max), when it is removed both versions generate the same code.

    fn clamp(self, min: Self, max: Self) -> Self
    where
        Self: Sized,
    {
        assert!(min <= max);
        if self < min {
            min
        } else if self > max {
            max
        } else {
            self
        }
    }

Unfortunately, the panic is documented behavior https://doc.rust-lang.org/std/cmp/trait.Ord.html#method.clamp, so I'm not sure what can actually be done here.

@Shnatsel
Copy link
Member Author

Shnatsel commented Jun 2, 2025

Oh, I can confirm that's the case on godbolt. That's weird; I expect the panic to be removed by constant propagation, and it is, but it messes with optimization anyway.

@JonathanBrouwer
Copy link
Contributor

JonathanBrouwer commented Jun 2, 2025

This is a simplified example:
https://rust.godbolt.org/z/avTqGq4Kj
The assert actually does not matter in this simple code (uncommenting it does not change the generated code)
I've also discovered that although the same amount of instructions is generated, the instructions themselves are different, possibly optimizing worse

@samueltardieu
Copy link
Contributor

samueltardieu commented Jun 2, 2025

@JonathanBrouwer Inverting the order in which min and max are compared makes clamp_impl() generate the same code as clip1(), even with the assertion enabled.

@okaneco
Copy link
Contributor

okaneco commented Jun 2, 2025

Related/duplicate of #125738

I reported the lint to clippy back when it was in the beta, I believe.
It was decided to close that issue and open an issue in this repo
rust-lang/rust-clippy#12826

LLVM upstream issue - llvm/llvm-project#104875
The PR there appears to have stalled out

@hkBst
Copy link
Member

hkBst commented Jun 5, 2025

@Shnatsel if the issue is the order in which clamp does the comparisons, does manually exchanging the max and min calls in the manual clamp also result in a performance difference?

@Shnatsel
Copy link
Member Author

Shnatsel commented Jun 5, 2025

There is no performance difference between .min().max() and .max().min(), and the assembly differences are minimal as well: https://rust.godbolt.org/z/1v857sa1K

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants