-
Notifications
You must be signed in to change notification settings - Fork 13.3k
rustc_codegen_ssa: Use llvm.invariant
intrinsics on arguments that are immutable references; off by default.
#103070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some changes occurred in compiler/rustc_codegen_gcc cc @antoyo |
r? @TaKO8Ki (rust-highfive has picked a reviewer for you, use r? to override) |
d65e579
to
8997e72
Compare
Another benefit of the conservatism here is that it avoids this PR depending too much on the semantics of unsafe code. Memory pointed to by immutable function parameter references to freezable memory is clearly immutable for the lifetime of the function regardless of what semantics we decide on for unsafe code; if they weren't it would very probably be UB already given that we mark such parameters as |
immutable references; off by default. Optimization failures around reloads and memcpy optimizations are frequently traceable to LLVM's failure to prove that memory can't be mutated by a call or store. This problem is especially acute in Rust, where large values tend to be memcpy'd more often than in C++. Thankfully, Rust has stronger guarantees on mutability available than C++ does, via the strong immutability of `&` references. This should allow LLVM to prove that memory can't be modified by stores and calls in more cases. We're already using LLVM's `readonly` parameter attribute on such calls. However, the semantics of `readonly` are akin to `const` in C++, in that they only promise to LLVM that the function won't mutate the parameter *through that pointer*, not that the pointed-to memory is immutable for the entire duration of the function. These weak semantics limit the applicability of `readonly` to LLVM's alias analysis. Instead of `readonly`, the correct way to express strong immutability guarantees on memory is through the `llvm.invariant.start` and `llvm.invariant.end` intrinsics. These enable a frontend like `rustc` to describe immutability of memory regions in an expressive, flow-sensitive manner. Unfortunately, LLVM doesn't use the `llvm.invariant.start` and `llvm.invariant.end` intrinsics for much at the moment. It's only used in one optimization in loop-invariant code motion at this time. Follow-up work will need to be done in LLVM to integrate these intrinsics into alias analysis. Possibly there will need to be some sort of "MemoryInvarianceAnalysis" that uses graph reachability algorithms to analyze the extent of the guarantees provided by these intrinsics to the control flow graph. Regardless, this front-end work needs to happen as a prerequisite for any LLVM work, so that the improvements to LLVM can be measured and tested. So this commit makes `rustc` use `llvm.invariant` in a minimal way: on immutable references to "freeze" types (i.e. not transitively containing UnsafeCell) passed directly as parameters to functions. This is off by default, gated behind the non-default `-Z emit-invariant-markers=yes` flag. Obviously, a lot more can be done to use `llvm.invariant` more liberally in the future, but this can be added over time, especially once more LLVM optimization passes use that infrastructure. This is simply the bare minimum for now. Once LLVM uses those intrinsics for more optimizations, the effects of more `llvm.invariant` use can be measured more precisely.
8997e72
to
5327e78
Compare
I'd suggest doing a perf run with this enabled by default. TBH, I'm not sure this is a viable way forward. I expect that this is going to regress optimization potential in practice, because invariant.start is essentially unused for optimization purposes (the only thing clang uses it for is to convert globals into constants after evaluating global ctors). At the same time, invariant.start will introduce a ref effect and invariant.end a modref effect, and from a quick look it doesn't seem like code is able to look past these effectively. I think it's unlikely that use of invariant.start and invariant.end for optimization in LLVM will increase much in the future, because their control-flow dependence would make this expensive in terms of compile-time. The more widely used way to represent invariance is invariant.group metadata in conjunction with launder.invariant.group intrinsics. The way to use these in this context would be to use launder.invariant.group to create a new pointer for the argument, and then annotate all uses of it using However, I believe that for Rust's use case, the way to get the most mileage out of this is to add an |
Actually, now that I think about it again: Doesn't the combination of |
No, it doesn't. See: https://godbolt.org/z/Wxn3WWWfa LLVM doesn't know that
I'd like to be able to use this on references that aren't directly function parameters. e.g.
should be optimized to avoid the redundant |
I'm not saying it currently handles this (it doesn't), but that it would be easy to make it handle it, without introducing new IR concepts or changes on the rustc side. |
OK, I've added that check to |
I can do this, but do we need to launder the pointer first? The LangRef suggests that |
The necessary change isn't quite that simple, and this will lead to miscompiles in some cases. Maybe it's good enough to get some performance data though.
The inliner does not insert launder.invariant.group intrinsics, these need to be added explicitly. |
OK, here's a rough proposed change: pcwalton/llvm-project@87a571a I changed Does this approach seem reasonable? If so I'll continue cleaning it up, adding tests, etc. Having two bool parameters seems a bit ugly, admittedly, but I wanted to follow the existing code. |
The new logic looks correct to me. I don't think we'd want to land this with the OrInvariant parameter, but it's a reasonable starting point for review. |
Wait, if we launder pointers every time we enter a function, doesn't that mean that redundant loads can't be eliminated across inlined functions? e.g.
If codegen applies a launder intrinsic to David Goldblatt suggested an |
I've given this some thought, and I think the right way to model this is by making the return value a ModRefInfo rather than bool, which is the upper bound on observable memory effects for this location. This is NoModRef for globally invariant memory and Ref for locally invariant memory. Users can then just
Without commenting on the rest, I don't think llvm.invariant.start can be replaced, because the thing its primarily used for is to declare that the object cannot be modified past a certain point (i.e. invariant.start without invariant.end). In particular this is used to turn globals into constants after global ctor evaluation, in which case having a "constantified" SSA value doesn't really help us -- we need the invariant.start on the global independently of how it will later be used. |
I think you are better suited to review this PR. r? @nikic |
I will close this PR after the relevant LLVM patches are upstream, unless you'd like me to close it now. |
I think it's safe to close this, I doubt we would want to pursue the particular approach implemented here. |
…adonly` on indirect immutable freeze by-value function parameters. Right now, `rustc` only examines function signatures and the platform ABI when determining the LLVM attributes to apply to parameters. This results in missed optimizations, because there are some attributes that can be determined via analysis of the MIR making up the function body. In particular, `readonly` could be applied to most indirectly-passed by-value function arguments (specifically, those that are freeze and are observed not to be mutated), but it currently is not. This patch introduces the machinery that allows `rustc` to determine those attributes. It consists of a query, `deduced_param_attrs`, that, when evaluated, analyzes the MIR of the function to determine supplementary attributes. The results of this query for each function are written into the crate metadata so that the deduced parameter attributes can be applied to cross-crate functions. In this patch, we simply check the parameter for mutations to determine whether the `readonly` attribute should be applied to parameters that are indirect immutable freeze by-value. More attributes could conceivably be deduced in the future: `nocapture` and `noalias` come to mind. Adding `readonly` to indirect function parameters where applicable enables some potential optimizations in LLVM that are discussed in [issue 103103] and [PR 103070] around avoiding stack-to-stack memory copies that appear in functions like `core::fmt::Write::write_fmt` and `core::panicking::assert_failed`. These functions pass a large structure unchanged by value to a subfunction that also doesn't mutate it. Since the structure in this case is passed as an indirect parameter, it's a pointer from LLVM's perspective. As a result, the intermediate copy of the structure that our codegen emits could be optimized away by LLVM's MemCpyOptimizer if it knew that the pointer is `readonly nocapture noalias` in both the caller and callee. We already pass `nocapture noalias`, but we're missing `readonly`, as we can't determine whether a by-value parameter is mutated by examining the signature in Rust. I didn't have much success with having LLVM infer the `readonly` attribute, even with fat LTO; it seems that deducing it at the MIR level is necessary. No large benefits should be expected from this optimization *now*; LLVM needs some changes (discussed in [PR 103070]) to more aggressively use the `noalias nocapture readonly` combination in its alias analysis. I have some LLVM patches for these optimizations and have had them looked over. With all the patches applied locally, I enabled LLVM to remove all the `memcpy`s from the following code: ```rust fn main() { println!("Hello {}", 3); } ``` which is a significant codegen improvement over the status quo. I expect that if this optimization kicks in in multiple places even for such a simple program, then it will apply to Rust code all over the place. [issue 103103]: rust-lang#103103 [PR 103070]: rust-lang#103070
Introduce deduced parameter attributes, and use them for deducing `readonly` on indirect immutable freeze by-value function parameters. Introduce deduced parameter attributes, and use them for deducing `readonly` on indirect immutable freeze by-value function parameters. Right now, `rustc` only examines function signatures and the platform ABI when determining the LLVM attributes to apply to parameters. This results in missed optimizations, because there are some attributes that can be determined via analysis of the MIR making up the function body. In particular, `readonly` could be applied to most indirectly-passed by-value function arguments (specifically, those that are freeze and are observed not to be mutated), but it currently is not. This patch introduces the machinery that allows `rustc` to determine those attributes. It consists of a query, `deduced_param_attrs`, that, when evaluated, analyzes the MIR of the function to determine supplementary attributes. The results of this query for each function are written into the crate metadata so that the deduced parameter attributes can be applied to cross-crate functions. In this patch, we simply check the parameter for mutations to determine whether the `readonly` attribute should be applied to parameters that are indirect immutable freeze by-value. More attributes could conceivably be deduced in the future: `nocapture` and `noalias` come to mind. Adding `readonly` to indirect function parameters where applicable enables some potential optimizations in LLVM that are discussed in [issue 103103] and [PR 103070] around avoiding stack-to-stack memory copies that appear in functions like `core::fmt::Write::write_fmt` and `core::panicking::assert_failed`. These functions pass a large structure unchanged by value to a subfunction that also doesn't mutate it. Since the structure in this case is passed as an indirect parameter, it's a pointer from LLVM's perspective. As a result, the intermediate copy of the structure that our codegen emits could be optimized away by LLVM's MemCpyOptimizer if it knew that the pointer is `readonly nocapture noalias` in both the caller and callee. We already pass `nocapture noalias`, but we're missing `readonly`, as we can't determine whether a by-value parameter is mutated by examining the signature in Rust. I didn't have much success with having LLVM infer the `readonly` attribute, even with fat LTO; it seems that deducing it at the MIR level is necessary. No large benefits should be expected from this optimization *now*; LLVM needs some changes (discussed in [PR 103070]) to more aggressively use the `noalias nocapture readonly` combination in its alias analysis. I have some LLVM patches for these optimizations and have had them looked over. With all the patches applied locally, I enabled LLVM to remove all the `memcpy`s from the following code: ```rust fn main() { println!("Hello {}", 3); } ``` which is a significant codegen improvement over the status quo. I expect that if this optimization kicks in in multiple places even for such a simple program, then it will apply to Rust code all over the place. [issue 103103]: rust-lang#103103 [PR 103070]: rust-lang#103070
I submitted https://reviews.llvm.org/D136659 to implement the suggested LLVM change. |
Optimization failures around reloads and memcpy optimizations are frequently traceable to LLVM's failure to prove that memory can't be mutated by a call or store. This problem is especially acute in Rust, where large values tend to be memcpy'd more often than in C++. Thankfully, Rust has stronger guarantees on mutability available than C++ does, via the strong immutability of
&
references. This should allow LLVM to prove that memory can't be modified by stores and calls in more cases.We're already using LLVM's
readonly
parameter attribute on such calls. However, the semantics ofreadonly
are akin toconst
in C++, in that they only promise to LLVM that the function won't mutate the parameter through that pointer, not that the pointed-to memory is immutable for the entire duration of the function. These weak semantics limit the applicability ofreadonly
to LLVM's alias analysis. Instead ofreadonly
, the correct way to express strong immutability guarantees on memory is through thellvm.invariant.start
andllvm.invariant.end
intrinsics. These enable a frontend likerustc
to describe immutability of memory regions in an expressive, flow-sensitive manner.Unfortunately, LLVM doesn't use the
llvm.invariant.start
andllvm.invariant.end
intrinsics for much at the moment. It's only used in one optimization in loop-invariant code motion at this time. Follow-up work will need to be done in LLVM to integrate these intrinsics into alias analysis. Possibly there will need to be some sort of "MemoryInvarianceAnalysis" that uses graph reachability algorithms to analyze the extent of the guarantees provided by these intrinsics to the control flow graph.Regardless, this front-end work needs to happen as a prerequisite for any LLVM work, so that the improvements to LLVM can be measured and tested. So this commit makes
rustc
usellvm.invariant
in a minimal way: on immutable references to "freeze" types (i.e. not transitively containing UnsafeCell) passed directly as parameters to functions. This is off by default, gated behind the non-default-Z emit-invariant-markers=yes
flag.Obviously, a lot more can be done to use
llvm.invariant
more liberally in the future, but this can be added over time, especially once more LLVM optimization passes use that infrastructure. This is simply the bare minimum for now. Once LLVM uses those intrinsics for more optimizations, the effects of morellvm.invariant
use can be measured more precisely.