Skip to content

Support calling functions with SIMD vectors that couldn't be used in the caller #132865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
RalfJung opened this issue Nov 10, 2024 · 2 comments
Labels
A-ABI Area: Concerning the application binary interface (ABI) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-SIMD Area: SIMD (Single Instruction Multiple Data) C-feature-request Category: A feature request, i.e: not implemented / a PR. T-opsem Relevant to the opsem team WG-llvm Working group: LLVM backend code generation

Comments

@RalfJung
Copy link
Member

RalfJung commented Nov 10, 2024

We now lint and will eventually error on this program:

use std::mem::transmute;
#[cfg(target_arch = "x86")]
use std::arch::x86::*;
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;

#[target_feature(enable = "avx")]
#[allow(improper_ctypes_definitions)]
unsafe extern "C" fn with_target_feature(x: __m256) {
    let val = unsafe { transmute::<_, [u32; 8]>(x) };
    dbg!(val);
}

fn main() {
    assert!(is_x86_feature_detected!("avx"));
    // SAFETY: we checked that the `avx` feature is present.
    unsafe {
        with_target_feature(transmute([1; 8])); //~ ERROR: missing `avx` target feature
    }
}
warning: ABI error: this function call uses a vector type that requires the `avx` target feature, which is not enabled in the caller
  --> test.rs:18:9
   |
18 |         with_target_feature(transmute([1; 8]));
   |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ function called here
   |
   = warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
   = note: for more information, see issue #116558 <https://github.com/rust-lang/rust/issues/116558>
   = help: consider enabling it globally (`-C target-feature=+avx`) or locally (`#[target_feature(enable="avx")]`)
   = note: `#[warn(abi_unsupported_vector_types)]` on by default

The lint is necessary because the way we codegen this function would be unsound (and indeed, if you run this on the playground you can see that the argument value gets corrupted). See #116558 for more context.

However, there's no fundamental reason that we couldn't compile this code! We "just" need to generate the call to with_target_feature using its proper ABI, i.e., using the AVX registers. This is sound because the function anyway requires that target feature, so the caller must have already ensured that this target feature is available.

The problem is that LLVM currently simply has no way to express such a call. So we have three options:

  • error (the easiest one, and what we are currently working towards)
  • fix this in LLVM (also see Inlining loses ABI-relevant target feature information at call operations llvm/llvm-project#70563) -- I am told this is quite hard
  • generate a shim that uses the Rust ABI (so it is not affected by these ABI shenanigans), and has the avx feature gate, and calls the actual callee -- not a pretty solution since the extra function call is bad for performance, and performance is the reason people manually write SIMD code to begin with

Lucky enough, this only affects non-Rust ABIs, so users should only rarely run into this.

Cc @rust-lang/wg-llvm @rust-lang/opsem @chorman0773 @veluca93

@RalfJung RalfJung added A-ABI Area: Concerning the application binary interface (ABI) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-SIMD Area: SIMD (Single Instruction Multiple Data) C-feature-request Category: A feature request, i.e: not implemented / a PR. T-opsem Relevant to the opsem team WG-llvm Working group: LLVM backend code generation labels Nov 10, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Nov 10, 2024
@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Nov 10, 2024
@amluto
Copy link

amluto commented Feb 28, 2025

I found this issue via the LKML discussion, and I wanted to add a word or two of caution. I think you've covered two bases:

  1. ABI mismatch: it's unsound to call a function if the caller and callee disagree on the ABI.

  2. Using a nonexistent register or instruction will crash. If you are running on bare metal in usermode with no funny business, I think you've approximately covered the bases: either the instruction and register exist or they don't, and if they do exist, they always have existed and always will exist.

But the actual environments that software runs in are tricky and nasty, and register state management is especially tricky. Some machines (fortunately not x86, or at least not any x86 that Intel or AMD will admit to) are heterogenous, and threads can context switch to and from CPUs with different capabilities. And the register state themselves is generally managed by a horrible mechanism that needs to be babysat.

In the context of Linux, this means that one needs permission to use various register sets, including x87 and SSE. On older systems, this may have meant twiddling the CR0.TS bit (although this could be sort of automatic in some contexts, and I admit I don't fully recall exactly how Linux did this, but I was the one who did much of the work to delete that code...). On current systems, in general, the contents of XMM registers may belong to userspace because XSAVE and friends are expensive and are done only as needed, and writing to XMM registers at the wrong time is egregiously unsound and, worse, won't even reliably affect the kernel -- it will just leak information to user code and may crash user code!

So what the system actually wants is (psuedocode, names don't match Linux, etc):

start_using_xmm()
use XMM regs;
stop_using_xmm()

and this is awful because it's very error-prone.

I don't have a great solution for Rust. Maybe the act of crossing a target feature boundary (depending on direction?) should be unsafe and also be implementable in a way that has some kind of explicit barrier so that instructions and register allocation don't leak across the boundary? I can imagine calling a (safe) function that does the transition and calls a closure:

run_xmm_code(func)

and somehow has the right type to make it sound.

Or I can imagine Rust helping out less and merely enabling code to work in approximately the same unpleasant way that C works.

But please be very cautious with any sort of analysis along the lines of "I can prove that this code is going to touch XMM regs, and therefore it's okay to touch XMM regs now."

FWIW, this isn't necessarily restricted to kernel code. User code that does lightweight threading can switch GPRs right away and defer switching FPU regs until it feels like it. (And, in fact, I think x86's CR0.TS mechanism dates back to the 80286/287 era and was used when switching "tasks".)

@RalfJung
Copy link
Member Author

RalfJung commented Mar 3, 2025

Thanks for raising these points! You will be happy to hear that both of these cases have been raised in discussions before, but I did not include an essay of everything that was ever said about target features in the issue description. :) However, I think they are unrelated to this issue. This issue is very specifically about what it says in the OP, whereas both points you raised are much wider discussions affecting not just the specific ability to invoke with_target_feature from without_target_feature. with_target_feature already today has a clear precondition of "AVX must be available"; if you call it in an environment where that is not the case (e.g. because the current CPU core does not have AVX, or because AVX has been disabled again in the kernel), you already have UB today. We are thus fully in our right to assume that AVX is available just before the call. (But not 5 lines before in the code! Only immediately before the function invocation.)

threads can context switch to and from CPUs with different capabilities

I think the general consensus is that these are cursed environments, and hardly any system actually supports that. Rust currently only supports platforms where all CPUs present the same capabilities. Maybe smaller cores can use slower emulations for large vector types or so, but a target feature is only considered available if it is available on all cores that the process may be scheduled on. If you are playing funny games with heterogeneous systems, it is your responsibility to ensure this remains true.

writing to XMM registers at the wrong time is egregiously unsound

The issues of enabling and disabling target features, e.g. inside the Linux kernel, has been discussed elsewhere and is orthogonal to this one. In particular, by stabilizing #134090 in its current form, we have decided that Rust will by default assume that a target feature, once available, will always remain available. Using target features the way the kernel does requires extra care -- basically, stop_using_xmm is unsafe and has a precondition of "ensure no code out there still assumes that SSE is available"; all function pointers or closures produced inside functions with the target feature are suspect as they could safely escape that function. This makes things slightly trickier in the kernel, but a lot easier for everyone else who's working in a more sane environment. We did ensure the kernel case remains possible though. If you want to re-hash that discussion, please do that in a new issue. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ABI Area: Concerning the application binary interface (ABI) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-SIMD Area: SIMD (Single Instruction Multiple Data) C-feature-request Category: A feature request, i.e: not implemented / a PR. T-opsem Relevant to the opsem team WG-llvm Working group: LLVM backend code generation
Projects
None yet
Development

No branches or pull requests

4 participants