Skip to content

std::simd scatter miscompilation #1439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pnevyk opened this issue Dec 23, 2023 · 9 comments · Fixed by rust-lang/rust#119278
Closed

std::simd scatter miscompilation #1439

pnevyk opened this issue Dec 23, 2023 · 9 comments · Fixed by rust-lang/rust#119278
Labels
A-core-arch Area: Necessary for full core::arch support C-bug Category: This is a bug.

Comments

@pnevyk
Copy link

pnevyk commented Dec 23, 2023

The example for Simd::scatter

#![feature(portable_simd)]

use std::simd::Simd;

fn main() {
    let mut vec: Vec<i32> = vec![10, 11, 12, 13, 14, 15, 16, 17, 18];
    let idxs = Simd::from_array([9, 3, 0, 0]); // Note the duplicate index.
    let vals = Simd::from_array([-27, 82, -41, 124]);

    vals.scatter(&mut vec, idxs); // two logical writes means the last wins.
    assert_eq!(vec, vec![124, 11, 12, 82, 14, 15, 16, 17, 18]);
}

results in

thread 'main' panicked at src/main.rs:11:5:
assertion `left == right` failed
  left: [-1, -1, 12, -1, -1, 15, 16, 17, 18]
 right: [124, 11, 12, 82, 14, 15, 16, 17, 18]
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
free(): invalid pointer
[1]    483914 IOT instruction (core dumped)  cargo run

Sort of a minimal reproducer is the following code:

#![feature(portable_simd)]

use std::simd::Simd;

fn main() {
    let mut vec: Vec<u8> = vec![0];
    let idxs = Simd::from_array([0]);
    let vals = Simd::from_array([1]);

    vals.scatter(&mut vec, idxs);
    assert_eq!(vec, vec![1]);
}

which results in

thread 'main' panicked at src/main.rs:11:5:
assertion `left == right` failed
  left: [255]
 right: [1]
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

(Interestingly, the second example does not cause free(): invalid pointer error.)

Running the examples with standard LLVM backend works as expected.

$ rustc --version
rustc 1.77.0-nightly (d6d7a9386 2023-12-22)

If I can provide more information I will happily do so. I tried to get generated assembly using cargo-show-asm, but I got

error: Unknown option `-x86-asm-syntax`

Looking at the actual outputs, it seems that the problem manifests itself by scatter putting value -1 at index i and i + 1 for every (valid) i from the idxs vector.

// Only 0 and 3 indices are valid in the example.
let idxs = Simd::from_array([9, 3, 0, 0]);

//    index: 0,  1,  2,  3,  4,  5,  6,  7,  8
//   left: [-1, -1, 12, -1, -1, 15, 16, 17, 18]

I would be interested in trying to fix this bug if I get some pointers to where to start. I have some basic knowledge of compilers in general and cranelift in particular.

@bjorn3
Copy link
Member

bjorn3 commented Dec 23, 2023

Thanks for the report! If you want to look into this you can pass --emit llvm-ir to rustc when compiling. Despite what the name suggests this will emit clif ir. You can find it in the output directory in the <crate_name>.clif directory. For every function there are three files with .unopt.clif, .opt.clif and .vcode as extensions. These are the clif ir before optimizations, after optimizations, the vcode (very close to the final assembly, but jump threading happens during final emission of object code and some instructions in vcode expand to multiple real instructions.) respectively.

The codegen for the simd_scatter intrinsic can be found at

sym::simd_scatter => {
intrinsic_args!(fx, args => (val, ptr, mask); intrinsic);
let (val_lane_count, _val_lane_ty) = val.layout().ty.simd_size_and_type(fx.tcx);
let (ptr_lane_count, _ptr_lane_ty) = ptr.layout().ty.simd_size_and_type(fx.tcx);
let (mask_lane_count, _mask_lane_ty) = mask.layout().ty.simd_size_and_type(fx.tcx);
assert_eq!(val_lane_count, ptr_lane_count);
assert_eq!(val_lane_count, mask_lane_count);
for lane_idx in 0..ptr_lane_count {
let val_lane = val.value_lane(fx, lane_idx).load_scalar(fx);
let ptr_lane = ptr.value_lane(fx, lane_idx).load_scalar(fx);
let mask_lane = mask.value_lane(fx, lane_idx).load_scalar(fx);
let if_enabled = fx.bcx.create_block();
let next = fx.bcx.create_block();
fx.bcx.ins().brif(mask_lane, if_enabled, &[], next, &[]);
fx.bcx.seal_block(if_enabled);
fx.bcx.switch_to_block(if_enabled);
fx.bcx.ins().store(MemFlags::trusted(), val_lane, ptr_lane, 0);
fx.bcx.ins().jump(next, &[]);
fx.bcx.seal_block(next);
fx.bcx.switch_to_block(next);
}
}

I can take a look at this too tomorrow.

@bjorn3 bjorn3 added C-bug Category: This is a bug. A-core-arch Area: Necessary for full core::arch support labels Dec 23, 2023
@pnevyk
Copy link
Author

pnevyk commented Dec 24, 2023

I made some progress in investigation. I reduced the reproducer to this:

#![feature(portable_simd)]

use std::simd::{Mask, Simd};

fn main() {
    let mut vec = [0; 1];
    let vals = Simd::from_array([1]);
    unsafe {
        vals.scatter_select_ptr(Simd::from_array([vec.as_mut_ptr()]), Mask::splat(true));
    }
    assert_eq!(vec, [1]);
}

Here is the clif for scatter_select_ptr:

Simd::scatter_select_ptr.unopt.clif
set opt_level=none
set tls_model=elf_gd
set libcall_call_conv=isa_default
set probestack_size_log2=12
set probestack_strategy=inline
set bb_padding_log2_minus_one=0
set regalloc_checker=0
set regalloc_verbose_logs=0
set enable_alias_analysis=1
set enable_verifier=0
set enable_pcc=0
set is_pic=1
set use_colocated_libcalls=0
set enable_float=1
set enable_nan_canonicalization=0
set enable_pinned_reg=0
set enable_atomics=1
set enable_safepoints=0
set enable_llvm_abi_extensions=1
set unwind_info=1
set preserve_frame_pointers=0
set machine_code_cfg_info=0
set enable_probestack=1
set probestack_func_adjusts_sp=0
set enable_jump_tables=1
set enable_heap_access_spectre_mitigation=1
set enable_table_access_spectre_mitigation=1
set enable_incremental_compilation_cache_checks=0
target x86_64 has_sse3=1 has_ssse3=1 has_sse41=1 has_sse42=1 has_avx=0 has_avx2=0 has_fma=0 has_avx512bitalg=0 has_avx512dq=0 has_avx512vl=0 has_avx512vbmi=0 has_avx512f=0 has_popcnt=1 has_bmi1=0 has_bmi2=0 has_lzcnt=0


function u0:21(i64, i64, i64) system_v {
; symbol _ZN4core9core_simd6vector17Simd$LT$T$C$_$GT$18scatter_select_ptr17h86829f5d5f23eba0E
; instance Instance { def: Item(DefId(2:21980 ~ core[2b98]::core_simd::vector::{impl#0}::scatter_select_ptr)), args: [i32, 1_usize] }
; abi FnAbi { args: [ArgAbi { layout: TyAndLayout { ty: std::simd::Simd<i32, 1>, layout: Layout { size: Size(4 bytes), align: AbiAndPrefAlign { abi: Align(4 bytes), pref: Align(4 bytes) }, abi: Vector { element: Initialized { value: Int(I32, true), valid_range: 0..=4294967295 }, count: 1 }, fields: Arbitrary { offsets: [Size(0 bytes)], memory_index: [0] }, largest_niche: None, variants: Single { index: 0 }, max_repr_align: None, unadjusted_abi_align: Align(4 bytes) } }, mode: Indirect { attrs: ArgAttributes { regular: NoAlias | NoCapture | NonNull | NoUndef, arg_ext: None, pointee_size: Size(4 bytes), pointee_align: Some(Align(4 bytes)) }, meta_attrs: None, on_stack: false } }, ArgAbi { layout: TyAndLayout { ty: std::simd::Simd<*mut i32, 1>, layout: Layout { size: Size(8 bytes), align: AbiAndPrefAlign { abi: Align(8 bytes), pref: Align(8 bytes) }, abi: Vector { element: Initialized { value: Pointer(AddressSpace(0)), valid_range: 0..=18446744073709551615 }, count: 1 }, fields: Arbitrary { offsets: [Size(0 bytes)], memory_index: [0] }, largest_niche: None, variants: Single { index: 0 }, max_repr_align: None, unadjusted_abi_align: Align(8 bytes) } }, mode: Indirect { attrs: ArgAttributes { regular: NoAlias | NoCapture | NonNull | NoUndef, arg_ext: None, pointee_size: Size(8 bytes), pointee_align: Some(Align(8 bytes)) }, meta_attrs: None, on_stack: false } }, ArgAbi { layout: TyAndLayout { ty: std::simd::Mask<isize, 1>, layout: Layout { size: Size(8 bytes), align: AbiAndPrefAlign { abi: Align(8 bytes), pref: Align(8 bytes) }, abi: Vector { element: Initialized { value: Int(I64, true), valid_range: 0..=18446744073709551615 }, count: 1 }, fields: Arbitrary { offsets: [Size(0 bytes)], memory_index: [0] }, largest_niche: None, variants: Single { index: 0 }, max_repr_align: None, unadjusted_abi_align: Align(8 bytes) } }, mode: Indirect { attrs: ArgAttributes { regular: NoAlias | NoCapture | NonNull | NoUndef, arg_ext: None, pointee_size: Size(8 bytes), pointee_align: Some(Align(8 bytes)) }, meta_attrs: None, on_stack: false } }], ret: ArgAbi { layout: TyAndLayout { ty: (), layout: Layout { size: Size(0 bytes), align: AbiAndPrefAlign { abi: Align(1 bytes), pref: Align(8 bytes) }, abi: Aggregate { sized: true }, fields: Arbitrary { offsets: [], memory_index: [] }, largest_niche: None, variants: Single { index: 0 }, max_repr_align: None, unadjusted_abi_align: Align(1 bytes) } }, mode: Ignore }, c_variadic: false, fixed_count: 3, conv: Rust, can_unwind: true }

; kind  loc.idx   param    pass mode                            ty
; zst   _0    ()                                0b 1, 8              align=8,offset=
; ret   _0      -          Ignore                               ()
; arg   _1      = v0       Indirect { attrs: ArgAttributes { regular: NoAlias | NoCapture | NonNull | NoUndef, arg_ext: None, pointee_size: Size(4 bytes), pointee_align: Some(Align(4 bytes)) }, meta_attrs: None, on_stack: false } std::simd::Simd<i32, 1_usize>
; arg   _2      = v1       Indirect { attrs: ArgAttributes { regular: NoAlias | NoCapture | NonNull | NoUndef, arg_ext: None, pointee_size: Size(8 bytes), pointee_align: Some(Align(8 bytes)) }, meta_attrs: None, on_stack: false } std::simd::Simd<*mut i32, 1_usize>
; arg   _3      = v2       Indirect { attrs: ArgAttributes { regular: NoAlias | NoCapture | NonNull | NoUndef, arg_ext: None, pointee_size: Size(8 bytes), pointee_align: Some(Align(8 bytes)) }, meta_attrs: None, on_stack: false } std::simd::Mask<isize, 1_usize>

; kind  local ty                              size align (abi,pref)
; reuse _1    std::simd::Simd<i32, 1_usize>     4b 4, 4              storage=v0
; reuse _2    std::simd::Simd<*mut i32, 1_usize>    8b 8, 8              storage=v1
; reuse _3    std::simd::Mask<isize, 1_usize>    8b 8, 8              storage=v2
; stack _4    std::simd::Simd<isize, 1_usize>    8b 8, 8              storage=ss0
; stack _5    core::core_simd::masks::mask_impl::Mask<isize, 1_usize>    8b 8, 8              storage=ss1

    ss0 = explicit_slot 16
    ss1 = explicit_slot 16

block0(v0: i64, v1: i64, v2: i64):
    nop
    jump block1

block1:
    nop
; _5 = (_3.0: core::core_simd::masks::mask_impl::Mask<isize, N>)
; write_cvalue: Addr(Pointer { base: Stack(ss1), offset: Offset32(0) }, None): core::core_simd::masks::mask_impl::Mask<isize, 1_usize> <- ByRef(Pointer { base: Addr(v2), offset: Offset32(0) }, None): core::core_simd::masks::mask_impl::Mask<isize, 1_usize>
    v3 = stack_addr.i64 ss1
    v4 = load.i64 notrap aligned v2
    store notrap aligned v4, v3
; _4 = (_5.0: std::simd::Simd<isize, N>)
; write_cvalue: Addr(Pointer { base: Stack(ss0), offset: Offset32(0) }, None): std::simd::Simd<isize, 1_usize> <- ByRef(Pointer { base: Stack(ss1), offset: Offset32(0) }, None): std::simd::Simd<isize, 1_usize>
    v5 = stack_addr.i64 ss1
    v6 = stack_addr.i64 ss0
    v7 = load.i64 notrap aligned v5
    store notrap aligned v7, v6
;
; _0 = core::core_simd::intrinsics::simd_scatter::<std::simd::Simd<T, N>, std::simd::Simd<*mut T, N>, std::simd::Simd<isize, N>>(move _1, move _2, move _4)
    v8 = stack_load.i64 ss0
    v9 = load.i64 notrap v1
    v10 = load.i32 notrap v0
    brif v10, block3, block4

block3:
    store.i64 notrap aligned v8, v9
    jump block4

block4:
    jump block2

block2:
    nop
;
; return
    return
}

If I didn't make a mistake in my clif analysis, the problem is that the generated code confuses enable mask and vals arguments (see my annotations):

; (pnevyk): vals, dest, enable
function u0:21(i64, i64, i64) system_v {

; ...

; stack _4    std::simd::Simd<isize, 1_usize>    8b 8, 8              storage=ss0
; stack _5    core::core_simd::masks::mask_impl::Mask<isize, 1_usize>    8b 8, 8              storage=ss1

    ss0 = explicit_slot 16
    ss1 = explicit_slot 16

block0(v0: i64, v1: i64, v2: i64):

; ...

; _0 = core::core_simd::intrinsics::simd_scatter::<std::simd::Simd<T, N>, std::simd::Simd<*mut T, N>, std::simd::Simd<isize, N>>(move _1, move _2, move _4)
; (pnevyk): v8 = enable[0]
    v8 = stack_load.i64 ss0
; (pnevyk): v9 = dest[0]
    v9 = load.i64 notrap v1
; (pnevyk): v10 = vals[0]
    v10 = load.i32 notrap v0
; (pnevyk): if vals[0] then jump block3 else jump block4
    brif v10, block3, block4

block3:
; (pnevyk): store enable[0] to dest[0]
    store.i64 notrap aligned v8, v9
    jump block4

This is in agreement with the behavior of storing -1 (i.e., true in Mask) instead of the value from vals. If I initialize vals with [0], then the brif is false and the array is untouched.

However, the implementation of the simd_scatter intrinsic looks correct to me.

For testing I am still using the version of cranelift distributed via rustup in version rustc 1.77.0-nightly (d6d7a9386 2023-12-22). I will try to build the cranelift backend on my machine and debug further.

@pnevyk
Copy link
Author

pnevyk commented Dec 24, 2023

rustc_codegen_cranelift built from source works correctly. Also the clif generated code is correct:

; _0 = core::core_simd::intrinsics::simd_scatter::<std::simd::Simd<T, N>, std::simd::Simd<*mut T, N>, std::simd::Simd<isize, N>>(move _1, move _2, move _4)
; (pnevyk): v8 = vals[0]
    v8 = load.i32 notrap v0
; (pnevyk): v9 = dest[0]
    v9 = load.i64 notrap v1
; (pnevyk): v10 = enable[0]
    v10 = stack_load.i64 ss0
    brif v10, block3, block4

block3:
; (pnevyk): store vals[0] to dest[0]
    store.i32 notrap aligned v8, v9
    jump block4

@bjorn3
Copy link
Member

bjorn3 commented Dec 24, 2023

8ab225d flipped these arguments, but in ace694c I believe I had to flip them back because of test failures [^1]. Both commits should be included in nightly-2023-12-22 due to a subtree sync on the 19th, but somehow only the first commit shows up in the git log of the rust repo.

The test failures
running 38 tests
i................................FFFFF
failures:

---- crates/core_simd/src/vector.rs - core_simd::vector::Simd<T,N>::scatter (line 496) stdout ----
Test executable failed (signal: 6 (SIGABRT) (core dumped)).

stderr:
thread 'main' panicked at crates/core_simd/src/vector.rs:10:1:
assertion `left == right` failed
  left: [-1, -1, 12, -1, -1, 15, 16, 17, 18]
 right: [124, 11, 12, 82, 14, 15, 16, 17, 18]
stack backtrace:
   0: rust_begin_unwind
             at /home/gh-bjorn3/cg_clif/build/stdlib/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /home/gh-bjorn3/cg_clif/build/stdlib/library/core/src/panicking.rs:72:14
   2: core::panicking::assert_failed_inner
             at /home/gh-bjorn3/cg_clif/build/stdlib/library/core/src/panic.rs:106:9
   3: core::panicking::assert_failed
   4: rust_out::main::_doctest_main_crates_core_simd_src_vector_rs_496_0
   5: rust_out::main
   6: core::ops::function::FnOnce::call_once
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


---- crates/core_simd/src/vector.rs - core_simd::vector::Simd<T,N>::scatter_ptr (line 603) stdout ----
Test executable failed (signal: 6 (SIGABRT) (core dumped)).

stderr:
thread 'main' panicked at crates/core_simd/src/vector.rs:13:1:
assertion `left == right` failed
  left: [-1, -1, -1, -1]
 right: [7, 5, 3, 6]
stack backtrace:
   0: rust_begin_unwind
             at /home/gh-bjorn3/cg_clif/build/stdlib/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /home/gh-bjorn3/cg_clif/build/stdlib/library/core/src/panicking.rs:72:14
   2: core::panicking::assert_failed_inner
             at /home/gh-bjorn3/cg_clif/build/stdlib/library/core/src/panic.rs:106:9
   3: core::panicking::assert_failed
   4: rust_out::main::_doctest_main_crates_core_simd_src_vector_rs_603_0
   5: rust_out::main
   6: core::ops::function::FnOnce::call_once
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


---- crates/core_simd/src/vector.rs - core_simd::vector::Simd<T,N>::scatter_select_ptr (line 630) stdout ----
Test executable failed (signal: 6 (SIGABRT) (core dumped)).

stderr:
thread 'main' panicked at crates/core_simd/src/vector.rs:14:1:
assertion `left == right` failed
  left: [0, 0, 0, -1]
 right: [0, 0, 3, 6]
stack backtrace:
   0: rust_begin_unwind
             at /home/gh-bjorn3/cg_clif/build/stdlib/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /home/gh-bjorn3/cg_clif/build/stdlib/library/core/src/panicking.rs:72:14
   2: core::panicking::assert_failed_inner
             at /home/gh-bjorn3/cg_clif/build/stdlib/library/core/src/panic.rs:106:9
   3: core::panicking::assert_failed
   4: rust_out::main::_doctest_main_crates_core_simd_src_vector_rs_630_0
   5: rust_out::main
   6: core::ops::function::FnOnce::call_once
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


---- crates/core_simd/src/vector.rs - core_simd::vector::Simd<T,N>::scatter_select (line 518) stdout ----
Test executable failed (signal: 6 (SIGABRT) (core dumped)).

stderr:
thread 'main' panicked at crates/core_simd/src/vector.rs:15:1:
assertion `left == right` failed
  left: [0, 0, 12, -1, -1, 15, 16, 17, 18]
 right: [-41, 11, 12, 82, 14, 15, 16, 17, 18]
stack backtrace:
   0: rust_begin_unwind
             at /home/gh-bjorn3/cg_clif/build/stdlib/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /home/gh-bjorn3/cg_clif/build/stdlib/library/core/src/panicking.rs:72:14
   2: core::panicking::assert_failed_inner
             at /home/gh-bjorn3/cg_clif/build/stdlib/library/core/src/panic.rs:106:9
   3: core::panicking::assert_failed
   4: rust_out::main::_doctest_main_crates_core_simd_src_vector_rs_518_0
   5: rust_out::main
   6: core::ops::function::FnOnce::call_once
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


---- crates/core_simd/src/vector.rs - core_simd::vector::Simd<T,N>::scatter_select_unchecked (line 549) stdout ----
Test executable failed (signal: 6 (SIGABRT) (core dumped)).

stderr:
thread 'main' panicked at crates/core_simd/src/vector.rs:19:1:
assertion `left == right` failed
  left: [0, 0, 12, -1, -1, 15, 16, 17, 18]
 right: [-41, 11, 12, 82, 14, 15, 16, 17, 18]
stack backtrace:
   0: rust_begin_unwind
             at /home/gh-bjorn3/cg_clif/build/stdlib/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /home/gh-bjorn3/cg_clif/build/stdlib/library/core/src/panicking.rs:72:14
   2: core::panicking::assert_failed_inner
             at /home/gh-bjorn3/cg_clif/build/stdlib/library/core/src/panic.rs:106:9
   3: core::panicking::assert_failed
   4: rust_out::main::_doctest_main_crates_core_simd_src_vector_rs_549_0
   5: rust_out::main
   6: core::ops::function::FnOnce::call_once
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.



failures:
    crates/core_simd/src/vector.rs - core_simd::vector::Simd<T,N>::scatter (line 496)
    crates/core_simd/src/vector.rs - core_simd::vector::Simd<T,N>::scatter_ptr (line 603)
    crates/core_simd/src/vector.rs - core_simd::vector::Simd<T,N>::scatter_select (line 518)
    crates/core_simd/src/vector.rs - core_simd::vector::Simd<T,N>::scatter_select_ptr (line 630)
    crates/core_simd/src/vector.rs - core_simd::vector::Simd<T,N>::scatter_select_unchecked (line 549)

test result: FAILED. 32 passed; 5 failed; 1 ignored; 0 measured; 3 filtered out; finished in 1.58s

error: doctest failed, to rerun pass `-p core_simd --doc`

@bjorn3
Copy link
Member

bjorn3 commented Dec 24, 2023

but somehow only the first commit shows up in the git log of the rust repo.

This may have the same root cause as #1385.

@bjorn3
Copy link
Member

bjorn3 commented Dec 24, 2023

rust-lang/rust#119278 should fix this.

@bjorn3 bjorn3 linked a pull request Dec 24, 2023 that will close this issue
@bjorn3
Copy link
Member

bjorn3 commented Dec 24, 2023

Should be fixed in the next nightly.

@bjorn3
Copy link
Member

bjorn3 commented Dec 25, 2023

@pnevyk Can you confirm that the issue is fixed with the latest nightly?

@pnevyk
Copy link
Author

pnevyk commented Dec 26, 2023

Can you confirm that the issue is fixed with the latest nightly?

Yes, it works 🎉

@pnevyk pnevyk closed this as completed Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-core-arch Area: Necessary for full core::arch support C-bug Category: This is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants