Skip to content

Vec::resize for bytes should be a single memset #120050

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 34 additions & 22 deletions library/alloc/src/vec/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,12 @@ use self::spec_extend::SpecExtend;
#[cfg(not(no_global_oom_handling))]
mod spec_extend;

#[cfg(not(no_global_oom_handling))]
use self::spec_extend_elem::SpecExtendElem;

#[cfg(not(no_global_oom_handling))]
mod spec_extend_elem;

/// A contiguous growable array type, written as `Vec<T>`, short for 'vector'.
///
/// # Examples
Expand Down Expand Up @@ -2871,7 +2877,7 @@ impl<T: Clone, A: Allocator> Vec<T, A> {
let len = self.len();

if new_len > len {
self.extend_with(new_len - len, value)
self.extend_elem(new_len - len, value)
} else {
self.truncate(new_len);
}
Expand Down Expand Up @@ -2985,32 +2991,38 @@ impl<T, A: Allocator, const N: usize> Vec<[T; N], A> {

impl<T: Clone, A: Allocator> Vec<T, A> {
#[cfg(not(no_global_oom_handling))]
/// Extend the vector by `n` clones of value.
fn extend_with(&mut self, n: usize, value: T) {
#[inline]
/// Extend the vector by `n` clones of `value`.
///
/// Uses specialization to dispatch to `extend_elem_copy` if possible.
fn extend_elem(&mut self, n: usize, value: T) {
self.spec_extend_elem(n, value);
}

#[cfg(not(no_global_oom_handling))]
#[inline]
/// Extend the vector by `n` *copies* of `value`.
fn extend_elem_copy(&mut self, n: usize, value: T)
where
T: Copy,
{
self.reserve(n);

// SAFETY: We now have space for all the elements, so the pointer math
// is all in-bounds. Because `T: Copy`, there's no user code being run
// here (no `clone` nor any `drop` in the assignment).
unsafe {
let mut ptr = self.as_mut_ptr().add(self.len());
// Use SetLenOnDrop to work around bug where compiler
// might not realize the store through `ptr` through self.set_len()
// don't alias.
let mut local_len = SetLenOnDrop::new(&mut self.len);

// Write all elements except the last one
for _ in 1..n {
ptr::write(ptr, value.clone());
ptr = ptr.add(1);
// Increment the length in every step in case clone() panics
local_len.increment_len(1);
}

if n > 0 {
// We can write the last element directly without cloning needlessly
ptr::write(ptr, value);
local_len.increment_len(1);
// Because there's no user code being run here, we can skip it for ZSTs.
// That helps tests in debug mode that do things like `vec![(); HUGE]`.
// See <https://github.com/rust-lang/rust/pull/118094>
if !T::IS_ZST {
Comment on lines +3015 to +3018
Copy link
Member Author

@scottmcm scottmcm Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @JarvisCraft in case you have thoughts on this approach, since I removed your specialization from #118094 in this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tagging me! I've commented below on what I am unsure about, though the optimization really looks promising.

let ptr = self.as_mut_ptr().add(self.len);
for i in 0..n {
*ptr.add(i) = value;
}
}

// len set by scope guard
self.len += n;
}
}
}
Expand Down
27 changes: 27 additions & 0 deletions library/alloc/src/vec/spec_extend_elem.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
use core::iter::repeat_n;

use super::Vec;
use crate::alloc::Allocator;

// Specialization trait used for Vec::extend_elem
pub(super) trait SpecExtendElem<T> {
fn spec_extend_elem(&mut self, n: usize, value: T);
}

impl<T, A: Allocator> SpecExtendElem<T> for Vec<T, A>
where
T: Clone,
{
default fn spec_extend_elem(&mut self, n: usize, value: T) {
self.extend_trusted(repeat_n(value, n))
}
}

impl<T, A: Allocator> SpecExtendElem<T> for Vec<T, A>
where
T: Copy,
{
fn spec_extend_elem(&mut self, n: usize, value: T) {
self.extend_elem_copy(n, value)
Copy link
Contributor

@JarvisCraft JarvisCraft Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this specialization is correct for all cases of vec![ZST; N].

At the moment:

  1. impl Copy for Foo permits <Foo as Clone>::clone to have side-effects.
  2. The docs of vec! explicitly mention that it works via Clone.

Which means that at the moment for vec![ZST; N] any effects of ZST::clone are observed N times as can be seen in the example. With this change, they won't since the specialization skips any calls to this method.

This is the reason why I've only implemented the specialization for () previously.

As mentioned in #118094 (comment), this kind of change is still allowed, although I expect that there must be an explicit proof that this is a valid optimization and an update to vec!'s doc is probably required to explicitly state that such optimization may occur.

An alternative may be to add a perma-unstable fn to Clone like is_trivially_cloneable() defaulting to false and only overridable by rustc on derive, but this of course is more tedious.

}
}
19 changes: 2 additions & 17 deletions library/alloc/src/vec/spec_from_elem.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ pub(super) trait SpecFromElem: Sized {
impl<T: Clone> SpecFromElem for T {
default fn from_elem<A: Allocator>(elem: Self, n: usize, alloc: A) -> Vec<Self, A> {
let mut v = Vec::with_capacity_in(n, alloc);
v.extend_with(n, elem);
v.extend_elem(n, elem);
v
}
}
Expand All @@ -24,7 +24,7 @@ impl<T: Clone + IsZero> SpecFromElem for T {
return Vec { buf: RawVec::with_capacity_zeroed_in(n, alloc), len: n };
}
let mut v = Vec::with_capacity_in(n, alloc);
v.extend_with(n, elem);
v.extend_elem(n, elem);
v
}
}
Expand Down Expand Up @@ -58,18 +58,3 @@ impl SpecFromElem for u8 {
v
}
}

// A better way would be to implement this for all ZSTs which are `Copy` and have trivial `Clone`
// but the latter cannot be detected currently
impl SpecFromElem for () {
#[inline]
fn from_elem<A: Allocator>(_elem: (), n: usize, alloc: A) -> Vec<(), A> {
let mut v = Vec::with_capacity_in(n, alloc);
// SAFETY: the capacity has just been set to `n`
// and `()` is a ZST with trivial `Clone` implementation
unsafe {
v.set_len(n);
}
v
}
}
23 changes: 23 additions & 0 deletions tests/codegen/vec-of-bytes-memset.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
//@ compile-flags: -O
//@ only-64bit

#![crate_type = "lib"]

// CHECK-LABEL: @resize_bytes_is_one_memset
#[no_mangle]
pub fn resize_bytes_is_one_memset(x: &mut Vec<u8>) {
// CHECK: call void @llvm.memset.p0.i64({{.+}}, i8 123, i64 456789, i1 false)
let new_len = x.len() + 456789;
x.resize(new_len, 123);
}

#[derive(Copy, Clone)]
struct ByteNewtype(i8);

// CHECK-LABEL: @from_elem_is_one_memset
#[no_mangle]
pub fn from_elem_is_one_memset() -> Vec<ByteNewtype> {
// CHECK: %[[P:.+]] = tail call{{.+}}@__rust_alloc(i64 noundef 123456, i64 noundef 1)
// CHECK: call void @llvm.memset.p0.i64({{.+}} %[[P]], i8 42, i64 123456, i1 false)
vec![ByteNewtype(42); 123456]
}
Loading