-
Notifications
You must be signed in to change notification settings - Fork 13.3k
remove reserve_for_push #104668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove reserve_for_push #104668
Conversation
r? @thomcc (rustbot has picked a reviewer for you, use r? to override) |
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
It was introduced ~a year ago (#91352) and then it was a compile time win. Let's see what happens now :) @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 48cb20e with merge 5c3b88615f267d31914a465c4f0efe1b142a7467... |
Thanks, If this makes compile times slower (I guess due to inlining) then I have an alternative idea to use |
Well, runtime benchmarks are still WIP, so we can only measure compile time performance + runtime performance of |
A simple benchmark with criterion 0.4 pushing 1<<18 numbers into the vec use criterion::{black_box, criterion_group, criterion_main, Criterion};
pub fn push(c: &mut Criterion) {
c.bench_function("push", |b| {
b.iter(|| {
let mut v = Vec::new();
for i in black_box(1..1 << 18) {
v.push(i);
}
black_box(v);
})
});
}
criterion_group!(benches, push);
criterion_main!(benches); In case criterion didn't like my changes, I also made a much simpler benchmark that got -50% improvements #[test]
fn foo_bench() {
let mut durs = vec![Duration::ZERO; 10000];
let mut start = Instant::now();
for dur in &mut durs {
black_box(foo());
let next = Instant::now();
*dur = next - start;
start = next;
}
dbg!(durs.into_iter().sum::<Duration>() / 10000);
} I couldn't believe these numbers, so I rebuild stage2 from master and I got the same results as nightly. I'm not sure if there's a fault with my benchmark methodology here. This is running on a MacBook Pro 13" M1 |
A theory could be that the improvements had a compounding effect within criterion since they must internally use Another theory could be that my new code just doesn't work correctly and is UB and gets optimised away (This seems extremely unlikely though) |
It doesn't have to be optimized away just because it's UB. Now the code isn't hidden behind |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Ok, I think the inlining was quite aggressive. I moved the vec push loop from the benchmark iter into a library function, and called that instead: This code has ~ 5-15% reduction in time. The asm of the library function isn't different from before, apart from calling The gains seem to be from |
(it is generally preferable to paste text not images) Some of these benchmarks look encouraging but I am suspicious because this is all microbenchmarking. In a microbenchmark the optimal inlining heuristic is often "yes" but in a real program, whether or not cold paths are inlined often has a cascading effect on later inlining decisions. So I'd want to know in addition to these benchmarks and whatever rustc-perf says, what is the code size a |
Finished benchmarking commit (5c3b88615f267d31914a465c4f0efe1b142a7467): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
Alloc has a bunch of built-in benches. you can run them with Edit: ah nevermind, I just saw the perf rlo results. |
I couldn't find many that do a Vec::push underneath. Running the benchmarks against master:
branch:
This looks like a regression to me. I don't understand where from though, as the outputs in the String::push asm look identical to me, and I don't understand why |
As the linked guide sasys, you'd have to make sure it's not noise. But even if it isn't, it's not an improvement and the perf.rlo benchmarks don't look good either and binary sizes have increased too. |
Yeah in it's current state, I don't see evidence that this helps. @rustbot author |
Based on a quick investigation around extend_one, I don't believe the
reserve_for_push
abstraction is beneficial.Quick benchmarks locally show a small improvement, would be interested in a try build too