Skip to content

[perf experiment] A MIR pass dedicated to optimizing common iterators #136745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

FractalFir
Copy link
Contributor

This PR is a perf experiment, and is not meant to be accepted. I am creating it to request a perf run

Motivation

Currently, many commonly used iterators don't get inlined during MIR optimization: this leads to increased ammount of LLVM-IR.

Since those iterators are also generic, they are unlikely to be inlined anytime soon.

Optimizing slice iteraotrs

This PR adds an experimental pass which replaces 2 commonly used iterators(std::slice::Iter and std::slice::IterMut) with inline implementations.

Should this pass show potential for performance gains, I will work on an improved version, which will also handle other common iterators from core(eg. Range, Enumerate).

A proper implementation will require other, bigger changes(e.g. maybe marking certain iterators as lang items for quicker lookup).

Because of that, I am asking for a perf run, to see if that effort will be worth it.

@rustbot
Copy link
Collaborator

rustbot commented Feb 8, 2025

r? @fmease

rustbot has assigned @fmease.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 8, 2025
@rustbot
Copy link
Collaborator

rustbot commented Feb 8, 2025

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@FractalFir
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@bors
Copy link
Collaborator

bors commented Feb 8, 2025

@FractalFir: 🔑 Insufficient privileges: not in try users

@matthiaskrgr
Copy link
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 8, 2025
@bors
Copy link
Collaborator

bors commented Feb 8, 2025

⌛ Trying commit 1d4d571 with merge 37e77e9...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 8, 2025
[perf experiment] A MIR pass dedicated to optimizing common iterators

**This PR is a perf experiment, and is not meant to be accepted. I am creating it to request a perf run**

# Motivation

Currently, many commonly used iterators don't get inlined during MIR optimization: this leads to increased ammount of LLVM-IR.

Since those iterators are also generic, they are unlikely to be inlined anytime soon.

# Optimizing slice iteraotrs

This PR adds an experimental pass which replaces 2 commonly used iterators(`std::slice::Iter` and `std::slice::IterMut`) with inline implementations.

Should this pass show potential for performance gains, I will work on an improved version, which will also handle other common iterators from `core`(eg. `Range`, `Enumerate`).

A proper implementation will require other, bigger changes(e.g. *maybe* marking certain iterators as lang items for quicker lookup).

Because of that, I am asking for a perf run, to see if that effort will be worth it.
ty::Array(_, _) => false,
ty::Never | ty::FnDef(..) => false,
ty::Adt(def, args) => match def.adt_kind() {
AdtKind::Enum => def.variants().len() > 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehehe, very familiar function... I have made this mistake too, this will blow up for enums with uninhabited variants.

If it helps you can steal from here: 2c19fc6#diff-d5c6f94594e21dda2a9e3717a8b245c481fd461d8eece9c7bc960693cb0c368aR795

@rust-log-analyzer
Copy link
Collaborator

The job x86_64-gnu-llvm-18 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
#22 exporting to docker image format
#22 sending tarball 27.1s done
#22 DONE 33.1s
##[endgroup]
Setting extra environment values for docker:  --env ENABLE_GCC_CODEGEN=1 --env GCC_EXEC_PREFIX=/usr/lib/gcc/
[CI_JOB_NAME=x86_64-gnu-llvm-18]
debug: `DISABLE_CI_RUSTC_IF_INCOMPATIBLE` configured.
---
sccache: Starting the server...
##[group]Configure the build
configure: processing command line
configure: 
configure: build.configure-args := ['--build=x86_64-unknown-linux-gnu', '--llvm-root=/usr/lib/llvm-18', '--enable-llvm-link-shared', '--set', 'rust.randomize-layout=true', '--set', 'rust.thin-lto-import-instr-limit=10', '--enable-verbose-configure', '--enable-sccache', '--disable-manage-submodules', '--enable-locked-deps', '--enable-cargo-native-static', '--set', 'rust.codegen-units-std=1', '--set', 'dist.compression-profile=balanced', '--dist-compression-formats=xz', '--set', 'rust.lld=false', '--disable-dist-src', '--release-channel=nightly', '--enable-debug-assertions', '--enable-overflow-checks', '--enable-llvm-assertions', '--set', 'rust.verify-llvm-ir', '--set', 'rust.codegen-backends=llvm,cranelift,gcc', '--set', 'llvm.static-libstdcpp', '--enable-new-symbol-mangling']
configure: target.x86_64-unknown-linux-gnu.llvm-config := /usr/lib/llvm-18/bin/llvm-config
configure: llvm.link-shared     := True
configure: rust.randomize-layout := True
configure: rust.thin-lto-import-instr-limit := 10
---
failures:

---- [mir-opt] tests/mir-opt/slice_iter.rs stdout ----

thread '[mir-opt] tests/mir-opt/slice_iter.rs' panicked at src/tools/compiletest/src/runtest/mir_opt.rs:72:21:
Output file `/checkout/obj/build/x86_64-unknown-linux-gnu/test/mir-opt/slice_iter/slice_iter.built.after.mir` from test does not exist, available files are in `/checkout/obj/build/x86_64-unknown-linux-gnu/test/mir-opt/slice_iter`


failures:
    [mir-opt] tests/mir-opt/slice_iter.rs

@bors
Copy link
Collaborator

bors commented Feb 8, 2025

☀️ Try build successful - checks-actions
Build commit: 37e77e9 (37e77e9a3d5e8a5385dd4735f806ccacda215f14)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (37e77e9): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
3.7% [3.7%, 3.7%] 1
Regressions ❌
(secondary)
5.0% [0.6%, 9.3%] 2
Improvements ✅
(primary)
-0.5% [-2.4%, -0.2%] 188
Improvements ✅
(secondary)
-0.5% [-1.1%, -0.1%] 72
All ❌✅ (primary) -0.5% [-2.4%, 3.7%] 189

Max RSS (memory usage)

Results (primary -2.9%, secondary 0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.6% [2.6%, 2.6%] 1
Improvements ✅
(primary)
-2.9% [-2.9%, -2.9%] 1
Improvements ✅
(secondary)
-2.3% [-2.3%, -2.3%] 1
All ❌✅ (primary) -2.9% [-2.9%, -2.9%] 1

Cycles

Results (primary 0.6%, secondary -0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.5% [3.5%, 3.5%] 1
Regressions ❌
(secondary)
7.1% [4.6%, 9.6%] 2
Improvements ✅
(primary)
-0.8% [-1.0%, -0.6%] 2
Improvements ✅
(secondary)
-2.4% [-2.9%, -1.3%] 7
All ❌✅ (primary) 0.6% [-1.0%, 3.5%] 3

Binary size

Results (primary -0.2%, secondary -0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.1%] 19
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.3% [-1.0%, -0.0%] 30
Improvements ✅
(secondary)
-0.3% [-0.4%, -0.1%] 75
All ❌✅ (primary) -0.2% [-1.0%, 0.1%] 49

Bootstrap: 780.482s -> 778.114s (-0.30%)
Artifact size: 329.09 MiB -> 329.14 MiB (0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 8, 2025
Comment on lines +43 to +44
for bid in (0..(bbs.len())).into_iter().map(BasicBlock::from_usize) {
let mut bb = &bbs[bid];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,5 @@
#[no_mangle]
// EMIT_MIR slice_iter.built.after.mir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot to bless this to show the new MIR? (Or forgot to add it to the commit?)

Comment on lines +300 to +305
// 2. Check that the `func` of the call is known.
let func = func.constant()?;
// 3. Check that the `func` is FnDef
let ty::FnDef(defid, generic_args) = func.ty().kind() else {
return None;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true
}
}
fn not_zst<'tcx>(t: Ty<'tcx>, tcx: TyCtxt<'tcx>) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really don't want to ever remake layout. If you depend on a layout question, use a query that calls layout. (If it's too generic to get an answer, and doesn't optimize, that's ok. At most do something like look through the fields for anything with known non-ZST layout, IMHO.)

let rejoin = Terminator { kind: TerminatorKind::Goto { target }, source_info };
let mut some_block = BasicBlockData::new(Some(rejoin.clone()), false);
let mut none_block = BasicBlockData::new(Some(rejoin), false);
// Create the None value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder that Option, OptionNone, and OptionSome are all lang items already, if you need them: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/lang_items/enum.LangItem.html#variant.Option

Copy link
Member

@scottmcm scottmcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A proper implementation will require other, bigger changes

I think I'm overall skeptical of this current approach. This goes in the direction of having way more stuff done in mir-opt code instead of the library, which makes it harder to review and harder for people to contribute to. (I really don't want to even debug unsoundness from the rust code implementation and the mir-opt implementation diverging subtly.)

Is there perhaps a way that this could be done more leveraging normal rust code, with some passes that can pick up particular patterns instead?

Spitballing:

  • How much of the value here is from a dedicated ZST check that's known more often than full layout information? Would it be worth a dedicated is_zst intrinsic to expose that? Or a NullOp we could introduce in place of Eq(SizeOf, 0)?
  • What if this was specialization on a compiler-builtin IsZst trait? That ought to have the same kind of "you have the implementation you need once you know the generic specifically enough" that this is doing...
  • Could this be done by "normal" inlining via some tweaks or new attributes of some kind? What if next was implemented as the const check which calls two different inherent methods, and next was marked with some new #[rustc_no_mir_inline_into_this] attribute so the const if would get inlined into the caller almost certainly, giving it a better chance to get folded away and then inline the one sub-call?
  • What if the current bonus for if const { ... } were higher? Especially if it's the first terminator? Or if the cost checker was smarter about not counting both paths since they're not both possible?

TerminatorKind::SwitchInt { discr, targets } => {
if discr.constant().is_some() {
// Not only will this become a `Goto`, but likely other
// things will be removable as unreachable.
self.bonus += CONST_SWITCH_BONUS;

@scottmcm
Copy link
Member

scottmcm commented Feb 9, 2025

Looking at the perf results here, I think the thing I'm most surprised is that check is so much improved on icount: https://perf.rust-lang.org/compare.html?start=8ad2c9724d983cfb116baab0bb800edd17f31644&end=37e77e9a3d5e8a5385dd4735f806ccacda215f14&stat=instructions%3Au&debug=false&opt=false&doc=false

Is check actually generating runtime MIR? I'd have thought it would only need analysis MIR, and thus this PR wouldn't affect check perf, since I wouldn't expect it to change the actual post-optimization machine code of the compiler (especially after PGO and BOLT).

@FractalFir
Copy link
Contributor Author

Closing this in favour of #136771, which seems to achieve the same goal without the need for a MIR pass(and also seems to be a bit faster).

@FractalFir FractalFir closed this Feb 9, 2025
@fmease fmease removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants