[perf experiment] A MIR pass dedicated to optimizing common iterators #136745

FractalFir · 2025-02-08T17:51:40Z

This PR is a perf experiment, and is not meant to be accepted. I am creating it to request a perf run

Motivation

Currently, many commonly used iterators don't get inlined during MIR optimization: this leads to increased ammount of LLVM-IR.

Since those iterators are also generic, they are unlikely to be inlined anytime soon.

Optimizing slice iteraotrs

This PR adds an experimental pass which replaces 2 commonly used iterators(std::slice::Iter and std::slice::IterMut) with inline implementations.

Should this pass show potential for performance gains, I will work on an improved version, which will also handle other common iterators from core(eg. Range, Enumerate).

A proper implementation will require other, bigger changes(e.g. maybe marking certain iterators as lang items for quicker lookup).

Because of that, I am asking for a perf run, to see if that effort will be worth it.

rustbot · 2025-02-08T17:51:49Z

r? @fmease

rustbot has assigned @fmease.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rustbot · 2025-02-08T17:51:51Z

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

FractalFir · 2025-02-08T17:52:04Z

@bors try @rust-timer queue

bors · 2025-02-08T17:52:07Z

@FractalFir: 🔑 Insufficient privileges: not in try users

matthiaskrgr · 2025-02-08T17:56:56Z

@bors try @rust-timer queue

bors · 2025-02-08T17:58:10Z

⌛ Trying commit 1d4d571 with merge 37e77e9...

[perf experiment] A MIR pass dedicated to optimizing common iterators **This PR is a perf experiment, and is not meant to be accepted. I am creating it to request a perf run** # Motivation Currently, many commonly used iterators don't get inlined during MIR optimization: this leads to increased ammount of LLVM-IR. Since those iterators are also generic, they are unlikely to be inlined anytime soon. # Optimizing slice iteraotrs This PR adds an experimental pass which replaces 2 commonly used iterators(`std::slice::Iter` and `std::slice::IterMut`) with inline implementations. Should this pass show potential for performance gains, I will work on an improved version, which will also handle other common iterators from `core`(eg. `Range`, `Enumerate`). A proper implementation will require other, bigger changes(e.g. *maybe* marking certain iterators as lang items for quicker lookup). Because of that, I am asking for a perf run, to see if that effort will be worth it.

the8472 · 2025-02-08T18:10:37Z

compiler/rustc_mir_transform/src/streamline_iter.rs

+        ty::Array(_, _) => false,
+        ty::Never | ty::FnDef(..) => false,
+        ty::Adt(def, args) => match def.adt_kind() {
+            AdtKind::Enum => def.variants().len() > 1,


Hehehe, very familiar function... I have made this mistake too, this will blow up for enums with uninhabited variants.

If it helps you can steal from here: 2c19fc6#diff-d5c6f94594e21dda2a9e3717a8b245c481fd461d8eece9c7bc960693cb0c368aR795

rust-log-analyzer · 2025-02-08T18:14:13Z

The job x86_64-gnu-llvm-18 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

#22 exporting to docker image format
#22 sending tarball 27.1s done
#22 DONE 33.1s
##[endgroup]
Setting extra environment values for docker:  --env ENABLE_GCC_CODEGEN=1 --env GCC_EXEC_PREFIX=/usr/lib/gcc/
[CI_JOB_NAME=x86_64-gnu-llvm-18]
debug: `DISABLE_CI_RUSTC_IF_INCOMPATIBLE` configured.
---
sccache: Starting the server...
##[group]Configure the build
configure: processing command line
configure: 
configure: build.configure-args := ['--build=x86_64-unknown-linux-gnu', '--llvm-root=/usr/lib/llvm-18', '--enable-llvm-link-shared', '--set', 'rust.randomize-layout=true', '--set', 'rust.thin-lto-import-instr-limit=10', '--enable-verbose-configure', '--enable-sccache', '--disable-manage-submodules', '--enable-locked-deps', '--enable-cargo-native-static', '--set', 'rust.codegen-units-std=1', '--set', 'dist.compression-profile=balanced', '--dist-compression-formats=xz', '--set', 'rust.lld=false', '--disable-dist-src', '--release-channel=nightly', '--enable-debug-assertions', '--enable-overflow-checks', '--enable-llvm-assertions', '--set', 'rust.verify-llvm-ir', '--set', 'rust.codegen-backends=llvm,cranelift,gcc', '--set', 'llvm.static-libstdcpp', '--enable-new-symbol-mangling']
configure: target.x86_64-unknown-linux-gnu.llvm-config := /usr/lib/llvm-18/bin/llvm-config
configure: llvm.link-shared     := True
configure: rust.randomize-layout := True
configure: rust.thin-lto-import-instr-limit := 10
---
failures:

---- [mir-opt] tests/mir-opt/slice_iter.rs stdout ----

thread '[mir-opt] tests/mir-opt/slice_iter.rs' panicked at src/tools/compiletest/src/runtest/mir_opt.rs:72:21:
Output file `/checkout/obj/build/x86_64-unknown-linux-gnu/test/mir-opt/slice_iter/slice_iter.built.after.mir` from test does not exist, available files are in `/checkout/obj/build/x86_64-unknown-linux-gnu/test/mir-opt/slice_iter`


failures:
    [mir-opt] tests/mir-opt/slice_iter.rs

bors · 2025-02-08T19:47:15Z

☀️ Try build successful - checks-actions
Build commit: 37e77e9 (37e77e9a3d5e8a5385dd4735f806ccacda215f14)

rust-timer · 2025-02-08T21:29:45Z

Finished benchmarking commit (37e77e9): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	3.7%	[3.7%, 3.7%]	1
Regressions ❌ (secondary)	5.0%	[0.6%, 9.3%]	2
Improvements ✅ (primary)	-0.5%	[-2.4%, -0.2%]	188
Improvements ✅ (secondary)	-0.5%	[-1.1%, -0.1%]	72
All ❌✅ (primary)	-0.5%	[-2.4%, 3.7%]	189

Max RSS (memory usage)

Results (primary -2.9%, secondary 0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.6%	[2.6%, 2.6%]	1
Improvements ✅ (primary)	-2.9%	[-2.9%, -2.9%]	1
Improvements ✅ (secondary)	-2.3%	[-2.3%, -2.3%]	1
All ❌✅ (primary)	-2.9%	[-2.9%, -2.9%]	1

Cycles

Results (primary 0.6%, secondary -0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.5%	[3.5%, 3.5%]	1
Regressions ❌ (secondary)	7.1%	[4.6%, 9.6%]	2
Improvements ✅ (primary)	-0.8%	[-1.0%, -0.6%]	2
Improvements ✅ (secondary)	-2.4%	[-2.9%, -1.3%]	7
All ❌✅ (primary)	0.6%	[-1.0%, 3.5%]	3

Binary size

Results (primary -0.2%, secondary -0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.0%, 0.1%]	19
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.3%	[-1.0%, -0.0%]	30
Improvements ✅ (secondary)	-0.3%	[-0.4%, -0.1%]	75
All ❌✅ (primary)	-0.2%	[-1.0%, 0.1%]	49

Bootstrap: 780.482s -> 778.114s (-0.30%)
Artifact size: 329.09 MiB -> 329.14 MiB (0.01%)

scottmcm · 2025-02-09T07:36:54Z

compiler/rustc_mir_transform/src/streamline_iter.rs

+        for bid in (0..(bbs.len())).into_iter().map(BasicBlock::from_usize) {
+            let mut bb = &bbs[bid];


https://doc.rust-lang.org/nightly/nightly-rustc/rustc_index/vec/struct.IndexVec.html#method.into_iter_enumerated if possible (if not, try https://doc.rust-lang.org/nightly/nightly-rustc/rustc_index/vec/struct.IndexVec.html#method.indices).

scottmcm · 2025-02-09T07:37:45Z

tests/mir-opt/slice_iter.rs

@@ -0,0 +1,5 @@
+#[no_mangle]
+// EMIT_MIR slice_iter.built.after.mir


I think you forgot to bless this to show the new MIR? (Or forgot to add it to the commit?)

scottmcm · 2025-02-09T07:39:57Z

compiler/rustc_mir_transform/src/streamline_iter.rs

+    // 2. Check that the `func` of the call is known.
+    let func = func.constant()?;
+    // 3. Check that the `func` is FnDef
+    let ty::FnDef(defid, generic_args) = func.ty().kind() else {
+        return None;
+    };


nit: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.Operand.html#method.const_fn_def

scottmcm · 2025-02-09T07:47:12Z

compiler/rustc_mir_transform/src/streamline_iter.rs

+        true
+    }
+}
+fn not_zst<'tcx>(t: Ty<'tcx>, tcx: TyCtxt<'tcx>) -> bool {


You really don't want to ever remake layout. If you depend on a layout question, use a query that calls layout. (If it's too generic to get an answer, and doesn't optimize, that's ok. At most do something like look through the fields for anything with known non-ZST layout, IMHO.)

scottmcm · 2025-02-09T07:49:37Z

compiler/rustc_mir_transform/src/streamline_iter.rs

+            let rejoin = Terminator { kind: TerminatorKind::Goto { target }, source_info };
+            let mut some_block = BasicBlockData::new(Some(rejoin.clone()), false);
+            let mut none_block = BasicBlockData::new(Some(rejoin), false);
+            // Create the None value


Reminder that Option, OptionNone, and OptionSome are all lang items already, if you need them: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/lang_items/enum.LangItem.html#variant.Option

scottmcm

A proper implementation will require other, bigger changes

I think I'm overall skeptical of this current approach. This goes in the direction of having way more stuff done in mir-opt code instead of the library, which makes it harder to review and harder for people to contribute to. (I really don't want to even debug unsoundness from the rust code implementation and the mir-opt implementation diverging subtly.)

Is there perhaps a way that this could be done more leveraging normal rust code, with some passes that can pick up particular patterns instead?

Spitballing:

How much of the value here is from a dedicated ZST check that's known more often than full layout information? Would it be worth a dedicated is_zst intrinsic to expose that? Or a NullOp we could introduce in place of Eq(SizeOf, 0)?
What if this was specialization on a compiler-builtin IsZst trait? That ought to have the same kind of "you have the implementation you need once you know the generic specifically enough" that this is doing...
Could this be done by "normal" inlining via some tweaks or new attributes of some kind? What if next was implemented as the const check which calls two different inherent methods, and next was marked with some new #[rustc_no_mir_inline_into_this] attribute so the const if would get inlined into the caller almost certainly, giving it a better chance to get folded away and then inline the one sub-call?
What if the current bonus for if const { ... } were higher? Especially if it's the first terminator? Or if the cost checker was smarter about not counting both paths since they're not both possible?

rust/compiler/rustc_mir_transform/src/cost_checker.rs

Lines 145 to 149 in 43ca9d1

    
           TerminatorKind::SwitchInt { discr, targets } => { 
        
               if discr.constant().is_some() { 
        
                   // Not only will this become a `Goto`, but likely other 
        
                   // things will be removable as unreachable. 
        
                   self.bonus += CONST_SWITCH_BONUS;

scottmcm · 2025-02-09T08:14:14Z

Looking at the perf results here, I think the thing I'm most surprised is that check is so much improved on icount: https://perf.rust-lang.org/compare.html?start=8ad2c9724d983cfb116baab0bb800edd17f31644&end=37e77e9a3d5e8a5385dd4735f806ccacda215f14&stat=instructions%3Au&debug=false&opt=false&doc=false

Is check actually generating runtime MIR? I'd have thought it would only need analysis MIR, and thus this PR wouldn't affect check perf, since I wouldn't expect it to change the actual post-optimization machine code of the compiler (especially after PGO and BOLT).

FractalFir · 2025-02-09T15:38:26Z

Closing this in favour of #136771, which seems to achieve the same goal without the need for a MIR pass(and also seems to be a bit faster).

Experiment with inling iterators

1d4d571

rustbot assigned fmease Feb 8, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 8, 2025

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 8, 2025

the8472 reviewed Feb 8, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 8, 2025

scottmcm reviewed Feb 9, 2025

View reviewed changes

scottmcm requested changes Feb 9, 2025

View reviewed changes

FractalFir closed this Feb 9, 2025

fmease removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[perf experiment] A MIR pass dedicated to optimizing common iterators #136745

[perf experiment] A MIR pass dedicated to optimizing common iterators #136745

FractalFir commented Feb 8, 2025

rustbot commented Feb 8, 2025

rustbot commented Feb 8, 2025

FractalFir commented Feb 8, 2025

This comment has been minimized.

bors commented Feb 8, 2025

matthiaskrgr commented Feb 8, 2025

This comment has been minimized.

bors commented Feb 8, 2025

the8472 Feb 8, 2025

rust-log-analyzer commented Feb 8, 2025

bors commented Feb 8, 2025

This comment has been minimized.

rust-timer commented Feb 8, 2025

scottmcm Feb 9, 2025

scottmcm Feb 9, 2025

scottmcm Feb 9, 2025

scottmcm Feb 9, 2025

scottmcm Feb 9, 2025

scottmcm left a comment •

edited

Loading

scottmcm commented Feb 9, 2025

FractalFir commented Feb 9, 2025

		for bid in (0..(bbs.len())).into_iter().map(BasicBlock::from_usize) {
		let mut bb = &bbs[bid];

		@@ -0,0 +1,5 @@
		#[no_mangle]
		// EMIT_MIR slice_iter.built.after.mir

	TerminatorKind::SwitchInt { discr, targets } => {
	if discr.constant().is_some() {
	// Not only will this become a `Goto`, but likely other
	// things will be removable as unreachable.
	self.bonus += CONST_SWITCH_BONUS;

[perf experiment] A MIR pass dedicated to optimizing common iterators #136745

[perf experiment] A MIR pass dedicated to optimizing common iterators #136745

Conversation

FractalFir commented Feb 8, 2025

Motivation

Optimizing slice iteraotrs

rustbot commented Feb 8, 2025

rustbot commented Feb 8, 2025

FractalFir commented Feb 8, 2025

This comment has been minimized.

bors commented Feb 8, 2025

matthiaskrgr commented Feb 8, 2025

This comment has been minimized.

bors commented Feb 8, 2025

Choose a reason for hiding this comment

rust-log-analyzer commented Feb 8, 2025

bors commented Feb 8, 2025

This comment has been minimized.

rust-timer commented Feb 8, 2025

Overall result: ❌✅ regressions and improvements - please read the text below

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottmcm left a comment • edited Loading

Choose a reason for hiding this comment

scottmcm commented Feb 9, 2025

FractalFir commented Feb 9, 2025

scottmcm left a comment •

edited

Loading