-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[perf experiment] A MIR pass dedicated to optimizing common iterators #136745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
@FractalFir: 🔑 Insufficient privileges: not in try users |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
[perf experiment] A MIR pass dedicated to optimizing common iterators **This PR is a perf experiment, and is not meant to be accepted. I am creating it to request a perf run** # Motivation Currently, many commonly used iterators don't get inlined during MIR optimization: this leads to increased ammount of LLVM-IR. Since those iterators are also generic, they are unlikely to be inlined anytime soon. # Optimizing slice iteraotrs This PR adds an experimental pass which replaces 2 commonly used iterators(`std::slice::Iter` and `std::slice::IterMut`) with inline implementations. Should this pass show potential for performance gains, I will work on an improved version, which will also handle other common iterators from `core`(eg. `Range`, `Enumerate`). A proper implementation will require other, bigger changes(e.g. *maybe* marking certain iterators as lang items for quicker lookup). Because of that, I am asking for a perf run, to see if that effort will be worth it.
ty::Array(_, _) => false, | ||
ty::Never | ty::FnDef(..) => false, | ||
ty::Adt(def, args) => match def.adt_kind() { | ||
AdtKind::Enum => def.variants().len() > 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehehe, very familiar function... I have made this mistake too, this will blow up for enums with uninhabited variants.
If it helps you can steal from here: 2c19fc6#diff-d5c6f94594e21dda2a9e3717a8b245c481fd461d8eece9c7bc960693cb0c368aR795
The job Click to see the possible cause of the failure (guessed by this bot)
|
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (37e77e9): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -2.9%, secondary 0.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 0.6%, secondary -0.3%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.2%, secondary -0.3%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 780.482s -> 778.114s (-0.30%) |
for bid in (0..(bbs.len())).into_iter().map(BasicBlock::from_usize) { | ||
let mut bb = &bbs[bid]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,5 @@ | |||
#[no_mangle] | |||
// EMIT_MIR slice_iter.built.after.mir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you forgot to bless this to show the new MIR? (Or forgot to add it to the commit?)
// 2. Check that the `func` of the call is known. | ||
let func = func.constant()?; | ||
// 3. Check that the `func` is FnDef | ||
let ty::FnDef(defid, generic_args) = func.ty().kind() else { | ||
return None; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true | ||
} | ||
} | ||
fn not_zst<'tcx>(t: Ty<'tcx>, tcx: TyCtxt<'tcx>) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You really don't want to ever remake layout. If you depend on a layout question, use a query that calls layout. (If it's too generic to get an answer, and doesn't optimize, that's ok. At most do something like look through the fields for anything with known non-ZST layout, IMHO.)
let rejoin = Terminator { kind: TerminatorKind::Goto { target }, source_info }; | ||
let mut some_block = BasicBlockData::new(Some(rejoin.clone()), false); | ||
let mut none_block = BasicBlockData::new(Some(rejoin), false); | ||
// Create the None value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder that Option
, OptionNone
, and OptionSome
are all lang items already, if you need them: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/lang_items/enum.LangItem.html#variant.Option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A proper implementation will require other, bigger changes
I think I'm overall skeptical of this current approach. This goes in the direction of having way more stuff done in mir-opt code instead of the library, which makes it harder to review and harder for people to contribute to. (I really don't want to even debug unsoundness from the rust code implementation and the mir-opt implementation diverging subtly.)
Is there perhaps a way that this could be done more leveraging normal rust code, with some passes that can pick up particular patterns instead?
Spitballing:
- How much of the value here is from a dedicated ZST check that's known more often than full layout information? Would it be worth a dedicated
is_zst
intrinsic to expose that? Or aNullOp
we could introduce in place ofEq(SizeOf, 0)
? - What if this was specialization on a compiler-builtin
IsZst
trait? That ought to have the same kind of "you have the implementation you need once you know the generic specifically enough" that this is doing... - Could this be done by "normal" inlining via some tweaks or new attributes of some kind? What if
next
was implemented as theconst
check which calls two different inherent methods, andnext
was marked with some new#[rustc_no_mir_inline_into_this]
attribute so theconst if
would get inlined into the caller almost certainly, giving it a better chance to get folded away and then inline the one sub-call? - What if the current bonus for
if const { ... }
were higher? Especially if it's the first terminator? Or if the cost checker was smarter about not counting both paths since they're not both possible?
rust/compiler/rustc_mir_transform/src/cost_checker.rs
Lines 145 to 149 in 43ca9d1
TerminatorKind::SwitchInt { discr, targets } => { | |
if discr.constant().is_some() { | |
// Not only will this become a `Goto`, but likely other | |
// things will be removable as unreachable. | |
self.bonus += CONST_SWITCH_BONUS; |
Looking at the perf results here, I think the thing I'm most surprised is that Is |
Closing this in favour of #136771, which seems to achieve the same goal without the need for a MIR pass(and also seems to be a bit faster). |
This PR is a perf experiment, and is not meant to be accepted. I am creating it to request a perf run
Motivation
Currently, many commonly used iterators don't get inlined during MIR optimization: this leads to increased ammount of LLVM-IR.
Since those iterators are also generic, they are unlikely to be inlined anytime soon.
Optimizing slice iteraotrs
This PR adds an experimental pass which replaces 2 commonly used iterators(
std::slice::Iter
andstd::slice::IterMut
) with inline implementations.Should this pass show potential for performance gains, I will work on an improved version, which will also handle other common iterators from
core
(eg.Range
,Enumerate
).A proper implementation will require other, bigger changes(e.g. maybe marking certain iterators as lang items for quicker lookup).
Because of that, I am asking for a perf run, to see if that effort will be worth it.