-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Optimize miri checking of integer array/slices #53903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
I think if you make is_range_defined return a result with the position of the first undefined byte in the error variant, and bubble that up, you should be able to recompute the index |
@oli-obk I've tried to implement what you suggested, but I've run into a bigger issue: The field in question is
Before this optimization, we would've caught the usage of |
} => { | ||
let len = dest.len(self)?; | ||
|
||
match self.memory.read_bytes(dest.ptr, Size::from_bytes(len)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is len
the length in bytes or in number of elements? If it is the former, you also need to multiply it by the element type's size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell from the source, it's the count
of elements in the array (or of the slice). I'll see what I can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then that's the source of your test failure. You're only testing the first 4 bytes, instead of all the 4 elements's bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the diagnostics now exactly like the old ones? Or why are there no test changes?
If so, that's great!
r? oli-obk |
I did my best and I think they should match the old ones. At least for the unit tests they match. Otherwise, they should still be pretty relevant. r? @oli-obk |
let's give it a perf run @bors try |
⌛ Trying commit 403477d319dd8bf91ed2f9ac264f77c73484eb9a with merge c4f54ce2096dd4d5263978b4dddea0637839f447... |
☀️ Test successful - status-travis |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@rust-timer build c4f54ce2096dd4d5263978b4dddea0637839f447 |
Success: Queued c4f54ce2096dd4d5263978b4dddea0637839f447 with parent 0be2c30, comparison URL. |
We actually lost performance. Can you do a local perf run and see whether it improves if you inline the function? |
@oli-obk I've managed to get rustc-perf to work locally. Inlining the function seems to have had some improvement on my computer, but I don't know if it's enough to regain the loss:
|
very weird. I would have assumed we'd get enormous benefits from not recomputing each field. Do we actually have perf tests for large arrays of integers? And is your code actually hit by those? |
@oli-obk I have no clue why there's worse performance. I've experimented around a bit with the code, and the only way to get about the same performance as
I've checked out the source behind |
I guess there's our problem. we're not perf testing the sanity checks. can you just locally perf |
@oli-obk Wow, you were right! It did make a huge difference. Compilation time with latest nightly: Benchmark #1: rustc ./src/test/run-pass/mir_heavy_promoted.rs
Time (mean ± σ): 10.361 s ± 0.110 s [User: 10.149 s, System: 0.217 s]
Range (min … max): 10.276 s … 10.596 s Compilation time with this PR: Benchmark #1: ./build/x86_64-unknown-linux-gnu/stage2/bin/rustc ./src/test/run-pass/mir_heavy_promoted.rs
Time (mean ± σ): 2.538 s ± 0.053 s [User: 2.358 s, System: 0.187 s]
Range (min … max): 2.477 s … 2.617 s 2.5s vs 10.3s... that's 4x improvement! |
src/librustc/mir/interpret/mod.rs
Outdated
/// | ||
/// Returns `Ok(())` if it's defined. Otherwise returns the index of the byte | ||
/// at which the first undefined access begins. | ||
pub fn is_range_defined(&self, start: Size, end: Size) -> Result<(), Size> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried inlining this, too?
I'll accept the 1% regression and take the 4x speedup. Can you just do a quick check whether more inlining in this code can help? |
@oli-obk yep, checking right now. I assume you're interested in the results on |
@oli-obk The perf results are in: It seems the performance regression is much lower now. Results of the stress test compared to
The |
Instead of checking every element, we can check the whole memory range at once.
Wonderful! @bors r+ |
📌 Commit 82cde90 has been approved by |
☀️ Test successful - status-appveyor, status-travis |
This pull request implements the optimization described in #53845 (the
E-easy
part of that issue, not the refactoring). Instead of checking every element of an integral array, we can check the whole memory range at once.r? @RalfJung