-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Back InitMask by IntervalSet #94450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Back InitMask by IntervalSet #94450
Conversation
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 8f368a15c9d3ce6379271799985472132b6e6cfd with merge 05028ceb468ace681da1478ce8b2d521c433e20e... |
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
Queued 05028ceb468ace681da1478ce8b2d521c433e20e with parent edda7e9, future comparison URL. |
This extends the functionality available on IntervalSet to provide more flexible access needed for the InitMask abstraction (and, likely, more generally useful as code wishes to migrate between the various Idx-based set abstractions available in rustc_index).
This avoids trouble with I = usize on 64-bit systems, where previously we would either assert or truncate when converting to u32.
The worst case scenarios that came up last time in this context have been array where elements contain padding, for example: pub static TWODARRAY: [[(u64, u8); COLS]; ROWS] = [[(0, 0); COLS]; ROWS]; |
24010be
to
18c3cb2
Compare
Yeah, we'll do much worse on such arrays, it's a good point. I'm not sure what the right tradeoff is -- in practice, for the example you suggest we still end up with something like the IntervalSet (just without random insert ability) in the compression routine. The IntervalSet has an up to 64x increase in memory usage over the bitset approach in the worst case, I think, since every other byte being init/uninit is represented in bytes/8 in the bitset and bytes/2*16 (64x larger) in the IntervalSet. I can switch the implementation over to a ChunkedBitSet, which would work better in the all-init and all-uninit cases (with 2048 bit granularity of choice), and so potentially is a more scalable solution -- it could easily enough support all the APIs we want, though compression of some kind likely still be desirable. It probably makes more sense to prioritize reducing worst-case extra memory (i.e., the every other bit case, for example, which a bitset of some kind is good at) rather than best-case memory where ~all bytes are the same init state. Some part of me also wants to consider implementing some kind of compression scheme atop IntervalSet, but it's probably both complicated and not worthwhile. |
Finished benchmarking commit (05028ceb468ace681da1478ce8b2d521c433e20e): comparison url. Summary: This benchmark run shows 4 relevant improvements 🎉 but 42 relevant regressions 😿 to instruction counts.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never |
This (unfortunately) doesn't seem to be the case--in practice we always pass All constants we pass into LLVM are duplicated because they're interned in a StringMap: https://github.com/llvm/llvm-project/blob/988dae653f4166020f28028cc149dd641c04dae7/llvm/lib/IR/Constants.cpp#L2982-L3002 (see also https://llvm.org/doxygen/classllvm_1_1Constant.html#details) |
☔ The latest upstream changes (presumably #94597) made this pull request unmergeable. Please resolve the merge conflicts. |
cleanup: remove unused ability to have LLVM null-terminate const strings (and the copied function in rustc_codegen_gcc) Noticed this while writing rust-lang#94450 (comment). r? `@nagisa`
cleanup: remove unused ability to have LLVM null-terminate const strings (and the copied function in rustc_codegen_gcc) Noticed this while writing rust-lang/rust#94450 (comment). r? `@nagisa`
On the example below (sourced from #93215), this cuts ~2 GB of memory from the total memory usage at no significant delta or a slight improvement in instruction counts; will run perf across the full suite of benchmarks to evaluate any further need for improvements. The example moves from using 33 to 31 GB, roughly, which makes sense: the array is 16*10,000*100,000 = 14.9 GB, which corresponds to ~1.86 GB for the initialization bitset. (The full array is initialized in this case).
The remaining memory usage is due to the two copies of the array stored in memory: currently all constants we pass into LLVM are duplicated because LLVM seems to require a trailing null byte (which we don't provide), requiring a reallocation by LLVM. See LLVMConstStringInContext calls, if interested in more detail. This is likely fixable but will require some further work that's largely orthogonal from this PR, and the wins in this PR will remain useful.