Back InitMask by IntervalSet #94450

Mark-Simulacrum · 2022-02-28T14:25:12Z

On the example below (sourced from #93215), this cuts ~2 GB of memory from the total memory usage at no significant delta or a slight improvement in instruction counts; will run perf across the full suite of benchmarks to evaluate any further need for improvements. The example moves from using 33 to 31 GB, roughly, which makes sense: the array is 16*10,000*100,000 = 14.9 GB, which corresponds to ~1.86 GB for the initialization bitset. (The full array is initialized in this case).

The remaining memory usage is due to the two copies of the array stored in memory: currently all constants we pass into LLVM are duplicated because LLVM seems to require a trailing null byte (which we don't provide), requiring a reallocation by LLVM. See LLVMConstStringInContext calls, if interested in more detail. This is likely fixable but will require some further work that's largely orthogonal from this PR, and the wins in this PR will remain useful.

const ROWS: usize = 100000;
const COLS: usize = 10000;

static TWODARRAY: [[u128; COLS]; ROWS] = [[0; COLS]; ROWS];

fn main() {}

Mark-Simulacrum · 2022-02-28T14:25:40Z

@bors try @rust-timer queue

rust-timer · 2022-02-28T14:25:41Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-02-28T14:25:48Z

⌛ Trying commit 8f368a15c9d3ce6379271799985472132b6e6cfd with merge 05028ceb468ace681da1478ce8b2d521c433e20e...

bors · 2022-02-28T15:57:09Z

☀️ Try build successful - checks-actions
Build commit: 05028ceb468ace681da1478ce8b2d521c433e20e (05028ceb468ace681da1478ce8b2d521c433e20e)

rust-timer · 2022-02-28T15:57:10Z

Queued 05028ceb468ace681da1478ce8b2d521c433e20e with parent edda7e9, future comparison URL.

This extends the functionality available on IntervalSet to provide more flexible access needed for the InitMask abstraction (and, likely, more generally useful as code wishes to migrate between the various Idx-based set abstractions available in rustc_index).

This avoids trouble with I = usize on 64-bit systems, where previously we would either assert or truncate when converting to u32.

tmiasko · 2022-02-28T16:12:14Z

The worst case scenarios that came up last time in this context have been array where elements contain padding, for example:

pub static TWODARRAY: [[(u64, u8); COLS]; ROWS] = [[(0, 0); COLS]; ROWS];

Mark-Simulacrum · 2022-02-28T17:26:15Z

Yeah, we'll do much worse on such arrays, it's a good point. I'm not sure what the right tradeoff is -- in practice, for the example you suggest we still end up with something like the IntervalSet (just without random insert ability) in the compression routine.

The IntervalSet has an up to 64x increase in memory usage over the bitset approach in the worst case, I think, since every other byte being init/uninit is represented in bytes/8 in the bitset and bytes/2*16 (64x larger) in the IntervalSet. I can switch the implementation over to a ChunkedBitSet, which would work better in the all-init and all-uninit cases (with 2048 bit granularity of choice), and so potentially is a more scalable solution -- it could easily enough support all the APIs we want, though compression of some kind likely still be desirable.

It probably makes more sense to prioritize reducing worst-case extra memory (i.e., the every other bit case, for example, which a bitset of some kind is good at) rather than best-case memory where ~all bytes are the same init state.

Some part of me also wants to consider implementing some kind of compression scheme atop IntervalSet, but it's probably both complicated and not worthwhile.

rust-timer · 2022-02-28T17:27:40Z

Finished benchmarking commit (05028ceb468ace681da1478ce8b2d521c433e20e): comparison url.

Summary: This benchmark run shows 4 relevant improvements 🎉 but 42 relevant regressions 😿 to instruction counts.

Arithmetic mean of relevant regressions: 0.8%
Arithmetic mean of relevant improvements: -22.7%
Arithmetic mean of all relevant changes: -1.3%
Largest improvement in instruction counts: -29.6% on incr-unchanged builds of ctfe-stress-4 check
Largest regression in instruction counts: 1.5% on full builds of keccak check

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

erikdesjardins · 2022-03-05T04:59:38Z

currently all constants we pass into LLVM are duplicated because LLVM seems to require a trailing null byte (which we don't provide), requiring a reallocation by LLVM.

This (unfortunately) doesn't seem to be the case--in practice we always pass DontNullTerminate = true to LLVMConstStringInContext.

All constants we pass into LLVM are duplicated because they're interned in a StringMap: https://github.com/llvm/llvm-project/blob/988dae653f4166020f28028cc149dd641c04dae7/llvm/lib/IR/Constants.cpp#L2982-L3002 (see also https://llvm.org/doxygen/classllvm_1_1Constant.html#details)

bors · 2022-03-07T01:54:49Z

☔ The latest upstream changes (presumably #94597) made this pull request unmergeable. Please resolve the merge conflicts.

cleanup: remove unused ability to have LLVM null-terminate const strings (and the copied function in rustc_codegen_gcc) Noticed this while writing rust-lang#94450 (comment). r? `@nagisa`

cleanup: remove unused ability to have LLVM null-terminate const strings (and the copied function in rustc_codegen_gcc) Noticed this while writing rust-lang/rust#94450 (comment). r? `@nagisa`

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Feb 28, 2022

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 28, 2022

This comment has been minimized.

Sign in to view

Mark-Simulacrum added 4 commits February 28, 2022 10:59

Expand IntervalSet APIs

99695a3

This extends the functionality available on IntervalSet to provide more flexible access needed for the InitMask abstraction (and, likely, more generally useful as code wishes to migrate between the various Idx-based set abstractions available in rustc_index).

Use IntervalSet in InitMask rather than the custom bitset impl

e540587

Move IntervalSet to store I directly

7dee111

This avoids trouble with I = usize on 64-bit systems, where previously we would either assert or truncate when converting to u32.

Avoid compressing InitMasks for copy_repeatedly

60a95f1

Bless mir-opt tests for new InitMask Debug impl

18c3cb2

Mark-Simulacrum force-pushed the initmask-refactor branch from 24010be to 18c3cb2 Compare February 28, 2022 17:04

Mark-Simulacrum added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Feb 28, 2022

rustbot added perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 28, 2022

Mark-Simulacrum removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 28, 2022

erikdesjardins mentioned this pull request Mar 5, 2022

cleanup: remove unused ability to have LLVM null-terminate const strings #94638

Merged

Mark-Simulacrum closed this Mar 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Back InitMask by IntervalSet #94450

Back InitMask by IntervalSet #94450

Mark-Simulacrum commented Feb 28, 2022 •

edited

Loading

Mark-Simulacrum commented Feb 28, 2022

rust-timer commented Feb 28, 2022

bors commented Feb 28, 2022

This comment has been minimized.

bors commented Feb 28, 2022

rust-timer commented Feb 28, 2022

tmiasko commented Feb 28, 2022

Mark-Simulacrum commented Feb 28, 2022

rust-timer commented Feb 28, 2022

erikdesjardins commented Mar 5, 2022

bors commented Mar 7, 2022

Back InitMask by IntervalSet #94450

Back InitMask by IntervalSet #94450

Conversation

Mark-Simulacrum commented Feb 28, 2022 • edited Loading

Mark-Simulacrum commented Feb 28, 2022

rust-timer commented Feb 28, 2022

bors commented Feb 28, 2022

This comment has been minimized.

bors commented Feb 28, 2022

rust-timer commented Feb 28, 2022

tmiasko commented Feb 28, 2022

Mark-Simulacrum commented Feb 28, 2022

rust-timer commented Feb 28, 2022

erikdesjardins commented Mar 5, 2022

bors commented Mar 7, 2022

Mark-Simulacrum commented Feb 28, 2022 •

edited

Loading