Codegen for a large static array uses a lot of memory #30083

sanxiyn · 2015-11-27T13:56:20Z

Following 2 lines of Rust code can use more than 200 MB of memory to compile.

const L: usize = 1 << 25;
pub static S: [u32; L] = [1; L];

$ rustc --crate-type lib -Z time-passes test.rs | grep translation
time: 0.710; rss: 209MB translation

Reported on users.rust-lang.org post. See also #23600.

The text was updated successfully, but these errors were encountered:

eefriedman · 2015-11-27T20:30:58Z

200MB is possibly slightly excessive... but it's naturally going to take a lot of memory to generate a 30MB object file.

eefriedman · 2015-11-27T20:33:35Z

Wait, no, I miscalculated; it's actually a 130MB object file.

Aatch · 2015-11-28T03:52:07Z

This is more of an LLVM issue, as I believe that we have to generate essentially [1, 1, 1, 1, ...., 1] in this case, which means LLVM ends up tracking 2^25 values. I'm actually surprised that it doesn't end up using 2^25 * 8 bytes of memory for 2^25 pointers.

Note that this isn't an issue for [0; L], as LLVM has a zeroinitializer value that represents "all zero bits" for any type.

Looking at the issue from users.rust-lang.org, it should be able to use a zeroinitializer, as the value is all-zeros. However, it doesn't for some reason. Investigation suggests that it doesn't for element types larger than a word (so [(u32, f32); L] is fine, [(u64, f32); L] is not). I haven't checked the relevant code yet though.

eddyb · 2015-12-21T14:16:43Z

It would be nice if LLVM was doing RLE for these kinds of things.
Although I don't recall C being able to generate anything large other than zeroed arrays (which is what the zeroinitializer is for).

Aatch · 2016-08-01T05:41:14Z

One issue shown by #35154 is that very large arrays can cause ICEs due to us trying to create absurdly-large vectors to hand to LLVM for constants.

bstrie · 2020-02-20T15:58:17Z

Triage: time-passes no longer has a step named translation as in the original issue, but as of Rust 1.43 the RSS peaks at 322 MB during the following phases:

time: 0.212; rss: 322MB codegen_to_LLVM_IR
time: 0.000; rss: 322MB assert_dep_graph
time: 0.000; rss: 322MB serialize_dep_graph
time: 0.214; rss: 322MB codegen_crate

That said, I think this issue could use a better title, because I would intuitively expect a large static array to use at least a linear-ish amount of memory when compiling. What is the specific issue that we should address? (eddyb uses the term "RLE" above, which I have no expansion for.)

eddyb · 2020-02-20T16:08:03Z

@bstrie "trans(lation)" is the old jargon for "codegen", maybe I should've retroactively replaced all of the mentions of it to avoid confusion.

RLE means "Run-length encoding", one of the common primitives of compression algorithms.

In this specific case, LLVM could represent constant blocks of data as N repeats of (smaller) block, which would always allow us to represent [e; N] values (even if we might have to detect it on the fly, like a compression algorithm, or also support it in miri).

That said, I think this issue could use a better title, because I would intuitively expect a large static array to use at least a linear-ish amount of memory when compiling.

It's not a bad intuition, but at no point do you actually have a linear amount of entropy, you can represent the array as "1 aka 1u32 aka 01 00 00 00, etc. repeated 2²⁵ times" all the way down to emitting the binary (which you don't have to do in memory either).

steveklabnik added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Jun 6, 2016

Aatch mentioned this issue Aug 1, 2016

Rustc runs out of memory when dealing with extremely large array literals in macros #35154

Closed

Mark-Simulacrum added C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. labels Jul 24, 2017

eddyb changed the title ~~Translating a large static array uses a lot of memory~~ Codegen for a large static array uses a lot of memory Feb 20, 2020

jonas-schievink added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 22, 2020

oli-obk mentioned this issue Jan 20, 2021

rustc hangs when generating very large arrays #81188

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codegen for a large static array uses a lot of memory #30083

Codegen for a large static array uses a lot of memory #30083

sanxiyn commented Nov 27, 2015

eefriedman commented Nov 27, 2015

eefriedman commented Nov 27, 2015

Aatch commented Nov 28, 2015

eddyb commented Dec 21, 2015

Aatch commented Aug 1, 2016

bstrie commented Feb 20, 2020

eddyb commented Feb 20, 2020

Codegen for a large static array uses a lot of memory #30083

Codegen for a large static array uses a lot of memory #30083

Comments

sanxiyn commented Nov 27, 2015

eefriedman commented Nov 27, 2015

eefriedman commented Nov 27, 2015

Aatch commented Nov 28, 2015

eddyb commented Dec 21, 2015

Aatch commented Aug 1, 2016

bstrie commented Feb 20, 2020

eddyb commented Feb 20, 2020