Questions Related to the "Abnormal" Memory Usage of Binaryen #6239

mobsceneZ · 2024-01-25T13:07:01Z

Hi developers, I am incorporating Binaryen as third-party library into my fuzzing framework. In short, Binaryen is used to parse input Wasm module, do some mutation and generation work and emit a new testcase back in this case.

Everything went fine until I deployed it on a cloud server. The memory usage of my fuzzing framework kept going on and eventually kernel's OOM Killer killed my fuzzing process.

It really took me a lot of effort to figure out the reason: during the mutation or generation stage, our work will generate block/loop names for BinaryenBlock()/BinaryenLoop() C API, which will further call IString::interned to update some static variables:

binaryen/src/support/istring.h

Line 36 in 6453fd5

static std::string_view interned(std::string_view s, bool reuse = true);

The problem is that fuzzing may generate different block/loop names for different input Wasm module, and these names become useless when we have finished the mutation/generation stage for this specific module, but the corresponding block/loop names are not erased from these static variables accordingly! Eventually, it results in excessive memory occupation for maintaining these meaningless names.

So, I wonder if there is any way to erase elements inside these static/global variables on a per-module basis, so that the overall memory usage is affordable.

The text was updated successfully, but these errors were encountered:

kripken · 2024-01-25T16:40:47Z

Atm Binaryen is optimized for short-running processes, like calling wasm-opt from the commandline to optimize a module. After the optimization the process ends, which is how toolchains (Emscripten, J2CL, Kotlin, etc.) use Binaryen. The specific issue is that we intern strings for speed, and we never remove them (until the process ends). That is simple and efficient, but if you have a very long running process then it will accumulate memory.

Perhaps we could add an option to not intern strings. That would just need a replacement file for istring.h. It would be slower, but avoid this overhead. However, if you can restart your fuzzer process periodically that would be simpler.

tlively · 2024-01-25T17:44:04Z

We similarly intern types and never delete them, but we do have a function called destroyAllTypesForTestingPurposesOnly that deletes all the interned types. It's wildly unsafe to use, but I could see it being useful to call between independent fuzzer iterations. Perhaps we could have a similar function for the interned strings?

mobsceneZ · 2024-01-26T03:15:52Z

Atm Binaryen is optimized for short-running processes, like calling wasm-opt from the commandline to optimize a module. After the optimization the process ends, which is how toolchains (Emscripten, J2CL, Kotlin, etc.) use Binaryen. The specific issue is that we intern strings for speed, and we never remove them (until the process ends). That is simple and efficient, but if you have a very long running process then it will accumulate memory.

Perhaps we could add an option to not intern strings. That would just need a replacement file for istring.h. It would be slower, but avoid this overhead. However, if you can restart your fuzzer process periodically that would be simpler.

Hey kripken, I think it may not be necessary to implement a new option that support not interning strings. Instead, like @tlively said in the comment, I wonder if it's acceptable to export a C API to clear up these interned types or strings which could make things easier.
Besides, I'm afraid that restarting fuzzer process may not be a good idea for the effectiveness of fuzzing. Another option for me is to maintain a block/loop name pool where each name can be reused across modules/functions, which as I presume could effectively reduce the memory usage. But it will depend on whether "exporting a C API" option is acceptable for Binaryen community :).

mobsceneZ · 2024-01-26T03:24:23Z

We similarly intern types and never delete them, but we do have a function called destroyAllTypesForTestingPurposesOnly that deletes all the interned types. It's wildly unsafe to use, but I could see it being useful to call between independent fuzzer iterations. Perhaps we could have a similar function for the interned strings?

Yes, functions like destroyAllTypesForTestingPurposesOnly are exactly what I need. I'm curious about why this function is "wildly unsafe to use", is is related to the thread safety?

kripken · 2024-01-26T03:43:09Z

I'm not opposed to such an API.

tlively · 2024-01-26T16:36:40Z

We similarly intern types and never delete them, but we do have a function called destroyAllTypesForTestingPurposesOnly that deletes all the interned types. It's wildly unsafe to use, but I could see it being useful to call between independent fuzzer iterations. Perhaps we could have a similar function for the interned strings?

Yes, functions like destroyAllTypesForTestingPurposesOnly are exactly what I need. I'm curious about why this function is "wildly unsafe to use", is is related to the thread safety?

Yeah, it's not safe for any thread to be doing anything with types while another thread is calling this function. But more generally, any types from before the call cannot be used after the call, which is a lifetime issue that code using types normally wouldn't need to think about.

MaxGraey · 2024-01-26T20:19:08Z

How about to use arena or bump allocator for interner? I guess generated names could be quite long and continuously grow in "converge" mode, which may caused to OOM, especially for wasm32 binaryen builds. It would be nice to be able to do a reset for interner state.

mobsceneZ · 2024-01-27T05:45:13Z

We similarly intern types and never delete them, but we do have a function called destroyAllTypesForTestingPurposesOnly that deletes all the interned types. It's wildly unsafe to use, but I could see it being useful to call between independent fuzzer iterations. Perhaps we could have a similar function for the interned strings?

Yes, functions like destroyAllTypesForTestingPurposesOnly are exactly what I need. I'm curious about why this function is "wildly unsafe to use", is is related to the thread safety?

Yeah, it's not safe for any thread to be doing anything with types while another thread is calling this function. But more generally, any types from before the call cannot be used after the call, which is a lifetime issue that code using types normally wouldn't need to think about.

You're right, if there are multiple threads then things could get messy. I do not use multiple threads in my fuzzing framework, so a clean up function is perfectly suitable for me. If that's ok for you to have this kind of C API, I will try adding one and issue a pull request ;).

mobsceneZ · 2024-01-27T06:20:53Z

How about to use arena or bump allocator for interner? I guess generated names could be quite long and continuously grow in "converge" mode, which may caused to OOM especially on for wasm32 binaryen builds. It would be nice to be able to do a reset for interner state.

Hey MaxGraey, I think for this case arena allocator is a pretty nice choice, but I'm not an expert in designing such memory allocators so it may depends on developers.

tlively · 2024-01-29T05:36:49Z

@mobsceneZ, sounds good! I look forward to the PR.

kripken · 2024-01-29T18:27:33Z

A bump allocator is separately a good idea here. We can use the existing MixedArena which gives exactly what we want. A separate PR with that would also be welcome here.

tlively · 2024-12-13T20:53:42Z

Closing because the questions were answered and it looks like there was a good solution found, mentioned in #6298.

MaxGraey mentioned this issue Jan 25, 2024

Compiler failing at optimization step with a big .ts file AssemblyScript/assemblyscript#2810

Closed

mobsceneZ mentioned this issue Feb 11, 2024

C API: Add support for clearing up interned objects #6298

Closed

tlively closed this as completed Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions Related to the "Abnormal" Memory Usage of Binaryen #6239

Questions Related to the "Abnormal" Memory Usage of Binaryen #6239

mobsceneZ commented Jan 25, 2024

kripken commented Jan 25, 2024

tlively commented Jan 25, 2024

mobsceneZ commented Jan 26, 2024

mobsceneZ commented Jan 26, 2024

kripken commented Jan 26, 2024

tlively commented Jan 26, 2024

MaxGraey commented Jan 26, 2024 •

edited

Loading

mobsceneZ commented Jan 27, 2024

mobsceneZ commented Jan 27, 2024

tlively commented Jan 29, 2024

kripken commented Jan 29, 2024

tlively commented Dec 13, 2024

Questions Related to the "Abnormal" Memory Usage of Binaryen #6239

Questions Related to the "Abnormal" Memory Usage of Binaryen #6239

Comments

mobsceneZ commented Jan 25, 2024

kripken commented Jan 25, 2024

tlively commented Jan 25, 2024

mobsceneZ commented Jan 26, 2024

mobsceneZ commented Jan 26, 2024

kripken commented Jan 26, 2024

tlively commented Jan 26, 2024

MaxGraey commented Jan 26, 2024 • edited Loading

mobsceneZ commented Jan 27, 2024

mobsceneZ commented Jan 27, 2024

tlively commented Jan 29, 2024

kripken commented Jan 29, 2024

tlively commented Dec 13, 2024

MaxGraey commented Jan 26, 2024 •

edited

Loading