Skip to content

Fixing asm2wasm crashing on \r when built with MinGW #986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jerstlouis
Copy link

Also enlarging stack size for MinGW

@@ -26,7 +26,8 @@ T wasm::read_file(const std::string &filename, Flags::BinaryOption binary, Flags
if (debug == Flags::Debug) std::cerr << "Loading '" << filename << "'..." << std::endl;
std::ifstream infile;
std::ios_base::openmode flags = std::ifstream::in;
if (binary == Flags::Binary) flags |= std::ifstream::binary;
// asm2wasm.exe built with MinGW would crash with files containing '\r' when opened without the binary flag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does anyone know why mingw is special here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that it's not MinGW specific but also an issue when built with Visual Studio?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but the reason I am guessing it's mingw-specific is that people have built and used this with vs. At least I think so, I didn't do it myself...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the code already always handles skipping \r and works on Linux which does not have a non-binary mode, then there's no reason why the file should be opened without the binary flag, and I can't see how it would work better on Visual Studio.

There's other reports of asm2wasm crashing, see for example: #327 which crashes at exactly the same spot.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are \r even generated though? Are they only generated when done on Windows? It doesn't make sense to me to generate files meant to be used across platforms differently whether they're created on one platform or another. The 'text' mode and \r\n really needs to disappear anyway, it's such a horrible relic of the past.

Copy link
Member

@kripken kripken May 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when we generate the asm.js text, on windows the text-handling python code in windows just emits \rs automatically. @juj knows more.

If there is no danger here, then opening as binary sounds ok, but I don't know enough about windows and text handling there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again opening as binary when outputting the asm.js will avoid those \r's. This will be an issue trying to get reproducible output across systems. Really, the 'text' mode has no reason to be.

@@ -148,6 +148,7 @@ ELSE()
ADD_COMPILE_FLAG("-Wno-unused-parameter")
ADD_COMPILE_FLAG("-fno-omit-frame-pointer")
IF(WIN32)
ADD_LINK_FLAG("-Wl,--stack,8388608")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment for this. i assume it's that unix stack sizes are normally 8MB but we need to add this for mingw?

Copy link
Author

@jerstlouis jerstlouis Apr 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was just making it the same as the MSVC section. MinGW defaults to 2 MB I think.

@kripken
Copy link
Member

kripken commented Jul 11, 2017

ping @juj, see question higher up

@juj
Copy link
Collaborator

juj commented Jul 12, 2017

Sorry for missing this before!

Before we fix this, we really need to have a test case in Emscripten suite about this, since these types of items are critical and we don't routinely test MinGW compiler at the moment. Currently we run the Binaryen suite on VS2015 on each commit, but that is not catching the issue at the moment. @griffin2000, @jerstlouis: do you know of a short test case to repro to see the crash? I'm now building with Binaryen to test this out.

@juj
Copy link
Collaborator

juj commented Jul 12, 2017

Investigated this in closer detail, I see that building Emscripten and Binaryen with MinGW 7.1.0, it does fail e.g. in Emscripten's test other.test_binaryen_debug.

The problem indeed is MinGW specific and does not appear in Visual Studio. However this is due to a bug in function wasm::read_file that only manifests in MinGW libstdc++ and not Visual Studio's c++ library.

The behavior that text file reading on Windows (in both VS and MinGW) has is that if a file has a byte sequence \r\n on disk, then reading those bytes in to memory in text mode should result in \r\n getting converted on the fly to \n in memory. MSDN: "Also, in text mode, carriage return–linefeed combinations are translated into single linefeeds on input, and linefeed characters are translated to carriage return–linefeed combinations on output.".

This has the effect that the size on disk of a file is larger in bytes than the file size in memory, since \r\n -> \n contracts the size by one byte when going to memory. The function ifstream::tellg() used in wasm::read_file always returns the size of the file in disk bytes, and that is the size that is allocated to memory. When the file is later read as text, there are more bytes allocated to it in memory than are needed, since the representation shrinks as \r bytes are removed.

The behavioral difference that is observed here between Visual Studio and MinGW is that on Visual Studio, calling infile.read(&input[0], insize);, where insize is the disk size (and not the memory size), the excess bytes are not touched and they all end up being \0s. This is benign for parser.h, and the code then constructs a std::string that just has excess \0s at the end, one for each removed \r character. However on MinGW it looks that the excess memory ends up getting garbage characters, most likely due to an in-memory contraction algorithm that removes \rs in one go after a binary read. This is the root source of the crash, i.e. wasm::read_file returns a std::string that has garbage characters in the end, which then later fails in parser.h, when it is rolling through the garbage characters.

The bug is in wasm::read_file as opposed to MinGW I believe. One should not expect that infile.read() would read exactly as many characters in as is requested in bytes. However there is no good C++ fstream API to ask how many characters it was able to read. Therefore instead, PR #1088 shows how to fix using C file API, which works on MinGW.

The difference with the fix here and #1088 is that unconditionally passing binary flag to file open means that, on Windows, the \r characters will begin to appear in the in memory representations of files opened in text mode. In parser.h function isSpace, those are currently taken into account, so for the parser at least, that is not a problem. (bool isSpace(char x) { return x == 32 || x == 9 || x == 10 || x == 13; } /* space, tab, linefeed/newline, or return */). However it should be made certain that all of the \r characters are efficiently consumed, since this PR now has an asymmetricity that text files will then be opened in binary mode, but can still be written in text mode, which would mean that if any \r\n sequences are retained in memory, then those would get duplicated to \r\r\n if they are later written out as text. I'm not sure if that can ever happen though.

The PR #1088 retains the current property that text content in memory only ever has \n newlines and never \r\n, though it does revert to C API to be able to do that, since fstreams aren't too flexible for this purpose. Ultimately, I am ambivalent as to which way to go to fix. This PR is ok if we know that \rs in memory are always ok and expected (are there locations in code beyond parser.h that would need to be audited for this?), and that it will not cause incorrect \r\r\n expansion later due to this asymmetric nature.

@juj
Copy link
Collaborator

juj commented Jul 12, 2017

does anyone know why mingw is special here?

The difference seems to be as to how VS and MinGW perform the contraction of \r characters. VS seems to discard \r characters as it is going through the read, whereas MinGW seems to first read the whole file in memory in binary, and then use an in-memory filter algorithm to weed out \rs, which resulted in garbage chars (leftovers of the input file) being observed as opposed to \0s.

@juj
Copy link
Collaborator

juj commented Jul 12, 2017

Why are \r even generated though? Are they only generated when done on Windows?

Emscripten has a linker flag --output_eol linux|windows that can be used to choose the desired output line ending mode, on both Windows and Linux hosts. It defaults to the host choice, so on Windows, you can pass --output_eol linux linker flag to get \n line endings.

@jerstlouis
Copy link
Author

http://www.cplusplus.com/reference/istream/istream/read/

"The number of characters successfully read and stored by this function can be accessed by calling member gcount."

@jerstlouis
Copy link
Author

My approach to parsing text files that may or may not contain \r is always to ignore \r. Simpler that way. I would recommend doing the same and avoiding all these "text" mode caveats. Through I think gcount should still be used here.

@juj juj mentioned this pull request Jul 17, 2017
@juj
Copy link
Collaborator

juj commented Jul 17, 2017

Rewrote the PR to use gcount.

@kripken
Copy link
Member

kripken commented Jul 18, 2017

I believe this has been fixed by #1088, thanks everyone.

@juj
Copy link
Collaborator

juj commented Jul 19, 2017

There was that unrelated change in this PR, added #1103 for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants