-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Deprecate MEM_INIT_METHOD 2 #5401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lgtm to me. Let's leave this up for a few more days to see if there are any concerns, before we merge. |
Can you please point out the benefit of the deprecation? That other PR is pretty long, a summary would be appreciated. It is far from obvious to me how the two relate. Short history recap from my point of view. When I introduced MEM_INIT_METHOD==2 in #3326, my goal was quite similar to #5296: getting all relevant resources into a single file, without causing the excessive overhead of an array literal with all bytes spelled out in decimal. Base64 came to my mind, but the discussion in #2188 had pointed out that it compresses rather poorly, so I looked for alternatives. For a while I had been following an idea I called tinylut, where a lookup table (LUT) establishes a mapping from bytes (or codepoints) in the string literal to bytes in the memory image. The idea was to use printable characters only, but to keep the number of escape sequences low while maintaining the compressibility. There is a sample file containing decode loop and sample input, and the corresponding encoder implementation. However, in #3326 (comment) @chadaustin argued for a simpler decoding logic avoiding that map, so I rewrote my PR to do that, and it got merged. For a while we had issues with the way Windows handled bytes 00 and 1A, but #3854 by @juj fixed that. After that, I kind of lost interest in advocating the thing any further. I had been using it for my own projects, and had assumed it would make its way to established default based on its own merits, but apparently it never did. Likely too few people knew about it. The way I see it, there are several competing goals here:
Base64 handles 4 perfectly, and 3 is good if you can delegate decoding to something. Goals 1 and 2 are handled better than with the decimal literal, but not as good as some of the discussed alternatives. In contrast to this, method 2 as it got merged from my PR does 1 and 2 quite nicely, being very close to raw binary representation. But 4 has been a source of problems, and while the CRC32 checksum helps to at least catch them early on, that adds some complexity. As do the various places where certain characters have to get escaped correctly. I don't mind my code getting deprecated and eventually deleted if that's what's best for the project. But I saw and still see a certain demand for small sizes, which base64 can't deliver that well. I am not sure how much of a problem unprintable characters are these days, with Windows and various versions of various uglification processors. I'm pretty sure they can be made to work fine in controlled environments (likely linux and no closure compiler), and if they do, they are probably preferable. I also believe that any compatibility issues remaining can be solved with moderate amounts of work if the person doing said work has direct access to an environment reproducing the issue. I'm no fan of having two (or even three) distinct memory initialization methods. It would be great if the current methods 0 and 2 could converge towards something that suits everyone. Personally I have the impression that method 2 might be closer to satisfying all needs, but I might be biased. If you change method 0 to use base64, then change that to optimize better for size, how much would you expect the outcome of that to differ from method 2, or alternatives like tinylut? |
I won't weigh in on what's best of emscripten (leaving that to @kripken and @juj, as I'm impartial), but thanks for the detailed analysis! That was very interesting. Depending on what's decided here, I would have no problem with either changing my new stuff too use your encoding or adding it as a separate option. As far as the relationship between my PR and your mem init encoding scheme: in addition to enabling embedding all subresources within the JS, my PR replaces the previous mem init option 0 implementation with a hook into the new |
Good points. From my perspective,
The single-file PR seems good both because it uses a simpler method, and that it simplifies the logic for embedding files so it works for all the files we need to embed. Those are two separate benefits, though, in theory we could use the mem init 2 method with it. So perhaps we should
Thoughts? |
That sounds like it makes a lot of sense to me. I don't want to spend too much more time on #5296 and it's already a pretty big PR as-is, but someone else sorting out whether and how to swap out the base64 encoding with string literal encoding sounds like a great idea. |
When I was there, IMVU was a strong proponent of mem init mode 2 (or really anything that could reduce binary size or parse time), but I don't know if it's in use. I've reached out to some people on the 3D team there to see if they still use it. Also cc @waywardmonkeys I still think it's a good idea. :) But I can't comment on the maintenance burden. |
I haven't read the above mentioned threads yet. IMVU hasn't used this method for a long time as it didn't work on Windows and the issue was pretty obscure to track down as being a problem. |
I don't know how much of a problem method 2 still is on Windows, since I don't have one. I thought that all the issues I had heard of had gotten resolved at some point, but I can't make any guarantees. Looking at gagern@5dbd71f58c2fcf933f2d656657ff2300094b0491I get the impression that I had wanted to remove the warning and reactivate the tests. Perhaps I had wanted to test this on Windows before opening a pull request, and then forgot about it before I got an opportunity. |
I think the desire here was to not have to maintain multiple embedding methods side by side. Though that does not feel too costly in this case, MEM_INIT_MODE=2 does not seem to take up too much code.
When I fixed the Windows issues, I did not want to remove that comment, because there were two things that were seen: first is that the Closure compiler project is not able to read in Given that PR #5296 does not conflict with MEM_INIT_MODE=2 but preserves it as enabled (correct me if I got that wrong?), is there a strong need to deprecate MEM_INIT_MODE=2, especially if @gagern and IMVU are using it? The maintenance cost does not seem high either, and the good bit is that developers who don't use it don't pay any runtime cost on compiled output size, it's only a couple of added functions in the compiler side. |
This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant. |
Pulled out of #5296 for separate discussion. Related comment threads: #5296 (comment), #5296 (comment), #5296 (comment)
CC @kripken, @juj, @gagern