Skip to content

Evaluate hex encoding for embedding files? #7211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
buu700 opened this issue Oct 2, 2018 · 18 comments
Open

Evaluate hex encoding for embedding files? #7211

buu700 opened this issue Oct 2, 2018 · 18 comments

Comments

@buu700
Copy link
Contributor

buu700 commented Oct 2, 2018

It was just pointed out to me in a Hacker News thread that hex might be more efficient than base64 after gzip/brotli compression for SINGLE_FILE's embedding.

His test showed that a hex-encoded wasm file was 65% the size of a base64-encoded wasm file after compression. My own test with a random 4.7 MB png I had sitting around showed that it didn't make a difference — with brotli they were all about the same (hex having the edge over base64 by 215 B and over the compressed original file by ~2.5 KB) and gzip was similar (in this case hex was ~600 KB bigger) — so it's unclear what the impact would be.

Just calling attention to this on the off chance that it would actually be an improvement, as I don't recall hex being considered when originally writing the SINGLE_FILE PR, and the previous discussion on encoding efficiencies (#3326 (comment)) has no mention of hex.

balls.png

@curiousdannii
Copy link
Contributor

Well a PNG file isn't a good example as it's already highly compressed. As @jedisct1 said in #7213 the base64 encoding means that repeated opcode sequences are being obscured, but with a PNG image there shouldn't be any repetitions left.

I hadn't thought of this before, but it makes sense. Patterns in the original are preserved, Huffman encoding means using only 16/256 symbols won't be a big problem, and decoding should be simple as you can pre-allocate a buffer half the string length. Use lowercase a-f to make the most benefit of Huffman encoding. Or be extra crazy and use "etnris" for optimal Huffman benefit. (This is from an old analysis we did of the most used characters in jQuery for UglifyJS. If you wanted to try such an approach, we could do a fresh analysis of Emscripten output.)

@stale
Copy link

stale bot commented Oct 3, 2019

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Oct 3, 2019
@jedisct1
Copy link

jedisct1 commented Oct 9, 2019

Booh, don't close me!

@stale stale bot removed the wontfix label Oct 9, 2019
@stale
Copy link

stale bot commented Oct 9, 2020

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Oct 9, 2020
@jedisct1
Copy link

jedisct1 commented Oct 9, 2020

🥕

@stale stale bot removed the wontfix label Oct 9, 2020
@stale
Copy link

stale bot commented Apr 17, 2022

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Apr 17, 2022
@jedisct1
Copy link

🐰

@stale stale bot removed the wontfix label Apr 17, 2022
@vadimkantorov
Copy link

if this can easily save SINGLE_FILE sizes after gzip, should be a nice thing to test!

@sbc100
Copy link
Collaborator

sbc100 commented Nov 21, 2023

Out of interest why do folks here what to be using SINGLE_FILE in the first place? It breaks compiled module caching as well as streaming compilation, both significant downsides.

@vadimkantorov
Copy link

vadimkantorov commented Nov 21, 2023

I am using this to distribute relatively small self-contained apps that can be opened in browser without requiring a web-server which can simply be opened by a lay-user by clicking on app.html, so here the file is already distributed on the user's machine (without forcing the user to set up a web-server, configuring MIME types etc, bypassing the issues with file:/// and simplifying distribution at the same time).

Another usecase is also for personal-use apps, accessible via some my.github.io/repo/app.html (possibly deployed for their private use by the user themselves). A real-world example is tiddlywiki which is distributed as a single-file html. I'm also planning at some point support such mode for my busytex project (extremely simple client-side latex editor for either local files or working with github repos)

Regarding caching/streaming compilation - can this limitation be bypassed by auto-storing a compiled wasm into browser's local storage?

@buu700
Copy link
Contributor Author

buu700 commented Nov 21, 2023

I use it in pqcrypto.js and libsodium.js to provide a single JS file that works in any JS environment and opportunistically takes advantage of WebAssembly where available (with asm.js fallback).

This is helpful for some contexts where relying on an external module would be problematic, and simplifies subresource integrity verification.

@sbc100
Copy link
Collaborator

sbc100 commented Nov 21, 2023

Thanks for the feedback!

@suzukieng
Copy link

@sbc100 I use it to distribute a barcode scanning library (https://strich.io, it's commercial). Basically for similar reasons like @vadimkantorov, it's easier for apps to consume that way - they usually just bundle the .js file with their app, and don't need to think about deploying an extra .wasm file in their assets. The decoupling of the .js and .wasm file has also caused some headaches for me in the past due to caching (version mismatch between .js and companion .wasm).
But I will eventually have to move away from SINGLE_FILE once I support SIMD. Bundling both WASM SIMD/non-SIMD in to the .js and switching between the two based on the browser's SIMD support would be too much wasted space.

@vadimkantorov
Copy link

@sbc100 Here is a primer of such distribution of a toy example: https://github.com/vadimkantorov/wasm-iconv

It might be nice if there were ways of pre-transforming the wasm bytes to make it more compressable by gzip (or maybe just self-decompressing for allowing to use something better than gzip)

@sbc100
Copy link
Collaborator

sbc100 commented Nov 27, 2023

@vadimkantorov wouldn't it make sense to use separate wasm file for https://vadimkantorov.github.io/wasm-iconv/? The downsides of SINGLE_FILE are significant enough that I feel like we should be warning folks against unless they really need it for a given use case.

@jedisct1
Copy link

libsodium.js originally used a separate wasm files, but there were significant integration issues [1]. SINGLE_FILE was very helpful to solve them.

@vadimkantorov
Copy link

vadimkantorov commented Nov 28, 2023

@sbc100 For this particular wasm-iconv case, I'd like to be open the iconv.html from file:/// as a simple single-file webapp (and also single-file helps for versioning of the wrapper+wasm together / distribution - no need to unpack the archive / or always remember to move the two files together). If there is some more suited widely-working format for distribution of local apps (not requiring internet / network) - e.g. some modern analogue of https://en.wikipedia.org/wiki/HTML_Application or web-analogue of snap/appx, I'd be happy to use it.

Maybe this means that there is a need for a new such web standard.

@sbc100
Copy link
Collaborator

sbc100 commented Nov 28, 2023

@sbc100 For this particular wasm-iconv case, I'd like to be open the iconv.html from file:/// as a simple single-file webapp (and also single-file helps for versioning of the wrapper+wasm together / distribution - no need to unpack the archive / or always remember to move the two files together). If there is some more suited widely-working format for distribution of local apps (not requiring internet / network) - e.g. some modern analogue of https://en.wikipedia.org/wiki/HTML_Application or web-analogue of snap/appx, I'd be happy to use it.

Maybe this means that there is a need for a new such web standard.

Indeed, that sounds like a case were we should look into addressing the reasons you are being pushed into using SINGLE_FILE. I'm not aware of the best way to package local web apps like that, but I imagine such a thing could exist.
Honestly this is the first time I'm hearing this reason for choosing SINGLE_FILE. Most users seem happy to run a server and deal with multiple assets, but it would be good to find a solution for your requirements that doesn't depends on SINGLE_FILE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants