Skip to content

Preload and precompress static assets in CorePlugin #708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 28, 2017

Conversation

nfelt
Copy link
Contributor

@nfelt nfelt commented Nov 8, 2017

This changes CorePlugin to load its static assets into memory at startup and precompress them with gzip. The two static assets in this case are "index.html" (the main vulcanized TensorBoard HTML blob) and "trace_viewer_index.html". Since "index.html" at a minimum is always served when using TensorBoard, this spends roughly 1 MB of extra RAM to move the cost of preparing it to startup-time so that it can be served directly from memory.

Prior to this change, a typical request for those assets would involve 1) a disk read to open the assets_zip_provider webfiles.zip file, 2) unzipping that zip file, 3) extracting the index.html file from the zip archive, and 4) recompressing the file using gzip (about 40ms to compress 2.6MB down to 0.7ish) to send it back to the client.

Now it's just served as precompressed gzip data directly from memory. From some basic testing on my workstation with a local TensorBoard, this changes the TTFB (Time To First Byte [1]) for index.html from about 100ms average to less than 5ms, and the average delay to DOMContentLoaded firing correspondingly drops from about 1000ms to 900ms.

Modern browsers pretty much all send Accept-Encoding: gzip but in case the user doesn't send that header, this change also adds support for decompressing precompressed data before responding. In theory we could avoid the decompression delay as well by also storing uncompressed assets in memory, but I opted not to because it complicates the Respond() API and the argument in favor is much weaker since decompression is typically faster than compression (in testing it took about 20ms for index.html) and we expect gzip supporting requests to be the vast majority, so it'd be spending more memory to save less time less often.

[1] https://developers.google.com/web/tools/chrome-devtools/network-performance/reference#timing-explanation

@nfelt nfelt requested a review from jart November 8, 2017 01:56
Copy link
Contributor

@jart jart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great optimization!

headers = []
# Automatically unzip precompressed data if not accepted.
if content_encoding == 'gzip' and not gzip_accepted:
with gzip.GzipFile(fileobj=six.BytesIO(content), mode='rb') as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional and probably not worth doing]

If you care about shaving tens of milliseconds or so for the 0.001% of requests that don't accept gzip, you should be able to say: content = gzip.GzipFile(...); [...]; return wrappers.Response(response=content, direct_passthrough=True). Please note that in order for this to work with HTTP/1.1 you would need to get rid of the content-length and make sure Werkzeug is using chunked transfer encoding under the hood. Or say Connection: close.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL - I actually decided to implement this, thanks for the tip! Now it does a streaming gunzip, and indeed that eliminates the 20 millisecond delay in TTFB when doing on-demand decompression - it's <5 ms to the first byte returned now when decompressing on demand :)

The Content-Length hassles would have been a dealbreaker IMO, but as it so happens gzip files encode the length of the original uncompressed content in their last four bytes (well, strictly speaking they record the lower 32 bits of that length, but I think we can rely on not having pre-compressed blobs larger than 2^32 bytes = 4GB). So we can extract those and use them to send the Content-Length header before we've decompressed a single byte.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's such a great detail you uncovered!

@nfelt nfelt merged commit b22234c into tensorflow:master Nov 28, 2017
@nfelt nfelt deleted the gzip-asset-precompression branch November 28, 2017 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants