-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Preload and precompress static assets in CorePlugin #708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great optimization!
tensorboard/backend/http_util.py
Outdated
headers = [] | ||
# Automatically unzip precompressed data if not accepted. | ||
if content_encoding == 'gzip' and not gzip_accepted: | ||
with gzip.GzipFile(fileobj=six.BytesIO(content), mode='rb') as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional and probably not worth doing]
If you care about shaving tens of milliseconds or so for the 0.001% of requests that don't accept gzip, you should be able to say: content = gzip.GzipFile(...); [...]; return wrappers.Response(response=content, direct_passthrough=True)
. Please note that in order for this to work with HTTP/1.1 you would need to get rid of the content-length and make sure Werkzeug is using chunked transfer encoding under the hood. Or say Connection: close
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL - I actually decided to implement this, thanks for the tip! Now it does a streaming gunzip, and indeed that eliminates the 20 millisecond delay in TTFB when doing on-demand decompression - it's <5 ms to the first byte returned now when decompressing on demand :)
The Content-Length hassles would have been a dealbreaker IMO, but as it so happens gzip files encode the length of the original uncompressed content in their last four bytes (well, strictly speaking they record the lower 32 bits of that length, but I think we can rely on not having pre-compressed blobs larger than 2^32 bytes = 4GB). So we can extract those and use them to send the Content-Length header before we've decompressed a single byte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's such a great detail you uncovered!
This changes CorePlugin to load its static assets into memory at startup and precompress them with gzip. The two static assets in this case are "index.html" (the main vulcanized TensorBoard HTML blob) and "trace_viewer_index.html". Since "index.html" at a minimum is always served when using TensorBoard, this spends roughly 1 MB of extra RAM to move the cost of preparing it to startup-time so that it can be served directly from memory.
Prior to this change, a typical request for those assets would involve 1) a disk read to open the assets_zip_provider webfiles.zip file, 2) unzipping that zip file, 3) extracting the index.html file from the zip archive, and 4) recompressing the file using gzip (about 40ms to compress 2.6MB down to 0.7ish) to send it back to the client.
Now it's just served as precompressed gzip data directly from memory. From some basic testing on my workstation with a local TensorBoard, this changes the TTFB (Time To First Byte [1]) for index.html from about 100ms average to less than 5ms, and the average delay to DOMContentLoaded firing correspondingly drops from about 1000ms to 900ms.
Modern browsers pretty much all send
Accept-Encoding: gzip
but in case the user doesn't send that header, this change also adds support for decompressing precompressed data before responding. In theory we could avoid the decompression delay as well by also storing uncompressed assets in memory, but I opted not to because it complicates the Respond() API and the argument in favor is much weaker since decompression is typically faster than compression (in testing it took about 20ms for index.html) and we expect gzip supporting requests to be the vast majority, so it'd be spending more memory to save less time less often.[1] https://developers.google.com/web/tools/chrome-devtools/network-performance/reference#timing-explanation