-
Notifications
You must be signed in to change notification settings - Fork 732
patch_stdout is surprisingly slow #682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This largely depends on the amount of print statements. Not so much on the text length of the output. The rendering is highly optimized, so I don't think that part can be improved. But what we could do is use timers, and only render to stdout when no new output is written in like 200ms, and then write everything at once. This will introduce a little more latency for the simple cases, and make the capturing more complex, but when we have that, using patch_stdout shouldn't effect the performance anymore. |
I think even just batching it would help drastically cause right now it happens 1 line at a time (finds a line in stdout, schedules append + repaint). |
NVM. You're right. Need a timeout timer here. However, I tried adding that & it didn't seem to really help so I think there's something else going on. |
Found a workaround, at least when printing through logging:
This fixes the performance issue we see when using logging. |
Tracked this down a bit more. Here are the hotspots I've found: HSplit._divide_heights. This is called very frequently when there are lots of things printing to the screen. This information doesn't change frequently (ever?) so caching it would save quite a bit. All the CPU time appears to be sent calling preferred_height for the children. I'm guessing that preferred_height can be memoized within each object. As a coarse optimization I changed _divide_heights to only recompute once a second and that provided 2x gain (20% down to ~13%). In HSplit.write_to_screen, if I completely skip the "# Draw child panes. " loop, everything renders correctly AFAICT but there's a reduction in the CPU time from 20% to 5% for it. I suspect that the prompt session is creating a complex layout that's expensive to render & it's not memoizing as much as it can in various places. |
One easy top-level fix is to buffer in-between frames in patch_stdout. Here's a stderr handler I direct my stdout logging StreamHandler to: https://gist.github.com/vlovich/9c5aecda35a287276455e0a6323e9ba8 It requires running the prompt toolkit in asyncio but I'm sure there's a way to adopt something like this natively within patch_stdout (or lower). |
patch_stdout slows down the program (by ~5x) if there's a lot of output which seems really large considering that what it's trying to do conceptually doesn't seem too complicated.
time python3 test.py
Without patch_stdout: ~0.2s
With patch_stdout: ~1s (including getting hit by #681).
Attached captured profile but viewing it in snakeviz doesn't seem to immediately point me to the source of performance issues (aside from that it does look like there's a hotspot). Maybe run_until_complete in patch_stdout prints 1 line at a time instead of all lines that have been buffered?
profile.bin.zip
profile_without_patch_stdout.bin.zip
The text was updated successfully, but these errors were encountered: