patch_stdout is surprisingly slow #682

vlovich · 2018-07-12T18:11:43Z

patch_stdout slows down the program (by ~5x) if there's a lot of output which seems really large considering that what it's trying to do conceptually doesn't seem too complicated.

from prompt_toolkit.completion import WordCompleter
from prompt_toolkit.eventloop.defaults import use_asyncio_event_loop
from prompt_toolkit.patch_stdout import patch_stdout
from prompt_toolkit.shortcuts import CompleteStyle
from prompt_toolkit.shortcuts.prompt import prompt
from threading import Thread
from prompt_toolkit.application import get_app

import sys

def spam_print():
  i = 1
  while True:
    print("This is spam", i)
    i += 1
    if i == 1000:
      try:
        get_app(raise_exception=True).exit(exception=KeyboardInterrupt())
      except:
        sys.exit(0)
      return

thread = None

while True:
  with patch_stdout():
  # with open("/dev/null"):
    if not thread:
      thread = Thread(target=spam_print)
      thread.start()
    prompt("> ", rprompt="test", complete_style=CompleteStyle.MULTI_COLUMN,
            completer=WordCompleter(list(chr(c) * 20 for c in range(ord("a"), ord("z")))))

time python3 test.py

Without patch_stdout: ~0.2s
With patch_stdout: ~1s (including getting hit by #681).

Attached captured profile but viewing it in snakeviz doesn't seem to immediately point me to the source of performance issues (aside from that it does look like there's a hotspot). Maybe run_until_complete in patch_stdout prints 1 line at a time instead of all lines that have been buffered?

profile.bin.zip

profile_without_patch_stdout.bin.zip

The text was updated successfully, but these errors were encountered:

jonathanslenders · 2018-07-12T19:09:20Z

This largely depends on the amount of print statements. Not so much on the text length of the output.
For every individual write to stdout, the prompt will be erased, the output is written to the real stdout, and the prompt is rendered again. Prompt rendering will probably take most of the time here.

The rendering is highly optimized, so I don't think that part can be improved. But what we could do is use timers, and only render to stdout when no new output is written in like 200ms, and then write everything at once. This will introduce a little more latency for the simple cases, and make the capturing more complex, but when we have that, using patch_stdout shouldn't effect the performance anymore.

vlovich · 2018-07-12T22:57:25Z

I think even just batching it would help drastically cause right now it happens 1 line at a time (finds a line in stdout, schedules append + repaint).

vlovich · 2018-07-20T08:56:47Z

NVM. You're right. Need a timeout timer here. However, I tried adding that & it didn't seem to really help so I think there's something else going on.

vlovich · 2018-08-29T00:36:07Z

Found a workaround, at least when printing through logging:

class PatchableStderrStream(object):
  def write(self, data):
    sys.stderr.write(data)

  def flush(self):
    # Uncomment for performance problem:
    # sys.stderr.flush()
    pass

stdout_ch = logging.StreamHandler(PatchableStderrStream())
logging.getLogger().addHandler(stdout_ch)

This fixes the performance issue we see when using logging.

vlovich · 2019-02-24T07:39:44Z

Tracked this down a bit more. Here are the hotspots I've found:

HSplit._divide_heights. This is called very frequently when there are lots of things printing to the screen. This information doesn't change frequently (ever?) so caching it would save quite a bit. All the CPU time appears to be sent calling preferred_height for the children. I'm guessing that preferred_height can be memoized within each object. As a coarse optimization I changed _divide_heights to only recompute once a second and that provided 2x gain (20% down to ~13%).

In HSplit.write_to_screen, if I completely skip the "# Draw child panes. " loop, everything renders correctly AFAICT but there's a reduction in the CPU time from 20% to 5% for it.

I suspect that the prompt session is creating a complex layout that's expensive to render & it's not memoizing as much as it can in various places.

vlovich · 2019-02-24T08:33:48Z

One easy top-level fix is to buffer in-between frames in patch_stdout. Here's a stderr handler I direct my stdout logging StreamHandler to: https://gist.github.com/vlovich/9c5aecda35a287276455e0a6323e9ba8

It requires running the prompt toolkit in asyncio but I'm sure there's a way to adopt something like this natively within patch_stdout (or lower).

jonathanslenders added the enhancement label Jul 12, 2018

jonathanslenders mentioned this issue Jul 12, 2018

really bad flickering when there's lots of output #680

Open

ulno mentioned this issue Jan 31, 2019

prompt-toolkit unusable on raspberry pi due to speed issues #830

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

patch_stdout is surprisingly slow #682

patch_stdout is surprisingly slow #682

vlovich commented Jul 12, 2018 •

edited

Loading

jonathanslenders commented Jul 12, 2018

vlovich commented Jul 12, 2018

vlovich commented Jul 20, 2018

vlovich commented Aug 29, 2018 •

edited

Loading

vlovich commented Feb 24, 2019

vlovich commented Feb 24, 2019

patch_stdout is surprisingly slow #682

patch_stdout is surprisingly slow #682

Comments

vlovich commented Jul 12, 2018 • edited Loading

jonathanslenders commented Jul 12, 2018

vlovich commented Jul 12, 2018

vlovich commented Jul 20, 2018

vlovich commented Aug 29, 2018 • edited Loading

vlovich commented Feb 24, 2019

vlovich commented Feb 24, 2019

vlovich commented Jul 12, 2018 •

edited

Loading

vlovich commented Aug 29, 2018 •

edited

Loading