-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
gh-107369: optimize textwrap.indent() #107374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Using str.split
for the predicate instead of line.strip
might change something for input that is not str
, but I think this is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lstrip
is faster for non-indented lines.
I wonder whether the following variants can be faster for some input and for how wide category of input.
def predicate(line):
return line and (not line[0].isspace() or line.lstrip())
or
predicate = re.compile(r'\S').search
Since we use In case of unicodeobject.c, rstrip is bit faster. But it may be because most lines are indented already. So I chose str.lstrip here, as Serhiy suggested. |
Now that you mention it, I can see that using We want to test whether the line has any non-space character. Algorithmically, |
Because
lstrip() is slow when every line has long indent. But With 4c6a46a and https://gist.github.com/methane/5c6153c564d9508199a81c48d33161eb
If I add
|
To maximize performance, we can stop using lambda by...:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your research Inada-san. Which to use here, lstrip or isspace, I leave up to you. It does not really matter in most cases.
For very long texts, I think changing prefixed_lines = []
for line in text.splitlines(True):
if not line.isspace():
prefixed_lines.append(prefix)
prefixed_lines.append(line) into the following may improve the overall performances prefixed_lines = []
append_line = prefixed_lines.append
for line in text.splitlines(True):
if not line.isspace():
append_line(prefix)
append_line(line) EDIT: After a more careful benchmarking, this does not seem to bring more improvements. However, not using a lambda function seems to be better. |
indent()-ing
Object/unicodeobject.c
(15332 lines) about 25% faster.