Skip to content

gh-131791: Improve speed of textwrap.dedent by replacing re #131792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 18 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 8 additions & 37 deletions Lib/textwrap.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
# Written by Greg Ward <[email protected]>

import re
import os

__all__ = ['TextWrapper', 'wrap', 'fill', 'dedent', 'indent', 'shorten']

Expand Down Expand Up @@ -413,9 +414,6 @@ def shorten(text, width, **kwargs):

# -- Loosely related functionality -------------------------------------

_whitespace_only_re = re.compile('^[ \t]+$', re.MULTILINE)
_leading_whitespace_re = re.compile('(^[ \t]*)(?:[^ \t\n])', re.MULTILINE)

def dedent(text):
"""Remove any common leading whitespace from every line in `text`.

Expand All @@ -429,42 +427,15 @@ def dedent(text):

Entirely blank lines are normalized to a newline character.
"""
# Look for the longest leading string of spaces and tabs common to
# all lines.
margin = None
text = _whitespace_only_re.sub('', text)
indents = _leading_whitespace_re.findall(text)
for indent in indents:
if margin is None:
margin = indent

# Current line more deeply indented than previous winner:
# no change (previous winner is still on top).
elif indent.startswith(margin):
pass

# Current line consistent with and no deeper than previous winner:
# it's the new winner.
elif margin.startswith(indent):
margin = indent

# Find the largest common whitespace between current line and previous
# winner.
else:
for i, (x, y) in enumerate(zip(margin, indent)):
if x != y:
margin = margin[:i]
break
if not text:
return text

lines = text.split("\n")

# sanity check (testing/debugging only)
if 0 and margin:
for line in text.split("\n"):
assert not line or line.startswith(margin), \
"line = %r, margin = %r" % (line, margin)
margin = os.path.commonprefix([line for line in lines if line.strip()])
margin_len = len(margin) - len(margin.lstrip())

if margin:
text = re.sub(r'(?m)^' + margin, '', text)
return text
return "\n".join([line[margin_len:] if line.strip() else "" for line in lines])


def indent(text, prefix, predicate=None):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Optimized :func:`textwrap.dedent`. It is now 2x faster than before for large inputs.
Loading