-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
gh-131791: Improve speed of textwrap.dedent
by replacing re
#131792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is a mess of optimization and new features. This PR also does not pass existing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
( This was optimized iteratively using Claude + ChatGPT models )
Yes, we noticed 😉. This isn't an acceptable use of AI for us.
If we actually are getting a 4x speedup (probably due to the lack of regex) and can pass all tests, then this does seem like something we could maybe investigate adding. But, please remove the added only_whitespace
parameter.
Misc/NEWS.d/next/Library/2025-03-27-04-35-17.gh-issue-131792.UtGg3O.rst
Outdated
Show resolved
Hide resolved
@Marius-Juston I agree with the other reviewers that there are too many changes in this PR. But it is interesting that an AI can generate good ideas (and I appreciate you mentioning this in the corresponding issue). Based on the ideas I tried some variations and I do think it is worthwhile to pursue this. Sometimes it is better to split a PR with many changes (like this one) into smaller steps (not sure we can do that here though). For example with only commit 2a93206 we can eliminate the
(benchmark script: https://github.com/eendebakpt/python_testing/blob/fac5d2457fe894e939d76d1cd5a8991a6b5b4624/benchmarks/bm_textwrap.py) |
Yeah, most of this is still AI-generated (except for the final lines, where list comprehension was used instead of for loops). After reviewing all the comments, I was clearly overeager with the performance increase that I had initially noticed and did not perform the necessary due diligence that I really should have done in the first place. I was really using AI to see if it would be able to find interesting or new ways to optimize the function as a test ( the use of the It might be better for me to just close this PR, do some more edits offline, verify all the tests, etc.. and then perform the PR? |
Lib/textwrap.py
Outdated
# If all lines are blank, normalize them | ||
if not non_blank_whites: | ||
# Preallocate result list | ||
return "".join(["\n" if line.endswith("\n") else "" for line in lines]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often does this occur? You could leave it out as the os.path.commonprefix([])
is ''
, which gives margin_len=0
so the case is correctly handled. It makes the code smaller.
… checking of common prefixes for each line
# Conflicts: # Lib/textwrap.py
With the updated code we get approximately 3x speedup for large files:
|
textwrap.dedent
by replacing re
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which thus returns a 4x performance boost for large files
For benchmarks, please use pyperf
instead of showing the minimum. The minimum doesn't help here as we're interested in average time complexity, not best time complexity. What's interesting when doing benchmarks is also the geometric mean and comparisons done on non-DEBUG builds (possibly with LTO+PGO enabled) [I don't know which kind of builds the tests were done on].
I would like to add test cases with much longer test strings than those that are given and also cases that are multi-lines and with multiple blocks of indented texts. I also want to know when a file is considered to be "large". Also, what happens when nothing should be dedented? are we loosing something when we do dedent(dedent(text))
?
I haven't seen the benchmarks for really big files. So I suggest generating some fake test instead wich lots of blocks that are indented and that need to be dedented, or when the common margin is far from there. Let's say we're twice faster but are we talking in terms of seconds or microseconds? for very large files, if the reading + loading into memory takes is already the main bottleneck, then optimizing dedent() wouldn't be noticed.
On the "how" this was solved, I really think it's not a good idea to use os.path.commonpath
as it's purpose is entirely different and we shouldn't really on this kind of magic. In addition ther docs say:
Raise ValueError if paths contain both absolute and relative pathnames, if paths are on different drives, or if paths is empty. Unlike commonprefix(), this returns a valid path.
What if there is a text that contains absolute and relative-looking path names? this would fail. I think we need to change the approach.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Based on what a previous PR did in issue #107369 I followed the same method and validated the "large" file with tests = []
test_names = []
test_names = ["abc \t", '', " ", "\t", ' \t abc', ' \t abc \t ', "' ' * 1000"]
test_indents = ["abc \t", '', " ", "\t", ' \t abc', ' \t abc \t ', ' ' * 1000]
for indent_v in test_indents:
text = indent(raw_text, indent_v)
tests.append(text)
# Basic indented text with empty lines
text = """
def hello():
print("Hello, world!")
"""
tests.append(text)
test_names.append("Basic indented text with empty lines")
# Text with mixed indentation and blank lines
text = """
This is a test.
Another line.
"""
tests.append(text)
test_names.append("Text with mixed indentation and blank lines")
# No indentation (edge case)
text = """No indents here.
Just normal text.
With empty lines."""
tests.append(text)
test_names.append("No indentation (edge case)")
# Only blank lines (should preserve them)
text = """
"""
tests.append(text)
test_names.append("Only blank lines")
# Edge case: No common prefix to remove
text = """hello
world
"""
tests.append(text)
test_names.append("Edge case: No common prefix to remove")
# Edge case: Single indented line
text = """ Only one indented line"""
tests.append(text)
test_names.append("Edge case: Single indented line")
# Edge case: Single indented line
text = """ """
tests.append(text)
test_names.append("Edge case: Single indented line only")
# Edge case: Single indented line
text = """"""
tests.append(text)
test_names.append("Edge case: Empty text")
for text, name in zip(tests, test_names):
print(f"========= Case: {name.encode('ascii')} =========")
a = dedent(text)
for func in [dedent_faster_v5, dedent_faster]:
single_test = func(text)
print(func.__name__, a == single_test)
it = timeit.Timer(lambda: func(text))
result = it.repeat(number=1_000)
result.sort()
print(f"{func.__name__} Min: {result[0]:.4f}msec")
When nothing should be dedented the only the empty lines are normalized as per the documentation of
I am not using
So I believe that using this function is okay
I was using as reference the Issue #107369 so I was performing something similar to them. As for the build for the speed comparison my python environment is an Anaconda 3.13.2; however, I will make sure to setup the comparisons in a more proper environment for validation. Is there a quick link for how to set that up or how would you like it to be setup? I assume I should just be doing ./configure --with-pydebug && make -j
./python speedtest.py |
Oh right, my bad. I incorrectly read the function being used. Well, I still think it's kind of weird to use that function, but if it works well for any strings, we can keep it maybe?
The method used in the linked PR is fine but I don't see those numbers for this specific file (maybe I missed it but I can't find it in your list). That's why I suggest using
You should do:
where bench.py is a benchmark script (see https://github.com/psf/pyperf). An example of benchmarking: import os
import random
import uuid
import pyperf
if __name__ == '__main__':
runner = pyperf.Runner()
runner.bench_func('uuid1()', uuid.uuid1)
node = random.getrandbits(48)
runner.bench_func('uuid1(node, None)', uuid.uuid1, node)
clock_seq = random.getrandbits(14)
runner.bench_func('uuid1(None, clock_seq)', uuid.uuid1, None, clock_seq)
ns = uuid.NAMESPACE_DNS
runner.bench_func('uuid3(NAMESPACE_DNS, os.urandom(16))',
uuid.uuid3, ns, os.urandom(16))
runner.bench_func('uuid3(NAMESPACE_DNS, os.urandom(1024))',
uuid.uuid3, ns, os.urandom(1024))
runner.bench_func('uuid4()', uuid.uuid4)
ns = uuid.NAMESPACE_DNS
runner.bench_func('uuid5(NAMESPACE_DNS, os.urandom(16))',
uuid.uuid5, ns, os.urandom(16))
runner.bench_func('uuid5(NAMESPACE_DNS, os.urandom(1024))',
uuid.uuid5, ns, os.urandom(1024))
runner.bench_func('uuid8()', uuid.uuid8) In your case, you should use different inputs for |
@Marius-Juston are you confident that this PR now represents entirely your own work, with no contribution from a large language model? For copyright and licensing reasons, we require you to sign the Contributor Licence Agreement, whereas potentially copyrightable code generated by a LLM puts us in difficult territory. Please see also our AI policy. A |
Yes this now entirely my own code |
@Marius-Juston Some more high level feedback on your PR. Performance Based on results so far and my own tests, I suspect performance will be good. But please benchmark as suggested by @picnixz using pyperf (here is possible starting point: bm_textwrap). Also for smaller strings there is value in improving performance. I recall Readability The implementation in this PR is really different from the orignal one, but I think it is quite readable. The final expression is a bit hard to read, maybe that can be split up? And I would omit the fast path for empty text (who dedents an empty string?) Memory requirements We will always require memory for the input text, and the output text. In addition for this PR we have |
Am I being stupid or doing something wrong with this benchmark? https://gist.github.com/Marius-Juston/6947a794937a6e46c0164dbb9058c8af I am getting returned:
but most importantly: ./python -m pyperf compare_to dedent.json dedent2.json
b'': Mean +- std dev: [dedent] 8.46 ms +- 0.40 ms -> [dedent2] 8.34 ms +- 0.38 ms: 1.01x faster
b' ': Mean +- std dev: [dedent] 13.8 ms +- 0.6 ms -> [dedent2] 13.6 ms +- 0.6 ms: 1.02x faster
b'\t': Mean +- std dev: [dedent] 13.0 ms +- 0.6 ms -> [dedent2] 13.0 ms +- 0.6 ms: 1.00x slower
b' \t abc': Mean +- std dev: [dedent] 14.7 ms +- 0.6 ms -> [dedent2] 15.0 ms +- 0.8 ms: 1.02x slower
b' \t abc \t ': Mean +- std dev: [dedent] 16.1 ms +- 0.7 ms -> [dedent2] 16.2 ms +- 0.6 ms: 1.00x slower
Benchmark hidden because not significant (1): b'abc \t'
Geometric mean: 1.00x slower Not sure if I am doing something stupid, I change between the runs ./configure -q --enable-optimizations && make -s -j
./python -m ensurepip --upgrade # Install pip since it does not exist
./python -m pip install pyperf # Install pyperf
./python tests/bm_dedent.py -o dedent.json
./python tests/bm_dedent.py -o dedent.json # Run in aftewards with the file changed for dedent2
./python -m pyperf compare_to dedent.json dedent2.json is the commands I did
|
This is the resulting output:
|
Can you give me the full benchmark script you used please? Also there's something happening as it says "ERROR when testing if values are significant". Now, I've run some benchmarks on my side and they are indeed promising (after in-lining +------------------------+---------+-----------------------+
| Benchmark | ref | new |
+========================+=========+=======================+
| empty | 101 ns | 28.2 ns: 3.58x faster |
+------------------------+---------+-----------------------+
| whitespaces | 272 ns | 578 ns: 2.13x slower |
+------------------------+---------+-----------------------+
| 1000 empty lines | 2.63 us | 4.62 us: 1.76x slower |
+------------------------+---------+-----------------------+
| no indented paragraph | 3.73 us | 1.43 us: 2.60x faster |
+------------------------+---------+-----------------------+
| indented paragraph | 3.70 us | 1.50 us: 2.47x faster |
+------------------------+---------+-----------------------+
| no indented paragraphs | 11.5 us | 3.15 us: 3.64x faster |
+------------------------+---------+-----------------------+
| indented paragraphs | 24.5 us | 6.97 us: 3.52x faster |
+------------------------+---------+-----------------------+
| large objects | 49.7 us | 23.3 us: 2.14x faster |
+------------------------+---------+-----------------------+
| Geometric mean | (ref) | 1.90x faster |
+------------------------+---------+-----------------------+ What is however a bit unfortunate is that we are relying on a function whose name doesn't help here, and I hope that # Return the longest prefix of all list elements.
def commonprefix(m):
"Given a list of pathnames, returns the longest common leading component"
if not m: return ''
# Some people pass in a list of pathname parts to operate in an OS-agnostic
# fashion; don't try to translate in that case as that's an abuse of the
# API and they are already doing what they need to be OS-agnostic and so
# they most likely won't be using an os.PathLike object in the sublists.
if not isinstance(m[0], (list, tuple)):
m = tuple(map(os.fspath, m))
s1 = min(m)
s2 = max(m)
for i, c in enumerate(s1):
if c != s2[i]:
return s1[:i]
return s1 We can definitely skip the |
It's specified as behaving as it does, and if the behaviour does change, we should notice this with the |
I think it's better to inline the code or create a small helper function. This reduces the needs for an import and for dead code (the |
Here is the code https://gist.github.com/Marius-Juston/6947a794937a6e46c0164dbb9058c8af yeah I know that the "Error is for when the values are significant" however the parameters I set did not seem to give much of an impact on fixing that ( the inner loops parameters just made the tests run for 0us even though I had set it to 10_000_000 so I am thinking something wrong is on my side ) |
@Marius-Juston Your script seems to work on my system (and I get a speedup for the PR). The warning |
You are right, I was under the impression that this was a C optimized function; however, since it is not you are right and we can just further optimize this by removing the |
I've opened #131919 as a sketch of an alternative, which avoids some allocations by using A |
Closing in favour of #131919. |
Current code:
The new implementation can give upwards to 4x the speed performance on larger files.
Returning the following:
Which thus returns a 4x performance boost for large files. Another advantage is this this method allows for people to just remove all common prefixes to the file as an option instead of just whitespaces. Another advantage is that it does not use regex expressions.
( This was optimized iteratively using Claude + ChatGPT models )
textwrap.dedent
#131791