Improve performance of `find_max_char` #122901

ruema · 2024-08-11T12:14:46Z

Feature or enhancement

Proposal:

find_max_char is called each time a string is created.
By reducing the amount of tests, the performance can be improved considerably.

python -m pyperf timeit -s "b=('a' *1000+ '\u019f'*2)" "b[:-1]"

+-----------+--------+----------------------+
| Benchmark | ref    | patch                |
+===========+========+======================+
| timeit    | 442 ns | 388 ns: 1.14x faster |
+-----------+--------+----------------------+

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

gh-122901: Improve performance of find_max_char #122902

The text was updated successfully, but these errors were encountered:

terryjreedy · 2024-08-11T15:22:04Z

@vstinner Should you or someone else look at this? Should OP say more about idea of patch? Is the above the proper speed test?

picnixz · 2024-08-13T10:00:11Z

I would also look at the arguments in #120212.

Personally, I would include in the benchmarks:

ASCII data only
lots of non-ASCII data
a bit of non-ASCII data
short, medium and very long strings

By the way, is your build a release or a PGO (maybe with LTO) build?

vstinner · 2024-08-27T09:19:41Z

@ruema: Would you mind to run more benchmarks? See previous @picnixz comment about that.

rhpvorderman · 2024-09-10T09:21:23Z

This is basically the same as #120212 but with larger chunks. Given that this approach unifies the find_max_char functions for all
char sizes, from a maintainability standpoint it is better than #120212.

I would add benchmarks for line by line reading. for line in file_descriptor is a very common pattern.

vstinner · 2024-09-18T17:23:54Z

Would it be possible to have a few more benchmarks on this change to have an idea if it's worth it or not?

ruema added the type-feature A feature request or enhancement label Aug 11, 2024

ruema mentioned this issue Aug 11, 2024

gh-122901: Improve performance of find_max_char #122902

Open

aisk added the performance Performance or resource usage label Aug 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of `find_max_char` #122901

Improve performance of `find_max_char` #122901

ruema commented Aug 11, 2024 •

edited by bedevere-app bot

Loading

terryjreedy commented Aug 11, 2024

picnixz commented Aug 13, 2024

vstinner commented Aug 27, 2024

rhpvorderman commented Sep 10, 2024

vstinner commented Sep 18, 2024

Improve performance of find_max_char #122901

Improve performance of find_max_char #122901

Comments

ruema commented Aug 11, 2024 • edited by bedevere-app bot Loading

Feature or enhancement

Proposal:

Has this already been discussed elsewhere?

Links to previous discussion of this feature:

Linked PRs

terryjreedy commented Aug 11, 2024

picnixz commented Aug 13, 2024

vstinner commented Aug 27, 2024

rhpvorderman commented Sep 10, 2024

vstinner commented Sep 18, 2024

Improve performance of `find_max_char` #122901

Improve performance of `find_max_char` #122901

ruema commented Aug 11, 2024 •

edited by bedevere-app bot

Loading