Skip to content

Improve performance of find_max_char #122901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ruema opened this issue Aug 11, 2024 · 5 comments
Open

Improve performance of find_max_char #122901

ruema opened this issue Aug 11, 2024 · 5 comments
Labels
performance Performance or resource usage type-feature A feature request or enhancement

Comments

@ruema
Copy link

ruema commented Aug 11, 2024

Feature or enhancement

Proposal:

find_max_char is called each time a string is created.
By reducing the amount of tests, the performance can be improved considerably.

python -m pyperf timeit -s "b=('a' *1000+ '\u019f'*2)" "b[:-1]"
+-----------+--------+----------------------+
| Benchmark | ref    | patch                |
+===========+========+======================+
| timeit    | 442 ns | 388 ns: 1.14x faster |
+-----------+--------+----------------------+

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@ruema ruema added the type-feature A feature request or enhancement label Aug 11, 2024
@aisk aisk added the performance Performance or resource usage label Aug 11, 2024
@terryjreedy
Copy link
Member

@vstinner Should you or someone else look at this? Should OP say more about idea of patch? Is the above the proper speed test?

@picnixz
Copy link
Member

picnixz commented Aug 13, 2024

I would also look at the arguments in #120212.

Personally, I would include in the benchmarks:

  • ASCII data only
  • lots of non-ASCII data
  • a bit of non-ASCII data
  • short, medium and very long strings

By the way, is your build a release or a PGO (maybe with LTO) build?

@vstinner
Copy link
Member

@ruema: Would you mind to run more benchmarks? See previous @picnixz comment about that.

@rhpvorderman
Copy link
Contributor

This is basically the same as #120212 but with larger chunks. Given that this approach unifies the find_max_char functions for all
char sizes, from a maintainability standpoint it is better than #120212.

I would add benchmarks for line by line reading. for line in file_descriptor is a very common pattern.

@vstinner
Copy link
Member

Would it be possible to have a few more benchmarks on this change to have an idea if it's worth it or not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants