gh-119182: Optimize PyUnicode_FromFormat() #120796

vstinner · 2024-06-20T12:50:41Z

Use strchr() and ucs1lib_find_max_char() to optimize the code path formatting sub-strings between '%' formats.

Issue: [C API] Add an efficient public PyUnicodeWriter API #119182

Use strchr() and ucs1lib_find_max_char() to optimize the code path formatting sub-strings between '%' formats.

vstinner · 2024-06-20T12:52:00Z

Results of #120248 (comment) benchmark:

$ python3 -m pyperf compare_to ref.json change.json --table
+----------------+--------+----------------------+
| Benchmark      | ref    | change               |
+================+========+======================+
| bench 30       | 219 ns | 202 ns: 1.08x faster |
+----------------+--------+----------------------+
| bench 100      | 262 ns | 216 ns: 1.21x faster |
+----------------+--------+----------------------+
| Geometric mean | (ref)  | 1.09x faster         |
+----------------+--------+----------------------+

Benchmark hidden because not significant (1): bench 3

serhiy-storchaka · 2024-06-20T13:51:31Z

Objects/unicodeobject.c


-            if (_PyUnicodeWriter_WriteASCIIString(writer, f, len) < 0)
+            int is_ascii = (ucs1lib_find_max_char((Py_UCS1*)f, (Py_UCS1*)f + len) < 128);


You can run it once, for the whole format string.

Right. I modified my PR to run it only once, good idea!

Objects/unicodeobject.c

vstinner · 2024-06-20T20:01:50Z

Merged, thanks for the reviews @serhiy-storchaka.

Use strchr() and ucs1lib_find_max_char() to optimize the code path formatting sub-strings between '%' formats.

pythongh-119182: Optimize PyUnicode_FromFormat()

818ea52

Use strchr() and ucs1lib_find_max_char() to optimize the code path formatting sub-strings between '%' formats.

vstinner added the skip news label Jun 20, 2024

bedevere-app bot mentioned this pull request Jun 20, 2024

[C API] Add an efficient public PyUnicodeWriter API #119182

Closed

bedevere-app bot added the awaiting core review label Jun 20, 2024

vstinner mentioned this pull request Jun 20, 2024

gh-119182: Decode PyUnicode_FromFormat() format string from UTF-8 #120248

Closed

serhiy-storchaka reviewed Jun 20, 2024

View reviewed changes

Run ucs1lib_find_max_char() only once

57d7152

serhiy-storchaka reviewed Jun 20, 2024

View reviewed changes

Objects/unicodeobject.c Show resolved Hide resolved

Add comment

614e132

vstinner enabled auto-merge (squash) June 20, 2024 18:58

vstinner merged commit 5150795 into python:main Jun 20, 2024
34 checks passed

vstinner deleted the optim_from_fmt branch June 20, 2024 19:06

bedevere-app bot removed the awaiting core review label Jun 20, 2024

mrahtz pushed a commit to mrahtz/cpython that referenced this pull request Jun 30, 2024

pythongh-119182: Optimize PyUnicode_FromFormat() (python#120796)

c818e2c

Use strchr() and ucs1lib_find_max_char() to optimize the code path formatting sub-strings between '%' formats.

noahbkim pushed a commit to hudson-trading/cpython that referenced this pull request Jul 11, 2024

pythongh-119182: Optimize PyUnicode_FromFormat() (python#120796)

fb42a9e

Use strchr() and ucs1lib_find_max_char() to optimize the code path formatting sub-strings between '%' formats.

estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024

pythongh-119182: Optimize PyUnicode_FromFormat() (python#120796)

ff60c66

Use strchr() and ucs1lib_find_max_char() to optimize the code path formatting sub-strings between '%' formats.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-119182: Optimize PyUnicode_FromFormat() #120796

gh-119182: Optimize PyUnicode_FromFormat() #120796

vstinner commented Jun 20, 2024 •

edited by bedevere-app bot

Loading

vstinner commented Jun 20, 2024

serhiy-storchaka Jun 20, 2024

vstinner Jun 20, 2024

vstinner commented Jun 20, 2024


		if (_PyUnicodeWriter_WriteASCIIString(writer, f, len) < 0)
		int is_ascii = (ucs1lib_find_max_char((Py_UCS1)f, (Py_UCS1)f + len) < 128);

gh-119182: Optimize PyUnicode_FromFormat() #120796

gh-119182: Optimize PyUnicode_FromFormat() #120796

Conversation

vstinner commented Jun 20, 2024 • edited by bedevere-app bot Loading

vstinner commented Jun 20, 2024

serhiy-storchaka Jun 20, 2024

Choose a reason for hiding this comment

vstinner Jun 20, 2024

Choose a reason for hiding this comment

vstinner commented Jun 20, 2024

vstinner commented Jun 20, 2024 •

edited by bedevere-app bot

Loading