Skip to content

gh-70278: Fix PyUnicode_FromFormat() with precision for %s and %V #120365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Jun 11, 2024

PyUnicode_FromFormat() no longer produces the ending \ufffd character for truncated C string when use precision with %s and %V. It now truncates the string before the start of truncated multibyte sequences.

PyUnicode_FromFormat() no longer produces the ending \ufffd
character for truncated C string when use precision with %s and %V.
It now truncates the string before the start of truncated multibyte sequences.
}
else {
length = 0;
while (length < precision && str[length]) {
length++;
}
pconsumed = (length < precision) ? NULL : &consumed;
Copy link
Member

@vstinner vstinner Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining why you set pconsumed? Explain the expected behavior of truncating incomplete sequence at the end, but still replace invalid sequence in the middle. Something like that :-) You can add a reference to the issue gh-70278.

@vstinner
Copy link
Member

Can you try to add an unit test?

@serhiy-storchaka
Copy link
Member Author

There are new tests in this PR. What other cases do you want to test?

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

For tests, GitHub didn't show me the whole diff when I clicked on a notification. I'm always confused by the UI, sorry.

@serhiy-storchaka serhiy-storchaka merged commit 6eb23b1 into python:main Jun 24, 2024
38 checks passed
@serhiy-storchaka serhiy-storchaka deleted the PyUnicode_FromFormat-truncate branch June 24, 2024 15:07
mrahtz pushed a commit to mrahtz/cpython that referenced this pull request Jun 30, 2024
…%V (pythonGH-120365)

PyUnicode_FromFormat() no longer produces the ending \ufffd
character for truncated C string when use precision with %s and %V.
It now truncates the string before the start of truncated multibyte sequences.
noahbkim pushed a commit to hudson-trading/cpython that referenced this pull request Jul 11, 2024
…%V (pythonGH-120365)

PyUnicode_FromFormat() no longer produces the ending \ufffd
character for truncated C string when use precision with %s and %V.
It now truncates the string before the start of truncated multibyte sequences.
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
…%V (pythonGH-120365)

PyUnicode_FromFormat() no longer produces the ending \ufffd
character for truncated C string when use precision with %s and %V.
It now truncates the string before the start of truncated multibyte sequences.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants