-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-110913: Fix WindowsConsoleIO chunking of UTF-8 text #111007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-110913: Fix WindowsConsoleIO chunking of UTF-8 text #111007
Conversation
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
b8a5110
to
b66a4e4
Compare
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
b66a4e4
to
709fb94
Compare
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
709fb94
to
2be4b92
Compare
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
2be4b92
to
2b0bf5b
Compare
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
I think we should have a NEWS entry for this, just to acknowledge that we changed the fix. Something like The change looks fine to me, but I'd like one more ACK before merging in case my head still isn't clear. |
Thanks! Added the NEWS entry. |
Misc/NEWS.d/next/Windows/2023-10-19-21-46-18.gh-issue-110913.CWlPfg.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it even possible to write tests for this?
Not without refactoring the entire thing (e.g. to make the actual It's a harmless enough issue that I think we're okay. Misunderstanding the UTF-8 format is what the problem was this time, and that's been triple checked now. |
I'm just wondering if it's possible to get invalid UTF-8 here (using "surrogateescape"?). If yes, then we should use more sophisticated way to find the longest valid prefix of invalid sequence to meet recomendations for handling invalid UTF-8. Otherwise LGTM. |
If it's undecodable by Thanks for the review! |
Thanks @sorgloomer for the PR, and @zooba for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11, 3.12. |
…H-111007) (cherry picked from commit 11312ea) Co-authored-by: Tamás Hegedűs <[email protected]>
…H-111007) (cherry picked from commit 11312ea) Co-authored-by: Tamás Hegedűs <[email protected]>
GH-111108 is a backport of this pull request to the 3.12 branch. |
GH-111109 is a backport of this pull request to the 3.11 branch. |
(cherry picked from commit 11312ea) Co-authored-by: Tamás Hegedűs <[email protected]>
(cherry picked from commit 11312ea) Co-authored-by: Tamás Hegedűs <[email protected]>
Fix the loop that searches for an UTF-8 sequence boundary