-
-
Notifications
You must be signed in to change notification settings - Fork 32k
Docs: C API: Clarify what happens when null bytes are passed to PyUnicode_AsUTF8
#127458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small change to wording
Co-authored-by: Stan U. <[email protected]>
Co-authored-by: Tomas R. <[email protected]>
I'm not sure if it's good to say that the function "doesn't handle" null bytes - the bytes are there and accessible, the problem is that |
Yeah, "doesn't handle" as in doesn't raise an error or strip them. I don't think it's worth changing the wording, but I'll let others weigh in. |
PyUnicode_AsUTF8
PyUnicode_AsUTF8
Doc/c-api/unicode.rst
Outdated
`null bytes <https://en.wikipedia.org/wiki/Null_character>`_ embedded within | ||
*unicode*. As a result, strings containing null bytes will remain in the returned | ||
string, which some C functions might interpret as the end of the string, leading to | ||
truncation. When handling user input, it is recommended to use :c:func:`PyUnicode_AsUTF8AndSize` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that "user input" is useful here. I would prefer to say something like: "If truncation is an issue, ..."
Co-authored-by: Victor Stinner <[email protected]>
@@ -1035,6 +1035,15 @@ These are the UTF-8 codec APIs: | |||
|
|||
As :c:func:`PyUnicode_AsUTF8AndSize`, but does not store the size. | |||
|
|||
.. warning:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A warning is a strong signal. Maybe a note is enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally had it as a note, but I think a warning is what we want here. I only discovered this quirk of PyUnicode_AsUTF8
because it caused a crash in the _interpreters
module. Things that can potentially cause security issues should probably get a warning, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that notes about security are usually documented in red as warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Friendly ping @vstinner :) |
Merged, thank you. |
How do we feel about backporting? |
Thanks @ZeroIntensity for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13. |
Thanks @ZeroIntensity for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12. |
…code_AsUTF8` (pythonGH-127458) (cherry picked from commit e792f4b) Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Stan U. <[email protected]> Co-authored-by: Tomas R. <[email protected]> Co-authored-by: Victor Stinner <[email protected]>
GH-129080 is a backport of this pull request to the 3.13 branch. |
…code_AsUTF8` (pythonGH-127458) (cherry picked from commit e792f4b) Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Stan U. <[email protected]> Co-authored-by: Tomas R. <[email protected]> Co-authored-by: Victor Stinner <[email protected]>
GH-129081 is a backport of this pull request to the 3.12 branch. |
I backported the change to 3.12 and 3.13 branches. |
… `PyUnicode_AsUTF8` (GH-127458) (#129080) Docs C API: Clarify what happens when null bytes are passed to `PyUnicode_AsUTF8` (GH-127458) (cherry picked from commit e792f4b) Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Stan U. <[email protected]> Co-authored-by: Tomas R. <[email protected]> Co-authored-by: Victor Stinner <[email protected]>
… `PyUnicode_AsUTF8` (GH-127458) (#129081) Docs C API: Clarify what happens when null bytes are passed to `PyUnicode_AsUTF8` (GH-127458) (cherry picked from commit e792f4b) Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Stan U. <[email protected]> Co-authored-by: Tomas R. <[email protected]> Co-authored-by: Victor Stinner <[email protected]>
…code_AsUTF8` (python#127458) Co-authored-by: Stan U. <[email protected]> Co-authored-by: Tomas R. <[email protected]> Co-authored-by: Victor Stinner <[email protected]>
Per discussion from a few days ago,
PyUnicode_AsUTF8
is missing documentation for the embedded null character gotcha. There was some pushback about marking it as a warning, so I've left it as a note for now.📚 Documentation preview 📚: https://cpython-previews--127458.org.readthedocs.build/