Docs: C API: Clarify what happens when null bytes are passed to `PyUnicode_AsUTF8` #127458

ZeroIntensity · 2024-11-30T22:22:54Z

Per discussion from a few days ago, PyUnicode_AsUTF8 is missing documentation for the embedded null character gotcha. There was some pushback about marking it as a warning, so I've left it as a note for now.

📚 Documentation preview 📚: https://cpython-previews--127458.org.readthedocs.build/

…characters.

StanFromIreland

Small change to wording

Doc/c-api/unicode.rst

Co-authored-by: Stan U. <[email protected]>

Doc/c-api/unicode.rst

Co-authored-by: Tomas R. <[email protected]>

TeamSpen210 · 2024-12-01T20:26:40Z

I'm not sure if it's good to say that the function "doesn't handle" null bytes - the bytes are there and accessible, the problem is that strlen() isn't able to handle them. Maybe it'd be better to say because strings can contain null bytes, strlen() should not be used, you should fetch that from the object?

ZeroIntensity · 2024-12-01T20:32:14Z

Yeah, "doesn't handle" as in doesn't raise an error or strip them. I don't think it's worth changing the wording, but I'll let others weigh in.

Doc/c-api/unicode.rst

vstinner · 2025-01-13T20:09:28Z

Doc/c-api/unicode.rst

+      `null bytes <https://en.wikipedia.org/wiki/Null_character>`_ embedded within
+      *unicode*. As a result, strings containing null bytes will remain in the returned
+      string, which some C functions might interpret as the end of the string, leading to
+      truncation. When handling user input, it is recommended to use :c:func:`PyUnicode_AsUTF8AndSize`


I don't think that "user input" is useful here. I would prefer to say something like: "If truncation is an issue, ..."

Co-authored-by: Victor Stinner <[email protected]>

vstinner · 2025-01-13T20:54:12Z

Doc/c-api/unicode.rst

@@ -1035,6 +1035,15 @@ These are the UTF-8 codec APIs:

   As :c:func:`PyUnicode_AsUTF8AndSize`, but does not store the size.

+   .. warning::


A warning is a strong signal. Maybe a note is enough?

I originally had it as a note, but I think a warning is what we want here. I only discovered this quirk of PyUnicode_AsUTF8 because it caused a crash in the _interpreters module. Things that can potentially cause security issues should probably get a warning, right?

You're right that notes about security are usually documented in red as warnings.

vstinner

LGTM

ZeroIntensity · 2025-01-20T15:49:01Z

Friendly ping @vstinner :)

vstinner · 2025-01-20T15:54:39Z

Merged, thank you.

ZeroIntensity · 2025-01-20T15:55:39Z

How do we feel about backporting?

miss-islington-app · 2025-01-20T15:59:02Z

Thanks @ZeroIntensity for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

miss-islington-app · 2025-01-20T15:59:04Z

Thanks @ZeroIntensity for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

…code_AsUTF8` (pythonGH-127458) (cherry picked from commit e792f4b) Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Stan U. <[email protected]> Co-authored-by: Tomas R. <[email protected]> Co-authored-by: Victor Stinner <[email protected]>

bedevere-app · 2025-01-20T15:59:15Z

GH-129080 is a backport of this pull request to the 3.13 branch.

…code_AsUTF8` (pythonGH-127458) (cherry picked from commit e792f4b) Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Stan U. <[email protected]> Co-authored-by: Tomas R. <[email protected]> Co-authored-by: Victor Stinner <[email protected]>

bedevere-app · 2025-01-20T15:59:21Z

GH-129081 is a backport of this pull request to the 3.12 branch.

vstinner · 2025-01-20T16:00:01Z

How do we feel about backporting?

I backported the change to 3.12 and 3.13 branches.

… `PyUnicode_AsUTF8` (GH-127458) (#129080) Docs C API: Clarify what happens when null bytes are passed to `PyUnicode_AsUTF8` (GH-127458) (cherry picked from commit e792f4b) Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Stan U. <[email protected]> Co-authored-by: Tomas R. <[email protected]> Co-authored-by: Victor Stinner <[email protected]>

… `PyUnicode_AsUTF8` (GH-127458) (#129081) Docs C API: Clarify what happens when null bytes are passed to `PyUnicode_AsUTF8` (GH-127458) (cherry picked from commit e792f4b) Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Stan U. <[email protected]> Co-authored-by: Tomas R. <[email protected]> Co-authored-by: Victor Stinner <[email protected]>

…code_AsUTF8` (python#127458) Co-authored-by: Stan U. <[email protected]> Co-authored-by: Tomas R. <[email protected]> Co-authored-by: Victor Stinner <[email protected]>

ZeroIntensity added 3 commits November 30, 2024 17:10

Document what happens when PyUnicode_AsUTF8() is given embedded null …

680651c

…characters.

Suggest PyUnicode_AsUTF8AndSize for user input.

8bfd541

Switch to a note instead of a warning.

52e9117

ZeroIntensity added docs Documentation in the Doc dir skip issue skip news topic-C-API labels Nov 30, 2024

bedevere-app bot added the awaiting review label Nov 30, 2024

StanFromIreland reviewed Dec 1, 2024

View reviewed changes

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

Update Doc/c-api/unicode.rst

1b393d4

Co-authored-by: Stan U. <[email protected]>

tomasr8 reviewed Dec 1, 2024

View reviewed changes

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

Update Doc/c-api/unicode.rst

040608b

Co-authored-by: Tomas R. <[email protected]>

Play with the wording a little bit.

6fb8cbe

ZeroIntensity changed the title ~~Docs: Clarify what happens when null bytes are passed to PyUnicode_AsUTF8~~ Docs: C API: Clarify what happens when null bytes are passed to PyUnicode_AsUTF8 Dec 16, 2024

Add a reference.

3c7b6be

ZeroIntensity requested a review from vstinner January 13, 2025 18:09

vstinner reviewed Jan 13, 2025

View reviewed changes

ZeroIntensity and others added 2 commits January 13, 2025 15:45

Update Doc/c-api/unicode.rst

0eac45f

Co-authored-by: Victor Stinner <[email protected]>

Switch the wording away from 'user input'

35e0783

vstinner reviewed Jan 13, 2025

View reviewed changes

vstinner approved these changes Jan 13, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Jan 13, 2025

vstinner merged commit e792f4b into python:main Jan 20, 2025
29 checks passed

bedevere-app bot removed the awaiting merge label Jan 20, 2025

ZeroIntensity deleted the asutf8-null-chars branch January 20, 2025 15:55

vstinner added needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes labels Jan 20, 2025

bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Jan 20, 2025

bedevere-app bot removed the needs backport to 3.12 only security fixes label Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs: C API: Clarify what happens when null bytes are passed to `PyUnicode_AsUTF8` #127458

Docs: C API: Clarify what happens when null bytes are passed to `PyUnicode_AsUTF8` #127458

ZeroIntensity commented Nov 30, 2024 •

edited by github-actions bot

Loading

StanFromIreland left a comment

TeamSpen210 commented Dec 1, 2024

ZeroIntensity commented Dec 1, 2024

vstinner Jan 13, 2025

vstinner Jan 13, 2025

ZeroIntensity Jan 13, 2025

vstinner Jan 13, 2025

vstinner left a comment

ZeroIntensity commented Jan 20, 2025

vstinner commented Jan 20, 2025

ZeroIntensity commented Jan 20, 2025

miss-islington-app bot commented Jan 20, 2025

miss-islington-app bot commented Jan 20, 2025

bedevere-app bot commented Jan 20, 2025

bedevere-app bot commented Jan 20, 2025

vstinner commented Jan 20, 2025

		@@ -1035,6 +1035,15 @@ These are the UTF-8 codec APIs:

		As :c:func:`PyUnicode_AsUTF8AndSize`, but does not store the size.

		.. warning::

Docs: C API: Clarify what happens when null bytes are passed to PyUnicode_AsUTF8 #127458

Docs: C API: Clarify what happens when null bytes are passed to PyUnicode_AsUTF8 #127458

Conversation

ZeroIntensity commented Nov 30, 2024 • edited by github-actions bot Loading

StanFromIreland left a comment

Choose a reason for hiding this comment

TeamSpen210 commented Dec 1, 2024

ZeroIntensity commented Dec 1, 2024

vstinner Jan 13, 2025

Choose a reason for hiding this comment

vstinner Jan 13, 2025

Choose a reason for hiding this comment

ZeroIntensity Jan 13, 2025

Choose a reason for hiding this comment

vstinner Jan 13, 2025

Choose a reason for hiding this comment

vstinner left a comment

Choose a reason for hiding this comment

ZeroIntensity commented Jan 20, 2025

vstinner commented Jan 20, 2025

ZeroIntensity commented Jan 20, 2025

miss-islington-app bot commented Jan 20, 2025

miss-islington-app bot commented Jan 20, 2025

bedevere-app bot commented Jan 20, 2025

bedevere-app bot commented Jan 20, 2025

vstinner commented Jan 20, 2025

Docs: C API: Clarify what happens when null bytes are passed to `PyUnicode_AsUTF8` #127458

Docs: C API: Clarify what happens when null bytes are passed to `PyUnicode_AsUTF8` #127458

ZeroIntensity commented Nov 30, 2024 •

edited by github-actions bot

Loading