[WIP] gh-129813: Add PyBytesWriter C API #129814

vstinner · 2025-02-07T15:48:38Z

Replace usage of the old private _PyBytesWriter with the new public PyBytesWriter C API.
Remove the old private _PyBytesWriter C API.
Add a freelist for PyBytesWriter_Create().
TODO: write doc
TODO: document new functions in What's New and Changelog

Issue: [C API] PEP 782: Add PyBytesWriter API #129813

* Replace usage of the old private _PyBytesWriter with the new public PyBytesWriter C API. * Remove the old private _PyBytesWriter C API. * Add a freelist for PyBytesWriter_Create(). * TODO: write doc * TODO: document new functions in What's New and Changelog

vstinner · 2025-02-07T15:49:33Z

The PR is big because it also replaces usage of the old private API with new public API. If the API is approved, I will split the PR into smaller pieces and measure the performance impact of these changes.

vstinner · 2025-02-07T15:56:17Z

Some functions should be optimized after the removal of the private min_size member:

unicode_encode_ucs1()
utf8_encoder()
PyBytes_FromFormatV()

These functions allocate a little bit too much memory, extend argument of PyBytesWriter_Extend() should be adjusted.

I just tried to make the code work, not to optimize it.

vstinner · 2025-02-07T16:16:52Z

This change has no impact on performance, even if the new public API allocates memory on the heap, instead of allocating on the stack. It uses a freelist to optimize PyBytesWriter_Create().

Example of microbenchmark on 3 functions:

import pyperf
import binascii

runner = pyperf.Runner()
runner.bench_func('from list 100', bytes, list(b'x' * 100))
runner.bench_func('from list 1,000', bytes, list(b'x' * 1_000))

runner.bench_func('from hex 100', bytes.fromhex, bytes(range(100)).hex())
runner.bench_func('from hex 1,000', bytes.fromhex, (b'x' * 1_000).hex())

runner.bench_func('b2a_uu', binascii.b2a_uu, b'x' * 45)

Result:

Benchmark	ref	public
from list 100	672 ns	647 ns: 1.04x faster
from list 1,000	6.22 us	6.12 us: 1.02x faster
from hex 100	143 ns	145 ns: 1.02x slower
from hex 1,000	1.02 us	1.03 us: 1.00x slower
Geometric mean	(ref)	1.01x faster

Benchmark hidden because not significant (1): b2a_uu

cmaloney · 2025-02-11T01:18:09Z

pseudo-tangential idea: Could this instead just be a C wrapper for io.BytesIO? Working to try to get to one fast implementation to read/write a bytes object.

vstinner · 2025-02-13T18:57:29Z

pseudo-tangential idea: Could this instead just be a C wrapper for io.BytesIO?

io.BytesIO API is basically the write(bytes) method, whereas proposed PyBytesWriter API gives directly a pointer into a buffer. It's a different API. I don't think that io.BytesIO API can or should be modified to give directly a pointer. io.BytesIO is more complex since it allows changing the position, the seek() method, and also reading, the read() method.

vstinner · 2025-03-12T11:24:26Z

I created a discussion: https://discuss.python.org/t/add-pybyteswriter-public-c-api/81182

It seems like most developers are confused by the API which requires to pass writer and buf to most functions. I abandon this API.

vstinner added the DO-NOT-MERGE label Feb 7, 2025

bedevere-app bot mentioned this pull request Feb 7, 2025

[C API] PEP 782: Add PyBytesWriter API #129813

Closed

vstinner added 7 commits February 13, 2025 20:03

Add documentation

8f8aeb8

Add PyBytesWriter_GetRemaining()

68d2fe7

Optimize bytes_fromformat()

e582385

Remove PyBytesWriter_GetAllocated()

7cb444b

Add 'center' example

486d7cd

Change PyBytesWriter_WriteBytes() argument type to void*

c4f4c07

Documentation

18ec24d

vstinner closed this Mar 12, 2025

vstinner deleted the bytes_writer branch March 12, 2025 11:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] gh-129813: Add PyBytesWriter C API #129814

[WIP] gh-129813: Add PyBytesWriter C API #129814

vstinner commented Feb 7, 2025 •

edited by bedevere-app bot

Loading

vstinner commented Feb 7, 2025

vstinner commented Feb 7, 2025

vstinner commented Feb 7, 2025

cmaloney commented Feb 11, 2025

vstinner commented Feb 13, 2025

vstinner commented Mar 12, 2025

[WIP] gh-129813: Add PyBytesWriter C API #129814

[WIP] gh-129813: Add PyBytesWriter C API #129814

Conversation

vstinner commented Feb 7, 2025 • edited by bedevere-app bot Loading

vstinner commented Feb 7, 2025

vstinner commented Feb 7, 2025

vstinner commented Feb 7, 2025

cmaloney commented Feb 11, 2025

vstinner commented Feb 13, 2025

vstinner commented Mar 12, 2025

vstinner commented Feb 7, 2025 •

edited by bedevere-app bot

Loading