Skip to content

gh-129205: Experiment BytesIO._readfrom() #130098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

cmaloney
Copy link
Contributor

@cmaloney cmaloney commented Feb 13, 2025

Draft PR / Experiment

Rather than directly moving loops, I have been experimenting with a BytesIO._readfrom(file, /, *, estimate=None, limit=None) that encapsulates the buffer resizing efficiently as well as the read loop, adding common code for features "estimated size" and "limit size". It can be used to implement FileIO.readall with minimal perf change (is in PR). In general I think "If there is a read loop, it should be faster and simpler".

The _pyio implementation supports for three kinds of IO objects: Direct FD ints, those with a .readinto member, and those with .read member. If that looks like a reasonable approach, I'd likely introduce it as a internal method BytesIO._readfrom() and move cases (with perf tests to make sure things don't regress).

In the C implementation I included an optimization to avoid a heap allocation by using 1KB of stack space in the estimate=0 case instead. Not sure that's worth the complexity and cost (if it gets used, need an extra copy compared to just using a bytes; and a warmed up interpreter feels like 1KB PyBytes is likely to be quickly available / allocated).

The CPython codebase has a common pattern of build a list of I/O chunks than "join" them together at the end of the loop. I think readfrom makes a tradeoff in that case, in that as long as resize infrequently copies (I think not lots of other memory buffers being allocated), it should be faster than that single extra large join and copy at the end. I haven't run full performance numbers though. In my mental model using non-linear buffer resizing for large readall is likely a much bigger performance gain and reducing number of allocs + deallocs, than potential realloc copies; definitely uses less memory overall.

if estimate is not None:
target_read = int(estimate) + 1
else:
target_read = DEFAULT_BUFFER_SIZE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: max(DEFAULT_BUFFER_SIZE, len(self._buffer) - self._pos) / if there's already lots of space use it.

@vstinner
Copy link
Member

I don't understand well the purpose of this change. If it's an optimization, do you have benchmark results showing that it makes a difference?

@cmaloney
Copy link
Contributor Author

cmaloney commented Feb 20, 2025

Couple motivators for me:

  1. It refactors to de-duplicate code in CPython, currently readall and return bytes is re-implemented regularly with varying features (to end, loop but cap size, estimated size).
  2. Provide a standard, and cheaper than io.open() stack way to read data from a file-like object. Currently code can't get the features it wants and tends to hand-roll a loop around os.read with a fd.
  3. It should make individual cases faster, provide centralized place to tune.
    a. It generally should have less copies (readinto + resize vs. append) / memory usage vs. call read() and append bytes loops. Hand-writing readinto loops definitely complicates code, vs. one efficient core implementation.
    b. Non-linear buffer resizing should be significantly faster than constant buffer size as size gets larger
    c. Should be faster than build list of bytes followed by one large alloc + copy at end ("".join(list_of_bytes) pattern).

I'll work on measuring and open a more directed issue if the performance delta is significant between the common code patterns today.

@cmaloney cmaloney closed this Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants