-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-129205: Experiment BytesIO._readfrom() #130098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if estimate is not None: | ||
target_read = int(estimate) + 1 | ||
else: | ||
target_read = DEFAULT_BUFFER_SIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: max(DEFAULT_BUFFER_SIZE, len(self._buffer) - self._pos)
/ if there's already lots of space use it.
I don't understand well the purpose of this change. If it's an optimization, do you have benchmark results showing that it makes a difference? |
Couple motivators for me:
I'll work on measuring and open a more directed issue if the performance delta is significant between the common code patterns today. |
Draft PR / Experiment
Rather than directly moving loops, I have been experimenting with a
BytesIO._readfrom(file, /, *, estimate=None, limit=None)
that encapsulates the buffer resizing efficiently as well as the read loop, adding common code for features "estimated size" and "limit size". It can be used to implementFileIO.readall
with minimal perf change (is in PR). In general I think "If there is a read loop, it should be faster and simpler".The
_pyio
implementation supports for three kinds of IO objects: Direct FD ints, those with a.readinto
member, and those with.read
member. If that looks like a reasonable approach, I'd likely introduce it as a internal methodBytesIO._readfrom()
and move cases (with perf tests to make sure things don't regress).In the C implementation I included an optimization to avoid a heap allocation by using 1KB of stack space in the
estimate=0
case instead. Not sure that's worth the complexity and cost (if it gets used, need an extra copy compared to just using a bytes; and a warmed up interpreter feels like 1KB PyBytes is likely to be quickly available / allocated).The CPython codebase has a common pattern of build a list of I/O chunks than "join" them together at the end of the loop. I think readfrom makes a tradeoff in that case, in that as long as resize infrequently copies (I think not lots of other memory buffers being allocated), it should be faster than that single extra large join and copy at the end. I haven't run full performance numbers though. In my mental model using non-linear buffer resizing for large readall is likely a much bigger performance gain and reducing number of allocs + deallocs, than potential
realloc
copies; definitely uses less memory overall.