Skip to content

Restrict vector arg eltype of read/copyuntil to UInt8 #58019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jakobnissen
Copy link
Member

@jakobnissen jakobnissen commented Apr 6, 2025

The previous implementation of copyuntil did not document what the vector delimiter could be, but the implementation allowed it to be a vector of anything. This laxness has the following problems:

  1. It's semantically dubious to read until the first T in an IO, when IOs are generally conceptualized to contain bytes, and not Ts. This seems to suggest a notion of an IO "containing" T's instead of bytes (which is a different thing from merely being able to read T's from the IO).

  2. The implementation is buggy, or at least has highly surprising consequences. For example:

julia> println(readuntil(IOBuffer("\1\2\3\4"), 0x0302)) # does not recognize the `0x0302` at bytes 2:3
UInt16[0x0201, 0x0403]

julia> println(readuntil(IOBuffer("\1\2\3\4\5"), 0x0302)) # same as above, but also wrongly errors
ERROR: EOFError: read end of file

julia> println(readuntil(IOBuffer("alpha beta gamma"), ["beta"])) # what does this even mean? And why isn't the "beta" found?
["alpha beta gamma"]

These bugs are fundamental to the current implementation, and it's difficult to fix them.

  1. The generic implementation is highly inefficient because it depends on readeach. It's difficult to make this generically efficient for arbitrary T.

We could also support AbstractVector{<:AbstractChar} by converting it to String, internally. But I wouldn't support this. It's too messy to have one function do everything, and the user is best served having to construct the string / vector themselves.

Closes #51724

The previous implementation of `copyuntil` did not document what the vector
delimiter could be, but the implementation allowed it to be a vector of anything.
This laxness has the following problems:

1. It's semantically dubious to read until the first T in an IO, when IOs are
   generally conceptualized to contain bytes, and not Ts. This seems to suggest
   a notion of an IO "containing" T's, which is different from being able to
   read T's from the IO, and probably implies `x::T in io::IO` should also work.

3. The implementation is buggy, or at least has highly surprising consequences.
   For example:

```julia
julia> println(readuntil(IOBuffer("\1\2\3\4"), 0x0302))
UInt16[0x0201, 0x0403]

julia> println(readuntil(IOBuffer("\1\2\3\4\5"), 0x0302))
ERROR: EOFError: read end of file
```

2. The generic implementation is highly inefficient because it depends on
   `readeach`. It's difficult to make this generically efficient for arbitrary T.
@jakobnissen jakobnissen added the io Involving the I/O subsystem: libuv, read, write, etc. label Apr 6, 2025
@stevengj
Copy link
Member

stevengj commented Apr 8, 2025

It's semantically dubious to read until the first T in an IO,

Why? We can do read(io, T), after all? Note that this feature of readuntil has been present since 3434a2e in 2012 (Julia 0.2), following a discussion in #1792, so I don't see how we can justify removing it now.

@stevengj stevengj added the breaking This change will break code label Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code io Involving the I/O subsystem: libuv, read, write, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vector argument of readuntil is poorly documented
2 participants