-
Notifications
You must be signed in to change notification settings - Fork 1.7k
File.readAsBytesSync()
doesn't fully read all data
#51071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@aam Interested in fixing this? |
I'm interested in fixing this and @aam is away. |
File.readBytesSync()
doesn't fully read all dataFile.readAsBytesSync()
doesn't fully read all data
Looping until everything is read e.g. int64_t File::Read(void* buffer, int64_t num_bytes) {
ASSERT(handle_->fd() >= 0);
int64_t num_bytes_read = 0;
while (num_bytes_read < num_bytes) {
// The behavior of `read(fildes, buf, nbyte)` where nbyte >= SSIZE_MAX is
// implementation-defined by the POSIX standard. On Linux, up to SSIZE_MAX
// bytes will be read.
ssize_t result = TEMP_FAILURE_RETRY(
read(handle_->fd(), reinterpret_cast<char*>(buffer) + num_bytes_read,
num_bytes - num_bytes_read));
if (result < 0) {
return result;
} else if (result == 0) {
return num_bytes_read + result;
} else {
num_bytes_read += result;
}
}
return num_bytes_read;
} has the unintended effect of breaking our support for pipes i.e.
There are other ways to fix this problem...
For (1), we already do that for the async case: |
👍 Excellent!
What's important for performance is that we avoid the extra step of combining several
Could the caller of
Dart first gets the length of file and then instructs C++ to read that number of bytes. For pipes we don't know how many bytes are in the pipe, so distinguishing the two cases sounds reasonable. |
Hmmm, I fixed this by modifying // The maximum number of bytes to read in a single call to `readSync`.
//
// On Windows and macOS, it is an error to call
// `read/_read(fildes, buf, nbyte)` with `nbyte >= INT_MAX`.
//
// The POSIX specification states that the behavior of `read` is
// implementation-defined if `nbyte > SSIZE_MAX`. On Linux, the `read` will
// transfer at most 0x7ffff000 bytes and return the number of bytes actually.
// transfered.
const int _maxReadSize = 2147483647;
Uint8List readAsBytesSync() {
var opened = openSync();
try {
var length = opened.lengthSync();
var builder = new BytesBuilder(copy: false);
if (length == 0) {
// May be character device, try to read it in chunks.
Uint8List data;
do {
data = opened.readSync(_blockSize);
if (data.length > 0) {
builder.add(data);
}
} while (data.length > 0);
} else {
// `readSync(bytes)` will over-allocate memory if `bytes` is greater
// than the length of the file.
//
// `BytesBuilder` has an optimization where, if it contains a single
// Uint8List, it does not copy memory in `takeBytes`.
//
// So the most efficient approach is to try to read the entire file
// at once with a `bytes` not larger than the size of the file.
Uint8List data;
var bytesRemaining = length;
do {
data = opened.readSync(min(bytesRemaining, _maxReadSize));
if (data.length > 0) {
bytesRemaining -= data.length;
builder.add(data);
}
} while (data.length > 0 && bytesRemaining > 0);
}
return builder.takeBytes();
} finally {
opened.closeSync();
}
} But I didn't notice But OTOH, Unless you disagree, I'll try to fix/use |
Isn't the issue actually with https://github.com/dart-lang/sdk/blob/main/runtime/platform/utils_linux.cc#L46
and similar posix implementations where it won't read more than 2,147,479,552 bytes at a time(https://man7.org/linux/man-pages/man2/read.2.html)? I don't see similar limitation to the |
If it's helpful you could also have two natives - one to read certain number of bytes (or less if EOF comes before that) and another that reads whatever is available in one read operation (pipe situation).
This may cause copies. A dart-only solution could allocate one |
@aam, yes the issue is that there is a max read size. @mkustermann I was aware of the copy but it would only occur if the file is >2GB in length so I wasn't really worried about it. But |
Could we perhaps then adjust |
No, as I said in #51071 (comment), that will break files with no length so as pipes. I think that fixing and then using |
I was thinking specifically about handling 2,147,479,552 as number of bytes read as it indicates that the internal posix read limit is hit. |
Someone went to a bunch of effort to make |
I have a change out for review here: https://dart-review.googlesource.com/c/sdk/+/279204 |
Bug: #51071 Change-Id: Ia64d803c9709b106e52a1c671c1c3288c051bd85 Tested: ci + new test CoreLibraryReviewExempt: bug fix only for vm Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/279204 Reviewed-by: Alexander Aprelev <[email protected]> Reviewed-by: Martin Kustermann <[email protected]> Commit-Queue: Brian Quinlan <[email protected]>
Fixed in 252015b |
can you give me example of ReadFully |
Running:
should print
but instead prints
The issue is pretty obvious runtime/bin/file.cc
The function assumes that
file->Read(...)
reads all that is possible, and if it reads less, the file may have been truncated (i.e. file size changed fromlength
to something less).Though
file->Read(...)
only issues a singleread()
call instead of a loop until everything is read:The text was updated successfully, but these errors were encountered: