Skip to content

GH-128131: Completely support random read access of uncompressed unencrypted files in ZipFile #128143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jan 20, 2025
44 changes: 17 additions & 27 deletions Lib/test/test_zipfile/test_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -3447,8 +3447,8 @@ def test_too_short(self):
self.assertEqual(
b"zzz", zipfile._Extra.strip(b"zzz", (self.ZIP64_EXTRA,)))

class StoredZipExtFileRandomAccessTest(unittest.TestCase):
def test_random_access(self):
class StoredZipExtFileRandomReadTest(unittest.TestCase):
def test_random_read(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring describing what this function is testing. Give an overview of what the problem is and what is being guaranteed by the test. Also, link to the reported issue, where the problem should be described in exquisite detail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this function seems too broad. It's not simply testing a random read, but it's also testing the case where it's following an optimized path for an uncompressed file. Consider test_stored_seek.

from _pyio import BytesIO
class StatIO(BytesIO):
def __init__(self):
Expand All @@ -3460,40 +3460,30 @@ def read(self, size=-1):
self.bytes_read += len(bs)
return bs

def get_bytes_read(self):
return self.bytes_read

sio = StatIO()
# 100000 bytes
txt = b'0123456789'*10000
# 20000 bytes
txt = b'0123456789' * 2000

# Check seek on a file
with zipfile.ZipFile(sio, "w", compression=zipfile.ZIP_STORED) as zipf:
zipf.writestr("foo.txt", txt)

# check random seek and read on a file
with zipfile.ZipFile(sio, "r") as zipf:
with zipf.open("foo.txt", "r") as fp:
br = sio.get_bytes_read()
fp.seek(50000, os.SEEK_CUR)
self.assertEqual(sio.get_bytes_read() - br, 0, 'seek produces redundant read!')

b = fp.read(100)
self.assertEqual(b, txt[:100])

# seek length must be greater than ZipExtFile.MIN_READ_SIZE (4096)
# backward seek
br = sio.get_bytes_read()
fp.seek(5000, os.SEEK_CUR)
b = fp.read(100)
self.assertEqual(b, txt[50000:50100])
self.assertLessEqual(sio.get_bytes_read() - br, 4096, 'read redundant bytes during backward seek!')

# forward seek
br = sio.get_bytes_read()
fp.seek(-40000, os.SEEK_CUR)
b = fp.read(100)
self.assertEqual(b, txt[10100:10200])
self.assertLessEqual(sio.get_bytes_read() - br, 4096, 'read redundant bytes during forward seek!')
old_count = sio.bytes_read
fp.seek(10002, os.SEEK_CUR)
arr = fp.read(100)
self.assertEqual(arr, txt[10002:10102])
self.assertLessEqual(sio.bytes_read - old_count, 4096, 'Redundant bytes were read during forward seek and read!')

# backward seek
old_count = sio.bytes_read
fp.seek(-5003, os.SEEK_CUR)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fp.seek(-5003, os.SEEK_CUR)
fp.seek(-backward_seek, os.SEEK_CUR)

arr = fp.read(100)
self.assertEqual(arr, txt[5099:5199])
self.assertLessEqual(sio.bytes_read - old_count, 4096, 'Redundant bytes were read during backward seek and read!')

if __name__ == "__main__":
unittest.main()
Original file line number Diff line number Diff line change
@@ -1 +1 @@
Support fast forward seek in uncompressed unencrypted :class:`!zipfile.ZipExtFile`.
Completely support random access of uncompressed unencrypted :class:`!zipfile.ZipExtFile`.
Loading