-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-88339: enable fast seeking of uncompressed unencrypted zipfile.ZipExtFile #27737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the test_zipfile.py
unittest explicitly cover this case of seeking within a ZIP_STORED entry? If so, mention that and drop a link to where in a comment here.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding an explicit seek
+ read
test case to Lib/test/test_zipfile.py
's StoredTestsWithSourceFile
class that checks that the data read matches what is expected at the seek'ed to offset?
I don't get why this test fails. Do you maybe have any idea? |
Will try to do, thanks! |
There is a unittest explicitly covering this case. The one which fails. |
You are right! It's that unit test. I added one assert to it. There were problems with read buffer which I fixed now. I rebased and added Please run CI and tell me if I'm still missing something. ;) BTW. Thanks for your work guys! |
Previous tests passed (some asyncio on windows x64 failed, but it's probably not my fault). |
Can I ask for review please? EDIT: Just rebased my branch. |
I have made the requested changes; please review again (test is already here, I added some tell asserts) |
Thanks for making the requested changes! @gpshead: please review the changes made to this pull request. |
It looks like I fixed all issues you guys found! :D |
Oh and I again forgot to tag you guys: @gpshead, @serhiy-storchaka |
I have made the requested changes; please review again |
Thanks for making the requested changes! @gpshead: please review the changes made to this pull request. |
Guys, what's wrong. It's over a year here now, I fixed everything you mentioned. Can I do something more to make this faster? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this could use more explicit testing around all possible seek edge cases, but that is true in general of the zipfile module and not specific to this PR. It appears to do the right thing and the existing tests that hopefully represent the more likely usages continue to pass.
If I haven't merged this within a few days, please ping me again. I'm giving Serhiy some time to respond if desired. |
@gpshead ;) Also I've a question - merging it now to main branch means this changes will be available from python 3.12, not 3.11? |
@gpshead ;) |
This reverts commit 13438487cb14a9947f3fb79faefd61c6ea0123ca.
ping @gpshead |
ping @gpshead |
bump @gpshead |
yep, this is in 3.12. the feature freeze window for a release closes upon the first beta. |
thanks for the improvement! |
…le.ZipExtFile (pythonGH-27737) Avoid reading all of the intermediate data in uncompressed items in a zip file when the user seeks forward. Contributed by: @JuniorJPDJ
At the moment stored ZipExtFile is being read to the place of seek like all other compressed variants.
It's not needed as it's possible to freely seek uncompressed file inside zip without this penalty.
Lots of apps depend on ZipExtFile seeking ability and this patch would increase performance and lower IO penalty significantly.
It disables CRC checking after first seek as it's impossible to check CRC if we are not reading whole file.
I've been using patched zipfile for almost a year now and I can say it works good ;)
https://bugs.python.org/issue44173 / #88339
redo of #26227 after a little fail