Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pnp loader speed and memory: jszip implementation #6688

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

goloveychuk
Copy link
Contributor

@goloveychuk goloveychuk commented Feb 14, 2025

What's the problem this PR addresses?

rework of #6671
TODO:

  • check security?
  • tests
  • elaborate mixed compression logic
  • benchmarks?
  • hide exports and interfaces for external consumers?

...

How did you fix it?

...

Checklist

  • I have set the packages that need to be released for my changes to be effective.
  • I will check that all automated PR checks pass before the PR gets reviewed.

@goloveychuk goloveychuk force-pushed the minizip2 branch 2 times, most recently from 8c5a405 to 5a847fd Compare February 23, 2025 14:03
@goloveychuk goloveychuk changed the title jsZip implementation Improve pnp loader speed and memory: jszip implementation Feb 23, 2025
@goloveychuk
Copy link
Contributor Author

goloveychuk commented Feb 23, 2025

@arcanis review pls.
Todo:

  1. have question about one test libzip/incons-file-count-overflow.zip
  2. need benchmark?
  3. mixed compression logic should be checked

@goloveychuk
Copy link
Contributor Author

@merceyz @RDIL maybe you want to review?

@goloveychuk
Copy link
Contributor Author

@arcanis if you want I can move jszip implementation into my repo, and add as external dependency from berry.

Comment on lines +96 to +100
minizip: {
description: `Whether Yarn should use minizip to extract archives`,
type: SettingsType.BOOLEAN,
default: false,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the main benefit of this PR is to decrease the runtime footprint (by comparison the install doesn't have a lot to gain, since only unplugged packages are extracted on the disk and there's very little of them), I'd rather keep the existing codepath for unplug (to simplify the changes as much as possible).

@@ -115,6 +123,15 @@ const plugin: Plugin<CoreHooks & StageHooks> = {
default: [],
isArray: true,
},
experimentalZipImplementation: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
experimentalZipImplementation: {
pnpZipImplementation: {

entries.push({
name,
os,
mtime: 0, //we dont care,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use 456789000 (see SAFE_TIME)

}

export class ZipFS extends BasePortableFakeFS {
private readonly libzip: Libzip;
export interface ZipImpl {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you move it to a ZipImplementation.ts file?

@arcanis
Copy link
Member

arcanis commented Mar 1, 2025

Hey @goloveychuk! No it's fine, I prefer to have it inside this repository at least as long as it's experimental; I still have to review this PR, it's pretty large (even discounting the test artifacts) and I've been extremely busy at work since the beginning of the year. I'll try to finish the review by next week.

@goloveychuk
Copy link
Contributor Author

Thank you.
Tests could be rewritten from snapshot file to inline snapshots or code which defines expected files list.

@goloveychuk
Copy link
Contributor Author

hey, @arcanis gentle reminder. Thanks!

@arcanis
Copy link
Member

arcanis commented Mar 28, 2025

I think the implementation looks good. I didn't see any perf difference when running the basic tests, but they are very very simple so it's not surprising. A better way to benchmark would be to use our yarn bench command (see scripts/bench-folder.ts), but it doesn't support using a different benchmark command between tests at the moment.

@goloveychuk
Copy link
Contributor Author

@arcanis I also suggest to rethink caching
https://github.com/goloveychuk/berry/blob/fb552204a46f364d75d08117e88f044f063c3703/packages/yarnpkg-libzip/sources/ZipFS.ts#L860
In many cases it leads to high memory consumption without benefits.
Usually js files are cached by infra:

  • commonjs is caching scripts in require.cache
  • esm loaders are caching compiled and linked module
  • webpack jest are caching transformed modules

It's a rear case when some infra will read same file in one pnp runtime.
And benefits are not clear.
Especially with js zip impl, which reads file from filesystem without additional operations like loading full archive into memory. Which should be same speed as node_modules read from fs.
Exception for above is compressed archives. But again, I'm not sure any infra will read same file twice.

Safest solution is to hide caching under specific Impl class, to let them decide what is best strategy for caching.
Wdyt?

@goloveychuk
Copy link
Contributor Author

I can do it in a new PR

@arcanis
Copy link
Member

arcanis commented Mar 28, 2025

It's a rear case when some infra will read same file in one pnp runtime.
And benefits are not clear.

The libzip doesn't allow (or didn't allow, perhaps they fixed it?) to read files that just got written (ie writeFileSync(p); readFileSync(p); was crashing). The file caching was to workaround this issue. It's fine to get rid of it in the js backend 👍

@goloveychuk
Copy link
Contributor Author

@arcanis made changes in last commit

@goloveychuk
Copy link
Contributor Author

Anything to fix to merge pr? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants