You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix deduplication of virtual packages installed under aliases (#6735)
## What's the problem this PR addresses?
Fixes#6573
In order to explain *why* the bug happens, let's do a refresher on
virtual packages and deduplication, and *how* we end up here
### Virtuals
Let's say we have the following packages:
- `pkg-z` has a peer dependency on `pkg-p@*`
- `pkg-a` has a dependency on `pkg-z@^1.0.0` and `pkg-p@^1.0.0`
- `pkg-b` has a dependency on `pkg-z@^1.0.0` and `pkg-p@^2.0.0`
```
pkg-a --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 ==> pkg-p@*
--[ pkg-p@^1.0.0 ]--> pkg-p@npm:1.0.0
pkg-b --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 ==> pkg-p@*
--[ pkg-p@^2.0.0 ]--> pkg-p@npm:2.0.0
```
When we resolve the peer dependencies, we resolve which instance of
`pkg-p` we each `pkg-z` instance should get, then we add that instance
of `pkg-p` as a regular dependency to the `pkg-z` instance in our
internal package data
```
pkg-a --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 --[...]--> pkg-p@npm:1.0.0
pkg-b --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 --[...]--> pkg-p@npm:2.0.0
```
This is a problem because we end up with the same package having two
different sets of dependencies. To avoid that, we create **virtual
packages** for each instance of peer requesters (i.e. packages with peer
dependencies) in the tree. We also replace descriptors that resolve to
those packages with **virtual descriptors**.
```
pkg-a --[ pkg-z@virtual:<id-a> ]--> pkg-z@virtual:<id-a> --[...]--> pkg-p@npm:1.0.0
pkg-b --[ pkg-z@virtual:<id-b> ]--> pkg-z@virtual:<id-b> --[...]--> pkg-p@npm:2.0.0
```
The two instance of `pkg-z` that have different dependency sets are now
literally two different packages in our internals. Note that the virtual
id is based on the parent package and the pre-virtualization package.
### Deduplication
However, just virtualizing every peer requester naively can lead to some
problems. Take this example:
```
pkg-a --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 ==> pkg-p@*
--[ pkg-p@^1.0.0 ]--> pkg-p@npm:1.0.0
pkg-b --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 ==> pkg-p@*
--[ pkg-p@^1.0.0 ]--> pkg-p@npm:1.0.0
```
```
pkg-a --[ pkg-z@virtual:<id-a> ]--> pkg-z@virtual:<id-a> --[...]-> pkg-p@npm:1.0.0
pkg-b --[ pkg-z@virtual:<id-b> ]--> pkg-z@virtual:<id-b> --[...]-> pkg-p@npm:1.0.0
```
We have the reverse "problem" -- we end up with two different virtual
`pkg-z` with the same dependency sets. That's not incorrect *per se*,
but it'd be nice if we can use the same package instance for both.
That's why we also have a deduplication process where we find virtual
packages that are the virtualizations of the same package and have the
same dependency sets, and replace all of their virtual descriptors and
packages with a single "master" descriptor/package among them
```
pkg-a --[ pkg-z@virtual:<id-a> ]--> pkg-z@virtual:<id-a> --[...]-> pkg-p@npm:1.0.0
pkg-b --[ pkg-z@virtual:<id-a> ]--> pkg-z@virtual:<id-a> --[...]-> pkg-p@npm:1.0.0
```
Now we go back to having one `pkg-z` instance, and we can remove the
discarded virtual descriptor/package (`pkg-z@virtual:<id-b>`) from our
internals.
### Bug with aliases
This deduplication, however, in turn leads to a bug (#1352) in some edge
cases. (In this and subsequent examples, I'll omit `pkg-p`. Assume that
all instances of `pkg-z` has their peer requests satisfied the same way)
```json
{
"name": "pkg-a",
"dependencies": {
"pkg-x": "npm:pkg-z@^1.0.0",
"pkg-y": "npm:pkg-z@^1.0.0"
}
}
```
```
pkg-a --[ pkg-x@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0
--[ pkg-y@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0
```
```
pkg-a --[ pkg-x@virtual:<id-a> ]--> pkg-z@virtual:<id-a>
--[ pkg-y@virtual:<id-a> ]--> pkg-z@virtual:<id-a>
```
We have used aliases to make two different virtual descriptors resolve
to the same virtual package. When deduping, we will dedupe one of the
virtual descriptors to the other descriptor, and **discard the package
it resolves to**. Now we end up with zero instances of `pkg-z`...
#1726 fixes that, by having the deduplication key include the descriptor
ident (`pkg-x` vs `pkg-y`), so those two are not deduped against each
other.
This is where we are currently at.
### More edge cases !!!
This fix, however, once again causes more edge cases.
In one edge case, it will cause virtual packages to not dedupe when they
should
```
pkg-a --[ pkg-x@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0
pkg-b --[ pkg-y@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0
```
```
pkg-a --[ pkg-x@virtual:<id-a> ]--> pkg-z@virtual:<id-a>
pkg-b --[ pkg-y@virtual:<id-b> ]--> pkg-z@virtual:<id-b>
```
#1726 fixes specifically for two aliased dependencies of the same
package. What if we have two packages each using an alias? We don't
dedupe those because the virtual descriptors have different idents, and
end up with two virtual instances because the parent package is part of
the virtual id calculation
----
That case also not a problem *per se*, just inefficient. However, using
that we can construct a situation where the same package can either
dedupe or not dedupe depending on which descriptor is being checked.
```
pkg-a --[ pkg-x@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0
pkg-b --[ pkg-x@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0
--[ pkg-y@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0
```
```
pkg-a --[ pkg-x@virtual:<id-a> ]--> pkg-z@virtual:<id-a>
pkg-b --[ pkg-x@virtual:<id-b> ]--> pkg-z@virtual:<id-b>
--[ pkg-y@virtual:<id-b> ]--> pkg-z@virtual:<id-b>
```
Here, the second descriptor `pkg-x@virtual:<id-b>` will be deduped into
the first: they have the same ident and resolve to the same "physical"
package. Once again we will **discard `pkg-z@virtual:<id-b>` in the
process**.
However, the third descriptor `pkg-x@virtual:<id-b>` is not deduped,
leaving a dangling reference to the discarded package.
This is the case reported in #6573
## How did you fix it?
At its root, #1352 is caused by the same virtual package being checked
for dedupe twice against two diffent virtual descriptors that resolve to
it. What if we just... not do that? **Instead of deduping virtual
descriptors, just dedupe the virtual packages.**
We would have virtual descriptors resolving to virtual packages with a
different virtual id, but that would not cause problems. In fact, the
entire point of resolution is to link descriptors and locators.
This ends up simplifying the implementation a lot because we don't need
to keep track of which packages have which virtual descriptors as
dependencies. We just need a reverse resolution map from virtual
locators to virtual descriptors. Deduping is as simple as updating the
project resolution map.
The only downside is we have a few more descriptors in the internals,
but that's just a very tiny cost compared to the benefits.
## Checklist
<!--- Don't worry if you miss something, chores are automatically
tested. -->
<!--- This checklist exists to help you remember doing the chores when
you submit a PR. -->
<!--- Put an `x` in all the boxes that apply. -->
- [x] I have read the [Contributing
Guide](https://yarnpkg.com/advanced/contributing).
<!-- See
https://yarnpkg.com/advanced/contributing#preparing-your-pr-to-be-released
for more details. -->
<!-- Check with `yarn version check` and fix with `yarn version check
-i` -->
- [x] I have set the packages that need to be released for my changes to
be effective.
<!-- The "Testing chores" workflow validates that your PR follows our
guidelines. -->
<!-- If it doesn't pass, click on it to see details as to what your PR
might be missing. -->
- [x] I will check that all automated PR checks pass before the PR gets
reviewed.
thrownewError(`Assertion failed: Expected the resolution for ${structUtils.prettyDescriptor(project.configuration,descriptor)} to have been registered`);
thrownewError(`Assertion failed: Expected the resolution for ${structUtils.prettyDescriptor(project.configuration,descriptor)} to have been registered`);
0 commit comments