Skip to content

Prepare for change in package identity semantics #3032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 37 commits into from

Conversation

mattt
Copy link
Contributor

@mattt mattt commented Nov 7, 2020

To support package registries (#2967), we need to change the semantics around how packages are identified. Currently, they're based on a name computed from the package's effective URL, because the name declared in the package isn't known until the manifest has been parsed. We should instead use the declared URL of a package dependency as its identity.

After submitting #3023 and discussing with @neonichu & @tomerd, I think that we should tackle this first, independently of the package registry work. This PR lays the groundwork for making this change by introducing a new PackageIdentity type and updating existing calls to helper methods like PackageReference.computeIdentity to go through this new API. If everything works as intended, this shouldn't change any existing behavior.

From here, changing the semantics to use URL as a package identifier will entail the following:

  • Auditing calls to DependecyMirrors.effectiveURL to ensure that the original URL is used to establish identity, rather than any set mirrored URL
  • Auditing calls to the PackageIdentity.computedName property, which this PR uses to preserve the current behavior
  • Conditionalizing the new behavior on Swift tools version > 5.3
  • Looking at any newly-failing tests and determining whether the described behavior is correct, in which case we fix the implementation; otherwise, we change or remove the test. (I have specific concerns about testRootPackageOverrides and testPackageMirror).

@mattt

This comment has been minimized.

@SDGGiesbrecht
Copy link
Contributor

I’m curious what happens—and what is supposed to happen—when there is a diamond and the two pointers at the bottom node differ only in the presence or absence of .git at the end of the URL. I don’t know if there are any tests that cover it, but it is something that has worked in the past (albeit with pins permanently stale in some toolchains).

@mattt
Copy link
Contributor Author

mattt commented Nov 9, 2020

@SDGGiesbrecht As far as I can tell, that should behave exactly the same. In the current implementation of computeDefaultName(fromURL:), a package's computed name (which, when lowercased constitutes its identity) is the last path component of the package dependency URL, removing any .git suffix. With the new PackageIdentity type, a package's identity would be insensitive to the following differences:

  • case or normalization
  • .git suffix or none
  • https://, git://, or ssh:// (either implicit or explicit) scheme
  • user, password, port, fragment, or query URL components

@mattt

This comment has been minimized.

@mattt

This comment has been minimized.

1 similar comment
@tomerd
Copy link
Contributor

tomerd commented Nov 9, 2020

@swift-ci please smoke test linux


return String(lastComponent)
/// The identity of the package.
public var identity: String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to have this expose the "real" identity instead of hiding it. we can then add a function/var to return the computed name projection and for the call sites that absolutely must use that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan was to do that in the next step, when we do make the semantic change. That way, the API identity remains consistent with the actual behavior of the system throughout the process, and we avoid changing the interface and the implementation both at the same time.

But if you'd prefer to do this now, we could certainly rename this symbol and add a new identity property that returns a PackageIdentity value. All I'd need to know is what that would be called. Best I can come up with are oldIdentity and legacyIdentity, which both have a code smell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to carry PackageIdentity anywhere we have identity over String. @neonichu @abertelrud ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree — it would be good (especially to make sure we don't have to re-audit) to switch to a specific value type here rather than String, and it would be also be a good place to attach type documentation that summarizes the semantic intent of identity (not necessarily details about how the underlying string is formed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With c3fbf4e, we have an explicit typealias (PackageReference.Identity) and property (PackageIdentity.legacyIdentity). When it comes time change the semantics, we can audit for those APIs explicitly.

@neonichu
Copy link
Contributor

I think testRootPackageOverrides is absolute behaviour we have to retain. That'll be part of the stronger sense of identity we discussed offline. I'll be digging into how to implement it this week.

@neonichu
Copy link
Contributor

Conditionalizing the new behavior on Swift tools version > 5.3

I'm a little concerned about this idea, because I don't think two different senses of identity can compose well. Would this mean if my root package has a newer tools version, we generally use the new behaviour?

@neonichu neonichu self-assigned this Nov 10, 2020
@mattt
Copy link
Contributor Author

mattt commented Nov 10, 2020

Conditionalizing the new behavior on Swift tools version > 5.3

I'm a little concerned about this idea, because I don't think two different senses of identity can compose well. Would this mean if my root package has a newer tools version, we generally use the new behaviour?

Sorry, I misspoke. What I meant was that we conditionalize the behavior based on the version of Swift / Swift Package Manager. That is, Swift 5.3 and below use the current behavior, and the next versions adopt the new behavior.

@abertelrud
Copy link
Contributor

abertelrud commented Nov 12, 2020

@mattt this looking great, thanks for putting together. at a high level I would prefer to carry PackageIdentity anywhere we deal with package's identity over leaving String based identity around, but lets see what @neonichu @abertelrud think

Yes, agreed. I think that's necessary for the work on making package identity more sophisticated. That said, I don't have a strong preference on whether we address that as part of that work or as part of this PR.

I agree as well (noted earlier but repeated here). It doesn't have to be part of this PR but it would be ideal if it were.

@tomerd
Copy link
Contributor

tomerd commented Nov 12, 2020

@mattt this looking great, thanks for putting together. at a high level I would prefer to carry PackageIdentity anywhere we deal with package's identity over leaving String based identity around, but lets see what @neonichu @abertelrud think

Yes, agreed. I think that's necessary for the work on making package identity more sophisticated. That said, I don't have a strong preference on whether we address that as part of that work or as part of this PR.

I agree as well (noted earlier but repeated here). It doesn't have to be part of this PR but it would be ideal if it were.

@mattt wdyt? how much more complicated would it be to push it further?

return (identity, $0)
}
let packageMapByName: [String: ResolvedPackageBuilder] = packageBuilders.spm_createDictionary{ ($0.package.name, $0) }
let packageMapByIdentity: [String: ResolvedPackageBuilder] = Dictionary(uniqueKeysWithValues:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a mapping with PackageIdentity as key?

Copy link
Contributor Author

@mattt mattt Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, yes. But right now, we use the old / current identity semantics, which come from the computedNamelegacyIdentity property. Added a FIXME comment in 57babb4.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I am 110% for keeping the old semantics for now and making incremental changes in future PRs. We will almost certainly want to make the actual semantic changes be behind a feature flag initially. But my understanding was that PackageIdentity would still keep the same package identity as right now, even if it were spelled as PackageIdentity rather than String, and that the semantic changes would come later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes in c3fbf4e should help clarify the direction of this PR. PackageIdentity will be the new semantics for package identity, but it's also responsible for / capable of providing the old / current semantics (PackageReference.Identity, which is a typealias of String).

public let description: String
public let computedName: String

public init(_ string: String) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path part of the URL is often different between HTTP and SSH URLs for the same repository. Where does that normalization take place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you mean? Do you have any specific examples in mind?

(Also, I pushed some changes since you started reviewing; there's a lot more explanation about what's going on as of 844286b)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there's a public example to point to, but one instance of BitBucket Server that I know of has something that follows this pattern:

ssh://[email protected]/teamname/projectname.git
https://server.company.com/scm/teamname/projectname.git

I'm not sure if the extra "scm" part in the HTTP form is something that's configured on the server. My impression is that there isn't generally a guarantee that the path part of the URLs are the same between SSH and HTTP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abertelrud do you think there is a way to canonicalize the BitBucket example above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. @neonichu do you know what the deal is with this server, is this path component something that a repository server can just randomly configure to be different for SSH vs HTTPS URLs, or is there some kind of pattern? I looked around a bit for configuration information about that server, but haven't found anything yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abertelrud Thanks for sharing that example.

The normalization proposed here attempts to mitigate what are, effectively, cosmetic differences in URLs. There are going to be limitations to this approach, like the one you describe, where a host uses different conventions for HTTPS and SSH URLs (my understanding is that most hosts, including GitHub and GitLab, do a direct translation). We could tailor our algorithm to accommodate this case specifically (e.g. by only looking at authority + last 2 path components), but I'm hesitant to take such an approach for loss of generality.

If you wanted to reconcile this on the client-side, the most direct way would be to set a dependency mirror URL. Another approach is something @tomerd had mentioned before, where SPM could use the first commit of each repository as a "fingerprint" to identify the same repositories at different URLs (e.g. /MorningStar.git and /EveningStar.git)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we shouldn't add special cases here, and I've confirmed that this is configurable on the server, so there isn't really anything that could be added that would cover all cases. So I guess my main concern is that we will have cases that stop working because of this, and that it might not be apparent how to address it.

If we can fetch both repositories and compare the first-commit hashes then that might mitigate this, although it would still mean needing to have credentials for both repositories so they can be fetched (whereas today only the overriding repository needs credentials).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also won't deal with any Git remappings that might have been set up in git config, though I don't know how common that is in practice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this was discussed quite a bit a few years ago with the conclusion that it would break too many cases. But I don't have the context to know whether that was the right conclusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abertelrud Yeah, too many unknowns to know for sure what the actual impact will be. We can / should gate the new semantics behind a feature flag / command line option. This will help us identify edge cases without breaking anything in the short term.

@mattt
Copy link
Contributor Author

mattt commented Nov 12, 2020

@mattt this looking great, thanks for putting together. at a high level I would prefer to carry PackageIdentity anywhere we deal with package's identity over leaving String based identity around, but lets see what @neonichu @abertelrud think

Yes, agreed. I think that's necessary for the work on making package identity more sophisticated. That said, I don't have a strong preference on whether we address that as part of that work or as part of this PR.

I agree as well (noted earlier but repeated here). It doesn't have to be part of this PR but it would be ideal if it were.

@mattt wdyt? how much more complicated would it be to push it further?

I'm hesitant to do that for fear of introducing unintended functional differences. If the primary concern is losing fidelity / type hinting by downconversion to String, I think a better stop-gap measure would be to rewrite 75525f8 to, instead of removing the PackageReference.PackageIdentity typealias, renaming to something that doesn't shadow the new PackageIdentity type. With that, we'll have everything lined up to migrate to the new semantics in the next phase.

Edit: See c3fbf4e

Add legacyIdentity property

This commit partially reverts 75525f8
@mattt

This comment has been minimized.

public typealias PackageIdentity = String
// FIXME: Replace with `PackageIdentity` once we change identity semantics.
/// The identity of the package.
public typealias Identity = String
Copy link
Contributor

@tomerd tomerd Nov 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattt @neonichu @abertelrud (probably just me) but still confused why going from String to PackageIdentity should be gated on changing the identity semantics. My assumption about the goal of this PR was that we would change everywhere to use the new PackageIdentity type and never need to do any translations or drop down to String / legacy identity etc. Of course the actual underlying identity inside PackageIdentity would be a String based on the same logic of extracting it from a canonicalized URL, but no code other than PackageIdentity should "be aware" of the underlying IMO (or at least I hope). Then the next step would be to change how we compute the underlying (using something more unique than canonical URL) but structurally it would all works the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we all agree that we should change the interface and the implementation separately, rather than both at the same time. So it's a question of tactics as to whether to change the interface first or the implementation first.

My PR changes the implementation while keeping the the public API mostly intact. What I believe @tomerd is suggesting is changing the interface first. Both are valid approaches that make different trade-offs.

The motivation for taking the implementation-first approach in this PR has to do with bigger interface-level changes planned for when we switch over the semantics. For example, 960642b changes PackageReference from being a structure to an enumeration to better model the distinction between local and remote package dependencies. By starting with the implementation, I believe we'll minimize risk and the be able to get to where we want to go in two PRs. If we started with the interface, I think we'd have to split out the other changes into a third, separate PR, or risk biting off more than we can chew at once in just one.

@mattt

This comment has been minimized.

@mattt mattt force-pushed the package-identity branch 2 times, most recently from ef28905 to 8eb7467 Compare November 16, 2020 16:58
@mattt

This comment has been minimized.

@mattt
Copy link
Contributor Author

mattt commented Nov 16, 2020

@swift-ci please smoke test

@mattt
Copy link
Contributor Author

mattt commented Nov 16, 2020

Closing as per offline discussion among the reviewers. Will open a new PR taking a new approach using dependency injection to consolidate call sites usage to a single PackageIdentity type.

@mattt mattt closed this Nov 16, 2020
@mattt mattt deleted the package-identity branch November 16, 2020 22:49
@mattt mattt mentioned this pull request Nov 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants