-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Prepare for change in package identity semantics #3032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
I’m curious what happens—and what is supposed to happen—when there is a diamond and the two pointers at the bottom node differ only in the presence or absence of |
@SDGGiesbrecht As far as I can tell, that should behave exactly the same. In the current implementation of
|
3af259b
to
96598b5
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1 similar comment
@swift-ci please smoke test linux |
|
||
return String(lastComponent) | ||
/// The identity of the package. | ||
public var identity: String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to have this expose the "real" identity instead of hiding it. we can then add a function/var to return the computed name projection and for the call sites that absolutely must use that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My plan was to do that in the next step, when we do make the semantic change. That way, the API identity
remains consistent with the actual behavior of the system throughout the process, and we avoid changing the interface and the implementation both at the same time.
But if you'd prefer to do this now, we could certainly rename this symbol and add a new identity
property that returns a PackageIdentity
value. All I'd need to know is what that would be called. Best I can come up with are oldIdentity
and legacyIdentity
, which both have a code smell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to carry PackageIdentity anywhere we have identity over String. @neonichu @abertelrud ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree — it would be good (especially to make sure we don't have to re-audit) to switch to a specific value type here rather than String, and it would be also be a good place to attach type documentation that summarizes the semantic intent of identity (not necessarily details about how the underlying string is formed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With c3fbf4e, we have an explicit typealias (PackageReference.Identity
) and property (PackageIdentity.legacyIdentity
). When it comes time change the semantics, we can audit for those APIs explicitly.
I think |
I'm a little concerned about this idea, because I don't think two different senses of identity can compose well. Would this mean if my root package has a newer tools version, we generally use the new behaviour? |
Sorry, I misspoke. What I meant was that we conditionalize the behavior based on the version of Swift / Swift Package Manager. That is, Swift 5.3 and below use the current behavior, and the next versions adopt the new behavior. |
f586c7c
to
5981043
Compare
I agree as well (noted earlier but repeated here). It doesn't have to be part of this PR but it would be ideal if it were. |
@mattt wdyt? how much more complicated would it be to push it further? |
return (identity, $0) | ||
} | ||
let packageMapByName: [String: ResolvedPackageBuilder] = packageBuilders.spm_createDictionary{ ($0.package.name, $0) } | ||
let packageMapByIdentity: [String: ResolvedPackageBuilder] = Dictionary(uniqueKeysWithValues: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a mapping with PackageIdentity
as key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, yes. But right now, we use the old / current identity semantics, which come from the computedName
legacyIdentity
property. Added a FIXME
comment in 57babb4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I am 110% for keeping the old semantics for now and making incremental changes in future PRs. We will almost certainly want to make the actual semantic changes be behind a feature flag initially. But my understanding was that PackageIdentity
would still keep the same package identity as right now, even if it were spelled as PackageIdentity
rather than String
, and that the semantic changes would come later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the changes in c3fbf4e should help clarify the direction of this PR. PackageIdentity
will be the new semantics for package identity, but it's also responsible for / capable of providing the old / current semantics (PackageReference.Identity
, which is a typealias of String
).
public let description: String | ||
public let computedName: String | ||
|
||
public init(_ string: String) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path part of the URL is often different between HTTP and SSH URLs for the same repository. Where does that normalization take place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you mean? Do you have any specific examples in mind?
(Also, I pushed some changes since you started reviewing; there's a lot more explanation about what's going on as of 844286b)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if there's a public example to point to, but one instance of BitBucket Server that I know of has something that follows this pattern:
ssh://[email protected]/teamname/projectname.git
https://server.company.com/scm/teamname/projectname.git
I'm not sure if the extra "scm" part in the HTTP form is something that's configured on the server. My impression is that there isn't generally a guarantee that the path part of the URLs are the same between SSH and HTTP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abertelrud do you think there is a way to canonicalize the BitBucket example above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. @neonichu do you know what the deal is with this server, is this path component something that a repository server can just randomly configure to be different for SSH vs HTTPS URLs, or is there some kind of pattern? I looked around a bit for configuration information about that server, but haven't found anything yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abertelrud Thanks for sharing that example.
The normalization proposed here attempts to mitigate what are, effectively, cosmetic differences in URLs. There are going to be limitations to this approach, like the one you describe, where a host uses different conventions for HTTPS and SSH URLs (my understanding is that most hosts, including GitHub and GitLab, do a direct translation). We could tailor our algorithm to accommodate this case specifically (e.g. by only looking at authority + last 2 path components), but I'm hesitant to take such an approach for loss of generality.
If you wanted to reconcile this on the client-side, the most direct way would be to set a dependency mirror URL. Another approach is something @tomerd had mentioned before, where SPM could use the first commit of each repository as a "fingerprint" to identify the same repositories at different URLs (e.g. /MorningStar.git
and /EveningStar.git
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we shouldn't add special cases here, and I've confirmed that this is configurable on the server, so there isn't really anything that could be added that would cover all cases. So I guess my main concern is that we will have cases that stop working because of this, and that it might not be apparent how to address it.
If we can fetch both repositories and compare the first-commit hashes then that might mitigate this, although it would still mean needing to have credentials for both repositories so they can be fetched (whereas today only the overriding repository needs credentials).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also won't deal with any Git remappings that might have been set up in git config
, though I don't know how common that is in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this was discussed quite a bit a few years ago with the conclusion that it would break too many cases. But I don't have the context to know whether that was the right conclusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abertelrud Yeah, too many unknowns to know for sure what the actual impact will be. We can / should gate the new semantics behind a feature flag / command line option. This will help us identify edge cases without breaking anything in the short term.
I'm hesitant to do that for fear of introducing unintended functional differences. If the primary concern is losing fidelity / type hinting by downconversion to Edit: See c3fbf4e |
Add legacyIdentity property This commit partially reverts 75525f8
d294d49
to
c3fbf4e
Compare
This comment has been minimized.
This comment has been minimized.
public typealias PackageIdentity = String | ||
// FIXME: Replace with `PackageIdentity` once we change identity semantics. | ||
/// The identity of the package. | ||
public typealias Identity = String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattt @neonichu @abertelrud (probably just me) but still confused why going from String
to PackageIdentity
should be gated on changing the identity semantics. My assumption about the goal of this PR was that we would change everywhere to use the new PackageIdentity
type and never need to do any translations or drop down to String / legacy identity etc. Of course the actual underlying identity inside PackageIdentity
would be a String based on the same logic of extracting it from a canonicalized URL, but no code other than PackageIdentity
should "be aware" of the underlying IMO (or at least I hope). Then the next step would be to change how we compute the underlying (using something more unique than canonical URL) but structurally it would all works the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we all agree that we should change the interface and the implementation separately, rather than both at the same time. So it's a question of tactics as to whether to change the interface first or the implementation first.
My PR changes the implementation while keeping the the public API mostly intact. What I believe @tomerd is suggesting is changing the interface first. Both are valid approaches that make different trade-offs.
The motivation for taking the implementation-first approach in this PR has to do with bigger interface-level changes planned for when we switch over the semantics. For example, 960642b changes PackageReference
from being a structure to an enumeration to better model the distinction between local and remote package dependencies. By starting with the implementation, I believe we'll minimize risk and the be able to get to where we want to go in two PRs. If we started with the interface, I think we'd have to split out the other changes into a third, separate PR, or risk biting off more than we can chew at once in just one.
This prevents file paths from shadowing domain names (e.g. file:///github.com and https://github.com)
This comment has been minimized.
This comment has been minimized.
ef28905
to
8eb7467
Compare
This comment has been minimized.
This comment has been minimized.
@swift-ci please smoke test |
Closing as per offline discussion among the reviewers. Will open a new PR taking a new approach using dependency injection to consolidate call sites usage to a single |
To support package registries (#2967), we need to change the semantics around how packages are identified. Currently, they're based on a name computed from the package's effective URL, because the name declared in the package isn't known until the manifest has been parsed. We should instead use the declared URL of a package dependency as its identity.
After submitting #3023 and discussing with @neonichu & @tomerd, I think that we should tackle this first, independently of the package registry work. This PR lays the groundwork for making this change by introducing a new
PackageIdentity
type and updating existing calls to helper methods likePackageReference.computeIdentity
to go through this new API. If everything works as intended, this shouldn't change any existing behavior.From here, changing the semantics to use URL as a package identifier will entail the following:
DependecyMirrors.effectiveURL
to ensure that the original URL is used to establish identity, rather than any set mirrored URLPackageIdentity.computedName
property, which this PR uses to preserve the current behaviortestRootPackageOverrides
andtestPackageMirror
).