Skip to content

Move the 'GitRepositoryProvider' to compare urls using their canonical representation, correctly accepting results where repositories only differ by '.git' #7741

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

francescomikulis
Copy link
Contributor

@francescomikulis francescomikulis commented Jul 1, 2024

Motivation:

Resolving local package dependencies often outputs a warning:
is not valid git repository for '', will fetch again.

Modifications:

The underlying issue is that updating packages with a local SCM path dependency always fail, as the shell-based repository URL has a file:// scheme, while the absolute path string does not.

Result:

After this change we only compare the canonical URLs, which will resolve the inconsistency of comparing strings without considering the scheme and / or path extension.

Validating the origin of checked-out repositories no longer ensures that the path extension (often .git) matches the repository's specifier, as different git clients inconsistently preserve the path extension of the remote. Moving to the CanonicalPackageURL also ensures that absolute paths are treated as urls with a file:// scheme, matching git's behavior.

…l representation, correctly accepting results where repositories only differ by '.git'.
@rauhul
Copy link
Member

rauhul commented Jul 1, 2024

Im gonna pre-flight what I think @MaxDesiatov will say: "Could you add some tests for this?"

@xedin
Copy link
Contributor

xedin commented Jul 1, 2024

Yeah, we definitely need tests for this before merging.

@neonichu
Copy link
Contributor

neonichu commented Jul 1, 2024

The motivation for local dependencies makes sense to me, but I don't think it's correct to ignore schemes and path extensions here, the whole purpose of the validation was to make sure that we have the "correct" repo. If we're now not doing that anymore, it might make more sense to remove the entire validation logic.

@bnbarham
Copy link
Contributor

bnbarham commented Jul 1, 2024

If we're now not doing that anymore, it might make more sense to remove the entire validation logic.

TBH I'd be fine with this - the repository cache seems like the correct layer to validate this. Otherwise we're just going to run into issues where the cache says the repositories match, but this validation disagrees).

EDIT: Though thinking about it more, this does check that the directory is at least a valid git repository + that the underlying origin maps (in our canonicalization sense). Which is better than not doing that at all.

return CanonicalPackageURL(remoteURL) == CanonicalPackageURL(url.absoluteString)
case .path(let absolutePath):
// Compare the canonical representation, which will drop any suffix
return CanonicalPackageLocation(remoteURL) == CanonicalPackageLocation(absolutePath.pathString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this matters in practice, but the caching logic always uses URL and it's probably better to do that here as well. That wouldn't fix the case mentioned in the description, but to fix that you could update computeCanonicalLocation to return a file scheme if one wasn't given (which seems somewhat reasonable given we're already inferring ssh if the user is git). Does that seem fine to you @neonichu?

@MaxDesiatov MaxDesiatov added needs tests This change needs test coverage source control Changes to SCM-related code labels Jul 2, 2024
@MaxDesiatov
Copy link
Contributor

@swift-ci test

…ider. Clarified the API surface and reduced the change to only ignore path extensions for local (path-based) dependencies.
…eintroduced the 'RepositorySpecifier' and moved to using 'CanonicalPackageURL', where paths now have a scheme of 'file'.
Comment on lines 35 to 39
public var url: String {
public var url: CanonicalPackageURL {
switch self.location {
case .path(let path): return path.pathString
case .url(let url): return url.absoluteString
case .path(let path): return CanonicalPackageURL(path.pathString)
case .url(let url): return CanonicalPackageURL(url.absoluteString)
Copy link
Contributor

@bnbarham bnbarham Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change 🤔?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this, it wasn't necessary.

@@ -89,74 +89,6 @@ private class MockRepository: Repository {
}
}

private class MockRepositories: RepositoryProvider {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was unused.

@bnbarham
Copy link
Contributor

@swift-ci please test

Copy link
Contributor

@MaxDesiatov MaxDesiatov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the change seems noteworthy enough to be included in CHANGELOG.md?

@robinkunde
Copy link

robinkunde commented Jul 16, 2024

I was preparing my own PRs for this issue, but this one is much further along. Great work @francescomikulis!
BTW: The reason I was looking at this issue is that we have a complex dependency graph with several large repos, and discarding the workspace repo and copying it from the cache was happening several times during dependency resolution. This should be a noticeable performance improvement for large projects.

One question I had was: Would it make sense to canonicalize the origin URL before populating the cache? As it stands now, whether the cached repo uses https://github.com/swiftlang/swift-package-manager.git or https://github.com/swiftlang/swift-package-manager for example depends on which version of the URL was encountered first in a dependency graph. This PR fixes the lookup logic, but I think the logic for populating the cache should be as predictable as possible too.

@MaxDesiatov
Copy link
Contributor

@swift-ci test

@MaxDesiatov
Copy link
Contributor

@swift-ci test windows

@MaxDesiatov MaxDesiatov removed the needs tests This change needs test coverage label Jul 19, 2024
@francescomikulis
Copy link
Contributor Author

I was preparing my own PRs for this issue, but this one is much further along. Great work @francescomikulis! BTW: The reason I was looking at this issue is that we have a complex dependency graph with several large repos, and discarding the workspace repo and copying it from the cache was happening several times during dependency resolution. This should be a noticeable performance improvement for large projects.

One question I had was: Would it make sense to canonicalize the origin URL before populating the cache? As it stands now, whether the cached repo uses https://github.com/swiftlang/swift-package-manager.git or https://github.com/swiftlang/swift-package-manager for example depends on which version of the URL was encountered first in a dependency graph. This PR fixes the lookup logic, but I think the logic for populating the cache should be as predictable as possible too.

I agree that we should try and ensure that the underlying cache is populated with expected values, matching the new validation logic in the GitRepositoryProvider.
From some research, an example of this is in the RepositoryManager, where it keeps around maps such as var pendingLookups = [RepositorySpecifier: DispatchGroup](). To fully move towards a more canonical model we'd probably have to change RepositorySpecifier to wrap a Location that was canonicalized.
I believe such a change is out-of-scope for fixing these cache / performance issues, but I agree that this could be an advantageous improvement.

…sitories that aren't yet checked out, even though the InMemoryFileSystem claims the directories exist.
@bnbarham
Copy link
Contributor

@swift-ci please test

@bnbarham
Copy link
Contributor

@swift-ci please test Windows platform

@francescomikulis francescomikulis merged commit bc9a9e3 into main Jul 22, 2024
5 checks passed
@francescomikulis francescomikulis deleted the repository-provider-compare-urls-with-canonical-representation branch July 22, 2024 22:21
francescomikulis added a commit that referenced this pull request Jul 22, 2024
…l representation, correctly accepting results where repositories only differ by '.git' (#7741)

Resolving local package dependencies often outputs a warning:
<path> is not valid git repository for '<repo>', will fetch again.

The underlying issue is that updating packages with a local SCM path
dependency always fail, as the shell-based repository URL has a file://
scheme, while the absolute path string does not.

After this change we only compare the canonical URLs, which will resolve
the inconsistency of comparing strings without considering the scheme
and / or path extension.

Validating the origin of checked-out repositories no longer ensures that
the path extension (often .git) matches the repository's specifier, as
different git clients inconsistently preserve the path extension of the
remote. Moving to the CanonicalPackageURL also ensures that absolute
paths are treated as urls with a file:// scheme, matching git's
behavior.
francescomikulis added a commit that referenced this pull request Jul 23, 2024
…l representation, correctly accepting results where repositories only differ by '.git' (#7741)

Resolving local package dependencies often outputs a warning:
<path> is not valid git repository for '<repo>', will fetch again.

The underlying issue is that updating packages with a local SCM path
dependency always fail, as the shell-based repository URL has a file://
scheme, while the absolute path string does not.

After this change we only compare the canonical URLs, which will resolve
the inconsistency of comparing strings without considering the scheme
and / or path extension.

Validating the origin of checked-out repositories no longer ensures that
the path extension (often .git) matches the repository's specifier, as
different git clients inconsistently preserve the path extension of the
remote. Moving to the CanonicalPackageURL also ensures that absolute
paths are treated as urls with a file:// scheme, matching git's
behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
add to change log source control Changes to SCM-related code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants