Skip to content

For git dependencies attempt to run git lfs pull to fetch binaries, if any. #2782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
48 changes: 39 additions & 9 deletions lib/src/source/git.dart
Original file line number Diff line number Diff line change
Expand Up @@ -350,10 +350,13 @@ class BoundGitSource extends CachedSource {
var revisionCachePath = _revisionCachePath(id);
await _revisionCacheClones.putIfAbsent(revisionCachePath, () async {
if (!entryExists(revisionCachePath)) {
await _clone(_repoCachePath(ref), revisionCachePath);
await _clone(ref.description['url'], revisionCachePath,
reference: _repoCachePath(ref));
await _checkOut(revisionCachePath, id.description['resolved-ref']);
await _pullLFS(revisionCachePath);
_writePackageList(revisionCachePath, [id.description['path']]);
Comment on lines -353 to 357
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider deleting the folder if one of these steps fails?

Before it wasn't really a problem, because _clone and _checkOut didn't do any network I/O, so odds that one of these commands was going to fail was probably very small.

But if _pullLFS fails, then what? How does the user recover? won't it leave the directory in place with a partially cloned LFS objects..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a folder deletion would be desirable because of the potential size of LFS objects. A likely scenario is:

  1. git clone succeeds and revisionCachePath is created with the lightweight contents of the repo
  2. git lfs pull starts on the heavy downloads and pulls down 1Gb of a 1.1Gb binary object
  3. The network breaks and an error is output to console
  4. The developer re-runs pub get
  5. The if checks the existence of the cloned repo and drops to the else clause
  6. Downloading resumes

} else {
await _pullLFS(revisionCachePath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? If entryExists(revisionCachePath) then can't we assume _pullLFS has already completed?

I agree that for existing clones this won't be the case. So if someone cloned a repository using LFS before LFS support was added to dart pub get then it won't work.

Solution for this problem could be to simply run dart pub cache repair.

_updatePackageList(revisionCachePath, id.description['path']);
}
});
Expand Down Expand Up @@ -584,24 +587,26 @@ class BoundGitSource extends CachedSource {
///
/// If [shallow] is true, creates a shallow clone that contains no history
/// for the repository.
Future _clone(
String from,
String to, {
bool mirror = false,
bool shallow = false,
}) {
///
/// If [reference] is not null, we expect [from] to be a remote repository
/// and [reference] to be the URI of a local filesystem repository.
/// This allows our clone to save network and filesystem resources while
/// still allowing Git LFS to fetch any binary files from the remote.
Future _clone(String from, String to,
{bool mirror = false, bool shallow = false, String reference}) {
return Future.sync(() {
// Git on Windows does not seem to automatically create the destination
// directory.
ensureDir(to);

var args = [
'clone',
if (reference != null) ['--reference', reference];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the only thing that is necessary?


If git-lfs is globally installed with git lfs install, then git clone just works.. but git clone path/to/canonical/cache path/to/revision/cache doesn't work... but if instead we do:

git clone --reference path/to/canonical/cache https://original-repository-url.com/repo path/to/revision/cache then git-lfs just works.

It's perhaps not the fastest thing, but if we do it this way we don't need any special handling for git lfs in pub (it just works). And if we don't need special handling for git lfs, then it's more reasonable to argue why we don't need test cases for git lfs.


Maybe, we can make a warning if we see an error message indicating a git lfs repository, but git lfs isn't installed globally.


It's also possible that it's much better to build special logic to install git lfs in the specific repository and do git lfs pull. I'm just saying I think the minimal changes might be easier to land.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerel thoughts? ^

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonasfj if I remember right the current approach of running a git lfs pull was the most dependable at fetching updates and/or version changes after the initial fetch attempt. But it would be worth doing some manual testing to see if modifying the regular git clone command to one that also supports LFS would work in all cases as that could be an easier implementation and lower maintenance going forward.

Test case 1: what happens if the pubspec is set to:

git:
  url: ../some-lib
  ref: feature-branch

where feature-branch contains LFS tracked blob files. The some-lib developer pushes an update to feature-branch and the consuming application wants to update to the latest commit. Does pub get successfully pull the latest commit and binary files?

Test case 2: using the pubspec above and a large binary file, break the network after the repo has been fetched but while the file is being downloaded. Does the binary get [re]fetched next time pub get is ran or does git clone exit 0 without retrying the binary download?


If the implementation needs a local LFS server there is one that's part of the LFS project (I haven't used it) https://github.com/git-lfs/lfs-test-server but it doesn't look like there's recent binaries built so it would probably involve building and uploading artifacts managed by this project.

if (mirror) '--mirror',
if (shallow) ...['--depth', '1'],
from,
to
to,
];

return git.run(args);
}).then((result) => null);
}
Expand All @@ -612,6 +617,31 @@ class BoundGitSource extends CachedSource {
.run(['checkout', ref], workingDir: repoPath).then((result) => null);
}

/// Checks out any LFS binaries at the current ref to [repoPath].
/// Ignore errors if `git lfs` is not installed.
Future<void> _pullLFS(String repoPath) async {
var attributesPath = '$repoPath/.gitattributes';
// if there is no attributes file there won't be any LFS binaries
if (!fileExists(attributesPath)) return;
if (!readTextFile(attributesPath).toLowerCase().contains('=lfs')) return;

try {
// to avoid silent failure let's initialize git lfs hooks in the
// cloned repository in case the user has not globally ran `git lfs install`
await git.run(['lfs', 'install', '--local'], workingDir: repoPath);
} on git.GitException catch (e) {
if (e.message.contains('not a git command')) {
log.error('git lfs not found');
return;
} else {
rethrow;
}
}

return git
.run(['lfs', 'pull'], workingDir: repoPath).then((result) => null);
}

String _revisionCachePath(PackageId id) => p.join(
systemCacheRoot, "${_repoName(id)}-${id.description['resolved-ref']}");

Expand Down
153 changes: 153 additions & 0 deletions test/get/git/git_lfs_test.dart
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
// Copyright (c) 2012, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.

import 'dart:io';

@TestOn('linux')
import 'package:test/test.dart';
import 'package:test_descriptor/test_descriptor.dart' show sandbox;
import 'package:path/path.dart' as p;
import 'package:pub/src/exit_codes.dart' as exit_codes;
import 'package:pub/src/io.dart' show runProcess;

import '../../descriptor.dart' as d;
import '../../test_pub.dart';

const pubspec = 'name: foo\r\ndescription: test\r\nversion: 3.0.0';
const gitattributes = 'foo.dylib filter=lfs diff=lfs merge=lfs -text';

String fakeGit(hash, lfs) {
return '''
#!/bin/bash -e
case \$@ in
--version)
echo 'git version 2.16'
;;
rev-list*)
echo '$hash'
;;
show*)
echo '$pubspec'
;;
checkout*)
mkdir -p $sandbox/cache/git/foo-$hash/.git
echo '$pubspec' > $sandbox/cache/git/foo-$hash/pubspec.yaml
echo '$gitattributes' > $sandbox/cache/git/foo-$hash/.gitattributes
;;
$lfs
esac
''';
}

void main() {
test('reports failure if Git LFS is needed but is not installed', () async {
ensureGit();

final repo = d.git('foo.git', [
d.libDir('foo'),
d.file('pubspec.yaml', pubspec),
d.file('.gitattributes', gitattributes)
]);

await repo.create();
final ref = await repo.runGit(['rev-list', '--max-count=1', 'HEAD']);
final hash = ref.first;

await d.appDir({
'foo': {
'git': {'url': '../foo.git'}
}
}).create();

const lfs = '''
lfs*)
echo "git: 'lfs' is not a git command. See 'git --help'."
exit 1
;;
''';

await d.dir('bin', [d.file('git', fakeGit(hash, lfs))]).create();
final binFolder = p.join(sandbox, 'bin');
// chmod the git script
if (!Platform.isWindows) {
await runProcess('chmod', ['+x', p.join(sandbox, 'bin', 'git')]);
}
Comment on lines +63 to +75
Copy link
Member

@jonasfj jonasfj Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it not be wiser to make a set of tests that are skipped unless:

  • (A) git LFS is not installed, and
  • (B) we are not running in CI (env var CI=true.

Then change the github actions environment to install git lfs, so that the tests will pass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me clarify, I think these test cases where we mock LFS is a good idea, but I don't think they are sufficient.

I think we need tests where we have LFS installed and use actual LFS. That ideally also means running an LFS server on localhost.


final separator = Platform.isWindows ? ';' : ':';

await runPub(
args: ['get'],
error: contains('git lfs not found'),
environment: {
// Override 'PATH' to ensure that we are using our fake git executable
'PATH': '$binFolder$separator${Platform.environment['PATH']}'
});
});

test('reports success if Git proceeds as usual due to no .gitattributes',
() async {
ensureGit();

await d.git('foo.git', [
d.libDir('foo'),
d.libPubspec('foo', '3.0.0'),
]).create();

await d.appDir({
'foo': {
'git': {'url': '../foo.git'}
}
}).create();

await runPub(args: ['get'], output: contains('Changed 1 dependency!'));
});

test('if Git LFS is installed then `git lfs pull` is executed', () async {
ensureGit();

final repo = d.git('foo.git', [
d.libDir('foo'),
d.file('pubspec.yaml', pubspec),
d.file('.gitattributes', gitattributes)
]);

await repo.create();
final ref = await repo.runGit(['rev-list', '--max-count=1', 'HEAD']);
final hash = ref.first;

await d.appDir({
'foo': {
'git': {'url': '../foo.git'}
}
}).create();

const lfs = '''
'lfs install'*)
echo 'git lfs install'
;;
'lfs pull')
echo 'git lfs pull was reached'
exit 1
;;
''';

await d.dir('bin', [d.file('git', fakeGit(hash, lfs))]).create();
final binFolder = p.join(sandbox, 'bin');
// chmod the git script
if (!Platform.isWindows) {
await runProcess('chmod', ['+x', p.join(sandbox, 'bin', 'git')]);
}

final separator = Platform.isWindows ? ';' : ':';

await runPub(
args: ['get'],
error: contains('git lfs pull was reached'),
environment: {
// Override 'PATH' to ensure that we are using our fake git executable
'PATH': '$binFolder$separator${Platform.environment['PATH']}'
},
exitCode: exit_codes.UNAVAILABLE);
});
}