For git dependencies attempt to run `git lfs pull` to fetch binaries, if any. #2782

jerel · 2020-12-01T22:54:30Z

As described in the linked issue the way that pub does caching of git dependencies breaks the automatic pulling of LFS managed binaries when git pull is ran. The bare repository step in ~/.pub-cache/git/cache causes the remote LFS origin to be missing when the working directory is checked out to ~/.pub-cache/git/<package-sha> so any binaries in the dependency are not fetched.

I'd welcome a conversation about whether this patch or one like it could be accepted or if there's another plan to successfully manage large pre-compiled binaries in a private dependency.

… if any.

google-cla · 2020-12-01T22:54:34Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

jerel · 2020-12-02T17:36:44Z

@googlebot I signed it!

jonasfj · 2020-12-03T14:11:59Z

Given how repositories are cached, I can see an argument against encouraging large git dependencies.

lib/src/source/git.dart

jerel · 2020-12-03T18:24:08Z

Given how repositories are cached, I can see an argument against encouraging large git dependencies.

To make sure I understand... do you mean that (1) there's an argument to be made against git dependencies that pull in large binaries via LFS or (2) against git dependencies that have large binaries checked directly in to their git history?

sigurdm · 2020-12-07T13:36:47Z

I think he meant (2)!

pub has a bare checkout of the package in the cache, and for each ref it has another full clone.
With this we could save the big files in the bare checkout.

…ll` is needed

jerel · 2020-12-09T00:01:23Z

After reworking this a little the following scenarios are covered:

the _clone() function now uses git clone's --reference argument to specify the local filesystem cache. This prevents git-lfs from losing the remote and lets it find remote files during a clone. This should also allow configs like .lfsconfig to specify overrides for LFS without extra work on our part.
Running pub get without git-lfs installed does not error - to support the continued usage of git as it works today.
pub get without git-lfs, then brew install git-lfs, then pub get fetches the LFS files missed in the first dependency fetch.
pub get with a ref specified on the git dependency followed by a switch to a different ref will still result in LFS files being fetched from the updated ref.

jerel · 2020-12-17T22:04:26Z

This is now up to date with master and is ready for further review.

sigurdm · 2020-12-21T08:30:27Z

I wonder if there is a way to detect if a repo uses lfs.
Then we could write a proper warning if lfs is not installed but is needed, and we could avoid running it if not needed.

I guess it has to be detected from the .gitattributes file but exactly how to detect it I don't know (maybe we can go with a simple regex)...

@jerel wdyt?

sigurdm · 2020-12-21T09:42:07Z

This also needs tests.

jerel · 2021-01-05T23:43:17Z

@sigurdm that seems like a nice addition. How about this where we return early if there isn't a gitattributes file or if it exists but doesn't contain filter=lfs, merge=lfs, etc? Then we can error out if git lfs isn't an installed git command:

  Future _pullLFS(String repoPath) async {
+   var attributesPath = '$repoPath/.gitattributes';
+   // if there is no attributes file there won't be any LFS binaries
+   if (!fileExists(attributesPath)) return;
+   if (!readTextFile(attributesPath).toLowerCase().contains('=lfs')) return;

    try {
      // to avoid silent failure let's initialize git lfs hooks in the
      // cloned repository in case the user has not globally ran `git lfs install`
      await git.run(['lfs', 'install', '--local'], workingDir: repoPath);
    } on git.GitException catch (e) {
      if (e.message.contains('not a git command')) {
-       log.warn('git lfs not found, continuing');
+       log.error('git lfs not found');
        return;
      } else {
        rethrow;
      }
    }

    return git
        .run(['lfs', 'pull'], workingDir: repoPath).then((result) => null);
  }

lib/src/source/git.dart

…fs support

jerel · 2021-02-03T22:21:08Z

@jonasfj @sigurdm how do you feel about the current state of this PR and its tests? If it looks good to you I'll tackle the Windows shell script for the failing test to finish it up.

jonasfj · 2021-02-05T14:52:04Z

I have to admit that I might have to read a beginners guide on git lfs, to understand if there is any negative implication.

But I can see how this can be useful, it would certainly make git dependencies more useful.

sigurdm · 2021-02-08T08:34:17Z

I belive with this line: https://github.com/dart-lang/pub/pull/2782/files#diff-1639c4669c428c26e68cfebd5039a33f87ba568795f2c058c303ca8528f62b77R580 the negative consequences will be minimal. This will only be invoked when the repo uses lfs.

jonasfj

Okay, I agree with @sigurdm this is unlikely to affect anyone not using git lfs, so what is the harm in shipping it :D

jonasfj · 2021-02-12T13:52:00Z

lib/src/source/git.dart

          _writePackageList(revisionCachePath, [id.description['path']]);
        } else {
+          await _pullLFS(revisionCachePath);


Why do we need this? If entryExists(revisionCachePath) then can't we assume _pullLFS has already completed?

I agree that for existing clones this won't be the case. So if someone cloned a repository using LFS before LFS support was added to dart pub get then it won't work.

Solution for this problem could be to simply run dart pub cache repair.

jonasfj · 2021-02-12T13:56:20Z

lib/src/source/git.dart

-          await _clone(_repoCachePath(ref), revisionCachePath);
+          await _clone(ref.description['url'], revisionCachePath,
+              reference: _repoCachePath(ref));
          await _checkOut(revisionCachePath, id.description['resolved-ref']);
+          await _pullLFS(revisionCachePath);
          _writePackageList(revisionCachePath, [id.description['path']]);


Should we consider deleting the folder if one of these steps fails?

Before it wasn't really a problem, because _clone and _checkOut didn't do any network I/O, so odds that one of these commands was going to fail was probably very small.

But if _pullLFS fails, then what? How does the user recover? won't it leave the directory in place with a partially cloned LFS objects..

I don't think a folder deletion would be desirable because of the potential size of LFS objects. A likely scenario is:

git clone succeeds and revisionCachePath is created with the lightweight contents of the repo

git lfs pull starts on the heavy downloads and pulls down 1Gb of a 1.1Gb binary object

The network breaks and an error is output to console

The developer re-runs pub get

The if checks the existence of the cloned repo and drops to the else clause

Downloading resumes

jonasfj · 2021-02-12T14:02:32Z

how do you feel about the current state of this PR and its tests? If it looks good to you I'll tackle the Windows shell script for the failing test to finish it up.

@jerel, I think it's looking really good. Maybe we should test what happens if git lfs fails intermittently (like network failure), can the user recover by running dart pub get again? Or will that simply get the user semi-cloned repository.
I imagine we can just extend the FakeGit script to simulate a failure..

nightscape · 2021-08-08T09:14:50Z

I'm just running into this exact problem.
It seems that everybody agrees that this would be good to have and that the code is generally in good shape.
@jerel do you have capacity to carry this over the finish line?

sizeak · 2021-11-19T17:16:55Z

This would be very useful to me if it gets merged, otherwise I'm soon going to need to use a LFS enabled submodule that pub will ignore instead, which doesn't sound very fun 😢

jerel · 2021-11-19T17:26:38Z

I've kept thinking I would get back to this to write the Windows test (I believe that's the only thing outstanding) but as we're no longer using LFS for this at work and I don't have a Windows environment ready to go I haven't had the time to prioritize it. If somebody that regularly uses Windows would like to contribute the test that would be wonderful.

jonasfj · 2021-11-22T11:52:39Z

test/get/git/git_lfs_test.dart

+    const lfs = '''
+  lfs*)
+    echo "git: 'lfs' is not a git command. See 'git --help'."
+    exit 1
+    ;;
+  ''';
+
+    await d.dir('bin', [d.file('git', fakeGit(hash, lfs))]).create();
+    final binFolder = p.join(sandbox, 'bin');
+    // chmod the git script
+    if (!Platform.isWindows) {
+      await runProcess('chmod', ['+x', p.join(sandbox, 'bin', 'git')]);
+    }


Might it not be wiser to make a set of tests that are skipped unless:

(A) git LFS is not installed, and

(B) we are not running in CI (env var CI=true.

Then change the github actions environment to install git lfs, so that the tests will pass.

Let me clarify, I think these test cases where we mock LFS is a good idea, but I don't think they are sufficient.

I think we need tests where we have LFS installed and use actual LFS. That ideally also means running an LFS server on localhost.

feinstein · 2021-12-06T04:27:54Z

I've kept thinking I would get back to this to write the Windows test (I believe that's the only thing outstanding) but as we're no longer using LFS for this at work and I don't have a Windows environment ready to go I haven't had the time to prioritize it. If somebody that regularly uses Windows would like to contribute the test that would be wonderful.

What tests are missing? I might take a look at this in my spare time (weekends)

jonasfj · 2022-01-18T13:26:28Z

What tests are missing? I might take a look at this in my spare time (weekends)

Ideally tests that actually use git LFS and not just mock it out, ideally with an minimal LFS server running on localhost.

Or an explanation why this is too complicated to do. If tests become extremely slow, or we need lots of dependencies, then maybe we don't want to do this. Or maybe we just want a script we can run locally for testing on occasion when a bug is reported.

jonasfj · 2022-05-18T13:07:58Z

lib/src/source/git.dart

      var args = [
        'clone',
+        if (reference != null) ['--reference', reference];


Isn't this the only thing that is necessary?

If git-lfs is globally installed with git lfs install, then git clone just works.. but git clone path/to/canonical/cache path/to/revision/cache doesn't work... but if instead we do:

git clone --reference path/to/canonical/cache https://original-repository-url.com/repo path/to/revision/cache then git-lfs just works.

It's perhaps not the fastest thing, but if we do it this way we don't need any special handling for git lfs in pub (it just works). And if we don't need special handling for git lfs, then it's more reasonable to argue why we don't need test cases for git lfs.

Maybe, we can make a warning if we see an error message indicating a git lfs repository, but git lfs isn't installed globally.

It's also possible that it's much better to build special logic to install git lfs in the specific repository and do git lfs pull. I'm just saying I think the minimal changes might be easier to land.

@jerel thoughts? ^

@jonasfj if I remember right the current approach of running a git lfs pull was the most dependable at fetching updates and/or version changes after the initial fetch attempt. But it would be worth doing some manual testing to see if modifying the regular git clone command to one that also supports LFS would work in all cases as that could be an easier implementation and lower maintenance going forward.

Test case 1: what happens if the pubspec is set to:

git: url: ../some-lib ref: feature-branch

where feature-branch contains LFS tracked blob files. The some-lib developer pushes an update to feature-branch and the consuming application wants to update to the latest commit. Does pub get successfully pull the latest commit and binary files?

Test case 2: using the pubspec above and a large binary file, break the network after the repo has been fetched but while the file is being downloaded. Does the binary get [re]fetched next time pub get is ran or does git clone exit 0 without retrying the binary download?

If the implementation needs a local LFS server there is one that's part of the LFS project (I haven't used it) https://github.com/git-lfs/lfs-test-server but it doesn't look like there's recent binaries built so it would probably involve building and uploading artifacts managed by this project.

mit-mit · 2022-07-18T10:01:54Z

Hi @jerel, is this still being worked on?

CarGuo · 2022-09-09T01:25:33Z

Hi! Is there any new progress in this issue？

jonasfj · 2022-09-12T15:36:50Z

@CarGuo there is a discussion here #2782 (comment) about what is actually necessary.

This probably needs further investigation. If you or someone else is interested in working out the minimal changes to our git commands in order for LFS to work, that would be great.
We'd also need to find out how

network failures, retries, etc, work out...
what if cloning fails half-way through, or LFS cloning fails half-way through, then can the user recover?
how will changing the git commands we use affect users who have existing PUB_CACHE folders, and switch between old and new Dart SDKs work, will they have issues?
- we need to find out how to gracefully avoid that or
- ensure that they can gracefully recover.

mosuem · 2024-03-12T09:15:27Z

Closing this as it is stale and needs more background work. Feel free to reopen in case you still want to land it!

For git dependencies attempt to run git lfs pull to fetch binaries,…

17bafd8

… if any.

google-cla bot added the cla: no label Dec 1, 2020

google-cla bot added cla: yes and removed cla: no labels Dec 2, 2020

jonasfj reviewed Dec 3, 2020

View reviewed changes

lib/src/source/git.dart Outdated Show resolved Hide resolved

Make error handling precise and handle missed scenarios where `lfs pu…

4ed945f

…ll` is needed

Merge branch 'master' of https://github.com/dart-lang/pub into git-lfs

9f4719d

sigurdm reviewed Jan 7, 2021

View reviewed changes

lib/src/source/git.dart Outdated Show resolved Hide resolved

jerel added 3 commits January 20, 2021 19:00

Check for the existence of a .gitattributes file as an indicator of l…

4a0b970

…fs support

Add tests for the Git LFS functionality

5ad164b

Merge branch 'master' of https://github.com/dart-lang/pub into git-lfs

003907a

jonasfj reviewed Feb 12, 2021

View reviewed changes

jonasfj reviewed Nov 22, 2021

View reviewed changes

sigurdm mentioned this pull request Jan 18, 2022

Add LFS support #3289

Closed

Merge branch 'master' into git-lfs

8b03709

jonasfj reviewed May 18, 2022

View reviewed changes

makumaaku mentioned this pull request Jan 26, 2024

Support git-lfs files in the lib directory #1433

Open

mosuem closed this Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For git dependencies attempt to run `git lfs pull` to fetch binaries, if any. #2782

For git dependencies attempt to run `git lfs pull` to fetch binaries, if any. #2782

jerel commented Dec 1, 2020

google-cla bot commented Dec 1, 2020

jerel commented Dec 2, 2020

jonasfj commented Dec 3, 2020

jerel commented Dec 3, 2020

sigurdm commented Dec 7, 2020

jerel commented Dec 9, 2020

jerel commented Dec 17, 2020

sigurdm commented Dec 21, 2020

sigurdm commented Dec 21, 2020

jerel commented Jan 5, 2021 •

edited

Loading

jerel commented Feb 3, 2021

jonasfj commented Feb 5, 2021

sigurdm commented Feb 8, 2021

jonasfj left a comment

jonasfj Feb 12, 2021

jonasfj Feb 12, 2021

jerel Jul 22, 2022

jonasfj commented Feb 12, 2021

nightscape commented Aug 8, 2021

sizeak commented Nov 19, 2021

jerel commented Nov 19, 2021

jonasfj Nov 22, 2021 •

edited

Loading

jonasfj Jan 18, 2022

feinstein commented Dec 6, 2021

jonasfj commented Jan 18, 2022

jonasfj May 18, 2022

jonasfj Jun 16, 2022

jerel Jul 22, 2022

mit-mit commented Jul 18, 2022

CarGuo commented Sep 9, 2022

jonasfj commented Sep 12, 2022 •

edited by sigurdm

Loading

mosuem commented Mar 12, 2024

For git dependencies attempt to run git lfs pull to fetch binaries, if any. #2782

For git dependencies attempt to run git lfs pull to fetch binaries, if any. #2782

Conversation

jerel commented Dec 1, 2020

google-cla bot commented Dec 1, 2020

What to do if you already signed the CLA

Individual signers

Corporate signers

jerel commented Dec 2, 2020

jonasfj commented Dec 3, 2020

jerel commented Dec 3, 2020

sigurdm commented Dec 7, 2020

jerel commented Dec 9, 2020

jerel commented Dec 17, 2020

sigurdm commented Dec 21, 2020

sigurdm commented Dec 21, 2020

jerel commented Jan 5, 2021 • edited Loading

jerel commented Feb 3, 2021

jonasfj commented Feb 5, 2021

sigurdm commented Feb 8, 2021

jonasfj left a comment

Choose a reason for hiding this comment

jonasfj Feb 12, 2021

Choose a reason for hiding this comment

jonasfj Feb 12, 2021

Choose a reason for hiding this comment

jerel Jul 22, 2022

Choose a reason for hiding this comment

jonasfj commented Feb 12, 2021

nightscape commented Aug 8, 2021

sizeak commented Nov 19, 2021

jerel commented Nov 19, 2021

jonasfj Nov 22, 2021 • edited Loading

Choose a reason for hiding this comment

jonasfj Jan 18, 2022

Choose a reason for hiding this comment

feinstein commented Dec 6, 2021

jonasfj commented Jan 18, 2022

jonasfj May 18, 2022

Choose a reason for hiding this comment

jonasfj Jun 16, 2022

Choose a reason for hiding this comment

jerel Jul 22, 2022

Choose a reason for hiding this comment

mit-mit commented Jul 18, 2022

CarGuo commented Sep 9, 2022

jonasfj commented Sep 12, 2022 • edited by sigurdm Loading

mosuem commented Mar 12, 2024

For git dependencies attempt to run `git lfs pull` to fetch binaries, if any. #2782

For git dependencies attempt to run `git lfs pull` to fetch binaries, if any. #2782

jerel commented Jan 5, 2021 •

edited

Loading

jonasfj Nov 22, 2021 •

edited

Loading

jonasfj commented Sep 12, 2022 •

edited by sigurdm

Loading