Skip to content

For git dependencies attempt to run git lfs pull to fetch binaries, if any. #2782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

jerel
Copy link

@jerel jerel commented Dec 1, 2020

Fixes #1433

As described in the linked issue the way that pub does caching of git dependencies breaks the automatic pulling of LFS managed binaries when git pull is ran. The bare repository step in ~/.pub-cache/git/cache causes the remote LFS origin to be missing when the working directory is checked out to ~/.pub-cache/git/<package-sha> so any binaries in the dependency are not fetched.

I'd welcome a conversation about whether this patch or one like it could be accepted or if there's another plan to successfully manage large pre-compiled binaries in a private dependency.

@google-cla
Copy link

google-cla bot commented Dec 1, 2020

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added the cla: no label Dec 1, 2020
@jerel
Copy link
Author

jerel commented Dec 2, 2020

@googlebot I signed it!

@google-cla google-cla bot added cla: yes and removed cla: no labels Dec 2, 2020
@jonasfj
Copy link
Member

jonasfj commented Dec 3, 2020

Given how repositories are cached, I can see an argument against encouraging large git dependencies.

@jerel
Copy link
Author

jerel commented Dec 3, 2020

Given how repositories are cached, I can see an argument against encouraging large git dependencies.

To make sure I understand... do you mean that (1) there's an argument to be made against git dependencies that pull in large binaries via LFS or (2) against git dependencies that have large binaries checked directly in to their git history?

@sigurdm
Copy link
Contributor

sigurdm commented Dec 7, 2020

I think he meant (2)!

pub has a bare checkout of the package in the cache, and for each ref it has another full clone.
With this we could save the big files in the bare checkout.

@jerel
Copy link
Author

jerel commented Dec 9, 2020

After reworking this a little the following scenarios are covered:

  • the _clone() function now uses git clone's --reference argument to specify the local filesystem cache. This prevents git-lfs from losing the remote and lets it find remote files during a clone. This should also allow configs like .lfsconfig to specify overrides for LFS without extra work on our part.
  • Running pub get without git-lfs installed does not error - to support the continued usage of git as it works today.
  • pub get without git-lfs, then brew install git-lfs, then pub get fetches the LFS files missed in the first dependency fetch.
  • pub get with a ref specified on the git dependency followed by a switch to a different ref will still result in LFS files being fetched from the updated ref.

@jerel
Copy link
Author

jerel commented Dec 17, 2020

This is now up to date with master and is ready for further review.

@sigurdm
Copy link
Contributor

sigurdm commented Dec 21, 2020

I wonder if there is a way to detect if a repo uses lfs.
Then we could write a proper warning if lfs is not installed but is needed, and we could avoid running it if not needed.

I guess it has to be detected from the .gitattributes file but exactly how to detect it I don't know (maybe we can go with a simple regex)...

@jerel wdyt?

@sigurdm
Copy link
Contributor

sigurdm commented Dec 21, 2020

This also needs tests.

@jerel
Copy link
Author

jerel commented Jan 5, 2021

@sigurdm that seems like a nice addition. How about this where we return early if there isn't a gitattributes file or if it exists but doesn't contain filter=lfs, merge=lfs, etc? Then we can error out if git lfs isn't an installed git command:

  Future _pullLFS(String repoPath) async {
+   var attributesPath = '$repoPath/.gitattributes';
+   // if there is no attributes file there won't be any LFS binaries
+   if (!fileExists(attributesPath)) return;
+   if (!readTextFile(attributesPath).toLowerCase().contains('=lfs')) return;

    try {
      // to avoid silent failure let's initialize git lfs hooks in the
      // cloned repository in case the user has not globally ran `git lfs install`
      await git.run(['lfs', 'install', '--local'], workingDir: repoPath);
    } on git.GitException catch (e) {
      if (e.message.contains('not a git command')) {
-       log.warn('git lfs not found, continuing');
+       log.error('git lfs not found');
        return;
      } else {
        rethrow;
      }
    }

    return git
        .run(['lfs', 'pull'], workingDir: repoPath).then((result) => null);
  }

@jerel
Copy link
Author

jerel commented Feb 3, 2021

@jonasfj @sigurdm how do you feel about the current state of this PR and its tests? If it looks good to you I'll tackle the Windows shell script for the failing test to finish it up.

@jonasfj
Copy link
Member

jonasfj commented Feb 5, 2021

I have to admit that I might have to read a beginners guide on git lfs, to understand if there is any negative implication.

But I can see how this can be useful, it would certainly make git dependencies more useful.

@sigurdm
Copy link
Contributor

sigurdm commented Feb 8, 2021

I belive with this line: https://github.com/dart-lang/pub/pull/2782/files#diff-1639c4669c428c26e68cfebd5039a33f87ba568795f2c058c303ca8528f62b77R580 the negative consequences will be minimal. This will only be invoked when the repo uses lfs.

Copy link
Member

@jonasfj jonasfj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I agree with @sigurdm this is unlikely to affect anyone not using git lfs, so what is the harm in shipping it :D

_writePackageList(revisionCachePath, [id.description['path']]);
} else {
await _pullLFS(revisionCachePath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? If entryExists(revisionCachePath) then can't we assume _pullLFS has already completed?

I agree that for existing clones this won't be the case. So if someone cloned a repository using LFS before LFS support was added to dart pub get then it won't work.

Solution for this problem could be to simply run dart pub cache repair.

Comment on lines -309 to 313
await _clone(_repoCachePath(ref), revisionCachePath);
await _clone(ref.description['url'], revisionCachePath,
reference: _repoCachePath(ref));
await _checkOut(revisionCachePath, id.description['resolved-ref']);
await _pullLFS(revisionCachePath);
_writePackageList(revisionCachePath, [id.description['path']]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider deleting the folder if one of these steps fails?

Before it wasn't really a problem, because _clone and _checkOut didn't do any network I/O, so odds that one of these commands was going to fail was probably very small.

But if _pullLFS fails, then what? How does the user recover? won't it leave the directory in place with a partially cloned LFS objects..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a folder deletion would be desirable because of the potential size of LFS objects. A likely scenario is:

  1. git clone succeeds and revisionCachePath is created with the lightweight contents of the repo
  2. git lfs pull starts on the heavy downloads and pulls down 1Gb of a 1.1Gb binary object
  3. The network breaks and an error is output to console
  4. The developer re-runs pub get
  5. The if checks the existence of the cloned repo and drops to the else clause
  6. Downloading resumes

@jonasfj
Copy link
Member

jonasfj commented Feb 12, 2021

how do you feel about the current state of this PR and its tests? If it looks good to you I'll tackle the Windows shell script for the failing test to finish it up.

@jerel, I think it's looking really good. Maybe we should test what happens if git lfs fails intermittently (like network failure), can the user recover by running dart pub get again? Or will that simply get the user semi-cloned repository.
I imagine we can just extend the FakeGit script to simulate a failure..

@nightscape
Copy link

I'm just running into this exact problem.
It seems that everybody agrees that this would be good to have and that the code is generally in good shape.
@jerel do you have capacity to carry this over the finish line?

@sizeak
Copy link

sizeak commented Nov 19, 2021

This would be very useful to me if it gets merged, otherwise I'm soon going to need to use a LFS enabled submodule that pub will ignore instead, which doesn't sound very fun 😢

@jerel
Copy link
Author

jerel commented Nov 19, 2021

I've kept thinking I would get back to this to write the Windows test (I believe that's the only thing outstanding) but as we're no longer using LFS for this at work and I don't have a Windows environment ready to go I haven't had the time to prioritize it. If somebody that regularly uses Windows would like to contribute the test that would be wonderful.

Comment on lines +63 to +75
const lfs = '''
lfs*)
echo "git: 'lfs' is not a git command. See 'git --help'."
exit 1
;;
''';

await d.dir('bin', [d.file('git', fakeGit(hash, lfs))]).create();
final binFolder = p.join(sandbox, 'bin');
// chmod the git script
if (!Platform.isWindows) {
await runProcess('chmod', ['+x', p.join(sandbox, 'bin', 'git')]);
}
Copy link
Member

@jonasfj jonasfj Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it not be wiser to make a set of tests that are skipped unless:

  • (A) git LFS is not installed, and
  • (B) we are not running in CI (env var CI=true.

Then change the github actions environment to install git lfs, so that the tests will pass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me clarify, I think these test cases where we mock LFS is a good idea, but I don't think they are sufficient.

I think we need tests where we have LFS installed and use actual LFS. That ideally also means running an LFS server on localhost.

@feinstein
Copy link

I've kept thinking I would get back to this to write the Windows test (I believe that's the only thing outstanding) but as we're no longer using LFS for this at work and I don't have a Windows environment ready to go I haven't had the time to prioritize it. If somebody that regularly uses Windows would like to contribute the test that would be wonderful.

What tests are missing? I might take a look at this in my spare time (weekends)

@sigurdm sigurdm mentioned this pull request Jan 18, 2022
@jonasfj
Copy link
Member

jonasfj commented Jan 18, 2022

What tests are missing? I might take a look at this in my spare time (weekends)

Ideally tests that actually use git LFS and not just mock it out, ideally with an minimal LFS server running on localhost.

Or an explanation why this is too complicated to do. If tests become extremely slow, or we need lots of dependencies, then maybe we don't want to do this. Or maybe we just want a script we can run locally for testing on occasion when a bug is reported.

var args = [
'clone',
if (reference != null) ['--reference', reference];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the only thing that is necessary?


If git-lfs is globally installed with git lfs install, then git clone just works.. but git clone path/to/canonical/cache path/to/revision/cache doesn't work... but if instead we do:

git clone --reference path/to/canonical/cache https://original-repository-url.com/repo path/to/revision/cache then git-lfs just works.

It's perhaps not the fastest thing, but if we do it this way we don't need any special handling for git lfs in pub (it just works). And if we don't need special handling for git lfs, then it's more reasonable to argue why we don't need test cases for git lfs.


Maybe, we can make a warning if we see an error message indicating a git lfs repository, but git lfs isn't installed globally.


It's also possible that it's much better to build special logic to install git lfs in the specific repository and do git lfs pull. I'm just saying I think the minimal changes might be easier to land.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerel thoughts? ^

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonasfj if I remember right the current approach of running a git lfs pull was the most dependable at fetching updates and/or version changes after the initial fetch attempt. But it would be worth doing some manual testing to see if modifying the regular git clone command to one that also supports LFS would work in all cases as that could be an easier implementation and lower maintenance going forward.

Test case 1: what happens if the pubspec is set to:

git:
  url: ../some-lib
  ref: feature-branch

where feature-branch contains LFS tracked blob files. The some-lib developer pushes an update to feature-branch and the consuming application wants to update to the latest commit. Does pub get successfully pull the latest commit and binary files?

Test case 2: using the pubspec above and a large binary file, break the network after the repo has been fetched but while the file is being downloaded. Does the binary get [re]fetched next time pub get is ran or does git clone exit 0 without retrying the binary download?


If the implementation needs a local LFS server there is one that's part of the LFS project (I haven't used it) https://github.com/git-lfs/lfs-test-server but it doesn't look like there's recent binaries built so it would probably involve building and uploading artifacts managed by this project.

@mit-mit
Copy link
Member

mit-mit commented Jul 18, 2022

Hi @jerel, is this still being worked on?

@CarGuo
Copy link

CarGuo commented Sep 9, 2022

Hi! Is there any new progress in this issue?

@jonasfj
Copy link
Member

jonasfj commented Sep 12, 2022

@CarGuo there is a discussion here #2782 (comment) about what is actually necessary.

This probably needs further investigation. If you or someone else is interested in working out the minimal changes to our git commands in order for LFS to work, that would be great.
We'd also need to find out how

  • network failures, retries, etc, work out...
  • what if cloning fails half-way through, or LFS cloning fails half-way through, then can the user recover?
  • how will changing the git commands we use affect users who have existing PUB_CACHE folders, and switch between old and new Dart SDKs work, will they have issues?
    • we need to find out how to gracefully avoid that or
    • ensure that they can gracefully recover.

@mosuem
Copy link
Member

mosuem commented Mar 12, 2024

Closing this as it is stale and needs more background work. Feel free to reopen in case you still want to land it!

@mosuem mosuem closed this Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support git-lfs files in the lib directory
9 participants