-
Notifications
You must be signed in to change notification settings - Fork 231
For git dependencies attempt to run git lfs pull
to fetch binaries, if any.
#2782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
@googlebot I signed it! |
Given how repositories are cached, I can see an argument against encouraging large git dependencies. |
To make sure I understand... do you mean that (1) there's an argument to be made against git dependencies that pull in large binaries via LFS or (2) against git dependencies that have large binaries checked directly in to their git history? |
I think he meant (2)! pub has a bare checkout of the package in the cache, and for each ref it has another full clone. |
After reworking this a little the following scenarios are covered:
|
This is now up to date with |
I wonder if there is a way to detect if a repo uses lfs. I guess it has to be detected from the @jerel wdyt? |
This also needs tests. |
@sigurdm that seems like a nice addition. How about this where we return early if there isn't a gitattributes file or if it exists but doesn't contain Future _pullLFS(String repoPath) async {
+ var attributesPath = '$repoPath/.gitattributes';
+ // if there is no attributes file there won't be any LFS binaries
+ if (!fileExists(attributesPath)) return;
+ if (!readTextFile(attributesPath).toLowerCase().contains('=lfs')) return;
try {
// to avoid silent failure let's initialize git lfs hooks in the
// cloned repository in case the user has not globally ran `git lfs install`
await git.run(['lfs', 'install', '--local'], workingDir: repoPath);
} on git.GitException catch (e) {
if (e.message.contains('not a git command')) {
- log.warn('git lfs not found, continuing');
+ log.error('git lfs not found');
return;
} else {
rethrow;
}
}
return git
.run(['lfs', 'pull'], workingDir: repoPath).then((result) => null);
} |
I have to admit that I might have to read a beginners guide on But I can see how this can be useful, it would certainly make git dependencies more useful. |
I belive with this line: https://github.com/dart-lang/pub/pull/2782/files#diff-1639c4669c428c26e68cfebd5039a33f87ba568795f2c058c303ca8528f62b77R580 the negative consequences will be minimal. This will only be invoked when the repo uses lfs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I agree with @sigurdm this is unlikely to affect anyone not using git lfs
, so what is the harm in shipping it :D
_writePackageList(revisionCachePath, [id.description['path']]); | ||
} else { | ||
await _pullLFS(revisionCachePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? If entryExists(revisionCachePath)
then can't we assume _pullLFS
has already completed?
I agree that for existing clones this won't be the case. So if someone cloned a repository using LFS before LFS support was added to dart pub get
then it won't work.
Solution for this problem could be to simply run dart pub cache repair
.
await _clone(_repoCachePath(ref), revisionCachePath); | ||
await _clone(ref.description['url'], revisionCachePath, | ||
reference: _repoCachePath(ref)); | ||
await _checkOut(revisionCachePath, id.description['resolved-ref']); | ||
await _pullLFS(revisionCachePath); | ||
_writePackageList(revisionCachePath, [id.description['path']]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider deleting the folder if one of these steps fails?
Before it wasn't really a problem, because _clone
and _checkOut
didn't do any network I/O, so odds that one of these commands was going to fail was probably very small.
But if _pullLFS
fails, then what? How does the user recover? won't it leave the directory in place with a partially cloned LFS objects..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a folder deletion would be desirable because of the potential size of LFS objects. A likely scenario is:
git clone
succeeds andrevisionCachePath
is created with the lightweight contents of the repogit lfs pull
starts on the heavy downloads and pulls down 1Gb of a 1.1Gb binary object- The network breaks and an error is output to console
- The developer re-runs
pub get
- The
if
checks the existence of the cloned repo and drops to theelse
clause - Downloading resumes
@jerel, I think it's looking really good. Maybe we should test what happens if |
I'm just running into this exact problem. |
This would be very useful to me if it gets merged, otherwise I'm soon going to need to use a LFS enabled submodule that |
I've kept thinking I would get back to this to write the Windows test (I believe that's the only thing outstanding) but as we're no longer using LFS for this at work and I don't have a Windows environment ready to go I haven't had the time to prioritize it. If somebody that regularly uses Windows would like to contribute the test that would be wonderful. |
const lfs = ''' | ||
lfs*) | ||
echo "git: 'lfs' is not a git command. See 'git --help'." | ||
exit 1 | ||
;; | ||
'''; | ||
|
||
await d.dir('bin', [d.file('git', fakeGit(hash, lfs))]).create(); | ||
final binFolder = p.join(sandbox, 'bin'); | ||
// chmod the git script | ||
if (!Platform.isWindows) { | ||
await runProcess('chmod', ['+x', p.join(sandbox, 'bin', 'git')]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might it not be wiser to make a set of tests that are skipped unless:
- (A) git LFS is not installed, and
- (B) we are not running in CI (env var
CI=true
.
Then change the github actions environment to install git lfs, so that the tests will pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me clarify, I think these test cases where we mock LFS is a good idea, but I don't think they are sufficient.
I think we need tests where we have LFS installed and use actual LFS. That ideally also means running an LFS server on localhost.
What tests are missing? I might take a look at this in my spare time (weekends) |
Ideally tests that actually use git LFS and not just mock it out, ideally with an minimal LFS server running on localhost. Or an explanation why this is too complicated to do. If tests become extremely slow, or we need lots of dependencies, then maybe we don't want to do this. Or maybe we just want a script we can run locally for testing on occasion when a bug is reported. |
var args = [ | ||
'clone', | ||
if (reference != null) ['--reference', reference]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the only thing that is necessary?
If git-lfs
is globally installed with git lfs install
, then git clone
just works.. but git clone path/to/canonical/cache path/to/revision/cache
doesn't work... but if instead we do:
git clone --reference path/to/canonical/cache https://original-repository-url.com/repo path/to/revision/cache
then git-lfs just works.
It's perhaps not the fastest thing, but if we do it this way we don't need any special handling for git lfs
in pub
(it just works). And if we don't need special handling for git lfs
, then it's more reasonable to argue why we don't need test cases for git lfs
.
Maybe, we can make a warning if we see an error message indicating a git lfs
repository, but git lfs
isn't installed globally.
It's also possible that it's much better to build special logic to install git lfs in the specific repository and do git lfs pull
. I'm just saying I think the minimal changes might be easier to land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerel thoughts? ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonasfj if I remember right the current approach of running a git lfs pull
was the most dependable at fetching updates and/or version changes after the initial fetch attempt. But it would be worth doing some manual testing to see if modifying the regular git clone
command to one that also supports LFS would work in all cases as that could be an easier implementation and lower maintenance going forward.
Test case 1: what happens if the pubspec is set to:
git:
url: ../some-lib
ref: feature-branch
where feature-branch
contains LFS tracked blob files. The some-lib
developer pushes an update to feature-branch
and the consuming application wants to update to the latest commit. Does pub get
successfully pull the latest commit and binary files?
Test case 2: using the pubspec above and a large binary file, break the network after the repo has been fetched but while the file is being downloaded. Does the binary get [re]fetched next time pub get
is ran or does git clone
exit 0
without retrying the binary download?
If the implementation needs a local LFS server there is one that's part of the LFS project (I haven't used it) https://github.com/git-lfs/lfs-test-server but it doesn't look like there's recent binaries built so it would probably involve building and uploading artifacts managed by this project.
Hi @jerel, is this still being worked on? |
Hi! Is there any new progress in this issue? |
@CarGuo there is a discussion here #2782 (comment) about what is actually necessary. This probably needs further investigation. If you or someone else is interested in working out the minimal changes to our git commands in order for LFS to work, that would be great.
|
Closing this as it is stale and needs more background work. Feel free to reopen in case you still want to land it! |
Fixes #1433
As described in the linked issue the way that
pub
does caching of git dependencies breaks the automatic pulling of LFS managed binaries whengit pull
is ran. The bare repository step in~/.pub-cache/git/cache
causes the remote LFS origin to be missing when the working directory is checked out to~/.pub-cache/git/<package-sha>
so any binaries in the dependency are not fetched.I'd welcome a conversation about whether this patch or one like it could be accepted or if there's another plan to successfully manage large pre-compiled binaries in a private dependency.