Skip to content

git: use dulwich for .gitignore #4809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 29, 2020
Merged

git: use dulwich for .gitignore #4809

merged 1 commit into from
Oct 29, 2020

Conversation

efiop
Copy link
Contributor

@efiop efiop commented Oct 29, 2020

In my tests Git._ignore is taking around ~65seconds for the whole test
suit(~50K invocations). The reason is that gitpython is launching a real
git check-ignore process, that takes time and resources.

Switching to pygit2 or dulwich makes it drop to ~5seconds, saving us
almost a minute of the test run time and a lot of resources by not
launching any processes.

I've chosen dulwich over pygit2 here because it is much more
pythonic and the performance is on-par with pygit2.

Related to #2215

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

In my tests `Git._ignore` is taking around ~65seconds for the whole test
suit(~50K invocations). The reason is that `gitpython` is launching a real
`git check-ignore` process, that takes time and resources.

Switching to `pygit2` or `dulwich` makes it drop to ~5seconds, saving us
almost a minute of the test run time and a lot of resources by not
launching any processes. I've chosen `dulwich` over `pygit2` here
because it is much more pythonic and as long as the performance is
on-par with `pygit2` it is better for us to use `dulwich`.
@efiop efiop added testing Related to the tests and the testing infrastructure performance improvement over resource / time consuming tasks labels Oct 29, 2020
@@ -50,6 +50,7 @@ def run(self):
"colorama>=0.3.9",
"configobj>=5.0.6",
"gitpython>3",
"dulwich>=0.20.6",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only problem is that both dulwich and pygit don't have wheels for 3.9, but that is solvable.

return True
except GitCommandError:
return False
repo = Repo(self.root_dir)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasteful to have a repo instance created each time here, but doing so for drop-in simplicity. Likely dulwich will replace everything except Git.clone() in the near future and so Repo will move to __init__.

@efiop efiop changed the title [WIP] git: use dulwich for .gitignore git: use dulwich for .gitignore Oct 29, 2020
@efiop
Copy link
Contributor Author

efiop commented Oct 29, 2020

Indeed, looks like the tests on windows are more stable now. Will be continuing the migration for other methods in the further PRs.

EDIT: still flaky https://github.com/iterative/dvc/runs/1329409665?check_suite_focus=true

@efiop efiop merged commit 32fa8e3 into iterative:master Oct 29, 2020
@efiop efiop deleted the libgit branch October 29, 2020 23:47
victorskl added a commit to umccr/umccrise that referenced this pull request Nov 26, 2020
* Latest dvc 1.10 re-use dulwich for .gitignore
  iterative/dvc#4809
* Regression: reintroduce issue reported in
  iterative/dvc#2010
* Resolution: pin dvc version to 1.9.1 for now
* Build image is docker:dind Alpine Linux
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance improvement over resource / time consuming tasks testing Related to the tests and the testing infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant