Skip to content

PoC: Use gitoxide for rev walk #2269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

cruessler
Copy link
Collaborator

@cruessler cruessler commented Jun 16, 2024

This is a PoC, mainly intended to explore two things:

  1. how easy/difficult it is to switch git2 for gix for a single use case,
  2. compare both approaches with respect to performance.

Findings

This version uses repo.rev_walk(). We could also try implementing the existing algorithm using gitx primitives, but I think it makes sense to optimize repo.rev_walk() so that every consumer benefits from improvements.

It seems as if using gix instead of git2 is fairly straightforward. There are a few changes necessary, but none of them major. Also, it seems as if the changes can mostly be contained in LogWalker, in particular without the need to change LogWalker::read’s API. (LogWalker::new needs to slightly be changed, but this does not look like an issue.)

Performance-wise, it seems as if the gix implementation is slower than the git2 implementation by about 30 %. I tested how long it took to open the app in my copy of the Linux kernel until the loading indicator stopped spinning, indicating that the full list of commit ids had been loaded. My copy of the Linux kernel contains 1_014_089 commits.

gix with use_commit_graph(true): about 22 s
gix with use_commit_graph(false): about 22 s
git2 (at commit 038c4a5): about 17 s

Keep in mind that these are very rough numbers. It’s also possible that the gix API can be used in a way that is faster.

gix with features = ["max-performance"]

When using features = ["max-performance"], gix is about 25 % faster on my machine than the implementation based on git2.

gix with use_commit_graph(false): about 13 s
gix with use_commit_graph(true): about 13 s

git2 without hash verification or caching

I added the following lines to repo in asyncgit::sync::repository, right before Repository::open_ext, and it was even faster.

enable_caching(false);
strict_hash_verification(false);

git2 without hash verification or caching: about 10 s

Edit 2024-06-16: I also got flamegraphs for both implementations that I could share.
Edit 2024-06-17: I added numbers for features = ["max-performance"]. I’ll later also add a flamegraph and will test use_commit_graph(true).
Edit 2024-06-18: I added numbers for git2 without hash verification or caching.

@cruessler cruessler changed the title Use gitoxide for rev walk PoC: Use gitoxide for rev walk Jun 16, 2024
@cruessler
Copy link
Collaborator Author

cruessler commented Jun 17, 2024

Flamegraph using gitx

flamegraph

Flamegraph using gitx with features = ["max-performance"]

flamegraph

Flamegraph using git2

flamegraph

Flamegraph using git2 without hash verification or caching

flamegraph

@Byron
Copy link

Byron commented Jun 17, 2024

That's all very exciting, thanks for getting started with gix in gitui :)!

I think it's worth noting that gix does not validate the objects it reads right now, which makes it less safe than git2, but as safe as Git (as far as I could tell).

This git2 behaviour can be deactivated with strict_hash_verification, and maybe more performance can be obtained by disabling object caching.

@extrawurst
Copy link
Collaborator

This git2 behaviour can be deactivated with strict_hash_verification, and maybe more performance can be obtained by disabling object caching.

@cruessler did you ever benchmark this?

@cruessler
Copy link
Collaborator Author

@extrawurst Yes, at the time I added the numbers in the first post, under “git2 without hash verification or caching”. It was even faster than gix with max-performance.

@cruessler cruessler closed this Mar 7, 2025
@Byron
Copy link

Byron commented Mar 16, 2025

That is interesting!

A note on gix with use_commit_graph(true): about 13 s - this would be about 10 times faster if a graph was actually used. It will be used by default (unless disabled in git config) if it was present as well. Was it created with git commit-graph write --reachable?

Something that seems strange here is that the numbers don't seem to match my own.

For instance, a simple commit traversal (that cannot use the commit-graph cache) can be done (with a hot FS-cache) at 138k commits/s on an M1 Pro.

❯ ein t hours
 09:50:50 traverse commit graph done 1.3M commits in 9.46s (138.4K commits/s)
 09:50:50        estimate-hours Extracted and organized data from 1309152 commits in 11.885291ms (110148920 commits/s)
total hours: 1243723.88
total 8h days: 155465.48
total commits = 1309152
total authors: 34641
total unique authors: 26510 (23.47% duplication)

linux ( master) +798 -408 [!] took 9s
❯ git rev-parse @
87d6aab2389e5ce0197d8257d5f8ee965a67c4cd

That code uses a the Simple iteration directly, which is used through abstractions here. I'd hope that these don't cause such a slowdown.

It would certainly be interesting, @cruessler, to see what ein t hours says on your machine. For a perfect comparison, one would certainly want to write a non-GUI program that does the traversal to be sure the right thing is measured.

Damaged pride aside 😅, I am glad that git2 is this awesome, and that I could help.

@cruessler
Copy link
Collaborator Author

To add a bit more context: I’ve created https://github.com/cruessler/gix-benchmarks in order to be able to more thoroughly compare history traversal speed of both gix and git2. It seems that gix is significantly faster, in particular in the Linux kernel. (I hope that I didn’t make a mistake in the benchmark code. 😄)

Also: I did not know about git commit-graph write --reachable at the time, and the numbers most certainly reflect that. 😄

The benchmark has strict_object_creation(false); strict_hash_verification(false); https://github.com/cruessler/gix-benchmarks/blob/73711297dbad890da146f113c3d9f7e92f0afac7/src/main.rs#L63-L65.

@extrawurst
Copy link
Collaborator

This is very promising!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants