-
-
Notifications
You must be signed in to change notification settings - Fork 354
Add rename tracking to blame #2022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add rename tracking to blame #2022
Conversation
838da39
to
a55f9bd
Compare
@Byron I’ve found and fixed a bug in the initial implementation and marked the commit as ready for review again. I had initially found the bug running |
First of all, apologies for the late response. Maybe there should be a "later" tag to indicate I am delaying the review to when I think I have enough time.
These are very existing indeed! Do you think the code for producing them should live somewhere near
I can only imagine how hard it must be to deduce the source of a bug or of inconsistencies with Git by merely comparing the outcome of Similar extensions I could imagine to add to In any case, I hope to get to give the PR a first look soon. |
No worries, that’s absolutely fine (though a tag would indeed be very helpful for planning)!
Currently, the code can be found in a separate repository: https://github.com/cruessler/gix-scripts (plus a couple of local changes that I haven’t pushed yet). I’m very open to moving it somewhere in
That’s an interesting idea! Do you mean not using
That also seems like a good idea! Is there anything similar in |
I’ve pushed a couple of commits to https://github.com/cruessler/gix-scripts so that it now also counts matching and non-matching lines and the percentage of matching lines. (This is now the version I used to calculate the numbers in the description.) |
Since this is still quite rare, I think I will go with posting a message to indicate such plans. Let's also say that I want to finish the review this week(end) :)!
Indeed, that's what I thought. Now, it's very dangerous to break isolation like that, but I'd think it can also be done well enough to avoid running in an unsuitable parent repository. Alternatively, we could find good ways to clone the whole repository on demand - in theory, a simple read-only sell script would do as part of the existing fixtures. That would be efficient as well except for on CI, and maybe there could even be a special case for that to reuse full clones of the parent repository if available. One can certainly get creative here.
This sounds more like stress-testing or continuous exploration of possible bugs, a bit like fuzzing. The system would have to be able to spend compute and collect data for later analysis. That also does sound like
Yes, exactly that. |
Works for me! :-)
I’m going to try and can come up with something. :-)
Then I’m going to prepare for a chase and start experimenting in |
For the record: unfortunately, it seems quite hard to come up with a good test case for the issue fixed in this commit: Get current file_path from unblamed hunk. (I found the issue running |
In order to reproduce something like that one would have to have a way of With that one could probably build yet another tool to turn this information into a shell script which, when executed, reproduces the exact, non-linear history that is needed to perform a blame on a certain file. This is a project on its own but it would be super-valuable for Something like |
I am doing an incremental review now, and you are welcome to chime in and look at intermediate results. |
d2942cb
to
c4abf16
Compare
c4abf16
to
3e5365c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, that should be it!
Please take a look at the refactor to see if that works for you, or maybe I missed something.
As a major change, I wired rewrites
up to options so it can be configured, which seems more adequate for a plumbing crate. It's less about disabling rewrite tracking and more about adjusting its values and possibly enable copy tracking as well.
Edit: tomorrow night I'd merge this if I don't hear from you.
pretty_assertion::assert_equal
gix-blame
This PR adds rename tracking to
gix-blame
. Since the numbers below seem to be very promising, I’m going to mark this PR as ready!There’s probably ways to make some of the code more efficient, but I think those would be micro-optimizations at this point. I compared this branch and
main
runninggix blame CREDITS
in the Linux kernel and if there is a difference, it is hard to distinguish from noise.2025-05-14 Some numbers
I just ran some benchmarks on how the resulting blames compare to
git
. I ran the comparison on my copy of thegitoxide
repository which, at the time of the benchmark, contained 2040 files in total tracked bygit
.HEAD
was at005e8fec0f1665afc42bda632f140d381eca9fe8
when I first ran the comparison. When I re-ran it on 2025-05-28,HEAD
was at5950c5886941ff3adf326985e2f0e6731e07f2a5
.main
Of 2040 files, 493 had identical blames while 1547 had blames with at least one line differing from
git blame
.In the files that did not fully match
git blame
, 178,068/320,282 lines (55.60 %) matchedgit blame
.In all files, 229,618/371,832 lines (61.75 %) matched
git blame
.Update 2025-05-28: In all files, 229,648/371,861 (61.76 %) matched
git blame
.This branch
Of 2040 files, 1023 had identical blames while 1017 had blames with at least one line differing from
git blame
.In the files that did not fully match
git blame
, 266,382/293,968 lines (90.62 %) matchedgit blame
.In all files, 343,595/371,181 lines (92.57 %) matched
git blame
.Update 2025-05-28: Investigating the difference mentioned below, I found a bug in the initial implementation. After fixing it, about 600 more lines got attributed to the same commit
git
attributed them to. In all files, 344,215/371,861 (92.57 %) matchedgit blame
.This means that this PR attributes about 110,000 lines more to the same commits as
git blame
, compared to currentgitoxide
main
.I think I’m quite pleased with these numbers. 😄
There is a difference in total line numbers between both measurements that I’ll investigate, though I don’t think it changes the general picture. Update 2025-05-28: the difference pointed to a bug that I’ve fixed in a separate commit.