You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
merge-ort: add code to check for whether cached renames can be reused
We need to know when renames detected in a previous merge operation can
be reused in a later merge operation. Consider the following setup
(from the git-rebase manpage):
A---B---C topic
/
D---E---F---G master
After rebasing, this will appear as:
A'--B'--C' topic
/
D---E---F---G master
Further, let's say that 'oldfile' was renamed to 'newfile' between E
and G. The rebase or cherry-pick of A onto G will involve a three-way
merge between E (as the merge base) and G and A. After detecting the
rename between E:oldfile and G:newfile, there will be a three-way
content merge of the following:
E:oldfile
G:newfile
A:oldfile
and produce a new result:
A':newfile
Now, when we want to pick B onto A', we will need to do a three-way
merge between A (as the merge-base) and A' and B. This will involve
a three-way content merge of
A:oldfile
A':newfile
B:oldfile
but only if we can detect that A:oldfile is similar enough to A':newfile
to be used together in a three-way content merge, i.e. only if we can
detect that A:oldfile and A':newfile are a rename. But we already know
that A:oldfile and A':newfile are similar enough to be used in a
three-way content merge, because that is precisely where A':newfile came
from in the previous merge.
Note that A & A' both appear in both merges. That gives us the
condition under which we can reuse renames.
There are a couple important points about this optimization:
- If the rebase or cherry-pick halts for user conflicts, these caches
are NOT saved anywhere. Thus, resuming a halted rebase or
cherry-pick will result in no reused renames for the next commit.
This is intentional, as user resolution can change files
significantly and in ways that violate the similarity assumptions
here.
- Technically, this might give different results for rename detection.
In particular, looking at the first merge above for oldfile and
newfile, if both A:oldfile and G:newfile were each 40% different
from E:oldfile, and different from each other, and they still merged
cleanly, then A':newfile could be more than 50% different from
A:oldfile. This would mean that traditionally the next step of the
rebase operation, moving B to B', would not detect the rename
between A:oldfile and A':newfile. This most likely would have
resulted in a modify/delete conflict and the rebase operation making
the user resolve the problem. With this optimization, the rename
would be detected and the operation would continue...but the odds of
the successful merging with such different files seems somewhat low
and thus seems likely that it would also halt and tell the user to
resolve the conflict. The odds of this happening in practice are
small...and even if it did occur, there's a pedantic question about
whether this would be considered a regression or a bugfix. I am not
sure which it would be considered, but given that rename detection
has always been heuristics and the original rules started with a
weird dipole with funny corner cases:
- files with the exact same name are considered the same file,
no matter how dissimilar the content
- any other two files are considered the same file only if they
are the best possible content match (highest similarity)
regardless of filename similarity
I don't see this optimization and minor change in behavior as out of
place. Much like the above two cases, even if a corner case is hit,
it is at least fairly easy for people to understand why the
algorithm did what it did -- somewhere in the sequence those files
were related. Much like the technical difference in rename detection
provided by basename-guided matching, this theoretical difference in
results is totally worth the potential time savings.
Signed-off-by: Elijah Newren <[email protected]>
0 commit comments