Skip to content

Commit e123d13

Browse files
committed
merge-ort: add code to check for whether cached renames can be reused
We need to know when renames detected in a previous merge operation can be reused in a later merge operation. Consider the following setup (from the git-rebase manpage): A---B---C topic / D---E---F---G master After rebasing, this will appear as: A'--B'--C' topic / D---E---F---G master Further, let's say that 'oldfile' was renamed to 'newfile' between E and G. The rebase or cherry-pick of A onto G will involve a three-way merge between E (as the merge base) and G and A. After detecting the rename between E:oldfile and G:newfile, there will be a three-way content merge of the following: E:oldfile G:newfile A:oldfile and produce a new result: A':newfile Now, when we want to pick B onto A', we will need to do a three-way merge between A (as the merge-base) and A' and B. This will involve a three-way content merge of A:oldfile A':newfile B:oldfile but only if we can detect that A:oldfile is similar enough to A':newfile to be used together in a three-way content merge, i.e. only if we can detect that A:oldfile and A':newfile are a rename. But we already know that A:oldfile and A':newfile are similar enough to be used in a three-way content merge, because that is precisely where A':newfile came from in the previous merge. Note that A & A' both appear in both merges. That gives us the condition under which we can reuse renames. There are a couple important points about this optimization: - If the rebase or cherry-pick halts for user conflicts, these caches are NOT saved anywhere. Thus, resuming a halted rebase or cherry-pick will result in no reused renames for the next commit. This is intentional, as user resolution can change files significantly and in ways that violate the similarity assumptions here. - Technically, this might give different results for rename detection. In particular, looking at the first merge above for oldfile and newfile, if both A:oldfile and G:newfile were each 40% different from E:oldfile, and different from each other, and they still merged cleanly, then A':newfile could be more than 50% different from A:oldfile. This would mean that traditionally the next step of the rebase operation, moving B to B', would not detect the rename between A:oldfile and A':newfile. This most likely would have resulted in a modify/delete conflict and the rebase operation making the user resolve the problem. With this optimization, the rename would be detected and the operation would continue...but the odds of the successful merging with such different files seems somewhat low and thus seems likely that it would also halt and tell the user to resolve the conflict. The odds of this happening in practice are small...and even if it did occur, there's a pedantic question about whether this would be considered a regression or a bugfix. I am not sure which it would be considered, but given that rename detection has always been heuristics and the original rules started with a weird dipole with funny corner cases: - files with the exact same name are considered the same file, no matter how dissimilar the content - any other two files are considered the same file only if they are the best possible content match (highest similarity) regardless of filename similarity I don't see this optimization and minor change in behavior as out of place. Much like the above two cases, even if a corner case is hit, it is at least fairly easy for people to understand why the algorithm did what it did -- somewhere in the sequence those files were related. Much like the technical difference in rename detection provided by basename-guided matching, this theoretical difference in results is totally worth the potential time savings. Signed-off-by: Elijah Newren <[email protected]>
1 parent 21583d7 commit e123d13

File tree

1 file changed

+64
-2
lines changed

1 file changed

+64
-2
lines changed

merge-ort.c

Lines changed: 64 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,30 @@ struct rename_info {
138138
int callback_data_nr, callback_data_alloc;
139139
char *callback_data_traverse_path;
140140

141+
/*
142+
* merge_trees: trees passed to the merge algorithm for the merge
143+
*
144+
* merge_trees records the trees passed to the merge algorithm. But,
145+
* this data also is stored in merge_result->priv. If a sequence of
146+
* merges are being done (such as when cherry-picking or rebasing),
147+
* the next merge can look at this and re-use information from
148+
* previous merges under certain cirumstances.
149+
*
150+
* See also all the cached_* variables.
151+
*/
152+
struct tree *merge_trees[3];
153+
154+
/*
155+
* cached_pairs_valid_side: which side's cached info can be reused
156+
*
157+
* See the description for merge_trees. For repeated merges, at most
158+
* only one side's cached information can be used. Valid values:
159+
* MERGE_SIDE2: cached data from side2 can be reused
160+
* MERGE_SIDE1: cached data from side1 can be reused
161+
* 0: no cached data can be reused
162+
*/
163+
int cached_pairs_valid_side;
164+
141165
/*
142166
* cached_pairs: Caching of renames and deletions.
143167
*
@@ -447,6 +471,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
447471
strmap_func(&renames->cached_pairs[i], 1);
448472
strset_func(&renames->cached_irrelevant[i]);
449473
}
474+
renames->cached_pairs_valid_side = 0;
475+
renames->dir_rename_mask = 0;
450476

451477
if (!reinitialize) {
452478
struct hashmap_iter iter;
@@ -469,8 +495,6 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
469495
strmap_clear(&opti->output, 0);
470496
}
471497

472-
renames->dir_rename_mask = 0;
473-
474498
/* Clean out callback_data as well. */
475499
FREE_AND_NULL(renames->callback_data);
476500
renames->callback_data_nr = renames->callback_data_alloc = 0;
@@ -3615,6 +3639,35 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
36153639
trace2_region_leave("merge", "allocate/init", opt->repo);
36163640
}
36173641

3642+
static void merge_check_renames_reusable(struct merge_options *opt,
3643+
struct merge_result *result,
3644+
struct tree *merge_base,
3645+
struct tree *side1,
3646+
struct tree *side2)
3647+
{
3648+
struct rename_info *renames;
3649+
struct tree **merge_trees;
3650+
struct merge_options_internal *opti = result->priv;
3651+
3652+
if (!opti)
3653+
return;
3654+
3655+
renames = &opti->renames;
3656+
merge_trees = renames->merge_trees;
3657+
/* merge_trees[0..2] will only be NULL if opti is */
3658+
assert(merge_trees[0] && merge_trees[1] && merge_trees[2]);
3659+
3660+
/* Check if we meet a condition for re-using cached_pairs */
3661+
if ( oideq(&merge_base->object.oid, &merge_trees[2]->object.oid) &&
3662+
oideq( &side1->object.oid, &result->tree->object.oid))
3663+
renames->cached_pairs_valid_side = MERGE_SIDE1;
3664+
else if (oideq(&merge_base->object.oid, &merge_trees[1]->object.oid) &&
3665+
oideq( &side2->object.oid, &result->tree->object.oid))
3666+
renames->cached_pairs_valid_side = MERGE_SIDE2;
3667+
else
3668+
renames->cached_pairs_valid_side = 0; /* neither side valid */
3669+
}
3670+
36183671
/*** Function Grouping: merge_incore_*() and their internal variants ***/
36193672

36203673
/*
@@ -3755,7 +3808,16 @@ void merge_incore_nonrecursive(struct merge_options *opt,
37553808

37563809
trace2_region_enter("merge", "merge_start", opt->repo);
37573810
assert(opt->ancestor != NULL);
3811+
merge_check_renames_reusable(opt, result, merge_base, side1, side2);
37583812
merge_start(opt, result);
3813+
/*
3814+
* Record the trees used in this merge, so if there's a next merge in
3815+
* a cherry-pick or rebase sequence it might be able to take advantage
3816+
* of the cached_pairs in that next merge.
3817+
*/
3818+
opt->priv->renames.merge_trees[0] = merge_base;
3819+
opt->priv->renames.merge_trees[1] = side1;
3820+
opt->priv->renames.merge_trees[2] = side2;
37593821
trace2_region_leave("merge", "merge_start", opt->repo);
37603822

37613823
merge_ort_nonrecursive_internal(opt, merge_base, side1, side2, result);

0 commit comments

Comments
 (0)