Skip to content

Commit 78f24d1

Browse files
committed
diffcore-rename, merge-ort: use handle_early_known_dir_renames()
Put the work of the last several patches to work by calling the new handle_early_known_dir_renames() function just after doing basename-guided rename detection. For the testcases mentioned in commit 557ac03 ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.671 s ± 0.052 s 5.714 s ± 0.087 s mega-renames: 13.653 s ± 0.104 s 11.439 s ± 0.192 s just-one-mega: 499.4 ms ± 3.6 ms 501.1 ms ± 3.8 ms While this improvement looks rather modest for these testcases (because all the previous optimizations were sufficient to nearly remove all time spent in rename detection already), there was a cherry-pick in a real world (but private) repository at $DAYJOB that showed a speedup factor of ~7x from this optimization. An important side note for future further optimization: There is a possible improvement to this optimization that I have not yet attempted: we could first check whether exact renames provide enough information for us to determine directory renames, and avoid doing basename-guided rename detection on some or all of the RELEVANT_LOCATION files within those directories. In effect, this variant would mean doing the handle_early_known_dir_renames() both after exact rename detection and again after basename-guided rename detection, though I remember thinking at one point that there was some extra cleanup needed between the steps if we were to try that. Checking for skippable renames an extra time might be a valuable optimization based on an understanding of the difference in performance of exact rename detection and basename-guided rename detection: While basename-guided rename detection is faster than full inexact rename detection with the latter's NxM file content comparisons, basename-guided rename detection is vastly slower than exact rename detection. The reason for this is that the first time you need to compare a file's contents to another is the most expensive; after diffcore-rename has generated a sequence of hashes for a file's contents, any subsequent comparisons of that file (via the sequence of hashes) to another file is much cheaper. So, removing files from rename detection before doing some basename-guided rename detection could potentially improve this optimization further. However, this particular optimization was actually the last one I did in original implementation order, and by the time I implemented this idea, every testcase I had was sufficiently fast that further optimization was unwarranted. If future testcases arise that tax rename detection more heavily, it may be worth implementing this more involved variant. Signed-off-by: Elijah Newren <[email protected]>
1 parent eb55bd9 commit 78f24d1

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

diffcore-rename.c

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1077,7 +1077,6 @@ static void remove_unneeded_paths_from_src(int detecting_copies,
10771077
rename_src_nr = new_num_src;
10781078
}
10791079

1080-
MAYBE_UNUSED
10811080
static void handle_early_known_dir_renames(struct dir_rename_info *info,
10821081
struct strintmap *relevant_sources,
10831082
struct strintmap *dirs_removed)
@@ -1310,9 +1309,15 @@ void diffcore_rename_extended(struct diff_options *options,
13101309
* Cull sources, again:
13111310
* - remove ones involved in renames (found via basenames)
13121311
* - remove ones not found in relevant_sources
1312+
* and
1313+
* - remove ones in relevant_sources which are needed only
1314+
* for directory renames, if no ancestory directory actually
1315+
* needs to know any more individual path renames under them
13131316
*/
13141317
trace2_region_enter("diff", "cull basename", options->repo);
13151318
remove_unneeded_paths_from_src(want_copies, relevant_sources);
1319+
handle_early_known_dir_renames(&info, relevant_sources,
1320+
dirs_removed);
13161321
trace2_region_leave("diff", "cull basename", options->repo);
13171322
}
13181323

0 commit comments

Comments
 (0)