Skip to content

Commit 4e5a08e

Browse files
committed
diffcore-rename, merge-ort: use handle_early_known_dir_renames()
Put the work of the last several patches to work by calling the new handle_early_known_dir_renames() function just after doing basename-guided rename detection. For the testcases mentioned in commit 557ac03 ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.680 s ± 0.096 s 5.665 s ± 0.129 s mega-renames: 13.812 s ± 0.162 s 11.435 s ± 0.158 s just-one-mega: 506.0 ms ± 3.9 ms 494.2 ms ± 6.1 ms While this improvement looks rather modest for these testcases (because all the previous optimizations were sufficient to nearly remove all time spent in rename detection already), there was a cherry-pick in a real world (but private) repository at $DAYJOB that showed a speedup factor of ~7x from this optimization. An important side note for future further optimization: There is a possible improvement to this optimization that I have not yet attempted: we could first check whether exact renames provide enough information for us to determine directory renames, and avoid doing basename-guided rename detection on some or all of the RELEVANT_LOCATION files within those directories. In effect, this variant would mean doing the handle_early_known_dir_renames() both after exact rename detection and again after basename-guided rename detection, though I remember thinking at one point that there was some extra cleanup needed between the steps if we were to try that. Checking for skippable renames an extra time might be a valuable optimization based on an understanding of the difference in performance of exact rename detection and basename-guided rename detection: While basename-guided rename detection is faster than full inexact rename detection with the latter's NxM file content comparisons, basename-guided rename detection is vastly slower than exact rename detection. The reason for this is that the first time you need to compare a file's contents to another is the most expensive; after diffcore-rename has generated a sequence of hashes for a file's contents, any subsequent comparisons of that file (via the sequence of hashes) to another file is much cheaper. So, removing files from rename detection before doing some basename-guided rename detection could potentially improve this optimization further. However, this particular optimization was actually the last one I did in original implementation order, and by the time I implemented this idea, every testcase I had was sufficiently fast that further optimization was unwarranted. If future testcases arise that tax rename detection more heavily, it may be worth implementing this more involved variant. Signed-off-by: Elijah Newren <[email protected]>
1 parent 0e1c9e5 commit 4e5a08e

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

diffcore-rename.c

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1117,7 +1117,6 @@ static void remove_unneeded_paths_from_src(int detecting_copies,
11171117
rename_src_nr = new_num_src;
11181118
}
11191119

1120-
MAYBE_UNUSED
11211120
static void handle_early_known_dir_renames(struct dir_rename_info *info,
11221121
struct strintmap *relevant_sources,
11231122
struct strintmap *dirs_removed)
@@ -1358,9 +1357,15 @@ void diffcore_rename_extended(struct diff_options *options,
13581357
* Cull sources, again:
13591358
* - remove ones involved in renames (found via basenames)
13601359
* - remove ones not found in relevant_sources
1360+
* and
1361+
* - remove ones in relevant_sources which are needed only
1362+
* for directory renames, if no ancestory directory actually
1363+
* needs to know any more individual path renames under them
13611364
*/
13621365
trace2_region_enter("diff", "cull basename", options->repo);
13631366
remove_unneeded_paths_from_src(want_copies, relevant_sources);
1367+
handle_early_known_dir_renames(&info, relevant_sources,
1368+
dirs_removed);
13641369
trace2_region_leave("diff", "cull basename", options->repo);
13651370
}
13661371

0 commit comments

Comments
 (0)