Skip to content

subtree: Fix handling of complex history #493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: maint
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 33 additions & 4 deletions contrib/subtree/git-subtree.sh
Original file line number Diff line number Diff line change
Expand Up @@ -238,13 +238,13 @@ cache_miss () {
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ed Maste wrote (reply to this):

On Tue, 6 Oct 2020 at 18:05, Tom Clarkson via GitGitGadget
<[email protected]> wrote:
>
> From: Tom Clarkson <[email protected]>
>
> Signed-off-by: Tom Clarkson <[email protected]>
Reviewed-by: Ed Maste <[email protected]>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Tom,

On Tue, 6 Oct 2020, Tom Clarkson via GitGitGadget wrote:

> From: Tom Clarkson <[email protected]>
>
> Include recursion depth in debug logs so we can see when the recursion is
> getting out of hand.
>
> Making the cache handle null mappings correctly and adding older commits
> to the cache allows the recursive algorithm to terminate at any point on
> mainline rather than needing to reach either the add point or the initial
> commit.

Makes sense.

> diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
> index 9867718503..160bad95c1 100755
> --- a/contrib/subtree/git-subtree.sh
> +++ b/contrib/subtree/git-subtree.sh
> @@ -244,7 +244,7 @@ check_parents () {
>  	do
>  		if ! test -r "$cachedir/notree/$miss"
>  		then
> -			debug "  incorrect order: $miss"
> +			debug "  unprocessed parent commit: $miss ($indent)"

Without any context, it is hard to understand what the `$indent` variable
is supposed to mean, so it is unclear why we need to print it here.

I _guess_ it is the degree removed from the first-parent lineage?

In any case, it does not hurt here, so I trust that it is good to include
it in the debug output.

>  			process_split_commit "$miss" "" "$indent"
>  		fi
>  	done
> @@ -392,6 +392,24 @@ find_existing_splits () {
>  	done
>  }
>
> +find_mainline_ref () {
> +	debug "Looking for first split..."
> +	dir="$1"
> +	revs="$2"

The `git-subtree` script seems to rely on the `local` construct, using it
in plenty of other circumstances. How about using it here, too?

> +
> +	git log --reverse --grep="^git-subtree-dir: $dir/*\$" \
> +		--no-show-signature --pretty=format:'START %H%n%s%n%n%b%nEND%n' $revs |

Since all you are interested in is the `git-subtree-mainline:` trailer,
wouldn't a format like `%(trailers:key=git-subtree-mainline)` instead of
`START %H%n%s%n%n%b%nEND%n`?

See
https://git-scm.com/docs/git-log#Documentation/git-log.txt-emtrailersoptionsem
for more information about pretty formats.

BTW I am super unfamiliar with `git subtree`'s inner workings, and
therefore it would help me incredibly if the commit message talked a bit
about the commit message layout (with a particular eye on
`git-subtree-dir` and `git-subtree-mainline` which I guess are trailers
added by `git subtree`?)...

> +	while read a b junk
> +	do
> +		case "$a" in
> +		git-subtree-mainline:)
> +			echo "$b"
> +			return
> +			;;
> +		esac
> +	done
> +}
> +
>  copy_commit () {
>  	# We're going to set some environment vars here, so
>  	# do it in a subshell to get rid of them safely later
> @@ -646,9 +664,9 @@ process_split_commit () {
>
>  	progress "$revcount/$revmax ($createcount) [$extracount]"
>
> -	debug "Processing commit: $rev"
> +	debug "Processing commit: $rev ($indent)"
>  	exists=$(cache_get "$rev")
> -	if test -n "$exists"
> +	if test -z "$(cache_miss "$rev")"
>  	then
>  		debug "  prior: $exists"

I do not see the `exists` variable being used other than for the debug
statement. Maybe better something like this?

	debug "  prior found for $rev"

>  		return
> @@ -773,6 +791,17 @@ cmd_split () {
>
>  	unrevs="$(find_existing_splits "$dir" "$revs")"
>
> +	mainline="$(find_mainline_ref "$dir" "$revs")"
> +	if test -n "$mainline"
> +	then
> +		debug "Mainline $mainline predates subtree add"
> +		git rev-list --topo-order --skip=1 $mainline |
> +		while read rev
> +		do
> +			cache_set "$rev" ""

Ah, so they are not really "null mappings", but mapped to an empty string.
Makes sense. Maybe adjust the commit message?

> +		done || exit $?
> +	fi
> +
>  	# We can't restrict rev-list to only $dir here, because some of our
>  	# parents have the $dir contents the root, and those won't match.
>  	# (and rev-list --follow doesn't seem to solve this)
> --
> gitgitgadget
>
>


check_parents () {
missed=$(cache_miss "$1")
missed=$(cache_miss $1)
local indent=$(($2 + 1))
for miss in $missed
do
if ! test -r "$cachedir/notree/$miss"
then
debug " incorrect order: $miss"
debug " unprocessed parent commit: $miss ($indent)"
process_split_commit "$miss" "" "$indent"
fi
done
Expand Down Expand Up @@ -392,6 +392,24 @@ find_existing_splits () {
done
}

find_mainline_ref () {
debug "Looking for first split..."
dir="$1"
revs="$2"

git log --reverse --grep="^git-subtree-dir: $dir/*\$" \
--no-show-signature --pretty=format:'START %H%n%s%n%n%b%nEND%n' $revs |
while read a b junk
do
case "$a" in
git-subtree-mainline:)
echo "$b"
return
;;
esac
done
}

copy_commit () {
# We're going to set some environment vars here, so
# do it in a subshell to get rid of them safely later
Expand Down Expand Up @@ -646,9 +664,9 @@ process_split_commit () {

progress "$revcount/$revmax ($createcount) [$extracount]"

debug "Processing commit: $rev"
debug "Processing commit: $rev ($indent)"
exists=$(cache_get "$rev")
if test -n "$exists"
if test -z "$(cache_miss "$rev")"
then
debug " prior: $exists"
return
Expand Down Expand Up @@ -773,6 +791,17 @@ cmd_split () {

unrevs="$(find_existing_splits "$dir" "$revs")"

mainline="$(find_mainline_ref "$dir" "$revs")"
if test -n "$mainline"
then
debug "Mainline $mainline predates subtree add"
git rev-list --topo-order --skip=1 $mainline |
while read rev
do
cache_set "$rev" ""
done || exit $?
fi

# We can't restrict rev-list to only $dir here, because some of our
# parents have the $dir contents the root, and those won't match.
# (and rev-list --follow doesn't seem to solve this)
Expand Down