Skip to content

Commit 1488b1c

Browse files
newrengitster
authored andcommitted
repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
The fix is short (~30 lines), but the description is not. Sorry. There is a set of problems caused by files in what I'll refer to as the "present-despite-SKIP_WORKTREE" state. This commit aims to not just fix these problems, but remove the entire class as a possibility -- for those using sparse checkouts. But first, we need to understand the problems this class presents. A quick outline: * Problems * User facing issues * Problem space complexity * Maintenance and code correctness challenges * SKIP_WORKTREE expectations in Git * Suggested solution * Pros/Cons of suggested solution * Notes on testcase modifications === User facing issues === There are various ways for users to get files to be present in the working copy despite having the SKIP_WORKTREE bit set for that file in the index. This may come from: * various git commands not really supporting the SKIP_WORKTREE bit[1,2] * users grabbing files from elsewhere and writing them to the worktree (perhaps even cached in their editor) * users attempting to "abort" a sparse-checkout operation with a not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the working tree is not atomic)[3]. Once users have present-despite-SKIP_WORKTREE files, any modifications users make to these files will be ignored, possibly to users' confusion. Further: * these files will degrade performance for the sparse-index case due to requiring the index to be expanded (see commit 55dfcf9 ("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why we try to delete entire directories outside the sparse cone). * these files will not be updated by by standard commands (switch/checkout/pull/merge/rebase will leave them alone unless conflicts happen -- and even then, the conflicted file may be written somewhere else to avoid overwriting the SKIP_WORKTREE file that is present and in the way) * there is nothing in Git that users can use to discover such files (status, diff, grep, etc. all ignore it) * there is no reasonable mechanism to "recover" from such a condition (neither `git sparse-checkout reapply` nor `git reset --hard` will correct it). So, not only are users modifications ignored, but the files get progressively more stale over time. At some point in the future, they may change their sparseness specification or disable sparse-checkouts. At that time, all present-despite-SKIP_WORKTREE files will show up as having lots of modifications because they represent a version from a different branch or commit. These might include user-made local changes from days before, but the only way to tell is to have users look through them all closely. If these users come to others for help, there will be no logs that explain the issue; it's just a mysterious list of changes. Users might adamantly claim (correctly, as it turns out) that they didn't modify these files, while others presume they did. [1] https://lore.kernel.org/git/[email protected]/ [2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/ === Problem space complexity === SKIP_WORKTREE has been part of Git for over a decade. Duy did lots of work on it initially, and several others have since come along and put lots of work into it. Stolee spent most of 2021 on the sparse-index, with lots of bugfixes along the way including to non-sparse-index cases as we are still trying to get sparse checkouts to behave reasonably. Basically every codepath throughout the treat needs to be aware of an additional type of file: tracked-but-not-present. The extra type results in lots of extra testcases and lots of extra code everywhere. But, the sad thing is that we actually have more than one extra type. We have tracked, tracked-but-not-present (SKIP_WORKTREE), and tracked-but-promised-to-not-be-present-but-is-present-anyway (present-despite-SKIP_WORKTREE). Two types is a monumental amount of effort to support, and adding a third feels a bit like insanity[4]. [4] Some examples of which can be seen at https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ === Maintenance and code correctness challenges === Matheus' patches to grep stalled for nearly a year, in part because of complications of how to handle sparse-checkouts appropriately in all cases[5][6] (with trying to sanely figure out how to sanely handle present-despite-SKIP_WORKTREE files being one of the complications). His rm/add follow-ups also took months because of those kinds of issues[7]. The corner cases with things like submodules and SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start becoming really complex[8]. We've had to add ugly logic to merge-ort to attempt to handle present-despite-SKIP_WORKTREE files[9], and basically just been forced to give up in merge-recursive knowing full well that we'll sometimes silently discard user modifications. Despite stash essentially being a merge, it needed extra code (beyond what was in merge-ort and merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid a few different bugs that'd result in an early abort with a partial stash application[10]. [5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t and the dates on the thread; also Matheus and I had several conversations off-list trying to resolve the issues over that time [6] ...it finally kind of got unstuck after https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ [7] See for example https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t and quotes like "The core functionality of sparse-checkout has always been only partially implemented", a statement I still believe is true today. [8] https://lore.kernel.org/git/[email protected]/ [9] See commit 66b209b ("merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries", 2021-03-20) [10] See commit ba359fd ("stash: fix stash application in sparse-checkouts", 2020-12-01) === SKIP_WORKTREE expectations in Git === A couple quotes: From [11] (before the "sparse-checkout" command existed): If it needs too many special cases, hacks, and conditionals, then it is not worth the complexity---if it is easier to write a correct code by allowing Git to populate working tree files, it is perfectly fine to do so. In a sense, the sparse checkout "feature" itself is a hack by itself, and that is why I think this part should be "best effort" as well. From the git-sparse-checkout manual (still present today): THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE. [11] https://lore.kernel.org/git/[email protected]/ === Suggested solution === SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as the name of the option implies, to allow the file to NOT be in the worktree but consider it to be unchanged rather than deleted. The suggests a simple solution: present-despite-SKIP_WORKTREE files should not exist, for those using sparse-checkouts. Enforce this at index loading time by checking if core.sparseCheckout is true; if so, check files in the index with the SKIP_WORKTREE bit set to verify that they are absent from the working tree. If they are present, unset the bit (in memory, though any commands that write to the index will record the update). Users can, of course, can get the SKIP_WORKTREE bit back such as by running `git sparse-checkout reapply` (if they have ensured the file is unmodified and doesn't match the specified sparsity patterns). === Pros/Cons of suggested solution === Pros: * Solves the user visible problems reported above, which I've been complaining about for nearly a year but couldn't find a solution to. * Helps prevent slow performance degradation with a sparse-index. * Much easier behavior in sparse-checkouts for users to reason about * Very simple, ~30 lines of code. * Significantly simplifies some ugly testcases, and obviates the need to test an entire class of potential issues. * Reduces code complexity, reasoning, and maintenance. Avoids disagreements about weird corner cases[12]. * It has been reported that some users might be (ab)using SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree mechanism[13, and a few other similar references]. These users know of multiple caveats and shortcomings in doing so; perhaps not surprising given the "SKIP_WORKTREE expecations" section above. However, these users use `git update-index --skip-worktree`, and not `git sparse-checkout` or core.sparseCheckout=true. As such, these users would be unaffected by this change and can continue abusing the system as before. [12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree Cons: * When core.sparseCheckout is enabled, this adds a performance cost to reading the index. I'll defer discussion of this cost to a subsequent patch, since I have some optimizations to add. === Notes on testcase modifications === The good: * t1011: Compare to two cases above it ('read-tree will not throw away dirty changes, non-sparse'); since the file is present, it should match the non-sparse case now * t1092: sparse-index & sparse-checkout now match full-worktree behavior in more cases! Yaay for consistency! * t6428, t7012: look at how much simpler the tests become! Merge and stash can just fail early telling the user there's a file in the way, instead of not noticing until it's about to write a file and then have to implement sudden crash avoidance. Hurray for sanity! * t7817: sparse behavior better matches full tree behavior. Hurray for sanity! The confusing: * t3705: These changes were ONLY needed on Windows, but they don't hurt other platforms. Let's discuss each individually: * core.sparseCheckout should be false by default. Nothing in this testcase toggles that until many, many tests later. However, early tests (#5 in particular) were testing `update-index --skip-worktree` behavior in a non-sparse-checkout, but the Windows tests in CI were behaving as if core.sparseCheckout=true had been specified somewhere. I do not have access to a Windows machine. But I just manually did what should have been a no-op and turned the config off. And it fixed the test. * I have no idea why the leftover .gitattributes file from this test was causing failures for test #18 on Windows, but only with these changes of mine. Test #18 was checking for empty stderr, and specifically wanted to know that some error completely unrelated to file endings did not appear. The leftover .gitattributes file thus caused some spurious stderr unrelated to the thing being checked. Since other tests did not intend to test normalization, just proactively remove the .gitattributes file. I'm certain this is cleaner and better, I'm just unsure why/how this didn't trigger problems before. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 73e08a2 commit 1488b1c

9 files changed

+61
-66
lines changed

repository.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -301,6 +301,13 @@ int repo_read_index(struct repository *repo)
301301
if (repo->settings.command_requires_full_index)
302302
ensure_full_index(repo->index);
303303

304+
/*
305+
* If sparse checkouts are in use, check whether paths with the
306+
* SKIP_WORKTREE attribute are missing from the worktree; if not,
307+
* clear that attribute for that path.
308+
*/
309+
clear_skip_worktree_from_present_files(repo->index);
310+
304311
return res;
305312
}
306313

sparse-index.c

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -341,6 +341,27 @@ void ensure_correct_sparsity(struct index_state *istate)
341341
ensure_full_index(istate);
342342
}
343343

344+
void clear_skip_worktree_from_present_files(struct index_state *istate)
345+
{
346+
int i;
347+
if (!core_apply_sparse_checkout)
348+
return;
349+
350+
restart:
351+
for (i = 0; i < istate->cache_nr; i++) {
352+
struct cache_entry *ce = istate->cache[i];
353+
struct stat st;
354+
355+
if (ce_skip_worktree(ce) && !lstat(ce->name, &st)) {
356+
if (S_ISSPARSEDIR(ce->ce_mode)) {
357+
ensure_full_index(istate);
358+
goto restart;
359+
}
360+
ce->ce_flags &= ~CE_SKIP_WORKTREE;
361+
}
362+
}
363+
}
364+
344365
/*
345366
* This static global helps avoid infinite recursion between
346367
* expand_to_path() and index_file_exists().

sparse-index.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ struct index_state;
55
#define SPARSE_INDEX_MEMORY_ONLY (1 << 0)
66
int convert_to_sparse(struct index_state *istate, int flags);
77
void ensure_correct_sparsity(struct index_state *istate);
8+
void clear_skip_worktree_from_present_files(struct index_state *istate);
89

910
/*
1011
* Some places in the codebase expect to search for a specific path.

t/t1011-read-tree-sparse-checkout.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -212,7 +212,7 @@ test_expect_success 'read-tree updates worktree, dirty case' '
212212
echo sub/added >.git/info/sparse-checkout &&
213213
git checkout -f top &&
214214
echo dirty >init.t &&
215-
read_tree_u_must_succeed -m -u HEAD^ &&
215+
read_tree_u_must_fail -m -u HEAD^ &&
216216
grep -q dirty init.t &&
217217
rm init.t
218218
'

t/t1092-sparse-checkout-compatibility.sh

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -367,7 +367,7 @@ test_expect_success 'status/add: outside sparse cone' '
367367
write_script edit-contents <<-\EOF &&
368368
echo text >>$1
369369
EOF
370-
run_on_sparse ../edit-contents folder1/a &&
370+
run_on_all ../edit-contents folder1/a &&
371371
run_on_all ../edit-contents folder1/new &&
372372
373373
test_sparse_match git status --porcelain=v2 &&
@@ -376,8 +376,8 @@ test_expect_success 'status/add: outside sparse cone' '
376376
test_sparse_match test_must_fail git add folder1/a &&
377377
grep "Disable or modify the sparsity rules" sparse-checkout-err &&
378378
test_sparse_unstaged folder1/a &&
379-
test_sparse_match test_must_fail git add --refresh folder1/a &&
380-
grep "Disable or modify the sparsity rules" sparse-checkout-err &&
379+
test_all_match git add --refresh folder1/a &&
380+
test_must_be_empty sparse-checkout-err &&
381381
test_sparse_unstaged folder1/a &&
382382
test_sparse_match test_must_fail git add folder1/new &&
383383
grep "Disable or modify the sparsity rules" sparse-checkout-err &&
@@ -643,11 +643,11 @@ test_expect_success 'update-index modify outside sparse definition' '
643643
run_on_sparse cp ../initial-repo/folder1/a folder1/a &&
644644
run_on_all ../edit-contents folder1/a &&
645645
646-
# If file has skip-worktree enabled, update-index does not modify the
647-
# index entry
648-
test_sparse_match git update-index folder1/a &&
649-
test_sparse_match git status --porcelain=v2 &&
650-
test_must_be_empty sparse-checkout-out &&
646+
# If file has skip-worktree enabled, but the file is present, it is
647+
# treated the same as if skip-worktree is disabled
648+
test_all_match git status --porcelain=v2 &&
649+
test_all_match git update-index folder1/a &&
650+
test_all_match git status --porcelain=v2 &&
651651
652652
# When skip-worktree is disabled (even on files outside sparse cone), file
653653
# is updated in the index

t/t3705-add-sparse-checkout.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ setup_sparse_entry () {
1919
fi &&
2020
git add sparse_entry &&
2121
git update-index --skip-worktree sparse_entry &&
22+
git config core.sparseCheckout false &&
2223
git commit --allow-empty -m "ensure sparse_entry exists at HEAD" &&
2324
SPARSE_ENTRY_BLOB=$(git rev-parse :sparse_entry)
2425
}
@@ -126,6 +127,7 @@ test_expect_success 'git add --chmod does not update sparse entries' '
126127
'
127128

128129
test_expect_success 'git add --renormalize does not update sparse entries' '
130+
test_when_finished rm .gitattributes &&
129131
test_config core.autocrlf false &&
130132
setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
131133
echo "sparse_entry text=auto" >.gitattributes &&

t/t6428-merge-conflicts-sparse.sh

Lines changed: 5 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ test_expect_success 'conflicting entries written to worktree even if sparse' '
112112
)
113113
'
114114

115-
test_expect_merge_algorithm failure success 'present-despite-SKIP_WORKTREE handled reasonably' '
115+
test_expect_success 'present-despite-SKIP_WORKTREE handled reasonably' '
116116
test_setup_numerals in_the_way &&
117117
(
118118
cd numerals_in_the_way &&
@@ -132,26 +132,13 @@ test_expect_merge_algorithm failure success 'present-despite-SKIP_WORKTREE handl
132132
133133
test_must_fail git merge -s recursive B^0 &&
134134
135-
git ls-files -t >index_files &&
136-
test_cmp expected-index index_files &&
135+
test_path_is_missing .git/MERGE_HEAD &&
137136
138-
test_path_is_file README &&
139137
test_path_is_file numerals &&
140138
141-
test_cmp expected-merge numerals &&
142-
143-
# There should still be a file with "foobar" in it
144-
grep foobar * &&
145-
146-
# 5 other files:
147-
# * expected-merge
148-
# * expected-index
149-
# * index_files
150-
# * others
151-
# * whatever name was given to the numerals file that had
152-
# "foobar" in it
153-
git ls-files -o >others &&
154-
test_line_count = 5 others
139+
# numerals should still have "foobar" in it
140+
echo foobar >expect &&
141+
test_cmp expect numerals
155142
)
156143
'
157144

t/t7012-skip-worktree-writing.sh

Lines changed: 7 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -171,50 +171,20 @@ test_expect_success 'stash restore in sparse checkout' '
171171
172172
# Put a file in the working directory in the way
173173
echo in the way >modified &&
174-
git stash apply &&
174+
test_must_fail git stash apply 2>error&&
175175
176-
# Ensure stash vivifies modifies paths...
177-
cat >expect <<-EOF &&
178-
H addme
179-
H modified
180-
H removeme
181-
H subdir/A
182-
S untouched
183-
EOF
184-
git ls-files -t >actual &&
185-
test_cmp expect actual &&
176+
grep "changes.*would be overwritten by merge" error &&
186177
187-
# ...and that the paths show up in status as changed...
188-
cat >expect <<-EOF &&
189-
A addme
190-
M modified
191-
D removeme
192-
M subdir/A
193-
?? actual
194-
?? expect
195-
?? modified.stash.XXXXXX
196-
EOF
197-
git status --porcelain | \
198-
sed -e s/stash......./stash.XXXXXX/ >actual &&
199-
test_cmp expect actual &&
178+
echo in the way >expect &&
179+
test_cmp expect modified &&
180+
git diff --quiet HEAD ":!modified" &&
200181
201182
# ...and that working directory reflects the files correctly
202-
test_path_is_file addme &&
183+
test_path_is_missing addme &&
203184
test_path_is_file modified &&
204185
test_path_is_missing removeme &&
205186
test_path_is_file subdir/A &&
206-
test_path_is_missing untouched &&
207-
208-
# ...including that we have the expected "modified" file...
209-
cat >expect <<-EOF &&
210-
modified
211-
tweaked
212-
EOF
213-
test_cmp expect modified &&
214-
215-
# ...and that the other "modified" file is still present...
216-
echo in the way >expect &&
217-
test_cmp expect modified.stash.*
187+
test_path_is_missing untouched
218188
)
219189
'
220190

t/t7817-grep-sparse-checkout.sh

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -83,10 +83,13 @@ test_expect_success 'setup' '
8383

8484
# The test below covers a special case: the sparsity patterns exclude '/b' and
8585
# sparse checkout is enabled, but the path exists in the working tree (e.g.
86-
# manually created after `git sparse-checkout init`). git grep should skip it.
86+
# manually created after `git sparse-checkout init`). Although b is marked
87+
# as SKIP_WORKTREE, git grep should notice it IS present in the worktree and
88+
# report it.
8789
test_expect_success 'working tree grep honors sparse checkout' '
8890
cat >expect <<-EOF &&
8991
a:text
92+
b:new-text
9093
EOF
9194
test_when_finished "rm -f b" &&
9295
echo "new-text" >b &&
@@ -126,12 +129,16 @@ test_expect_success 'grep --cached searches entries with the SKIP_WORKTREE bit'
126129
'
127130

128131
# Note that sub2/ is present in the worktree but it is excluded by the sparsity
129-
# patterns, so grep should not recurse into it.
132+
# patterns. We also explicitly mark it as SKIP_WORKTREE in case it got cleared
133+
# by previous git commands. Thus sub2 starts as SKIP_WORKTREE but since it is
134+
# present in the working tree, grep should recurse into it.
130135
test_expect_success 'grep --recurse-submodules honors sparse checkout in submodule' '
131136
cat >expect <<-EOF &&
132137
a:text
133138
sub/B/b:text
139+
sub2/a:text
134140
EOF
141+
git update-index --skip-worktree sub2 &&
135142
git grep --recurse-submodules "text" >actual &&
136143
test_cmp expect actual
137144
'

0 commit comments

Comments
 (0)