-
Notifications
You must be signed in to change notification settings - Fork 643
Jepsen: lost write due to intersecting transactions both committing #2885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
One more clue, the distributed tx likely tried to execute before |
Searched history for other operations with versions
Need to check whether repeatable read/write are promoted correctly in this case. |
Looks like the bug is in |
On the other hand repeatable snapshot reads are supposed to include all conflicting changes at or below the given version, so it was supposed to wait until 281482960303899 is executed (otherwise supposedly repeatable snapshot would change after it commits, it was even reading the same key). It's unclear what was the exact order though. |
There could also be a race between plan step handling and plan queue starting the operation. When we choose versions we look at transaction queue, however it is possible that the front of the queue in not in active ops yet. In that case we won't be able to add a dependency to something that doesn't exist in dependency tracker yet. This would also explain why this is relatively rare and needed many days to trigger. |
A second case, where
Notice how the read is for an unrelated key, technically it doesn't have to wait for a distributed tx with |
I've been running Jepsen over the weekend, and got a couple of weird failures. I've tried to investigate this case so far, formatted for readability:
Here's the max suffix read of the key 198:
It is kinda visible, that the append of 667 in T2 was lost, even though ydb replied with success.
I searched the history and found these appends surrounding 667, sorted by commit timestamp:
Note there have been no shard restarts or anything like that. Appends of 666 and 667 definitely intersect (read snapshot vs commit version), only one of them was supposed to succeed, yet both have
:type :ok
.Even more interesting are shard logs on these operations:
Judging by "restored its data" timestamps we have append 667 finishing first (with write version
v1710698143041/18446744073709551615
), then we have append 666 finishing second (with write versionv1710698143041/281482960303899
), and then append 670 acquiring locks after both transactions have committed.What we have here is append 667 (immediate) first running before append 666 (distributed) was prepared, at
992014Z
(probably hit a page fault or something else). Then at994305Z
byLoadTxDetails
message we can infer the distributed tx got toBuildAndWaitDependencies
, and it released its data at994421Z
because it either conflicted with something, or tried executing and page faulted. Now append 667 restores its data at994933Z
and likely finishes executing. Then finally append 666 restores its data at995264Z
and succeeds as well.Now append of 667 is supposed to break conflicting locks, it likely does so at version
v1710698143041/18446744073709551615
. Then append of 666 runs and sees that its lock is not broken atv1710698143041/281482960303899
, so it executes as well.What's unclear is how this reversal happened. We are supposed to choose immediate write version that sticks to the front of the next version in the transaction queue. It's possible we executed something out-of-order, but in that case we should have marked distributed transactions as "logically complete", which would in turn should have promoted immediate conflicts, and we are supposed to have had a hard dependency of 281482960303890 on 281482960303899, but it clearly didn't happen for some reason.
The text was updated successfully, but these errors were encountered: