Fix two separate but related problems with watch retry handling #338
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains two commits which each fix an aspect of retry handling with WATCH/MULTI.
First patch: 9546682
Second patch: 99de3d3
If we have a WATCH, then the MULTI must exec on exactly that node; it should not be allowed for that to be redirected to a different node, because then the MULTI isn't on the same connection as the WATCH anymore!
The first part of this patch fixes this by disabling redirection handling in transaction.rb if we are in a watch. I also added a test test_the_state_of_cluster_resharding_with_reexecuted_watch for this.
That means that the user block is also re-executed in this case; that's actually what we want. If WATCH is re-executed, then it's also vital that the user code which does redis reads is also re-executed, so that the code can make the decision about what to put in the transaction again (based on potentially updated information).
However, the second part of this patch is a lot trickier...
This change causes a different test, test_the_state_of_cluster_resharding_with_transaction_and_watch, to
break. That test is asserting that
Disabling redirection handling obviously makes this stop working, and it is possible to handle this case correctly. We need to record whether or not we had to issue an ASKING on the WATCH for the transaction, and if so, pre-emptively issue an ASKING on the MULTI too. That's because this slot is not yet actually assigned to the node we're connected to (it's IMPORTING).
It may well not be worth it, and I'm also OK with just failing WATCH/MULTI on slots which are currently migrating. That would imply: