Skip to content

Prevent TransportReplicationAction to route request based on stale local routing table #16274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 2, 2016

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Jan 27, 2016

Relates to #12573

When relocating a primary shard, there is a cluster state update at the end of relocation where the active primary is switched from the relocation source to the relocation target. If relocation source receives and processes this cluster state before the relocation target, there is a time span where relocation source believes active primary to be on relocation target and relocation target believes active primary to be on relocation source. This results in index/delete/flush requests being sent back and forth and can end in an OOM on both nodes.

This PR adds a field to the index/delete/flush request that helps detect the case where we locally have stale routing information. In case this staleness is detected, we wait until we have received an up-to-date cluster state before rerouting the request.

I have included the test from #12574 in this PR to demonstrate the fix in an integration test. That integration test will not be part of the final commit, however.

@ywelsch ywelsch added review :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. labels Jan 27, 2016
@ywelsch
Copy link
Contributor Author

ywelsch commented Jan 27, 2016

@bleskes instead of using the cluster state version, we could as well use the index metadata version. The index metadata version is updated whenever a new shard is started (thanks to active allocation ids). wdyt?

On a related note, we could use this field as well to wait for dynamic mapping updates to be applied. (for that the update mappings api would have to return the current index metadata version).

} else {
// chasing the node with the active primary for a second hop requires that we are at least up-to-date with the current cluster state version
// this prevents redirect loops between two nodes when a primary was relocated and the relocation target is not aware that it is the active primary shard already.
request.minimumClusterStateVersion(state.version());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we assert that the version is not set?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got confused. It actually likely the previous routing node has an older cluster state in which we should override the existing value.. nevermind

@ywelsch
Copy link
Contributor Author

ywelsch commented Feb 1, 2016

@bleskes renamed the field and removed integration test.

@@ -155,6 +157,16 @@ public final Request consistencyLevel(WriteConsistencyLevel consistencyLevel) {
return (Request) this;
}

@SuppressWarnings("unchecked")
Request routedBasedOnClusterVersion(long routedBasedOnClusterVersion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add some java docs here?

@bleskes
Copy link
Contributor

bleskes commented Feb 1, 2016

LGTM . Thanks @ywelsch - Left some minor comments, no need for another cycle.

@ywelsch ywelsch force-pushed the fix/endless-index-loop branch from 397b6dc to 96be074 Compare February 1, 2016 18:31
@ywelsch ywelsch force-pushed the fix/endless-index-loop branch from 96be074 to af1f637 Compare February 2, 2016 12:57
ywelsch pushed a commit that referenced this pull request Feb 2, 2016
Prevent TransportReplicationAction to route request based on stale local routing table
@ywelsch ywelsch merged commit c5a6ddf into elastic:master Feb 2, 2016
ywelsch pushed a commit to ywelsch/elasticsearch that referenced this pull request Jul 7, 2016
ywelsch added a commit that referenced this pull request Jul 7, 2016
…cal routing table (#19296)

When relocating a primary shard, there is a cluster state update at the end of relocation where the active primary is switched from the relocation source to the relocation target. If relocation source receives and processes this cluster state before the relocation target, there is a time span where relocation source believes active primary to be on relocation target and relocation target believes active primary to be on relocation source. This results in index/delete/flush requests being sent back and forth and can end in an OOM on both nodes.

Backport of #16274 to 2.4.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v5.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants