Skip to content

Master should wait on cluster state publication when failing a shard #15468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 16, 2015
Merged

Master should wait on cluster state publication when failing a shard #15468

merged 1 commit into from
Dec 16, 2015

Conversation

jasontedor
Copy link
Member

When a client sends a request to fail a shard to the master, the current
behavior is that the master will submit the cluster state update task
and then immediately send a successful response back to the client;
additionally, if there are any failures while processing the cluster
state update task to fail the shard, then the client will never be
notified of these failures.

This commit modifies the master behavior when handling requests to fail
a shard. In particular, the master will now wait until successful
publication of the cluster state update before notifying the request
client that the shard is marked as failed; additionally, the client is
now notified of any failures during the execution of the cluster state
update task.

Relates #14252

}

@Override
public void clusterStateProcessed(String source, ClusterState oldState, ClusterState newState) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this keeps bugging me :) we should something on the executor as well....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bleskes What keeps bugging you?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this through another channel; the issue is the possible reroutes are being issued for every task rather than once per batch. Modifying this behavior will require a little extra machinery in the cluster state task execution framework. I opened #15482 to address.

@bleskes
Copy link
Contributor

bleskes commented Dec 16, 2015

LGTM

When a client sends a request to fail a shard to the master, the current
behavior is that the master will submit the cluster state update task
and then immediately send a successful response back to the client;
additionally, if there are any failures while processing the cluster
state update task to fail the shard, then the client will never be
notified of these failures.

This commit modifies the master behavior when handling requests to fail
a shard. In particular, the master will now wait until successful
publication of the cluster state update before notifying the request
client that the shard is marked as failed; additionally, the client is
now notified of any failures during the execution of the cluster state
update task.

Relates #14252
jasontedor added a commit that referenced this pull request Dec 16, 2015
…d-failures

Master should wait on cluster state publication when failing a shard
@jasontedor jasontedor merged commit 3e8768f into elastic:master Dec 16, 2015
@jasontedor jasontedor deleted the master-side-of-wait-on-shard-failures branch December 16, 2015 15:39
@clintongormley clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. and removed :Cluster labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >enhancement resiliency v5.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants