Skip to content

Simplify ClusterStateUpdateTask Timeout Handling #64117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

original-brownbear
Copy link
Contributor

@original-brownbear original-brownbear commented Oct 25, 2020

It's confusing and slightly error prone (see #64116) to handle the timeouts
via overrides but the priority via a field. This simplifies the code to to avoid future
issues and save over 100 LOC.

Also this fixes a bug in TransportVotingConfigExclusionsAction where trying to instantiate a time value with a negative time could throw and unexpected exception and as a result leak a listener.

It's confusing and slightly error prone (see elastic#64116) to handle the timeouts
via overrides but the priority via a field. This simplifies the code to to avoid future
issues and save over 100 LOC.
@original-brownbear original-brownbear added >non-issue :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.11.0 labels Oct 25, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Cluster Coordination)

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 25, 2020
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @original-brownbear

@@ -105,7 +105,8 @@ public void onTimeout(TimeValue timeout) {

private void submitClearVotingConfigExclusionsTask(ClearVotingConfigExclusionsRequest request, long startTimeMillis,
ActionListener<ClearVotingConfigExclusionsResponse> listener) {
clusterService.submitStateUpdateTask("clear-voting-config-exclusions", new ClusterStateUpdateTask(Priority.URGENT) {
clusterService.submitStateUpdateTask("clear-voting-config-exclusions", new ClusterStateUpdateTask(Priority.URGENT,
TimeValue.timeValueMillis(request.getTimeout().millis() + startTimeMillis - threadPool.relativeTimeInMillis())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unsafe in that it could be negative and cause an exception. Your change reduces the risk of that, but I wonder if we should fix it in this same PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 5897ae9, that should give us a consistent (as in same as when the update would timeout on the queue) exception here I think.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment only.

@@ -105,7 +106,13 @@ public void onTimeout(TimeValue timeout) {

private void submitClearVotingConfigExclusionsTask(ClearVotingConfigExclusionsRequest request, long startTimeMillis,
ActionListener<ClearVotingConfigExclusionsResponse> listener) {
clusterService.submitStateUpdateTask("clear-voting-config-exclusions", new ClusterStateUpdateTask(Priority.URGENT) {
final long timeout = request.getTimeout().millis() + startTimeMillis - threadPool.relativeTimeInMillis();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just do Math.max(0, timeout) instead, just like in cluster health action? Seems like the right place to handle timeout=0 is inside submitStateUpdateTask, rather than special handle it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I pushed 91e5fb3

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One last ask: can you update the issue description to mention the bug that this fixes?

@original-brownbear
Copy link
Contributor Author

Thanks Henning, I updated the description :)

@original-brownbear original-brownbear merged commit ef4ea4a into elastic:master Oct 28, 2020
@original-brownbear original-brownbear deleted the fix-cs-update-timeout-handling branch October 28, 2020 13:26
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Oct 28, 2020
It's confusing and slightly error prone (see elastic#64116) to handle the timeouts
via overrides but the priority via a field. This simplifies the code to to avoid future
issues and save over 100 LOC.

Also this fixes a bug in `TransportVotingConfigExclusionsAction` where trying to instantiate a time value with a negative time could throw and unexpected exception and as a result leak a listener.
original-brownbear added a commit that referenced this pull request Oct 29, 2020
It's confusing and slightly error prone (see #64116) to handle the timeouts
via overrides but the priority via a field. This simplifies the code to to avoid future
issues and save over 100 LOC.

Also this fixes a bug in `TransportVotingConfigExclusionsAction` where trying to instantiate a time value with a negative time could throw and unexpected exception and as a result leak a listener.
@original-brownbear original-brownbear restored the fix-cs-update-timeout-handling branch December 6, 2020 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants