Simplify ClusterStateUpdateTask Timeout Handling #64117

original-brownbear · 2020-10-25T19:59:05Z

It's confusing and slightly error prone (see #64116) to handle the timeouts
via overrides but the priority via a field. This simplifies the code to to avoid future
issues and save over 100 LOC.

Also this fixes a bug in TransportVotingConfigExclusionsAction where trying to instantiate a time value with a negative time could throw and unexpected exception and as a result leak a listener.

It's confusing and slightly error prone (see elastic#64116) to handle the timeouts via overrides but the priority via a field. This simplifies the code to to avoid future issues and save over 100 LOC.

elasticmachine · 2020-10-25T19:59:07Z

Pinging @elastic/es-distributed (:Distributed/Cluster Coordination)

…out-handling

…-update-timeout-handling

henningandersen

LGTM, thanks @original-brownbear

henningandersen · 2020-10-28T11:10:18Z

...ticsearch/action/admin/cluster/configuration/TransportClearVotingConfigExclusionsAction.java

@@ -105,7 +105,8 @@ public void onTimeout(TimeValue timeout) {

    private void submitClearVotingConfigExclusionsTask(ClearVotingConfigExclusionsRequest request, long startTimeMillis,
                                                       ActionListener<ClearVotingConfigExclusionsResponse> listener) {
-        clusterService.submitStateUpdateTask("clear-voting-config-exclusions", new ClusterStateUpdateTask(Priority.URGENT) {
+        clusterService.submitStateUpdateTask("clear-voting-config-exclusions", new ClusterStateUpdateTask(Priority.URGENT,
+                TimeValue.timeValueMillis(request.getTimeout().millis() + startTimeMillis - threadPool.relativeTimeInMillis())) {


This seems unsafe in that it could be negative and cause an exception. Your change reduces the risk of that, but I wonder if we should fix it in this same PR?

I pushed 5897ae9, that should give us a consistent (as in same as when the update would timeout on the queue) exception here I think.

…out-handling

henningandersen

Small comment only.

henningandersen · 2020-10-28T12:09:19Z

...ticsearch/action/admin/cluster/configuration/TransportClearVotingConfigExclusionsAction.java

@@ -105,7 +106,13 @@ public void onTimeout(TimeValue timeout) {

    private void submitClearVotingConfigExclusionsTask(ClearVotingConfigExclusionsRequest request, long startTimeMillis,
                                                       ActionListener<ClearVotingConfigExclusionsResponse> listener) {
-        clusterService.submitStateUpdateTask("clear-voting-config-exclusions", new ClusterStateUpdateTask(Priority.URGENT) {
+        final long timeout = request.getTimeout().millis() + startTimeMillis - threadPool.relativeTimeInMillis();


I wonder if we should just do Math.max(0, timeout) instead, just like in cluster health action? Seems like the right place to handle timeout=0 is inside submitStateUpdateTask, rather than special handle it here?

++ I pushed 91e5fb3

henningandersen

LGTM. One last ask: can you update the issue description to mention the bug that this fixes?

original-brownbear · 2020-10-28T13:26:41Z

Thanks Henning, I updated the description :)

It's confusing and slightly error prone (see elastic#64116) to handle the timeouts via overrides but the priority via a field. This simplifies the code to to avoid future issues and save over 100 LOC. Also this fixes a bug in `TransportVotingConfigExclusionsAction` where trying to instantiate a time value with a negative time could throw and unexpected exception and as a result leak a listener.

It's confusing and slightly error prone (see #64116) to handle the timeouts via overrides but the priority via a field. This simplifies the code to to avoid future issues and save over 100 LOC. Also this fixes a bug in `TransportVotingConfigExclusionsAction` where trying to instantiate a time value with a negative time could throw and unexpected exception and as a result leak a listener.

Simplify ClusterStateUpdateTask Timeout Handling

7a7596c

It's confusing and slightly error prone (see elastic#64116) to handle the timeouts via overrides but the priority via a field. This simplifies the code to to avoid future issues and save over 100 LOC.

original-brownbear added >non-issue :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.11.0 labels Oct 25, 2020

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 25, 2020

fix test

a536e26

original-brownbear mentioned this pull request Oct 26, 2020

Fix Broken Clone Snapshot CS Update #64116

Merged

original-brownbear added 2 commits October 27, 2020 10:24

Merge remote-tracking branch 'elastic/master' into fix-cs-update-time…

44edd03

…out-handling

Merge branch 'master' of github.com:elastic/elasticsearch into fix-cs…

80210bd

…-update-timeout-handling

original-brownbear requested review from fcofdez and henningandersen October 27, 2020 10:18

henningandersen approved these changes Oct 28, 2020

View reviewed changes

original-brownbear added 2 commits October 28, 2020 12:41

Merge remote-tracking branch 'elastic/master' into fix-cs-update-time…

e38b51d

…out-handling

CR: fix negative timeval bug

5897ae9

original-brownbear requested a review from henningandersen October 28, 2020 11:55

henningandersen reviewed Oct 28, 2020

View reviewed changes

fix

91e5fb3

original-brownbear added >bug and removed >non-issue labels Oct 28, 2020

henningandersen approved these changes Oct 28, 2020

View reviewed changes

original-brownbear merged commit ef4ea4a into elastic:master Oct 28, 2020

original-brownbear deleted the fix-cs-update-timeout-handling branch October 28, 2020 13:26

original-brownbear added the backport pending label Oct 28, 2020

original-brownbear mentioned this pull request Oct 28, 2020

Simplify ClusterStateUpdateTask Timeout Handling (#64117) #64313

Merged

original-brownbear removed the backport pending label Oct 28, 2020

original-brownbear restored the fix-cs-update-timeout-handling branch December 6, 2020 19:00

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify ClusterStateUpdateTask Timeout Handling #64117

Simplify ClusterStateUpdateTask Timeout Handling #64117

Uh oh!

original-brownbear commented Oct 25, 2020 •

edited

Loading

Uh oh!

elasticmachine commented Oct 25, 2020

Uh oh!

henningandersen left a comment

Uh oh!

henningandersen Oct 28, 2020

Uh oh!

original-brownbear Oct 28, 2020

Uh oh!

henningandersen left a comment

Uh oh!

henningandersen Oct 28, 2020

Uh oh!

original-brownbear Oct 28, 2020

Uh oh!

henningandersen left a comment

Uh oh!

original-brownbear commented Oct 28, 2020

Uh oh!

Uh oh!

Simplify ClusterStateUpdateTask Timeout Handling #64117

Simplify ClusterStateUpdateTask Timeout Handling #64117

Uh oh!

Conversation

original-brownbear commented Oct 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Oct 25, 2020

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

henningandersen Oct 28, 2020

Choose a reason for hiding this comment

Uh oh!

original-brownbear Oct 28, 2020

Choose a reason for hiding this comment

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

henningandersen Oct 28, 2020

Choose a reason for hiding this comment

Uh oh!

original-brownbear Oct 28, 2020

Choose a reason for hiding this comment

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Oct 28, 2020

Uh oh!

Uh oh!

original-brownbear commented Oct 25, 2020 •

edited

Loading