Skip to content

CoordinatorTests.testDiscoveryUsesNodesFromLastClusterState test failure #41967

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
benwtrent opened this issue May 8, 2019 · 1 comment · Fixed by #42504
Closed

CoordinatorTests.testDiscoveryUsesNodesFromLastClusterState test failure #41967

benwtrent opened this issue May 8, 2019 · 1 comment · Fixed by #42504
Assignees
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >test-failure Triaged test failures from CI v7.2.0

Comments

@benwtrent
Copy link
Member

Reproduces locally

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+multijob-unix-compatibility/os=debian-8/155/consoleFull

Failure:

13:54:46 org.elasticsearch.cluster.coordination.CoordinatorTests > testDiscoveryUsesNodesFromLastClusterState FAILED
13:54:46     java.lang.AssertionError: node1 has applied its state 
13:54:46     Expected: <606L>
13:54:46          but: was <605L>
13:54:46         at __randomizedtesting.SeedInfo.seed([567D9E1ADC657714:3060AB7CA14A7F0B]:0)
13:54:46         at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
13:54:46         at org.junit.Assert.assertThat(Assert.java:956)
13:54:46         at org.elasticsearch.cluster.coordination.CoordinatorTests$Cluster.stabilise(CoordinatorTests.java:1524)
13:54:46         at org.elasticsearch.cluster.coordination.CoordinatorTests$Cluster.stabilise(CoordinatorTests.java:1505)
13:54:46         at org.elasticsearch.cluster.coordination.CoordinatorTests.testDiscoveryUsesNodesFromLastClusterState(CoordinatorTests.java:1086)

Reproduce:

 ./gradlew :server:test --tests "org.elasticsearch.cluster.coordination.CoordinatorTests.testDiscoveryUsesNodesFromLastClusterState" -Dtests.seed=567D9E1ADC657714 -Dtests.security.manager=true -Dtests.locale=es-US -Dtests.timezone=Chile/EasterIsland -Dcompiler.java=12 -Druntime.java=8

Seems to only fail on 7.x branch. Verified that master without the java=8 flag it passed just fine.

@benwtrent benwtrent added >test-failure Triaged test failures from CI :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. v7.2.0 labels May 8, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

benwtrent added a commit that referenced this issue May 8, 2019
benwtrent added a commit that referenced this issue May 8, 2019
seut added a commit to crate/crate that referenced this issue May 24, 2019
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue May 24, 2019
Today the default stabilisation time is calculated on the assumption that the
elected master has no pending tasks to process when it is elected, but this is
not a safe assumption to make. This can result in a cluster reaching the end of
its stabilisation time without having stabilised. Furthermore in elastic#36943 we
increased the probability that each step in `runRandomly()` enqueues another
task, vastly increasing the chance that we hit such a situation.

This change extends the stabilisation process to allow time for all pending
tasks, plus a task that might currently be in flight.

Fixes elastic#41967, in which the master entered the stabilisation phase with over 800
tasks to process.
mergify bot pushed a commit to crate/crate that referenced this issue May 24, 2019
DaveCTurner added a commit that referenced this issue May 24, 2019
Today the default stabilisation time is calculated on the assumption that the
elected master has no pending tasks to process when it is elected, but this is
not a safe assumption to make. This can result in a cluster reaching the end of
its stabilisation time without having stabilised. Furthermore in #36943 we
increased the probability that each step in `runRandomly()` enqueues another
task, vastly increasing the chance that we hit such a situation.

This change extends the stabilisation process to allow time for all pending
tasks, plus a task that might currently be in flight.

Fixes #41967, in which the master entered the stabilisation phase with over 800
tasks to process.
DaveCTurner added a commit that referenced this issue May 24, 2019
Today the default stabilisation time is calculated on the assumption that the
elected master has no pending tasks to process when it is elected, but this is
not a safe assumption to make. This can result in a cluster reaching the end of
its stabilisation time without having stabilised. Furthermore in #36943 we
increased the probability that each step in `runRandomly()` enqueues another
task, vastly increasing the chance that we hit such a situation.

This change extends the stabilisation process to allow time for all pending
tasks, plus a task that might currently be in flight.

Fixes #41967, in which the master entered the stabilisation phase with over 800
tasks to process.
DaveCTurner added a commit that referenced this issue May 24, 2019
Today the default stabilisation time is calculated on the assumption that the
elected master has no pending tasks to process when it is elected, but this is
not a safe assumption to make. This can result in a cluster reaching the end of
its stabilisation time without having stabilised. Furthermore in #36943 we
increased the probability that each step in `runRandomly()` enqueues another
task, vastly increasing the chance that we hit such a situation.

This change extends the stabilisation process to allow time for all pending
tasks, plus a task that might currently be in flight.

Fixes #41967, in which the master entered the stabilisation phase with over 800
tasks to process.
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
Today the default stabilisation time is calculated on the assumption that the
elected master has no pending tasks to process when it is elected, but this is
not a safe assumption to make. This can result in a cluster reaching the end of
its stabilisation time without having stabilised. Furthermore in elastic#36943 we
increased the probability that each step in `runRandomly()` enqueues another
task, vastly increasing the chance that we hit such a situation.

This change extends the stabilisation process to allow time for all pending
tasks, plus a task that might currently be in flight.

Fixes elastic#41967, in which the master entered the stabilisation phase with over 800
tasks to process.
henningandersen pushed a commit that referenced this issue Jun 10, 2019
Today the default stabilisation time is calculated on the assumption that the
elected master has no pending tasks to process when it is elected, but this is
not a safe assumption to make. This can result in a cluster reaching the end of
its stabilisation time without having stabilised. Furthermore in #36943 we
increased the probability that each step in `runRandomly()` enqueues another
task, vastly increasing the chance that we hit such a situation.

This change extends the stabilisation process to allow time for all pending
tasks, plus a task that might currently be in flight.

Fixes #41967, in which the master entered the stabilisation phase with over 800
tasks to process.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >test-failure Triaged test failures from CI v7.2.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants