Skip to content

[CI] Various tests in ShrinkIndexIT fail with "expected at least one master-eligible node left" #44164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
droberts195 opened this issue Jul 10, 2019 · 4 comments · Fixed by #44214
Assignees
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >test-failure Triaged test failures from CI

Comments

@droberts195
Copy link
Contributor

droberts195 commented Jul 10, 2019

This failure has occurred many times in the last 36 hours: https://build-stats.elastic.co/app/kibana#/discover?_g=(refreshInterval:(pause:!t,value:0),time:(from:now-30d,mode:relative,to:now))&_a=(columns:!(test,build-id,build_url,branch),filters:!(),index:e58bf320-7efd-11e8-bf69-63c8ef516157,interval:auto,query:(language:lucene,query:'class:%20%22org.elasticsearch.action.admin.indices.create.ShrinkIndexIT%22'),sort:!(time,desc))

Often the failing test is ShrinkIndexIT.testCreateShrinkWithIndexSort, see for example https://gradle.com/s/fglguuelmqobe

More recently a failure of ShrinkIndexIT.testShrinkCommitsMergeOnIdle was seen but with the same error message: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-1/2405/testReport/junit/org.elasticsearch.action.admin.indices.create/ShrinkIndexIT/testShrinkCommitsMergeOnIdle/

In both cases the error message is something like:

java.lang.AssertionError: expected at least one master-eligible node left in {node_sc1=org.elasticsearch.test.InternalTestCluster$NodeAndClient@2de106c0}

A REPRO command for 7.x is:

./gradlew :server:integTest --tests "org.elasticsearch.action.admin.indices.create.ShrinkIndexIT.testCreateShrinkWithIndexSort" -Dtests.seed=50AF68DFDA0BACBE -Dtests.security.manager=true -Dtests.locale=zh-TW -Dtests.timezone=America/North_Dakota/Center -Dcompiler.java=12 -Druntime.java=8

A REPRO command for master is:

./gradlew :server:integTest --tests "org.elasticsearch.action.admin.indices.create.ShrinkIndexIT.testShrinkCommitsMergeOnIdle" -Dtests.seed=C57BD4CF13C2F399 -Dtests.security.manager=true -Dtests.locale=fo -Dtests.timezone=Europe/Zagreb -Dcompiler.java=12 -Druntime.java=11

Neither of these reproduce locally for me.

I will mute the suite.

@droberts195 droberts195 added >test-failure Triaged test failures from CI :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. labels Jul 10, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

droberts195 added a commit that referenced this issue Jul 10, 2019
@droberts195
Copy link
Contributor Author

Muted on master in 2f4905f and on 7.x in cad804d

droberts195 added a commit that referenced this issue Jul 10, 2019
@original-brownbear original-brownbear self-assigned this Jul 10, 2019
@original-brownbear
Copy link
Member

I can easily reproduce this on master using seed -Dtests.seed=C57BD4CF13C2F399 and running the test suite on repeat in Idea (for by class). Trying to track this down now

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Jul 11, 2019
* Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived.
* Closes elastic#44164
pull bot pushed a commit to Pandinosaurus/elasticsearch that referenced this issue Jul 11, 2019
* Fix ShrinkIndexIT

* Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived.
* Closes elastic#44164
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Jul 11, 2019
* Fix ShrinkIndexIT

* Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived.
* Closes elastic#44164
original-brownbear added a commit that referenced this issue Jul 11, 2019
* Fix ShrinkIndexIT

* Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived.
* Closes #44164
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants