Skip to content

[CI] org.elasticsearch.client.TasksIT testGetInvalidTask fails sporadically on master #36001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jkakavas opened this issue Nov 28, 2018 · 5 comments
Assignees
Labels
>test-failure Triaged test failures from CI

Comments

@jkakavas
Copy link
Member

Failed in a build today
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=opensuse/86/console

and in an (unrelated) PR build yesterday
#35648 (comment)

REPRODUCE WITH: ./gradlew :client:rest-high-level:integTestRunner \
  -Dtests.seed=3F5A7BB7634DF43F \
  -Dtests.class=org.elasticsearch.client.TasksIT \
  -Dtests.method="testGetInvalidTask" \
  -Dtests.security.manager=true \
  -Dtests.locale=sr-Latn-ME \
  -Dtests.timezone=Europe/Astrakhan \
  -Dcompiler.java=11 \
  -Druntime.java=8

It didn't reproduce locally in 4096 iterations.

Stacktrace ElasticsearchStatusException[Elasticsearch exception [type=no_shard_available_action_exception, reason=No shard available for [get [.tasks][task][doesNotExistNodeName:123]: routing [null]]]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_index_shard_state_exception, reason=CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED]]]; at __randomizedtesting.SeedInfo.seed([3F5A7BB7634DF43F:818AEE3E35CAF9FC]:0) at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177) at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1703) at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1679) at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseOptionalEntity(RestHighLevelClient.java:1476) at org.elasticsearch.client.TasksClient.get(TasksClient.java:84) at org.elasticsearch.client.ESRestHighLevelClientTestCase.execute(ESRestHighLevelClientTestCase.java:79) at org.elasticsearch.client.TasksIT.testGetInvalidTask(TasksIT.java:121) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) at java.lang.Thread.run(Thread.java:748) Suppressed: org.elasticsearch.client.ResponseException: method [GET], host [http://[::1]:38982], URI [/_tasks/doesNotExistNodeName:123?wait_for_completion=false], status line [HTTP/1.1 503 Service Unavailable] {"error":{"root_cause":[{"type":"illegal_index_shard_state_exception","reason":"CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED]","index_uuid":"YZxZgHqqSu-94s8M8mh9qA","shard":"0","index":".tasks"}],"type":"no_shard_available_action_exception","reason":"No shard available for [get [.tasks][task][doesNotExistNodeName:123]: routing [null]]","caused_by":{"type":"illegal_index_shard_state_exception","reason":"CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED]","index_uuid":"YZxZgHqqSu-94s8M8mh9qA","shard":"0","index":".tasks"}},"status":503} at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:687) at org.elasticsearch.client.RestClient.performRequest(RestClient.java:218) at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseOptionalEntity(RestHighLevelClient.java:1471) ... 40 more Caused by: org.elasticsearch.client.ResponseException: method [GET], host [http://[::1]:38982], URI [/_tasks/doesNotExistNodeName:123?wait_for_completion=false], status line [HTTP/1.1 503 Service Unavailable] {"error":{"root_cause":[{"type":"illegal_index_shard_state_exception","reason":"CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED]","index_uuid":"YZxZgHqqSu-94s8M8mh9qA","shard":"0","index":".tasks"}],"type":"no_shard_available_action_exception","reason":"No shard available for [get [.tasks][task][doesNotExistNodeName:123]: routing [null]]","caused_by":{"type":"illegal_index_shard_state_exception","reason":"CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED]","index_uuid":"YZxZgHqqSu-94s8M8mh9qA","shard":"0","index":".tasks"}},"status":503} at org.elasticsearch.client.RestClient$1.completed(RestClient.java:307) at org.elasticsearch.client.RestClient$1.completed(RestClient.java:292) at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:119) at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177) at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436) at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326) at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588) ... 1 more Caused by: [.tasks/YZxZgHqqSu-94s8M8mh9qA][[.tasks][0]] ElasticsearchException[Elasticsearch exception [type=illegal_index_shard_state_exception, reason=CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED]]] at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:491) at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:402) at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:432) at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:598) at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:169) ... 43 more
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@danielmitterdorfer
Copy link
Member

This is failing again on 6.x in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-unix-compatibility/os=centos/96/consoleFull with (see stack trace for more details):

CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED]

Reproduction line:

REPRODUCE WITH: ./gradlew :client:rest-high-level:integTestRunner \
  -Dtests.seed=EAA6597678B5DE87 \
  -Dtests.class=org.elasticsearch.client.TasksIT \
  -Dtests.method="testGetInvalidTask" \
  -Dtests.security.manager=true \
  -Dtests.locale=sr-Latn-ME \
  -Dtests.timezone=America/Porto_Velho \
  -Dcompiler.java=11 \
  -Druntime.java=8
Stack trace
07:44:59    > Throwable #1: ElasticsearchStatusException[Elasticsearch exception [type=no_shard_available_action_exception, reason=No shard available for [get [.tasks][task][doesNotExistNodeName:123]: routing [null]]]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_index_shard_state_exception, reason=CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED]]];
07:44:59    > 	at __randomizedtesting.SeedInfo.seed([EAA6597678B5DE87:5476CCFF2E32D344]:0)
07:44:59    > 	at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177)
07:44:59    > 	at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:2042)
07:44:59    > 	at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:2018)
07:44:59    > 	at org.elasticsearch.client.RestHighLevelClient$2.onFailure(RestHighLevelClient.java:1993)
07:44:59    > 	at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onDefinitiveFailure(RestClient.java:850)
07:44:59    > 	at org.elasticsearch.client.RestClient$1.retryIfPossible(RestClient.java:588)
07:44:59    > 	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:550)
07:44:59    > 	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:531)
07:44:59    > 	at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:119)
07:44:59    > 	at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177)
07:44:59    > 	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436)
07:44:59    > 	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326)
07:44:59    > 	at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
07:44:59    > 	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
07:44:59    > 	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
07:44:59    > 	at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
07:44:59    > 	at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
07:44:59    > 	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
07:44:59    > 	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
07:44:59    > 	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
07:44:59    > 	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
07:44:59    > 	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
07:44:59    > 	at java.lang.Thread.run(Thread.java:748)
07:44:59    > 	Suppressed: org.elasticsearch.client.ResponseException: method [GET], host [http://[::1]:39814], URI [/_tasks/doesNotExistNodeName:123?wait_for_completion=false], status line [HTTP/1.1 503 Service Unavailable]
07:44:59    > {"error":{"root_cause":[{"type":"illegal_index_shard_state_exception","reason":"CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED]","index_uuid":"gWoZj4bWR12DZMhpSR2vWw","shard":"0","index":".tasks"}],"type":"no_shard_available_action_exception","reason":"No shard available for [get [.tasks][task][doesNotExistNodeName:123]: routing [null]]","caused_by":{"type":"illegal_index_shard_state_exception","reason":"CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED]","index_uuid":"gWoZj4bWR12DZMhpSR2vWw","shard":"0","index":".tasks"}},"status":503}
07:44:59    > 		at org.elasticsearch.client.RestClient$1.completed(RestClient.java:546)
07:44:59    > 		... 16 more
07:44:59    > Caused by: [.tasks/gWoZj4bWR12DZMhpSR2vWw][[.tasks][0]] ElasticsearchException[Elasticsearch exception [type=illegal_index_shard_state_exception, reason=CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED]]]
07:44:59    > 	at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:509)
07:44:59    > 	at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:420)
07:44:59    > 	at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:450)
07:44:59    > 	at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:616)
07:44:59    > 	at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:169)
07:44:59    > 	... 22 more

This does not reproduce locally. First of all, I think the reason is that we need to remove the restriction on Dtests.method and run all tests in this class. In the build output we can see:

07:44:59   1> [2018-12-04T02:44:58,342][INFO ][o.e.c.TasksIT            ] [testGetValidTask] before test
07:44:59   1> [2018-12-04T02:44:58,344][INFO ][o.e.c.TasksIT            ] [testGetValidTask] initializing REST clients against [http://[::1]:39814]
07:44:59   1> [2018-12-04T02:44:58,674][INFO ][o.e.c.TasksIT            ] [testGetValidTask] There are still tasks running after this test that might break subsequent tests [indices:admin/create, indices:data/write/reindex].
07:44:59   1> [2018-12-04T02:44:58,674][INFO ][o.e.c.TasksIT            ] [testGetValidTask] after test
07:44:59   1> [2018-12-04T02:44:58,677][INFO ][o.e.c.TasksIT            ] [testGetInvalidTask] before test
07:44:59   1> [2018-12-04T02:44:58,805][INFO ][o.e.c.TasksIT            ] [testGetInvalidTask] There are still tasks running after this test that might break subsequent tests [indices:admin/create, indices:data/write/index, indices:data/write/reindex].
07:44:59   1> [2018-12-04T02:44:58,805][INFO ][o.e.c.TasksIT            ] [testGetInvalidTask] after test

As there are warnings about running tasks in testGetValidTask, I wonder whether a waitForPendingTasks() invocation is missing at the end of #testGetValidTask()?

@markharwood as you have recently added these tests, can you please have a closer look?

@andyb-elastic
Copy link
Contributor

Ran this test class in a loop a couple hundred times and wasn't able to reproduce. It looks like there was a about a month-long period where a test in this class failed with that stacktrace since it was added in May 2018. There hasn't been a failure of this kind in this class since 17 December 2018

@markharwood
Copy link
Contributor

Ran this test in a loop on 6.7 and could not reproduce the error or the warnings about still-running tasks.

@hub-cap
Copy link
Contributor

hub-cap commented Aug 22, 2019

I went thru a bunch of old build emails and cannot find any failures in recent history so I am closing this. Feel free to open if these start occurring again.

@hub-cap hub-cap closed this as completed Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

6 participants