Skip to content

ML REST test fails with NoShardAvailableActionException #66931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DaveCTurner opened this issue Jan 4, 2021 · 4 comments · Fixed by #67105
Closed

ML REST test fails with NoShardAvailableActionException #66931

DaveCTurner opened this issue Jan 4, 2021 · 4 comments · Fixed by #67105
Labels
:ml Machine learning >test-failure Triaged test failures from CI

Comments

@DaveCTurner
Copy link
Contributor

Build scan:

https://gradle-enterprise.elastic.co/s/5irmyjghkz5mc/console-log/raw?task=:x-pack:plugin:yamlRestTest

Repro line:

REPRODUCE WITH: ./gradlew ':x-pack:plugin:yamlRestTest' --tests "org.elasticsearch.xpack.test.rest.XPackRestIT.test {p0=ml/trained_model_cat_apis/Test cat trained models}" -Dtests.seed=E59A966331A469D9 -Dtests.security.manager=true -Dtests.locale=hr -Dtests.timezone=Greenwich -Druntime.java=8 -Dtests.rest.blacklist=getting_started/10_monitor_cluster_health/*

Reproduces locally?:

No, fails for different reasons for me locally.

Applicable branches:

Only seen on 7.x

Failure history:

Apparently just this one failure.

Failure excerpt:

Not sure what's relevant here, sorry.

org.elasticsearch.xpack.test.rest.XPackRestIT > test {p0=ml/trained_model_cat_apis/Test cat trained models} FAILED java.lang.AssertionError: Failure at [ml/trained_model_cat_apis:93]: expected [2xx] status code but api [cat.ml_trained_models] returned [500 Internal Server Error] [{"error":{"root_cause":[{"type":"no_shard_available_action_exception","reason":"[yamlRestTest-0][127.0.0.1:36254][indices:data/read/search[phase/query]]","index_uuid":"BeB3z_omTnuUIjJekil3jQ","shard":"0","index":".ml-stats-000001","stack_trace":"[.ml-stats-000001/BeB3z_omTnuUIjJekil3jQ][[.ml-stats-000001][0]] NoShardAvailableActionException[[yamlRestTest-0][127.0.0.1:36254][indices:data/read/search[phase/query]]]\n\tat org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:459)\n\tat org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:408)\n\tat org.elasticsearch.action.search.AbstractSearchAsyncAction.access$000(AbstractSearchAsyncAction.java:70)\n\tat org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:275)\n\tat

@DaveCTurner DaveCTurner added >test-failure Triaged test failures from CI :ml Machine learning labels Jan 4, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@davidkyle
Copy link
Member

The NoShardAvailableActionException exception occurs when searching .ml-stats-000001 which is performed with lenient expand open.

There is nothing in the test that should create the ml-stats-000001 index and the failed call is actually the 3rd call to the cat trained model API in the test.

This makes me think something is leaking into this test causing the creation of the index. The test that ran immediately before was ml/inference_stats_crud/Test get stats given expression without matches and allow_no_match is true.

It looks like we've found our culprit: inference_stats_crud may need to be rewritten as a rest test so we can wait for all the actions to finish

@droberts195
Copy link
Contributor

This sounds like it is another issue caused by the problem of #65846, which could affect users in production as well as tests. And I just noticed that #66853 has been opened to fix that. So maybe the days of intermittent ML test failures due to indices that get created as side effects of other actions are coming to an end 🤞

@davidkyle
Copy link
Member

That is good news!

We should fix these noisy neighbour tests anyway as they may cause hard to diagnose failures in future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants