Skip to content

[CI] XPackRestIT test {p0=ml/inference_crud/Test force delete given model referenced by pipeline} failing #80703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
droberts195 opened this issue Nov 15, 2021 · 9 comments · Fixed by #108202
Assignees
Labels
medium-risk An open issue or test failure that is a medium risk to future releases :ml Machine learning Team:ML Meta label for the ML team >test-failure Triaged test failures from CI

Comments

@droberts195
Copy link
Contributor

Build scan:
https://gradle-enterprise.elastic.co/s/5jx7stxjd4d36/tests/:x-pack:plugin:yamlRestTest/org.elasticsearch.xpack.test.rest.XPackRestIT/test%20%7Bp0=ml%2Finference_crud%2FTest%20force%20delete%20given%20model%20referenced%20by%20pipeline%7D

Reproduction line:
./gradlew ':x-pack:plugin:yamlRestTest' --tests "org.elasticsearch.xpack.test.rest.XPackRestIT.test {p0=ml/inference_crud/Test force delete given model referenced by pipeline}" -Dtests.seed=74755A3DAB1FE611 -Dtests.locale=es-UY -Dtests.timezone=Asia/Taipei -Druntime.java=17

Applicable branches:
master

Reproduces locally?:
No

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.xpack.test.rest.XPackRestIT&tests.test=test%20%7Bp0%3Dml/inference_crud/Test%20force%20delete%20given%20model%20referenced%20by%20pipeline%7D

Failure excerpt:

org.elasticsearch.client.ResponseException: method [GET], host [http://127.0.0.1:39914], URI [/_ml/trained_models/_stats?size=10000], status line [HTTP/1.1 500 Internal Server Error]
{"error":{"root_cause":[{"type":"no_shard_available_action_exception","reason":"[yamlRestTest-0][127.0.0.1:36340][indices:data/read/search[phase/query]]","index_uuid":"SGzsn9YWTXmqAmiTbax72A","shard":"0","index":".ml-stats-000001"}],"type":"exception","reason":"Searching for stats for models [h-classification-model,d-classification-model,c-classification-model,g-classification-model,i-classification-model,b-classification-model,lang_ident_model_1,y-classification-model,f-classification-model,k-classification-model,j-classification-model,z-classification-model,a-regression-model-1,a-regression-model-0,e-classification-model] failed","caused_by":{"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":".ml-stats-000001","node":"Hy8VS95RQHqgQxx__Xnk5A","reason":{"type":"no_shard_available_action_exception","reason":"[yamlRestTest-0][127.0.0.1:36340][indices:data/read/search[phase/query]]","index_uuid":"SGzsn9YWTXmqAmiTbax72A","shard":"0","index":".ml-stats-000001"}}]}},"status":500}

  at __randomizedtesting.SeedInfo.seed([74755A3DAB1FE611:FC2165E705E38BE9]:0)
  at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:335)
  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:301)
  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:276)
  at org.elasticsearch.xpack.core.ml.integration.MlRestTestStateCleaner.deleteAllTrainedModelIngestPipelines(MlRestTestStateCleaner.java:43)
  at org.elasticsearch.xpack.core.ml.integration.MlRestTestStateCleaner.resetFeatures(MlRestTestStateCleaner.java:34)
  at org.elasticsearch.xpack.test.rest.AbstractXPackRestTest.clearMlState(AbstractXPackRestTest.java:116)
  at org.elasticsearch.xpack.test.rest.AbstractXPackRestTest.cleanup(AbstractXPackRestTest.java:100)
  at jdk.internal.reflect.GeneratedMethodAccessor14.invoke(null:-1)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:568)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
  at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:824)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:475)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:831)
  at java.lang.Thread.run(Thread.java:833)

@droberts195 droberts195 added :ml Machine learning >test-failure Triaged test failures from CI labels Nov 15, 2021
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Nov 15, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@droberts195
Copy link
Contributor Author

Once again it's the dreaded "all shards failed" on an index that's just been created: #65846.

@henningandersen
Copy link
Contributor

I think this master build failure is the same dreaded cause: https://gradle-enterprise.elastic.co/s/rtjxk5ybsifba

@droberts195
Copy link
Contributor Author

Muted on master by #81093

@davidkyle
Copy link
Member

inference_crud/Test delete given model with alias referenced by pipeline is failing with the same error:

https://gradle-enterprise.elastic.co/s/ijqm2nxnjrzpy
https://gradle-enterprise.elastic.co/s/7mixgsrvzecas

Muted in #81580

elasticsearchmachine pushed a commit that referenced this issue Dec 9, 2021
@droberts195
Copy link
Contributor Author

inference_crud/Test delete given model referenced by pipeline is also affected: https://gradle-enterprise.elastic.co/s/nze7lhnhw765u

I'll mute that too

droberts195 added a commit to droberts195/elasticsearch that referenced this issue Dec 20, 2021
benwtrent added a commit that referenced this issue Jun 28, 2023
#97179)

There is a common issue with the ML test cleanup code where we grab model stats before taking up a cleaning action.

However, in grabbing those stats the search fails because the index was just recently created.

This moves the yaml tests as they existed (pretty much line for line) into a single node rest test.

I also mute the yaml tests instead of simply deleting them (as I would prefer having those).

related to: #80703
@droberts195
Copy link
Contributor Author

Assigning medium-risk due to loss of test coverage from muting.

@droberts195 droberts195 added the medium-risk An open issue or test failure that is a medium risk to future releases label Oct 10, 2023
afoucret added a commit to afoucret/elasticsearch that referenced this issue Nov 24, 2023
@maxhniebergall
Copy link
Contributor

It seems like this issue is still blocked by the underlying issue: #65846

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
medium-risk An open issue or test failure that is a medium risk to future releases :ml Machine learning Team:ML Meta label for the ML team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants