[ML] Fix PyTorchModelIT::testDeploymentStats #81161

davidkyle · 2021-11-30T17:20:13Z

PyTorchModelIT::testDeploymentStats has been failing in #80819 due to missing fields in the GET stats response. The problem is in the test as it has the wrong expectations about what is returned when a deployment is starting

The only way a stats response can be constructed like this is if there are no task responses from the individual nodes and only the nodes for started models are included in the GET stats request

https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportGetDeploymentStatsAction.java#L123

The task request will never be sent to a node hosting a single model in the starting state. This is by design as those responses are built on the co-ordinating node. The test passed most of the time because the response is valid if the model is started on at least 1 node.

As to why sometimes the 2nd GET stats call failed this is because the 2 ml nodes have different views of the trained model allocation, one node knows the model is started there but the other doesn't. GET stats is not a master node action but the responses will be eventually consistent.

Closes #80819

elasticmachine · 2021-11-30T17:20:17Z

Pinging @elastic/ml-core (Team:ML)

benwtrent

good catch!

elasticsearchmachine · 2021-12-01T08:32:24Z

💔 Backport failed

Status	Branch	Result
❌	8.0	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 81161

Adjusts the test's expectations about the information available when deployments are in the `starting` state

Can't test starting models

599791b

davidkyle added >test Issues or PRs that are addressing/adding tests :ml Machine learning v8.0.0 v8.1.0 labels Nov 30, 2021

elasticmachine added the Team:ML Meta label for the ML team label Nov 30, 2021

davidkyle changed the title ~~[ML] Can't test starting models~~ [ML] Fix PyTorchModelIT::testDeploymentStats Nov 30, 2021

davidkyle added the auto-backport-and-merge label Nov 30, 2021

spotless

a46bda3

benwtrent approved these changes Nov 30, 2021

View reviewed changes

davidkyle merged commit 29d17c0 into elastic:master Dec 1, 2021

davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Dec 1, 2021

[ML] Fix PyTorchModelIT::testDeploymentStats (elastic#81161)

3d5eabc

Adjusts the test's expectations about the information available when deployments are in the `starting` state

davidkyle mentioned this pull request Dec 1, 2021

[ML] Fix PyTorchModelIT::testDeploymentStats (#81161) #81202

Merged

elasticsearchmachine pushed a commit that referenced this pull request Dec 1, 2021

[ML] Fix PyTorchModelIT::testDeploymentStats (#81161) (#81202)

f36ff2e

Adjusts the test's expectations about the information available when deployments are in the `starting` state

mark-vieira added v8.0.0-rc1 and removed v8.0.0 labels Jan 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Fix PyTorchModelIT::testDeploymentStats #81161

[ML] Fix PyTorchModelIT::testDeploymentStats #81161

Uh oh!

davidkyle commented Nov 30, 2021 •

edited

Loading

Uh oh!

elasticmachine commented Nov 30, 2021

Uh oh!

benwtrent left a comment

Uh oh!

elasticsearchmachine commented Dec 1, 2021

Uh oh!

Uh oh!

[ML] Fix PyTorchModelIT::testDeploymentStats #81161

[ML] Fix PyTorchModelIT::testDeploymentStats #81161

Uh oh!

Conversation

davidkyle commented Nov 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Nov 30, 2021

Uh oh!

benwtrent left a comment

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Dec 1, 2021

💔 Backport failed

Uh oh!

Uh oh!

davidkyle commented Nov 30, 2021 •

edited

Loading