[ML] adding ml autoscaling integration test #65638

benwtrent · 2020-11-30T22:05:43Z

This adds ml autoscaling integration tests.

The test verifies that the scaling requirements adjust according to the current real load
on the cluster given machine learning jobs of various sizes.

Additionally, there was a bug in the ml scaling service settings. This commit addresses the bug.

elasticmachine · 2020-11-30T22:05:45Z

Pinging @elastic/ml-core (:ml)

elasticmachine · 2020-11-30T22:05:46Z

Pinging @elastic/es-distributed (Team:Distributed)

droberts195

Thanks for adding the test.

I have one question, plus there is something to be careful about in the Cloud-side PR with the removal of the setting maximum.

droberts195 · 2020-12-01T10:16:18Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/MachineLearning.java

@@ -449,7 +449,7 @@
        false,
        Property.NodeScope);
    public static final Setting<Integer> MAX_LAZY_ML_NODES =
-            Setting.intSetting("xpack.ml.max_lazy_ml_nodes", 0, 0, 3, Property.Dynamic, Property.NodeScope);
+            Setting.intSetting("xpack.ml.max_lazy_ml_nodes", 0, 0, Property.Dynamic, Property.NodeScope);


We will have to be careful that the Cloud infrastructure never tries to set this setting higher than 3 in a pre-7.11/post-7.11 mixed version cluster. This should be possible for autoscaling, as we've previously said that autoscaling won't attempt to do anything in such a cluster.

This ties in with the Cloud-side PR to set the settings required for ML autoscaling. It is probably best if this one doesn't go in elasticsearch.yml, but instead gets set using a cluster settings API call once the entire cluster has been upgraded to 7.11 or higher.

Also, the docs state that the maximum value is 3. I think it's probably best to leave the docs like this for the time being. Maybe in a year or two we can adjust them, but while there is a risk of problems in mixed version clusters it's probably best that we don't.

droberts195 · 2020-12-01T11:09:32Z

...i-node-tests/src/javaRestTest/java/org/elasticsearch/xpack/ml/integration/AutoscalingIT.java

+                new GetAutoscalingCapacityAction.Request()
+            ).actionGet(),
+            "requesting scale up as number of jobs in queues exceeded configured limit",
+            380991001934L,


I can see where the 1328196267L comes from in the previous assertion: ceil((100 + 40 + 200 + 40) * 1024^2 * 100 / 30).

But why is this one 380991001934L? ceil((100 + 40 + 200 + 40 + 20000 + 40 + 10000 + 40 + 30000 + 40) * 1024^2 * 100 / 30) = 211462826667

I think it would be good to put the expected formula that's been evaluated in a comment so that if ever there are changes to the constants used in the code then the person who edits the expected result here doesn't just mindlessly paste in whatever makes the test pass but can use the formula to manually check the result makes sense given their new constants.

droberts195 · 2020-12-01T11:20:41Z

...i-node-tests/src/javaRestTest/java/org/elasticsearch/xpack/ml/integration/AutoscalingIT.java

+            .build();
+    }
+
+    public void testMLAutoscalingCapacity() {


This is running on the assumption that xpack.ml.max_machine_memory_percent is set to the default of 30 and xpack.ml.use_auto_machine_memory_percent is false right? I think it's worth adding a comment to say that these are the expectations for this test and it will need modifying if either of those defaults is ever changed. It will make it clearer for future maintainers where the numbers have come from.

…aling-integration-test

droberts195

LGTM

benwtrent · 2020-12-02T13:41:59Z

run elasticsearch-ci/2

…aling-integration-test

This adds ml autoscaling integration tests. The test verifies that the scaling requirements adjust according to the current real load on the cluster given machine learning jobs of various sizes. Additionally, there was a bug in the ml scaling service settings. This commit addresses the bug.

* [ML] adding ml autoscaling integration test (#65638) This adds ml autoscaling integration tests. The test verifies that the scaling requirements adjust according to the current real load on the cluster given machine learning jobs of various sizes. Additionally, there was a bug in the ml scaling service settings. This commit addresses the bug.

[ML] adding ml autoscaling integration test

e35fe6a

benwtrent added >test Issues or PRs that are addressing/adding tests :ml Machine learning v8.0.0 :Distributed Coordination/Autoscaling v7.11.0 labels Nov 30, 2020

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Nov 30, 2020

droberts195 reviewed Dec 1, 2020

View reviewed changes

benwtrent added 2 commits December 1, 2020 08:06

Merge remote-tracking branch 'upstream/master' into feature/ml-autosc…

0c1099a

…aling-integration-test

addressing PR comments

2a1b44d

benwtrent requested a review from droberts195 December 1, 2020 15:16

droberts195 approved these changes Dec 1, 2020

View reviewed changes

benwtrent added 3 commits December 1, 2020 11:22

fixing format

b76f5cd

fixing tests format

72268d9

fixing test

90328f2

benwtrent added 2 commits December 2, 2020 13:28

Merge remote-tracking branch 'upstream/master' into feature/ml-autosc…

8b863ce

…aling-integration-test

fixing tests

727b28b

benwtrent merged commit 68358df into elastic:master Dec 2, 2020

benwtrent deleted the feature/ml-autoscaling-integration-test branch December 2, 2020 20:01

benwtrent mentioned this pull request Dec 2, 2020

[7.x] [ML] adding ml autoscaling integration test (#65638) #65775

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] adding ml autoscaling integration test #65638

[ML] adding ml autoscaling integration test #65638

benwtrent commented Nov 30, 2020

elasticmachine commented Nov 30, 2020

elasticmachine commented Nov 30, 2020

droberts195 left a comment

droberts195 Dec 1, 2020

droberts195 Dec 1, 2020

droberts195 Dec 1, 2020 •

edited

Loading

droberts195 left a comment

benwtrent commented Dec 2, 2020

[ML] adding ml autoscaling integration test #65638

[ML] adding ml autoscaling integration test #65638

Conversation

benwtrent commented Nov 30, 2020

elasticmachine commented Nov 30, 2020

elasticmachine commented Nov 30, 2020

droberts195 left a comment

Choose a reason for hiding this comment

droberts195 Dec 1, 2020

Choose a reason for hiding this comment

droberts195 Dec 1, 2020

Choose a reason for hiding this comment

droberts195 Dec 1, 2020 • edited Loading

Choose a reason for hiding this comment

droberts195 left a comment

Choose a reason for hiding this comment

benwtrent commented Dec 2, 2020

droberts195 Dec 1, 2020 •

edited

Loading