-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[ML] Additional outlier detection parameters #47600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Additional outlier detection parameters #47600
Conversation
Adds the following parameters to `outlier_detection`: - `compute_feature_influence` (boolean): whether to compute or not feature influence scores - `outlier_fraction` (double): the proportion of the data set assumed to be outlying prior to running outlier detection - `standardization_enabled` (boolean): whether to apply standardization to the feature values
Pinging @elastic/ml-core (:ml) |
@szabosteve Could you please take a look at the docs changes of this PR? |
@dolaru Pinging as this might warrant new test scenarios for our QA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
} | ||
|
||
public OutlierDetection(StreamInput in) throws IOException { | ||
nNeighbors = in.readOptionalVInt(); | ||
method = in.readBoolean() ? in.readEnum(Method.class) : null; | ||
featureInfluenceThreshold = in.readOptionalDouble(); | ||
if (in.getVersion().onOrAfter(Version.V_7_5_0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that you have BWC tests, this may have to be changed to V_8_0_0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's why I'm holding on before merging the BWC tests :-)
This depends on elastic/ml-cpp#716 to be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking care of this.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
OutlierDetection outlierDetection = new OutlierDetection.Builder().build(); | ||
Map<String, Object> params = outlierDetection.getParams(); | ||
assertThat(params.size(), equalTo(3)); | ||
assertThat(params.containsKey("compute_feature_influence"), is(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: There are hasKey
and hasEntry
methods in org.hamcrest.Matchers
which you could use here.
@@ -96,6 +96,10 @@ include-tagged::{doc-tests-file}[{api}-outlier-detection-customized] | |||
<1> Constructing a new OutlierDetection object | |||
<2> The method used to perform the analysis | |||
<3> Number of neighbors taken into account during analysis | |||
<4> The min `outlier_score` required to compute feature influence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: Could the functionality of compute_feature_influence
setting be achieved with setting min_outlier_score
to a very high number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, one can set feature_influence_threshold
to 1
to achieve the same as setting compute_feature_influence
to false
.
run elasticsearch-ci/1 |
run elasticsearch-ci/2 |
run elasticsearch-ci/1 |
Adds the following parameters to `outlier_detection`: - `compute_feature_influence` (boolean): whether to compute or not feature influence scores - `outlier_fraction` (double): the proportion of the data set assumed to be outlying prior to running outlier detection - `standardization_enabled` (boolean): whether to apply standardization to the feature values Backport of elastic#47600
Adds the following parameters to `outlier_detection`: - `compute_feature_influence` (boolean): whether to compute or not feature influence scores - `outlier_fraction` (double): the proportion of the data set assumed to be outlying prior to running outlier detection - `standardization_enabled` (boolean): whether to apply standardization to the feature values Backport of #47600
Adds the following parameters to
outlier_detection
:compute_feature_influence
(boolean): whether to compute or notfeature influence scores
outlier_fraction
(double): the proportion of the data set assumedto be outlying prior to running outlier detection
standardization_enabled
(boolean): whether to apply standardizationto the feature values