Skip to content

[ML] Additional outlier detection parameters #47600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

dimitris-athanasiou
Copy link
Contributor

Adds the following parameters to outlier_detection:

  • compute_feature_influence (boolean): whether to compute or not
    feature influence scores
  • outlier_fraction (double): the proportion of the data set assumed
    to be outlying prior to running outlier detection
  • standardization_enabled (boolean): whether to apply standardization
    to the feature values

Adds the following parameters to `outlier_detection`:

- `compute_feature_influence` (boolean): whether to compute or not
   feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
   to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
   to the feature values
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@dimitris-athanasiou
Copy link
Contributor Author

@szabosteve Could you please take a look at the docs changes of this PR?

@dimitris-athanasiou
Copy link
Contributor Author

@dolaru Pinging as this might warrant new test scenarios for our QA

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

}

public OutlierDetection(StreamInput in) throws IOException {
nNeighbors = in.readOptionalVInt();
method = in.readBoolean() ? in.readEnum(Method.class) : null;
featureInfluenceThreshold = in.readOptionalDouble();
if (in.getVersion().onOrAfter(Version.V_7_5_0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that you have BWC tests, this may have to be changed to V_8_0_0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's why I'm holding on before merging the BWC tests :-)

@dimitris-athanasiou
Copy link
Contributor Author

This depends on elastic/ml-cpp#716 to be merged first.

Copy link
Contributor

@szabosteve szabosteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking care of this.
LGTM.

Copy link
Contributor

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

OutlierDetection outlierDetection = new OutlierDetection.Builder().build();
Map<String, Object> params = outlierDetection.getParams();
assertThat(params.size(), equalTo(3));
assertThat(params.containsKey("compute_feature_influence"), is(true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: There are hasKey and hasEntry methods in org.hamcrest.Matchers which you could use here.

@@ -96,6 +96,10 @@ include-tagged::{doc-tests-file}[{api}-outlier-detection-customized]
<1> Constructing a new OutlierDetection object
<2> The method used to perform the analysis
<3> Number of neighbors taken into account during analysis
<4> The min `outlier_score` required to compute feature influence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: Could the functionality of compute_feature_influence setting be achieved with setting min_outlier_score to a very high number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, one can set feature_influence_threshold to 1 to achieve the same as setting compute_feature_influence to false.

@dimitris-athanasiou
Copy link
Contributor Author

run elasticsearch-ci/1

@dimitris-athanasiou
Copy link
Contributor Author

run elasticsearch-ci/2

@dimitris-athanasiou
Copy link
Contributor Author

run elasticsearch-ci/1

@dimitris-athanasiou dimitris-athanasiou merged commit e99435a into elastic:master Oct 7, 2019
@dimitris-athanasiou dimitris-athanasiou deleted the additional-outlier-detection-params branch October 7, 2019 12:28
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Oct 7, 2019
Adds the following parameters to `outlier_detection`:

- `compute_feature_influence` (boolean): whether to compute or not
   feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
   to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
   to the feature values

Backport of elastic#47600
dimitris-athanasiou added a commit that referenced this pull request Oct 7, 2019
Adds the following parameters to `outlier_detection`:

- `compute_feature_influence` (boolean): whether to compute or not
   feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
   to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
   to the feature values

Backport of #47600
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants