Skip to content

[DOCS] Add scatterplot matrix to outlier detection example #1507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 21, 2021

Conversation

lcawl
Copy link
Contributor

@lcawl lcawl commented Dec 30, 2020

Related to elastic/kibana#84420

This PR updates the screenshots in the outlier detection example (https://www.elastic.co/guide/en/machine-learning/master/ecommerce-outliers.html) and adds a paragraph about the new scatterplot matrix.

Preview:

@lcawl lcawl marked this pull request as ready for review January 20, 2021 21:29
@lcawl lcawl requested review from walterra and szabosteve January 20, 2021 21:29
Copy link

@walterra walterra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great update, I just posted one suggestion about clarifying performance implications.


If you want these charts to represent data from a larger sample size or from a
randomized selection of documents, you can change the default behavior. However,
these options might slow down the performance of the matrix.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slow down the performance of the matrix

Maybe this can be clarified with a bit more detail:

  • A larger sample size might slow down the performance of the matrix on the client side.
  • A randomized selection will create a "heavier" query and potentially put more load on the ES cluster. It will not affect the client side performance of the matrix in comparison with a non-randomized matrix with the same sample size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've changed the impact statement to this: "However,
a larger sample size might slow down the performance of the matrix and a
randomized selection might put more load on the cluster due to the more
intensive query."

Copy link
Contributor

@szabosteve szabosteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two minor suggestions regarding using shared attributes, please take or leave them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants