Skip to content

Add MlClientDocumentationIT tests for classification. #47569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 11, 2019

Conversation

przemekwitek
Copy link
Contributor

@przemekwitek przemekwitek commented Oct 4, 2019

This PR enhances client documentation tests with the new classification analysis type:

  • testPutDataFrameAnalytics
  • testEvaluateDataFrame, testEvaluateDataFrame_Classification, testEvaluateDataFrame_Regression

Additionally, it adds basic java rest high-level docs related to classification.

Relates #46735

@przemekwitek przemekwitek force-pushed the classification_docs branch 5 times, most recently from 65152a7 to c8df7f8 Compare October 7, 2019 10:05
@przemekwitek przemekwitek removed the WIP label Oct 7, 2019
@przemekwitek przemekwitek marked this pull request as ready for review October 7, 2019 10:06
@przemekwitek przemekwitek added :ml Machine learning >docs General docs changes v7.5.0 v8.0.0 labels Oct 7, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (>docs)

Copy link
Contributor

@szabosteve szabosteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I left a few nits and questions. Maybe nothing needs to be changed if it's obvious which of the duplicate class names to use, not possible to link to bookmarks on external sites, and we don't care about full stops at the end of numbered items. But they are things to at least consider.

{
// tag::evaluate-data-frame-evaluation-regression
Evaluation evaluation =
new Regression( // <1>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Regression class is not fully qualified. But I don't think the doc examples include the imports. So this doesn't make it clear which package to choose when typing Regression into an IDE and it suggests two possible classes that could be imported.

It might be best to rename one of the classes, or else fully qualify the name here as well as where the other one is used in the docs.

(Same for Classification on line 3341.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
I've fully qualified the names Regression and Classification in this file for now.
LMK if you like the idea of renaming Regression to RegressionEvaluation and Classification to ClassificationEvaluation (or maybe have a different idea for naming). Then I could move on with renaming.

<2> Name of the field in the index. Its value denotes the actual (i.e. ground truth) label for an example. Must be either true or false.
<3> Name of the field in the index. Its value denotes the probability (as per some ML algorithm) of the example being classified as positive.
<4> The remaining parameters are the metrics to be calculated based on the two fields described above.
<5> https://en.wikipedia.org/wiki/Precision_and_recall[Precision] calculated at thresholds: 0.4, 0.5 and 0.6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to link to the #Precision bookmark on this page?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, instead of wikipedia link, or in addition?
Such a section does not exist yet on our page. Should I add it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can link to a specific bookmark on the Wikipedia page like this:

https://en.wikipedia.org/wiki/Precision_and_recall#Precision

I'm not sure it's possible in Asciidoc though. Maybe the # causes a problem. If not don't worry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's what you meant.

Sure, done.

<3> Name of the field in the index. Its value denotes the probability (as per some ML algorithm) of the example being classified as positive.
<4> The remaining parameters are the metrics to be calculated based on the two fields described above.
<5> https://en.wikipedia.org/wiki/Precision_and_recall[Precision] calculated at thresholds: 0.4, 0.5 and 0.6
<6> https://en.wikipedia.org/wiki/Precision_and_recall[Recall] calculated at thresholds: 0.5 and 0.7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to link to the #Recall bookmark on this page?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my questions above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I'm happy to merge if the docs team is happy with the numbered lists.

@przemekwitek
Copy link
Contributor Author

run elasticsearch-ci/packaging-sample-matrix

@przemekwitek
Copy link
Contributor Author

run elasticsearch-ci/packaging-sample

@przemekwitek przemekwitek merged commit 9b5770d into elastic:master Oct 11, 2019
@przemekwitek przemekwitek deleted the classification_docs branch October 11, 2019 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes :ml Machine learning v7.5.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants