[DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs (#46966)

szabosteve · szabosteve · commit 033aa9cf9b29 · 2019-10-02T10:33:45.000+02:00
* [DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs.

* [DOCS] Removes extra lines from examples.

* Update docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley &lt;lcawley@elastic.co&gt;

* Update docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley &lt;lcawley@elastic.co&gt;

* [DOCS] Explains examples.
diff --git a/docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc b/docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc
@@ -172,3 +172,79 @@ only.
 <3> The ground truth value for the actual house price. This is required in order 
 to evaluate results.
 <4> The predicted value for house price calculated by the {reganalysis}.
+
+
+The following example calculates the training error:
+
+[source,console]
+--------------------------------------------------
+POST _ml/data_frame/_evaluate
+{
+  "index": "student_performance_mathematics_reg",
+  "query": {
+    "term": {
+      "ml.is_training": {
+        "value": true <1>
+      }
+    }
+  },
+  "evaluation": {
+    "regression": { 
+      "actual_field": "G3", <2>
+      "predicted_field": "ml.G3_prediction", <3>
+      "metrics": {  
+        "r_squared": {},
+        "mean_squared_error": {}                             
+      }
+    }
+  }
+}
+--------------------------------------------------
+// TEST[skip:TBD]
+
+<1> In this example, a test/train split (`training_percent`) was defined for the 
+{reganalysis}. This query limits evaluation to be performed on the train split 
+only. It means that a training error will be calculated.
+<2> The field that contains the ground truth value for the actual student 
+performance. This is required in order to evaluate results.
+<3> The field that contains the predicted value for student performance 
+calculated by the {reganalysis}.
+
+
+The next example calculates the testing error. The only difference compared with 
+the previous example is that `ml.is_training` is set to `false` this time, so 
+the query excludes the train split from the evaluation.
+
+[source,console]
+--------------------------------------------------
+POST _ml/data_frame/_evaluate
+{
+  "index": "student_performance_mathematics_reg",
+  "query": {
+    "term": {
+      "ml.is_training": {
+        "value": false <1>
+      }
+    }
+  },
+  "evaluation": {
+    "regression": { 
+      "actual_field": "G3", <2>
+      "predicted_field": "ml.G3_prediction", <3>
+      "metrics": {  
+        "r_squared": {},
+        "mean_squared_error": {}                             
+      }
+    }
+  }
+}
+--------------------------------------------------
+// TEST[skip:TBD]
+
+<1> In this example, a test/train split (`training_percent`) was defined for the 
+{reganalysis}. This query limits evaluation to be performed on the test split 
+only. It means that a testing error will be calculated.
+<2> The field that contains the ground truth value for the actual student 
+performance. This is required in order to evaluate results.
+<3> The field that contains the predicted value for student performance 
+calculated by the {reganalysis}.
diff --git a/docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc b/docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc
@@ -179,7 +179,7 @@ The API returns the following result:
 
 
 [[ml-put-dfanalytics-example-r]]
-===== {regression-cap} example
+===== {regression-cap} examples
 
 The following example creates the `house_price_regression_analysis` 
 {dfanalytics-job}, the analysis type is `regression`:
@@ -235,4 +235,31 @@ The API returns the following result:
 }
 ----
 // TESTRESPONSE[s/1567168659127/$body.$_path/]
-// TESTRESPONSE[s/"version": "8.0.0"/"version": $body.version/]
+// TESTRESPONSE[s/"version": "8.0.0"/"version": $body.version/]
+
+
+The following example creates a job and specifies a training percent:
+
+[source,console]
+--------------------------------------------------
+PUT _ml/data_frame/analytics/student_performance_mathematics_0.3
+{
+ "source": {
+   "index": "student_performance_mathematics"
+ },
+ "dest": {
+   "index":"student_performance_mathematics_reg"
+ },
+ "analysis":
+   {
+     "regression": {
+       "dependent_variable": "G3",
+       "training_percent": 70  <1>
+     }
+   }
+}
+--------------------------------------------------
+// TEST[skip:TBD]
+
+<1> The `training_percent` defines the percentage of the data set that will be used 
+for training the model.