Skip to content

Commit f25972c

Browse files
authored
[DOCS] Add scatterplot matrix to classification example (#1539) (#1542)
1 parent 1728809 commit f25972c

File tree

2 files changed

+20
-2
lines changed

2 files changed

+20
-2
lines changed

docs/en/stack/ml/df-analytics/flightdata-classification.asciidoc

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ in {kib} or the {ref}/put-dfanalytics.html[create {dfanalytics-jobs}] API.
102102

103103
[role="screenshot"]
104104
image::images/flights-classification-job-1.jpg["Creating a {dfanalytics-job} in {kib}"]
105+
--
105106

106107
.. Choose `kibana_sample_data_flights` as the source index.
107108
.. Choose `classification` as the job type.
@@ -111,6 +112,23 @@ want to predict with the {classanalysis}.
111112
excluded fields. These fields will be excluded from the analysis. It is
112113
recommended to exclude fields that either contain erroneous data or describe the
113114
`dependent_variable`.
115+
+
116+
--
117+
The wizard includes a scatterplot matrix, which enables you to explore the
118+
relationships between the numeric fields. The color of each point is affected by
119+
the value of the dependent variable for that document, as shown in the legend.
120+
You can use this matrix to help you decide which fields to include or exclude
121+
from the analysis.
122+
123+
[role="screenshot"]
124+
image::images/flights-classification-scatterplot.png["A scatterplot matrix for three fields in {kib}"]
125+
126+
If you want these charts to represent data from a larger sample size or from a
127+
randomized selection of documents, you can change the default behavior. However,
128+
a larger sample size might slow down the performance of the matrix and a
129+
randomized selection might put more load on the cluster due to the more
130+
intensive query.
131+
--
114132
.. Choose a training percent of `10` which means it randomly selects 10% of the
115133
source data for training. While that value is low for this example, for many
116134
large data sets using a small training sample greatly reduces runtime without
@@ -129,8 +147,8 @@ analysis. In {kib}, the index name matches the job ID by default. It will
129147
contain a copy of the source index data where each document is annotated with
130148
the results. If the index does not exist, it will be created automatically.
131149
.. Use default values for all other options.
132-
133-
150+
+
151+
--
134152
.API example
135153
[%collapsible]
136154
====
Loading

0 commit comments

Comments
 (0)