[DOCS] Added Data Frames subsection to ML section. (#352)

szabosteve · lcawl · commit 91d5953ab83b · 2019-06-12T13:36:38.000-07:00
diff --git a/docs/en/stack/ml/dataframes.asciidoc b/docs/en/stack/ml/dataframes.asciidoc
@@ -0,0 +1,58 @@
+[[ml-dataframes]]
+=== {dataframes-cap}
+
+beta[]
+
+A _{dataframe}_ is a transformation of a dataset by certain rules. You can think 
+of it like a spreadsheet or a data table that makes your data ready to be analyzed 
+and organized.
+
+A lot of {es} datasets are organized as a stream of events: each event is a individual 
+document, for example a single item purchase. {dataframe-transforms-cap} enable 
+you to summarize this data, bringing it into an organized, more analysis friendly 
+format. For example, you can summarize all the purchases of a single customer (see 
+the example below).
+
+The {dataframe} feature enables you to define a _pivot_ which is a set of features 
+that transform the dataset into a different, more digestible format. Pivoting 
+results in a summary of your dataset (which is the {dataframe} itself).
+
+Defining a pivot consist of two main parts. First, you select one or more fields 
+that your dataset will be grouped by. Principally you can select categorical 
+fields (terms) for grouping. You can also select numerical fields, in this case, 
+the field values will be bucketed using an interval you specify.
+
+The second step is deciding how you want to aggregate the grouped data. When 
+using aggregations, you practically ask questions about the dataset. There are 
+different types of aggregations, each with its own purpose and output. To learn 
+more about the supported aggregations and group-by fields, see 
+{ref}/data-frame-transform-pivot.html[Pivot resources].
+
+As an optional step, it's also possible to add a query to further limit the 
+scope of the aggregation.
+
+IMPORTANT: In 7.2, you can build {dataframes} on the top of a static dataset. 
+When new data comes into the index, you have to perform the transformation again 
+on the altered data.
+
+.Example
+
+Imagine that you run a webshop that sells clothes. Every order creates a 
+document that contains a unique order ID, the name and the category of the 
+ordered product, its price, the ordered quantity, the exact date of the order, 
+and some customer information (name, gender, location, etc). Your dataset 
+contains all the transactions from last year.
+
+If you want to check the sales in the different categories in your last fiscal year,
+define a {dataframe} that is grouped by the product categories (women's shoes, men's
+clothing, etc.) and the order date with the interval of the last year, then set 
+a sum aggregation on the ordered quantity. The result is a {dataframe} pivot that 
+shows the number of sold items in every product category in the last year.
+
+[role="screenshot"]
+image::ml/images/ml-dataframepivot.jpg["Example of a data frame pivot in {kib}"]
+
+IMPORTANT: Creating a {dataframe} leaves your source index intact. A new index will 
+be created dedicated to the {dataframe}.
+
+TIP: Using {dataframes} does not require {dfeeds}.
diff --git a/docs/en/stack/ml/images/ml-dataframepivot.jpg b/docs/en/stack/ml/images/ml-dataframepivot.jpg
diff --git a/docs/en/stack/ml/overview.asciidoc b/docs/en/stack/ml/overview.asciidoc
@@ -11,3 +11,4 @@ include::buckets.asciidoc[]
 include::calendars.asciidoc[]
 include::rules.asciidoc[]
 include::architecture.asciidoc[]
+include::dataframes.asciidoc[]