Skip to content

Commit 91d5953

Browse files
szabostevelcawl
authored andcommitted
[DOCS] Added Data Frames subsection to ML section. (#352)
1 parent 3885f85 commit 91d5953

File tree

3 files changed

+59
-0
lines changed

3 files changed

+59
-0
lines changed

docs/en/stack/ml/dataframes.asciidoc

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
[[ml-dataframes]]
2+
=== {dataframes-cap}
3+
4+
beta[]
5+
6+
A _{dataframe}_ is a transformation of a dataset by certain rules. You can think
7+
of it like a spreadsheet or a data table that makes your data ready to be analyzed
8+
and organized.
9+
10+
A lot of {es} datasets are organized as a stream of events: each event is a individual
11+
document, for example a single item purchase. {dataframe-transforms-cap} enable
12+
you to summarize this data, bringing it into an organized, more analysis friendly
13+
format. For example, you can summarize all the purchases of a single customer (see
14+
the example below).
15+
16+
The {dataframe} feature enables you to define a _pivot_ which is a set of features
17+
that transform the dataset into a different, more digestible format. Pivoting
18+
results in a summary of your dataset (which is the {dataframe} itself).
19+
20+
Defining a pivot consist of two main parts. First, you select one or more fields
21+
that your dataset will be grouped by. Principally you can select categorical
22+
fields (terms) for grouping. You can also select numerical fields, in this case,
23+
the field values will be bucketed using an interval you specify.
24+
25+
The second step is deciding how you want to aggregate the grouped data. When
26+
using aggregations, you practically ask questions about the dataset. There are
27+
different types of aggregations, each with its own purpose and output. To learn
28+
more about the supported aggregations and group-by fields, see
29+
{ref}/data-frame-transform-pivot.html[Pivot resources].
30+
31+
As an optional step, it's also possible to add a query to further limit the
32+
scope of the aggregation.
33+
34+
IMPORTANT: In 7.2, you can build {dataframes} on the top of a static dataset.
35+
When new data comes into the index, you have to perform the transformation again
36+
on the altered data.
37+
38+
.Example
39+
40+
Imagine that you run a webshop that sells clothes. Every order creates a
41+
document that contains a unique order ID, the name and the category of the
42+
ordered product, its price, the ordered quantity, the exact date of the order,
43+
and some customer information (name, gender, location, etc). Your dataset
44+
contains all the transactions from last year.
45+
46+
If you want to check the sales in the different categories in your last fiscal year,
47+
define a {dataframe} that is grouped by the product categories (women's shoes, men's
48+
clothing, etc.) and the order date with the interval of the last year, then set
49+
a sum aggregation on the ordered quantity. The result is a {dataframe} pivot that
50+
shows the number of sold items in every product category in the last year.
51+
52+
[role="screenshot"]
53+
image::ml/images/ml-dataframepivot.jpg["Example of a data frame pivot in {kib}"]
54+
55+
IMPORTANT: Creating a {dataframe} leaves your source index intact. A new index will
56+
be created dedicated to the {dataframe}.
57+
58+
TIP: Using {dataframes} does not require {dfeeds}.
91.6 KB
Loading

docs/en/stack/ml/overview.asciidoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,4 @@ include::buckets.asciidoc[]
1111
include::calendars.asciidoc[]
1212
include::rules.asciidoc[]
1313
include::architecture.asciidoc[]
14+
include::dataframes.asciidoc[]

0 commit comments

Comments
 (0)