7
7
8
8
beta[]
9
9
10
- A _{dataframe}_ is a two-dimensional tabular data structure. In the context of
11
- the {stack}, it is a transformation of data that is indexed in {es}. For
12
- example, you can use {dataframes} to _pivot_ your data into a new entity-centric
13
- index. By transforming and summarizing your data, it becomes possible to
14
- visualize and analyze it in alternative and interesting ways.
10
+ You can use {transforms} to _pivot_ your data into a new entity-centric index.
11
+ By transforming and summarizing your data, it becomes possible to visualize and
12
+ analyze it in alternative and interesting ways.
15
13
16
14
A lot of {es} indices are organized as a stream of events: each event is an
17
- individual document, for example a single item purchase. {dataframes -cap} enable
15
+ individual document, for example a single item purchase. {transforms -cap} enable
18
16
you to summarize this data, bringing it into an organized, more
19
17
analysis-friendly format. For example, you can summarize all the purchases of a
20
18
single customer.
21
19
22
- You can create {dataframes} by using {transforms}.
23
20
{transforms-cap} enable you to define a pivot, which is a set of
24
21
features that transform the index into a different, more digestible format.
25
- Pivoting results in a summary of your data, which is the {dataframe} .
22
+ Pivoting results in a summary of your data in a new index .
26
23
27
24
To define a pivot, first you select one or more fields that you will use to
28
25
group your data. You can select categorical fields (terms) and numerical fields
@@ -38,34 +35,32 @@ more about the supported aggregations and group-by fields, see
38
35
As an optional step, you can also add a query to further limit the scope of the
39
36
aggregation.
40
37
41
- The {transform} performs a composite aggregation that
42
- paginates through all the data defined by the source index query. The output of
43
- the aggregation is stored in a destination index. Each time the
44
- {transform} queries the source index, it creates a _checkpoint_. You
45
- can decide whether you want the {transform} to run once (batch
46
- {transform}) or continuously ({transform}). A batch
47
- {transform} is a single operation that has a single checkpoint.
48
- {ctransforms-cap} continually increment and process checkpoints as new
49
- source data is ingested.
38
+ The {transform} performs a composite aggregation that paginates through all the
39
+ data defined by the source index query. The output of the aggregation is stored
40
+ in a destination index. Each time the {transform} queries the source index, it
41
+ creates a _checkpoint_. You can decide whether you want the {transform} to run
42
+ once (batch {transform}) or continuously ({transform}). A batch {transform} is a
43
+ single operation that has a single checkpoint. {ctransforms-cap} continually
44
+ increment and process checkpoints as new source data is ingested.
50
45
51
46
.Example
52
47
53
- Imagine that you run a webshop that sells clothes. Every order creates a document
54
- that contains a unique order ID, the name and the category of the ordered product,
55
- its price, the ordered quantity, the exact date of the order, and some customer
56
- information (name, gender, location, etc). Your dataset contains all the transactions
57
- from last year.
48
+ Imagine that you run a webshop that sells clothes. Every order creates a
49
+ document that contains a unique order ID, the name and the category of the
50
+ ordered product, its price, the ordered quantity, the exact date of the order,
51
+ and some customer information (name, gender, location, etc). Your dataset
52
+ contains all the transactions from last year.
58
53
59
54
If you want to check the sales in the different categories in your last fiscal
60
- year, define a {transform} that groups the data by the product
61
- categories (women's shoes, men's clothing, etc.) and the order date. Use the
62
- last year as the interval for the order date. Then add a sum aggregation on the
63
- ordered quantity. The result is a {dataframe} that shows the number of sold
55
+ year, define a {transform} that groups the data by the product categories
56
+ (women's shoes, men's clothing, etc.) and the order date. Use the last year as
57
+ the interval for the order date. Then add a sum aggregation on the ordered
58
+ quantity. The result is an entity-centric index that shows the number of sold
64
59
items in every product category in the last year.
65
60
66
61
[role="screenshot"]
67
62
image::images/ml-dataframepivot.jpg["Example of a data frame pivot in {kib}"]
68
63
69
64
IMPORTANT: The {transform} leaves your source index intact. It
70
- creates a new index that is dedicated to the {dataframe} .
65
+ creates a new index that is dedicated to the transformed data .
71
66
0 commit comments