|
| 1 | +[[ml-dataframes]] |
| 2 | +=== {dataframes-cap} |
| 3 | + |
| 4 | +beta[] |
| 5 | + |
| 6 | +A _{dataframe}_ is a transformation of a dataset by certain rules. You can think |
| 7 | +of it like a spreadsheet or a data table that makes your data ready to be analyzed |
| 8 | +and organized. |
| 9 | + |
| 10 | +A lot of {es} datasets are organized as a stream of events: each event is a individual |
| 11 | +document, for example a single item purchase. {dataframe-transforms-cap} enable |
| 12 | +you to summarize this data, bringing it into an organized, more analysis friendly |
| 13 | +format. For example, you can summarize all the purchases of a single customer (see |
| 14 | +the example below). |
| 15 | + |
| 16 | +The {dataframe} feature enables you to define a _pivot_ which is a set of features |
| 17 | +that transform the dataset into a different, more digestible format. Pivoting |
| 18 | +results in a summary of your dataset (which is the {dataframe} itself). |
| 19 | + |
| 20 | +Defining a pivot consist of two main parts. First, you select one or more fields |
| 21 | +that your dataset will be grouped by. Principally you can select categorical |
| 22 | +fields (terms) for grouping. You can also select numerical fields, in this case, |
| 23 | +the field values will be bucketed using an interval you specify. |
| 24 | + |
| 25 | +The second step is deciding how you want to aggregate the grouped data. When |
| 26 | +using aggregations, you practically ask questions about the dataset. There are |
| 27 | +different types of aggregations, each with its own purpose and output. To learn |
| 28 | +more about the supported aggregations and group-by fields, see |
| 29 | +{ref}/data-frame-transform-pivot.html[Pivot resources]. |
| 30 | + |
| 31 | +As an optional step, it's also possible to add a query to further limit the |
| 32 | +scope of the aggregation. |
| 33 | + |
| 34 | +IMPORTANT: In 7.2, you can build {dataframes} on the top of a static dataset. |
| 35 | +When new data comes into the index, you have to perform the transformation again |
| 36 | +on the altered data. |
| 37 | + |
| 38 | +.Example |
| 39 | + |
| 40 | +Imagine that you run a webshop that sells clothes. Every order creates a |
| 41 | +document that contains a unique order ID, the name and the category of the |
| 42 | +ordered product, its price, the ordered quantity, the exact date of the order, |
| 43 | +and some customer information (name, gender, location, etc). Your dataset |
| 44 | +contains all the transactions from last year. |
| 45 | + |
| 46 | +If you want to check the sales in the different categories in your last fiscal year, |
| 47 | +define a {dataframe} that is grouped by the product categories (women's shoes, men's |
| 48 | +clothing, etc.) and the order date with the interval of the last year, then set |
| 49 | +a sum aggregation on the ordered quantity. The result is a {dataframe} pivot that |
| 50 | +shows the number of sold items in every product category in the last year. |
| 51 | + |
| 52 | +[role="screenshot"] |
| 53 | +image::ml/images/ml-dataframepivot.jpg["Example of a data frame pivot in {kib}"] |
| 54 | + |
| 55 | +IMPORTANT: Creating a {dataframe} leaves your source index intact. A new index will |
| 56 | +be created dedicated to the {dataframe}. |
| 57 | + |
| 58 | +TIP: Using {dataframes} does not require {dfeeds}. |
0 commit comments