@@ -26,49 +26,198 @@ experimental[]
26
26
[[rollup-put-job-api-desc]]
27
27
==== {api-description-title}
28
28
29
+ The {rollup-job} configuration contains all the details about how the job should
30
+ run, when it indexes documents, and what future queries will be able to execute
31
+ against the rollup index.
32
+
33
+ There are three main sections to the job configuration: the logistical details
34
+ about the job (cron schedule, etc), the fields that are used for grouping, and
35
+ what metrics to collect for each group.
36
+
29
37
Jobs are created in a `STOPPED` state. You can start them with the
30
38
<<rollup-start-job,start {rollup-jobs} API>>.
31
39
32
40
[[rollup-put-job-api-path-params]]
33
41
==== {api-path-parms-title}
34
42
35
43
`<job_id>`::
36
- (Required, string) Identifier for the {rollup-job}.
44
+ (Required, string) Identifier for the {rollup-job}. This can be any
45
+ alphanumeric string and uniquely identifies the data that is associated with
46
+ the {rollup-job}. The ID is persistent; it is stored with the rolled up data.
47
+ If you create a job, let it run for a while, then delete the job, the data
48
+ that the job rolled up is still be associated with this job ID. You cannot
49
+ create a new job with the same ID since that could lead to problems with
50
+ mismatched job configurations.
37
51
38
52
[[rollup-put-job-api-request-body]]
39
53
==== {api-request-body-title}
40
54
41
55
`cron`::
42
- (Required, string) A cron string which defines when the {rollup-job} should be executed.
56
+ (Required, string) A cron string which defines the intervals when the
57
+ {rollup-job} should be executed. When the interval triggers, the indexer
58
+ attempts to rollup the data in the index pattern. The cron pattern is
59
+ unrelated to the time interval of the data being rolled up. For example, you
60
+ may wish to create hourly rollups of your document but to only run the indexer
61
+ on a daily basis at midnight, as defined by the cron. The cron pattern is
62
+ defined just like a {watcher} cron schedule.
43
63
64
+ [[rollup-groups-config]]
44
65
`groups`::
45
- (Required, object) Defines the grouping fields that are defined for this
46
- {rollup-job}. See <<rollup-job-config,{rollup-job} config>>.
66
+ (Required, object) Defines the grouping fields and aggregations that are
67
+ defined for this {rollup-job}. These fields will then be available later for
68
+ aggregating into buckets.
69
+ +
70
+ --
71
+ These aggs and fields can be used in any combination. Think of the `groups`
72
+ configuration as defining a set of tools that can later be used in aggregations
73
+ to partition the data. Unlike raw data, we have to think ahead to which fields
74
+ and aggregations might be used. Rollups provide enough flexibility that you
75
+ simply need to determine _which_ fields are needed, not _in what order_ they are
76
+ needed.
77
+
78
+ There are three types of groupings currently available:
79
+ --
80
+
81
+ `date_histogram`:::
82
+ (Required, object) A date histogram group aggregates a `date` field into
83
+ time-based buckets. This group is *mandatory*; you currently cannot rollup
84
+ documents without a timestamp and a `date_histogram` group. The
85
+ `date_histogram` group has several parameters:
86
+
87
+ `field`::::
88
+ (Required, string) The date field that is to be rolled up.
89
+
90
+ `calendar_interval` or `fixed_interval`::::
91
+ (Required, <<time-units,time units>>) The interval of time buckets to be
92
+ generated when rolling up. For example, `60m` produces 60 minute (hourly)
93
+ rollups. This follows standard time formatting syntax as used elsewhere in
94
+ {es}. The interval defines the _minimum_ interval that can be aggregated only.
95
+ If hourly (`60m`) intervals are configured, <<rollup-search,rollup search>>
96
+ can execute aggregations with 60m or greater (weekly, monthly, etc) intervals.
97
+ So define the interval as the smallest unit that you wish to later query. For
98
+ more information about the difference between calendar and fixed time
99
+ intervals, see <<rollup-understanding-group-intervals>>.
100
+ +
101
+ --
102
+ NOTE: Smaller, more granular intervals take up proportionally more space.
103
+
104
+ --
105
+
106
+ `delay`::::
107
+ (Optional,<<time-units,time units>>) How long to wait before rolling up new
108
+ documents. By default, the indexer attempts to roll up all data that is
109
+ available. However, it is not uncommon for data to arrive out of order,
110
+ sometimes even a few days late. The indexer is unable to deal with data that
111
+ arrives after a time-span has been rolled up. That is to say, there is no
112
+ provision to update already-existing rollups.
113
+ +
114
+ --
115
+ Instead, you should specify a `delay` that matches the longest period of time
116
+ you expect out-of-order data to arrive. For example, a `delay` of `1d`
117
+ instructs the indexer to roll up documents up to `now - 1d`, which provides
118
+ a day of buffer time for out-of-order documents to arrive.
119
+ --
120
+
121
+ `time_zone`::::
122
+ (Optional, string) Defines what time_zone the rollup documents are stored as.
123
+ Unlike raw data, which can shift timezones on the fly, rolled documents have
124
+ to be stored with a specific timezone. By default, rollup documents are stored
125
+ in `UTC`.
126
+
127
+ `terms`:::
128
+ (Optional, object) The terms group can be used on `keyword` or numeric fields
129
+ to allow bucketing via the `terms` aggregation at a later point. The indexer
130
+ enumerates and stores _all_ values of a field for each time-period. This can
131
+ be potentially costly for high-cardinality groups such as IP addresses,
132
+ especially if the time-bucket is particularly sparse.
133
+ +
134
+ --
135
+ TIP: While it is unlikely that a rollup will ever be larger in size than the raw
136
+ data, defining `terms` groups on multiple high-cardinality fields can
137
+ effectively reduce the compression of a rollup to a large extent. You should be
138
+ judicious which high-cardinality fields are included for that reason.
139
+
140
+ The `terms` group has a single parameter:
141
+ --
142
+
143
+ `fields`::::
144
+ (Required, string) The set of fields that you wish to collect terms for. This
145
+ array can contain fields that are both `keyword` and numerics. Order does not
146
+ matter.
147
+
148
+ `histogram`:::
149
+ (Optional, object) The histogram group aggregates one or more numeric fields
150
+ into numeric histogram intervals.
151
+ +
152
+ --
153
+ The `histogram` group has a two parameters:
154
+ --
155
+
156
+ `fields`::::
157
+ (Required, array) The set of fields that you wish to build histograms for. All fields
158
+ specified must be some kind of numeric. Order does not matter.
159
+
160
+ `interval`::::
161
+ (Required, integer) The interval of histogram buckets to be generated when
162
+ rolling up. For example, a value of `5` creates buckets that are five units
163
+ wide (`0-5`, `5-10`, etc). Note that only one interval can be specified in the
164
+ `histogram` group, meaning that all fields being grouped via the histogram
165
+ must share the same interval.
47
166
48
167
`index_pattern`::
49
168
(Required, string) The index or index pattern to roll up. Supports
50
- wildcard-style patterns (`logstash-*`).
169
+ wildcard-style patterns (`logstash-*`). The job will
170
+ attempt to rollup the entire index or index-pattern.
171
+ +
172
+ --
173
+ NOTE: The `index_pattern` cannot be a pattern that would also match the
174
+ destination `rollup_index`. For example, the pattern `foo-*` would match the
175
+ rollup index `foo-rollup`. This situation would cause problems because the
176
+ {rollup-job} would attempt to rollup its own data at runtime. If you attempt to
177
+ configure a pattern that matches the `rollup_index`, an exception occurs to
178
+ prevent this behavior.
179
+
180
+ --
51
181
182
+ [[rollup-metrics-config]]
52
183
`metrics`::
53
- (Optional, object) Defines the metrics to collect for each grouping tuple. See
54
- <<rollup-job-config,{rollup-job} config>>.
184
+ (Optional, object) Defines the metrics to collect for each grouping tuple.
185
+ By default, only the doc_counts are collected for each group. To make rollup
186
+ useful, you will often add metrics like averages, mins, maxes, etc. Metrics
187
+ are defined on a per-field basis and for each field you configure which metric
188
+ should be collected.
189
+ +
190
+ --
191
+ The `metrics` configuration accepts an array of objects, where each object has
192
+ two parameters:
193
+ --
194
+
195
+ `field`:::
196
+ (Required, string) The field to collect metrics for. This must be a numeric
197
+ of some kind.
198
+
199
+ `metrics`:::
200
+ (Required, array) An array of metrics to collect for the field. At least one
201
+ metric must be configured. Acceptable metrics are `min`,`max`,`sum`,`avg`, and
202
+ `value_count`.
55
203
56
204
`page_size`::
57
205
(Required, integer) The number of bucket results that are processed on each
58
206
iteration of the rollup indexer. A larger value tends to execute faster, but
59
- requires more memory during processing.
207
+ requires more memory during processing. This value has no effect on how the
208
+ data is rolled up; it is merely used for tweaking the speed or memory cost of
209
+ the indexer.
60
210
61
211
`rollup_index`::
62
212
(Required, string) The index that contains the rollup results. The index can
63
- be shared with other {rollup-jobs}.
64
-
65
- For more details about the job configuration, see <<rollup-job-config>>.
213
+ be shared with other {rollup-jobs}. The data is stored so that it doesn't
214
+ interfere with unrelated jobs.
66
215
67
216
[[rollup-put-job-api-example]]
68
217
==== {api-example-title}
69
218
70
- The following example creates a {rollup-job} named " sensor" , targeting the
71
- " sensor-*" index pattern:
219
+ The following example creates a {rollup-job} named ` sensor` , targeting the
220
+ ` sensor-*` index pattern:
72
221
73
222
[source,console]
74
223
--------------------------------------------------
@@ -78,7 +227,7 @@ PUT _rollup/job/sensor
78
227
"rollup_index": "sensor_rollup",
79
228
"cron": "*/30 * * * * ?",
80
229
"page_size" :1000,
81
- "groups" : {
230
+ "groups" : { <1>
82
231
"date_histogram": {
83
232
"field": "timestamp",
84
233
"fixed_interval": "1h",
@@ -88,7 +237,7 @@ PUT _rollup/job/sensor
88
237
"fields": ["node"]
89
238
}
90
239
},
91
- "metrics": [
240
+ "metrics": [ <2>
92
241
{
93
242
"field": "temperature",
94
243
"metrics": ["min", "max", "sum"]
@@ -101,6 +250,11 @@ PUT _rollup/job/sensor
101
250
}
102
251
--------------------------------------------------
103
252
// TEST[setup:sensor_index]
253
+ <1> This configuration enables date histograms to be used on the `timestamp`
254
+ field and `terms` aggregations to be used on the `node` field.
255
+ <2> This configuration defines metrics over two fields: `temperature` and
256
+ `voltage`. For the `temperature` field, we are collecting the min, max, and
257
+ sum of the temperature. For `voltage`, we are collecting the average.
104
258
105
259
When the job is created, you receive the following results:
106
260
@@ -109,4 +263,4 @@ When the job is created, you receive the following results:
109
263
{
110
264
"acknowledged": true
111
265
}
112
- ----
266
+ ----
0 commit comments