You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Rollup] Only allow aggregating on multiples of configured interval (#32052)
We need to limit the search request aggregations to whole multiples
of the configured interval for both histogram and date_histogram.
Otherwise, agg buckets won't overlap with the rolled up buckets
and the results will be incorrect.
For histogram, the validation is very simple: request must be >= the config,
and modulo evenly.
Dates are more tricky.
- If both request and config are fixed dates, we can convert to millis
and treat them just like the histo
- If both are calendar, we make sure the request is >= the config with
a static lookup map that ranks the calendar values relatively. All
calendar units are "singles", so they are evenly divisible already
- We disallow any other combination (one fixed, one calendar, etc)
Copy file name to clipboardExpand all lines: x-pack/docs/en/rest-api/rollup/rollup-job-config.asciidoc
+45-5
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ PUT _xpack/rollup/job/sensor
23
23
"groups" : {
24
24
"date_histogram": {
25
25
"field": "timestamp",
26
-
"interval": "1h",
26
+
"interval": "60m",
27
27
"delay": "7d"
28
28
},
29
29
"terms": {
@@ -99,7 +99,7 @@ fields will then be available later for aggregating into buckets. For example,
99
99
"groups" : {
100
100
"date_histogram": {
101
101
"field": "timestamp",
102
-
"interval": "1h",
102
+
"interval": "60m",
103
103
"delay": "7d"
104
104
},
105
105
"terms": {
@@ -133,9 +133,9 @@ The `date_histogram` group has several parameters:
133
133
The date field that is to be rolled up.
134
134
135
135
`interval` (required)::
136
-
The interval of time buckets to be generated when rolling up. E.g. `"1h"` will produce hourly rollups. This follows standard time formatting
137
-
syntax as used elsewhere in Elasticsearch. The `interval` defines the _minimum_ interval that can be aggregated only. If hourly (`"1h"`)
138
-
intervals are configured, <<rollup-search,Rollup Search>> can execute aggregations with 1hr or greater (weekly, monthly, etc) intervals.
136
+
The interval of time buckets to be generated when rolling up. E.g. `"60m"` will produce 60 minute (hourly) rollups. This follows standard time formatting
137
+
syntax as used elsewhere in Elasticsearch. The `interval` defines the _minimum_ interval that can be aggregated only. If hourly (`"60m"`)
138
+
intervals are configured, <<rollup-search,Rollup Search>> can execute aggregations with 60m or greater (weekly, monthly, etc) intervals.
139
139
So define the interval as the smallest unit that you wish to later query.
140
140
141
141
Note: smaller, more granular intervals take up proportionally more space.
@@ -154,6 +154,46 @@ The `date_histogram` group has several parameters:
154
154
to be stored with a specific timezone. By default, rollup documents are stored in `UTC`, but this can be changed with the `time_zone`
155
155
parameter.
156
156
157
+
.Calendar vs Fixed time intervals
158
+
**********************************
159
+
Elasticsearch understands both "calendar" and "fixed" time intervals. Fixed time intervals are fairly easy to understand;
160
+
`"60s"` means sixty seconds. But what does `"1M` mean? One month of time depends on which month we are talking about,
161
+
some months are longer or shorter than others. This is an example of "calendar" time, and the duration of that unit
162
+
depends on context. Calendar units are also affected by leap-seconds, leap-years, etc.
163
+
164
+
This is important because the buckets generated by Rollup will be in either calendar or fixed intervals, and will limit
165
+
how you can query them later (see <<rollup-search-limitations-intervals, Requests must be multiples of the config>>.
166
+
167
+
We recommend sticking with "fixed" time intervals, since they are easier to understand and are more flexible at query
168
+
time. It will introduce some drift in your data during leap-events, and you will have to think about months in a fixed
169
+
quantity (30 days) instead of the actual calendar length... but it is often easier than dealing with calendar units
170
+
at query time.
171
+
172
+
Multiples of units are always "fixed" (e.g. `"2h"` is always the fixed quantity `7200` seconds. Single units can be
173
+
fixed or calendar depending on the unit:
174
+
175
+
[options="header"]
176
+
|=======
177
+
|Unit |Calendar |Fixed
178
+
|millisecond |NA |`1ms`, `10ms`, etc
179
+
|second |NA |`1s`, `10s`, etc
180
+
|minute |`1m` |`2m`, `10m`, etc
181
+
|hour |`1h` |`2h`, `10h`, etc
182
+
|day |`1d` |`2d`, `10d`, etc
183
+
|week |`1w` |NA
184
+
|month |`1M` |NA
185
+
|quarter |`1q` |NA
186
+
|year |`1y` |NA
187
+
|=======
188
+
189
+
For some units where there are both fixed and calendar, you may need to express the quantity in terms of the next
190
+
smaller unit. For example, if you want a fixed day (not a calendar day), you should specify `24h` instead of `1d`.
191
+
Similarly, if you want fixed hours, specify `60m` instead of `1h`. This is because the single quantity entails
192
+
calendar time, and limits you to querying by calendar time in the future.
193
+
194
+
195
+
**********************************
196
+
157
197
===== Terms
158
198
159
199
The `terms` group can be used on `keyword` or numeric fields, to allow bucketing via the `terms` aggregation at a later point. The `terms`
0 commit comments