@@ -15,17 +15,28 @@ TIP: If you use a terms aggregation and the cardinality of a term is high, the
15
15
aggregation might not be effective and you might want to just use the default
16
16
search and scroll behavior.
17
17
18
+ [discrete]
19
+ [[aggs-limits-dfeeds]]
20
+ ==== Requirements and limitations
21
+
18
22
There are some limitations to using aggregations in {dfeeds}. Your aggregation
19
23
must include a `date_histogram` aggregation, which in turn must contain a `max`
20
24
aggregation on the time field. This requirement ensures that the aggregated data
21
25
is a time series and the timestamp of each bucket is the time of the last record
22
26
in the bucket.
23
27
28
+ IMPORTANT: The name of the aggregation and the name of the field that the agg
29
+ operates on need to match, otherwise the aggregation doesn't work. For example,
30
+ if you use a `max` aggregation on a time field called `responsetime`, the name
31
+ of the aggregation must be also `responsetime`.
32
+
24
33
You must also consider the interval of the date histogram aggregation carefully.
25
34
The bucket span of your {anomaly-job} must be divisible by the value of the
26
35
`calendar_interval` or `fixed_interval` in your aggregation (with no remainder).
27
36
If you specify a `frequency` for your {dfeed}, it must also be divisible by this
28
- interval.
37
+ interval. {anomaly-jobs-cap} cannot use date histograms with an interval
38
+ measured in months because the length of the month is not fixed. {dfeeds-cap}
39
+ tolerate weeks or smaller units.
29
40
30
41
TIP: As a rule of thumb, if your detectors use <<ml-metric-functions,metric>> or
31
42
<<ml-sum-functions,sum>> analytical functions, set the date histogram
@@ -34,6 +45,11 @@ finer, more granular time buckets, which are ideal for this type of analysis. If
34
45
your detectors use <<ml-count-functions,count>> or <<ml-rare-functions,rare>>
35
46
functions, set the interval to the same value as the bucket span.
36
47
48
+
49
+ [discrete]
50
+ [[aggs-include-jobs]]
51
+ ==== Including aggregations in {anomaly-jobs}
52
+
37
53
When you create or update an {anomaly-job}, you can include the names of
38
54
aggregations, for example:
39
55
@@ -85,13 +101,13 @@ PUT _ml/datafeeds/datafeed-farequote
85
101
"time": { <1>
86
102
"max": {"field": "time"}
87
103
},
88
- "airline": { <1 >
104
+ "airline": { <2 >
89
105
"terms": {
90
106
"field": "airline",
91
107
"size": 100
92
108
},
93
109
"aggregations": {
94
- "responsetime": { <1 >
110
+ "responsetime": { <3 >
95
111
"avg": {
96
112
"field": "responsetime"
97
113
}
@@ -107,15 +123,23 @@ PUT _ml/datafeeds/datafeed-farequote
107
123
108
124
<1> In this example, the aggregations have names that match the fields that they
109
125
operate on. That is to say, the `max` aggregation is named `time` and its
110
- field is also `time`. The same is true for the aggregations with the names
111
- `airline` and `responsetime`.
126
+ field also needs to be `time`.
127
+ <2> Likewise, the `term` aggregation is named `airline` and its field is also
128
+ named `airline`.
129
+ <3> Likewise, the `avg` aggregation is named `responsetime` and its field is
130
+ also named `responsetime`.
131
+
132
+ Your {dfeed} can contain multiple aggregations, but only the ones with names
133
+ that match values in the job configuration are fed to the job.
112
134
113
- IMPORTANT: Your {dfeed} can contain multiple aggregations, but only the ones
114
- with names that match values in the job configuration are fed to the job.
115
135
116
- {dfeeds-cap} support complex nested aggregations, this example uses the `derivative`
117
- pipeline aggregation to find the first order derivative of the counter
118
- `system.network.out.bytes` for each value of the field `beat.name`.
136
+ [discrete]
137
+ [[aggs-dfeeds]]
138
+ ==== Nested aggregations in {dfeeds}
139
+
140
+ {dfeeds-cap} support complex nested aggregations. This example uses the
141
+ `derivative` pipeline aggregation to find the first order derivative of the
142
+ counter `system.network.out.bytes` for each value of the field `beat.name`.
119
143
120
144
[source,js]
121
145
----------------------------------
@@ -154,6 +178,11 @@ pipeline aggregation to find the first order derivative of the counter
154
178
----------------------------------
155
179
// NOTCONSOLE
156
180
181
+
182
+ [discrete]
183
+ [[aggs-single-dfeeds]]
184
+ ==== Single bucket aggregations in {dfeeds}
185
+
157
186
{dfeeds-cap} not only supports multi-bucket aggregations, but also single bucket
158
187
aggregations. The following shows two `filter` aggregations, each gathering the
159
188
number of unique entries for the `error` field.
@@ -201,6 +230,11 @@ number of unique entries for the `error` field.
201
230
----------------------------------
202
231
// NOTCONSOLE
203
232
233
+
234
+ [discrete]
235
+ [[aggs-define-dfeeds]]
236
+ ==== Defining aggregations in {dfeeds}
237
+
204
238
When you define an aggregation in a {dfeed}, it must have the following form:
205
239
206
240
[source,js]
@@ -239,7 +273,7 @@ When you define an aggregation in a {dfeed}, it must have the following form:
239
273
The top level aggregation must be either a
240
274
{ref}/search-aggregations-bucket.html[bucket aggregation] containing as single
241
275
sub-aggregation that is a `date_histogram` or the top level aggregation is the
242
- required `date_histogram`. There must be exactly one `date_histogram`
276
+ required `date_histogram`. There must be exactly one `date_histogram`
243
277
aggregation. For more information, see
244
278
{ref}/search-aggregations-bucket-datehistogram-aggregation.html[Date histogram aggregation].
245
279
@@ -248,9 +282,9 @@ NOTE: The `time_zone` parameter in the date histogram aggregation must be set to
248
282
249
283
Each histogram bucket has a key, which is the bucket start time. This key cannot
250
284
be used for aggregations in {dfeeds}, however, because they need to know the
251
- time of the latest record within a bucket. Otherwise, when you restart a {dfeed},
252
- it continues from the start time of the histogram bucket and possibly fetches
253
- the same data twice. The max aggregation for the time field is therefore
285
+ time of the latest record within a bucket. Otherwise, when you restart a
286
+ {dfeed}, it continues from the start time of the histogram bucket and possibly
287
+ fetches the same data twice. The max aggregation for the time field is therefore
254
288
necessary to provide the time of the latest record within a bucket.
255
289
256
290
You can optionally specify a terms aggregation, which creates buckets for
@@ -280,8 +314,7 @@ GET .../_search {
280
314
By default, {es} limits the maximum number of terms returned to 10000. For high
281
315
cardinality fields, the query might not run. It might return errors related to
282
316
circuit breaking exceptions that indicate that the data is too large. In such
283
- cases, do not use aggregations in your {dfeed}. For more
284
- information, see
317
+ cases, do not use aggregations in your {dfeed}. For more information, see
285
318
{ref}/search-aggregations-bucket-terms-aggregation.html[Terms aggregation].
286
319
287
320
You can also optionally specify multiple sub-aggregations. The sub-aggregations
0 commit comments