Skip to content

Commit a4efab6

Browse files
authored
[DOCS] Merge rollup config details into API (#49412)
1 parent ddf5c0a commit a4efab6

File tree

8 files changed

+267
-315
lines changed

8 files changed

+267
-315
lines changed

docs/java-rest/high-level/rollup/put_job.asciidoc

+3-3
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ include-tagged::{doc-tests}/RollupDocumentationIT.java[x-pack-rollup-put-rollup-
2121
==== Rollup Job Configuration
2222

2323
The `RollupJobConfig` object contains all the details about the rollup job
24-
configuration. See {ref}/rollup-job-config.html[Rollup configuration] to learn more
24+
configuration. See {ref}/rollup-put-job.html[create rollup job API] to learn more
2525
about the various configuration settings.
2626

2727
A `RollupJobConfig` requires the following arguments:
@@ -45,7 +45,7 @@ include-tagged::{doc-tests}/RollupDocumentationIT.java[x-pack-rollup-put-rollup-
4545

4646
The grouping configuration of the Rollup job is defined in the `RollupJobConfig`
4747
using a `GroupConfig` instance. `GroupConfig` reflects all the configuration
48-
settings that can be defined using the REST API. See {ref}/rollup-job-config.html#rollup-groups-config[Grouping Config]
48+
settings that can be defined using the REST API. See {ref}/rollup-put-job.html#rollup-groups-config[Grouping config]
4949
to learn more about these settings.
5050

5151
Using the REST API, we could define this grouping configuration:
@@ -89,7 +89,7 @@ include-tagged::{doc-tests}/RollupDocumentationIT.java[x-pack-rollup-put-rollup-
8989
After defining which groups should be generated for the data, you next configure
9090
which metrics should be collected. The list of metrics is defined in the `RollupJobConfig`
9191
using a `List<MetricConfig>` instance. `MetricConfig` reflects all the configuration
92-
settings that can be defined using the REST API. See {ref}/rollup-job-config.html#rollup-metrics-config[Metrics Config]
92+
settings that can be defined using the REST API. See {ref}/rollup-put-job.html#rollup-metrics-config[Metrics config]
9393
to learn more about these settings.
9494

9595
Using the REST API, we could define this metrics configuration:

docs/reference/aggregations/bucket/datehistogram-aggregation.asciidoc

+11-6
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[[search-aggregations-bucket-datehistogram-aggregation]]
2-
=== Date Histogram Aggregation
2+
=== Date histogram aggregation
33

44
This multi-bucket aggregation is similar to the normal
55
<<search-aggregations-bucket-histogram-aggregation,histogram>>, but it can
@@ -10,7 +10,8 @@ that here the interval can be specified using date/time expressions. Time-based
1010
data requires special support because time-based intervals are not always a
1111
fixed length.
1212

13-
==== Calendar and Fixed intervals
13+
[[calendar_and_fixed_intervals]]
14+
==== Calendar and fixed intervals
1415

1516
When configuring a date histogram aggregation, the interval can be specified
1617
in two manners: calendar-aware time intervals, and fixed time intervals.
@@ -42,7 +43,8 @@ are clear to the user immediately and there is no ambiguity. The old `interval`
4243
will be removed in the future.
4344
==================================
4445

45-
===== Calendar Intervals
46+
[[calendar_intervals]]
47+
===== Calendar intervals
4648

4749
Calendar-aware intervals are configured with the `calendar_interval` parameter.
4850
Calendar intervals can only be specified in "singular" quantities of the unit
@@ -100,7 +102,8 @@ One year (1y) is the interval between the start day of the month and time of
100102
day and the same day of the month and time of day the following year in the
101103
specified timezone, so that the date and time are the same at the start and end. +
102104

103-
===== Calendar Interval Examples
105+
[[calendar_interval_examples]]
106+
===== Calendar interval examples
104107
As an example, here is an aggregation requesting bucket intervals of a month in calendar time:
105108

106109
[source,console]
@@ -157,7 +160,8 @@ POST /sales/_search?size=0
157160
--------------------------------------------------
158161
// NOTCONSOLE
159162

160-
===== Fixed Intervals
163+
[[fixed_intervals]]
164+
===== Fixed intervals
161165

162166
Fixed intervals are configured with the `fixed_interval` parameter.
163167

@@ -192,7 +196,8 @@ All days begin at the earliest possible time, which is usually 00:00:00
192196

193197
Defined as 24 hours (86,400,000 milliseconds)
194198

195-
===== Fixed Interval Examples
199+
[[fixed_interval_examples]]
200+
===== Fixed interval examples
196201

197202
If we try to recreate the "month" `calendar_interval` from earlier, we can approximate that with
198203
30 fixed days:

docs/reference/redirects.asciidoc

+6-1
Original file line numberDiff line numberDiff line change
@@ -1040,4 +1040,9 @@ See <<ad-realm-configuration>>.
10401040
[role="exclude",id="how-security-works"]
10411041
=== How security works
10421042

1043-
See <<elasticsearch-security>>.
1043+
See <<elasticsearch-security>>.
1044+
1045+
[role="exclude",id="rollup-job-config"]
1046+
=== Rollup job configuration
1047+
1048+
See <<rollup-put-job-api-request-body>>.

docs/reference/rollup/apis/put-job.asciidoc

+170-16
Original file line numberDiff line numberDiff line change
@@ -26,49 +26,198 @@ experimental[]
2626
[[rollup-put-job-api-desc]]
2727
==== {api-description-title}
2828

29+
The {rollup-job} configuration contains all the details about how the job should
30+
run, when it indexes documents, and what future queries will be able to execute
31+
against the rollup index.
32+
33+
There are three main sections to the job configuration: the logistical details
34+
about the job (cron schedule, etc), the fields that are used for grouping, and
35+
what metrics to collect for each group.
36+
2937
Jobs are created in a `STOPPED` state. You can start them with the
3038
<<rollup-start-job,start {rollup-jobs} API>>.
3139

3240
[[rollup-put-job-api-path-params]]
3341
==== {api-path-parms-title}
3442

3543
`<job_id>`::
36-
(Required, string) Identifier for the {rollup-job}.
44+
(Required, string) Identifier for the {rollup-job}. This can be any
45+
alphanumeric string and uniquely identifies the data that is associated with
46+
the {rollup-job}. The ID is persistent; it is stored with the rolled up data.
47+
If you create a job, let it run for a while, then delete the job, the data
48+
that the job rolled up is still be associated with this job ID. You cannot
49+
create a new job with the same ID since that could lead to problems with
50+
mismatched job configurations.
3751

3852
[[rollup-put-job-api-request-body]]
3953
==== {api-request-body-title}
4054

4155
`cron`::
42-
(Required, string) A cron string which defines when the {rollup-job} should be executed.
56+
(Required, string) A cron string which defines the intervals when the
57+
{rollup-job} should be executed. When the interval triggers, the indexer
58+
attempts to rollup the data in the index pattern. The cron pattern is
59+
unrelated to the time interval of the data being rolled up. For example, you
60+
may wish to create hourly rollups of your document but to only run the indexer
61+
on a daily basis at midnight, as defined by the cron. The cron pattern is
62+
defined just like a {watcher} cron schedule.
4363

64+
[[rollup-groups-config]]
4465
`groups`::
45-
(Required, object) Defines the grouping fields that are defined for this
46-
{rollup-job}. See <<rollup-job-config,{rollup-job} config>>.
66+
(Required, object) Defines the grouping fields and aggregations that are
67+
defined for this {rollup-job}. These fields will then be available later for
68+
aggregating into buckets.
69+
+
70+
--
71+
These aggs and fields can be used in any combination. Think of the `groups`
72+
configuration as defining a set of tools that can later be used in aggregations
73+
to partition the data. Unlike raw data, we have to think ahead to which fields
74+
and aggregations might be used. Rollups provide enough flexibility that you
75+
simply need to determine _which_ fields are needed, not _in what order_ they are
76+
needed.
77+
78+
There are three types of groupings currently available:
79+
--
80+
81+
`date_histogram`:::
82+
(Required, object) A date histogram group aggregates a `date` field into
83+
time-based buckets. This group is *mandatory*; you currently cannot rollup
84+
documents without a timestamp and a `date_histogram` group. The
85+
`date_histogram` group has several parameters:
86+
87+
`field`::::
88+
(Required, string) The date field that is to be rolled up.
89+
90+
`calendar_interval` or `fixed_interval`::::
91+
(Required, <<time-units,time units>>) The interval of time buckets to be
92+
generated when rolling up. For example, `60m` produces 60 minute (hourly)
93+
rollups. This follows standard time formatting syntax as used elsewhere in
94+
{es}. The interval defines the _minimum_ interval that can be aggregated only.
95+
If hourly (`60m`) intervals are configured, <<rollup-search,rollup search>>
96+
can execute aggregations with 60m or greater (weekly, monthly, etc) intervals.
97+
So define the interval as the smallest unit that you wish to later query. For
98+
more information about the difference between calendar and fixed time
99+
intervals, see <<rollup-understanding-group-intervals>>.
100+
+
101+
--
102+
NOTE: Smaller, more granular intervals take up proportionally more space.
103+
104+
--
105+
106+
`delay`::::
107+
(Optional,<<time-units,time units>>) How long to wait before rolling up new
108+
documents. By default, the indexer attempts to roll up all data that is
109+
available. However, it is not uncommon for data to arrive out of order,
110+
sometimes even a few days late. The indexer is unable to deal with data that
111+
arrives after a time-span has been rolled up. That is to say, there is no
112+
provision to update already-existing rollups.
113+
+
114+
--
115+
Instead, you should specify a `delay` that matches the longest period of time
116+
you expect out-of-order data to arrive. For example, a `delay` of `1d`
117+
instructs the indexer to roll up documents up to `now - 1d`, which provides
118+
a day of buffer time for out-of-order documents to arrive.
119+
--
120+
121+
`time_zone`::::
122+
(Optional, string) Defines what time_zone the rollup documents are stored as.
123+
Unlike raw data, which can shift timezones on the fly, rolled documents have
124+
to be stored with a specific timezone. By default, rollup documents are stored
125+
in `UTC`.
126+
127+
`terms`:::
128+
(Optional, object) The terms group can be used on `keyword` or numeric fields
129+
to allow bucketing via the `terms` aggregation at a later point. The indexer
130+
enumerates and stores _all_ values of a field for each time-period. This can
131+
be potentially costly for high-cardinality groups such as IP addresses,
132+
especially if the time-bucket is particularly sparse.
133+
+
134+
--
135+
TIP: While it is unlikely that a rollup will ever be larger in size than the raw
136+
data, defining `terms` groups on multiple high-cardinality fields can
137+
effectively reduce the compression of a rollup to a large extent. You should be
138+
judicious which high-cardinality fields are included for that reason.
139+
140+
The `terms` group has a single parameter:
141+
--
142+
143+
`fields`::::
144+
(Required, string) The set of fields that you wish to collect terms for. This
145+
array can contain fields that are both `keyword` and numerics. Order does not
146+
matter.
147+
148+
`histogram`:::
149+
(Optional, object) The histogram group aggregates one or more numeric fields
150+
into numeric histogram intervals.
151+
+
152+
--
153+
The `histogram` group has a two parameters:
154+
--
155+
156+
`fields`::::
157+
(Required, array) The set of fields that you wish to build histograms for. All fields
158+
specified must be some kind of numeric. Order does not matter.
159+
160+
`interval`::::
161+
(Required, integer) The interval of histogram buckets to be generated when
162+
rolling up. For example, a value of `5` creates buckets that are five units
163+
wide (`0-5`, `5-10`, etc). Note that only one interval can be specified in the
164+
`histogram` group, meaning that all fields being grouped via the histogram
165+
must share the same interval.
47166

48167
`index_pattern`::
49168
(Required, string) The index or index pattern to roll up. Supports
50-
wildcard-style patterns (`logstash-*`).
169+
wildcard-style patterns (`logstash-*`). The job will
170+
attempt to rollup the entire index or index-pattern.
171+
+
172+
--
173+
NOTE: The `index_pattern` cannot be a pattern that would also match the
174+
destination `rollup_index`. For example, the pattern `foo-*` would match the
175+
rollup index `foo-rollup`. This situation would cause problems because the
176+
{rollup-job} would attempt to rollup its own data at runtime. If you attempt to
177+
configure a pattern that matches the `rollup_index`, an exception occurs to
178+
prevent this behavior.
179+
180+
--
51181

182+
[[rollup-metrics-config]]
52183
`metrics`::
53-
(Optional, object) Defines the metrics to collect for each grouping tuple. See
54-
<<rollup-job-config,{rollup-job} config>>.
184+
(Optional, object) Defines the metrics to collect for each grouping tuple.
185+
By default, only the doc_counts are collected for each group. To make rollup
186+
useful, you will often add metrics like averages, mins, maxes, etc. Metrics
187+
are defined on a per-field basis and for each field you configure which metric
188+
should be collected.
189+
+
190+
--
191+
The `metrics` configuration accepts an array of objects, where each object has
192+
two parameters:
193+
--
194+
195+
`field`:::
196+
(Required, string) The field to collect metrics for. This must be a numeric
197+
of some kind.
198+
199+
`metrics`:::
200+
(Required, array) An array of metrics to collect for the field. At least one
201+
metric must be configured. Acceptable metrics are `min`,`max`,`sum`,`avg`, and
202+
`value_count`.
55203

56204
`page_size`::
57205
(Required, integer) The number of bucket results that are processed on each
58206
iteration of the rollup indexer. A larger value tends to execute faster, but
59-
requires more memory during processing.
207+
requires more memory during processing. This value has no effect on how the
208+
data is rolled up; it is merely used for tweaking the speed or memory cost of
209+
the indexer.
60210

61211
`rollup_index`::
62212
(Required, string) The index that contains the rollup results. The index can
63-
be shared with other {rollup-jobs}.
64-
65-
For more details about the job configuration, see <<rollup-job-config>>.
213+
be shared with other {rollup-jobs}. The data is stored so that it doesn't
214+
interfere with unrelated jobs.
66215

67216
[[rollup-put-job-api-example]]
68217
==== {api-example-title}
69218

70-
The following example creates a {rollup-job} named "sensor", targeting the
71-
"sensor-*" index pattern:
219+
The following example creates a {rollup-job} named `sensor`, targeting the
220+
`sensor-*` index pattern:
72221

73222
[source,console]
74223
--------------------------------------------------
@@ -78,7 +227,7 @@ PUT _rollup/job/sensor
78227
"rollup_index": "sensor_rollup",
79228
"cron": "*/30 * * * * ?",
80229
"page_size" :1000,
81-
"groups" : {
230+
"groups" : { <1>
82231
"date_histogram": {
83232
"field": "timestamp",
84233
"fixed_interval": "1h",
@@ -88,7 +237,7 @@ PUT _rollup/job/sensor
88237
"fields": ["node"]
89238
}
90239
},
91-
"metrics": [
240+
"metrics": [ <2>
92241
{
93242
"field": "temperature",
94243
"metrics": ["min", "max", "sum"]
@@ -101,6 +250,11 @@ PUT _rollup/job/sensor
101250
}
102251
--------------------------------------------------
103252
// TEST[setup:sensor_index]
253+
<1> This configuration enables date histograms to be used on the `timestamp`
254+
field and `terms` aggregations to be used on the `node` field.
255+
<2> This configuration defines metrics over two fields: `temperature` and
256+
`voltage`. For the `temperature` field, we are collecting the min, max, and
257+
sum of the temperature. For `voltage`, we are collecting the average.
104258

105259
When the job is created, you receive the following results:
106260

@@ -109,4 +263,4 @@ When the job is created, you receive the following results:
109263
{
110264
"acknowledged": true
111265
}
112-
----
266+
----

0 commit comments

Comments
 (0)