@@ -26,49 +26,198 @@ experimental[]
2626[[rollup-put-job-api-desc]]
2727==== {api-description-title}
2828
29+ The {rollup-job} configuration contains all the details about how the job should
30+ run, when it indexes documents, and what future queries will be able to execute
31+ against the rollup index.
32+
33+ There are three main sections to the job configuration: the logistical details
34+ about the job (cron schedule, etc), the fields that are used for grouping, and
35+ what metrics to collect for each group.
36+
2937Jobs are created in a `STOPPED` state. You can start them with the
3038<<rollup-start-job,start {rollup-jobs} API>>.
3139
3240[[rollup-put-job-api-path-params]]
3341==== {api-path-parms-title}
3442
3543`<job_id>`::
36- (Required, string) Identifier for the {rollup-job}.
44+ (Required, string) Identifier for the {rollup-job}. This can be any
45+ alphanumeric string and uniquely identifies the data that is associated with
46+ the {rollup-job}. The ID is persistent; it is stored with the rolled up data.
47+ If you create a job, let it run for a while, then delete the job, the data
48+ that the job rolled up is still be associated with this job ID. You cannot
49+ create a new job with the same ID since that could lead to problems with
50+ mismatched job configurations.
3751
3852[[rollup-put-job-api-request-body]]
3953==== {api-request-body-title}
4054
4155`cron`::
42- (Required, string) A cron string which defines when the {rollup-job} should be executed.
56+ (Required, string) A cron string which defines the intervals when the
57+ {rollup-job} should be executed. When the interval triggers, the indexer
58+ attempts to rollup the data in the index pattern. The cron pattern is
59+ unrelated to the time interval of the data being rolled up. For example, you
60+ may wish to create hourly rollups of your document but to only run the indexer
61+ on a daily basis at midnight, as defined by the cron. The cron pattern is
62+ defined just like a {watcher} cron schedule.
4363
64+ [[rollup-groups-config]]
4465`groups`::
45- (Required, object) Defines the grouping fields that are defined for this
46- {rollup-job}. See <<rollup-job-config,{rollup-job} config>>.
66+ (Required, object) Defines the grouping fields and aggregations that are
67+ defined for this {rollup-job}. These fields will then be available later for
68+ aggregating into buckets.
69+ +
70+ --
71+ These aggs and fields can be used in any combination. Think of the `groups`
72+ configuration as defining a set of tools that can later be used in aggregations
73+ to partition the data. Unlike raw data, we have to think ahead to which fields
74+ and aggregations might be used. Rollups provide enough flexibility that you
75+ simply need to determine _which_ fields are needed, not _in what order_ they are
76+ needed.
77+
78+ There are three types of groupings currently available:
79+ --
80+
81+ `date_histogram`:::
82+ (Required, object) A date histogram group aggregates a `date` field into
83+ time-based buckets. This group is *mandatory*; you currently cannot rollup
84+ documents without a timestamp and a `date_histogram` group. The
85+ `date_histogram` group has several parameters:
86+
87+ `field`::::
88+ (Required, string) The date field that is to be rolled up.
89+
90+ `calendar_interval` or `fixed_interval`::::
91+ (Required, <<time-units,time units>>) The interval of time buckets to be
92+ generated when rolling up. For example, `60m` produces 60 minute (hourly)
93+ rollups. This follows standard time formatting syntax as used elsewhere in
94+ {es}. The interval defines the _minimum_ interval that can be aggregated only.
95+ If hourly (`60m`) intervals are configured, <<rollup-search,rollup search>>
96+ can execute aggregations with 60m or greater (weekly, monthly, etc) intervals.
97+ So define the interval as the smallest unit that you wish to later query. For
98+ more information about the difference between calendar and fixed time
99+ intervals, see <<rollup-understanding-group-intervals>>.
100+ +
101+ --
102+ NOTE: Smaller, more granular intervals take up proportionally more space.
103+
104+ --
105+
106+ `delay`::::
107+ (Optional,<<time-units,time units>>) How long to wait before rolling up new
108+ documents. By default, the indexer attempts to roll up all data that is
109+ available. However, it is not uncommon for data to arrive out of order,
110+ sometimes even a few days late. The indexer is unable to deal with data that
111+ arrives after a time-span has been rolled up. That is to say, there is no
112+ provision to update already-existing rollups.
113+ +
114+ --
115+ Instead, you should specify a `delay` that matches the longest period of time
116+ you expect out-of-order data to arrive. For example, a `delay` of `1d`
117+ instructs the indexer to roll up documents up to `now - 1d`, which provides
118+ a day of buffer time for out-of-order documents to arrive.
119+ --
120+
121+ `time_zone`::::
122+ (Optional, string) Defines what time_zone the rollup documents are stored as.
123+ Unlike raw data, which can shift timezones on the fly, rolled documents have
124+ to be stored with a specific timezone. By default, rollup documents are stored
125+ in `UTC`.
126+
127+ `terms`:::
128+ (Optional, object) The terms group can be used on `keyword` or numeric fields
129+ to allow bucketing via the `terms` aggregation at a later point. The indexer
130+ enumerates and stores _all_ values of a field for each time-period. This can
131+ be potentially costly for high-cardinality groups such as IP addresses,
132+ especially if the time-bucket is particularly sparse.
133+ +
134+ --
135+ TIP: While it is unlikely that a rollup will ever be larger in size than the raw
136+ data, defining `terms` groups on multiple high-cardinality fields can
137+ effectively reduce the compression of a rollup to a large extent. You should be
138+ judicious which high-cardinality fields are included for that reason.
139+
140+ The `terms` group has a single parameter:
141+ --
142+
143+ `fields`::::
144+ (Required, string) The set of fields that you wish to collect terms for. This
145+ array can contain fields that are both `keyword` and numerics. Order does not
146+ matter.
147+
148+ `histogram`:::
149+ (Optional, object) The histogram group aggregates one or more numeric fields
150+ into numeric histogram intervals.
151+ +
152+ --
153+ The `histogram` group has a two parameters:
154+ --
155+
156+ `fields`::::
157+ (Required, array) The set of fields that you wish to build histograms for. All fields
158+ specified must be some kind of numeric. Order does not matter.
159+
160+ `interval`::::
161+ (Required, integer) The interval of histogram buckets to be generated when
162+ rolling up. For example, a value of `5` creates buckets that are five units
163+ wide (`0-5`, `5-10`, etc). Note that only one interval can be specified in the
164+ `histogram` group, meaning that all fields being grouped via the histogram
165+ must share the same interval.
47166
48167`index_pattern`::
49168 (Required, string) The index or index pattern to roll up. Supports
50- wildcard-style patterns (`logstash-*`).
169+ wildcard-style patterns (`logstash-*`). The job will
170+ attempt to rollup the entire index or index-pattern.
171+ +
172+ --
173+ NOTE: The `index_pattern` cannot be a pattern that would also match the
174+ destination `rollup_index`. For example, the pattern `foo-*` would match the
175+ rollup index `foo-rollup`. This situation would cause problems because the
176+ {rollup-job} would attempt to rollup its own data at runtime. If you attempt to
177+ configure a pattern that matches the `rollup_index`, an exception occurs to
178+ prevent this behavior.
179+
180+ --
51181
182+ [[rollup-metrics-config]]
52183`metrics`::
53- (Optional, object) Defines the metrics to collect for each grouping tuple. See
54- <<rollup-job-config,{rollup-job} config>>.
184+ (Optional, object) Defines the metrics to collect for each grouping tuple.
185+ By default, only the doc_counts are collected for each group. To make rollup
186+ useful, you will often add metrics like averages, mins, maxes, etc. Metrics
187+ are defined on a per-field basis and for each field you configure which metric
188+ should be collected.
189+ +
190+ --
191+ The `metrics` configuration accepts an array of objects, where each object has
192+ two parameters:
193+ --
194+
195+ `field`:::
196+ (Required, string) The field to collect metrics for. This must be a numeric
197+ of some kind.
198+
199+ `metrics`:::
200+ (Required, array) An array of metrics to collect for the field. At least one
201+ metric must be configured. Acceptable metrics are `min`,`max`,`sum`,`avg`, and
202+ `value_count`.
55203
56204`page_size`::
57205 (Required, integer) The number of bucket results that are processed on each
58206 iteration of the rollup indexer. A larger value tends to execute faster, but
59- requires more memory during processing.
207+ requires more memory during processing. This value has no effect on how the
208+ data is rolled up; it is merely used for tweaking the speed or memory cost of
209+ the indexer.
60210
61211`rollup_index`::
62212 (Required, string) The index that contains the rollup results. The index can
63- be shared with other {rollup-jobs}.
64-
65- For more details about the job configuration, see <<rollup-job-config>>.
213+ be shared with other {rollup-jobs}. The data is stored so that it doesn't
214+ interfere with unrelated jobs.
66215
67216[[rollup-put-job-api-example]]
68217==== {api-example-title}
69218
70- The following example creates a {rollup-job} named " sensor" , targeting the
71- " sensor-*" index pattern:
219+ The following example creates a {rollup-job} named ` sensor` , targeting the
220+ ` sensor-*` index pattern:
72221
73222[source,console]
74223--------------------------------------------------
@@ -78,7 +227,7 @@ PUT _rollup/job/sensor
78227 "rollup_index": "sensor_rollup",
79228 "cron": "*/30 * * * * ?",
80229 "page_size" :1000,
81- "groups" : {
230+ "groups" : { <1>
82231 "date_histogram": {
83232 "field": "timestamp",
84233 "fixed_interval": "1h",
@@ -88,7 +237,7 @@ PUT _rollup/job/sensor
88237 "fields": ["node"]
89238 }
90239 },
91- "metrics": [
240+ "metrics": [ <2>
92241 {
93242 "field": "temperature",
94243 "metrics": ["min", "max", "sum"]
@@ -101,6 +250,11 @@ PUT _rollup/job/sensor
101250}
102251--------------------------------------------------
103252// TEST[setup:sensor_index]
253+ <1> This configuration enables date histograms to be used on the `timestamp`
254+ field and `terms` aggregations to be used on the `node` field.
255+ <2> This configuration defines metrics over two fields: `temperature` and
256+ `voltage`. For the `temperature` field, we are collecting the min, max, and
257+ sum of the temperature. For `voltage`, we are collecting the average.
104258
105259When the job is created, you receive the following results:
106260
@@ -109,4 +263,4 @@ When the job is created, you receive the following results:
109263{
110264 "acknowledged": true
111265}
112- ----
266+ ----
0 commit comments