Skip to content

Commit ab90ed6

Browse files
Sue-Gallagherpolyfractal
authored andcommitted
[DOCS] Add info on calendar vs fixed interval. (#31638)
Extensive edit to add additional information on the difference between calendar intervals and fixed-length intervals.
1 parent be3ddea commit ab90ed6

File tree

1 file changed

+185
-62
lines changed

1 file changed

+185
-62
lines changed

docs/reference/aggregations/bucket/datehistogram-aggregation.asciidoc

Lines changed: 185 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,129 @@
11
[[search-aggregations-bucket-datehistogram-aggregation]]
22
=== Date Histogram Aggregation
33

4-
A multi-bucket aggregation similar to the <<search-aggregations-bucket-histogram-aggregation,histogram>> except it can
5-
only be applied on date values. Since dates are represented in Elasticsearch internally as long values, it is possible
6-
to use the normal `histogram` on dates as well, though accuracy will be compromised. The reason for this is in the fact
7-
that time based intervals are not fixed (think of leap years and on the number of days in a month). For this reason,
8-
we need special support for time based data. From a functionality perspective, this histogram supports the same features
9-
as the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>. The main difference is that the interval can be specified by date/time expressions.
4+
This multi-bucket aggregation is similar to the normal
5+
<<search-aggregations-bucket-histogram-aggregation,histogram>>, but it can
6+
only be used with date values. Because dates are represented internally in
7+
Elasticsearch as long values, it is possible, but not as accurate, to use the
8+
normal `histogram` on dates as well. The main difference in the two APIs is
9+
that here the interval can be specified using date/time expressions. Time-based
10+
data requires special support because time-based intervals are not always a
11+
fixed length.
12+
13+
==== Setting intervals
14+
15+
There seems to be no limit to the creativity we humans apply to setting our
16+
clocks and calendars. We've invented leap years and leap seconds, standard and
17+
daylight savings times, and timezone offsets of 30 or 45 minutes rather than a
18+
full hour. While these creations help keep us in sync with the cosmos and our
19+
environment, they can make specifying time intervals accurately a real challenge.
20+
The only universal truth our researchers have yet to disprove is that a
21+
millisecond is always the same duration, and a second is always 1000 milliseconds.
22+
Beyond that, things get complicated.
23+
24+
Generally speaking, when you specify a single time unit, such as 1 hour or 1 day, you
25+
are working with a _calendar interval_, but multiples, such as 6 hours or 3 days, are
26+
_fixed-length intervals_.
27+
28+
For example, a specification of 1 day (1d) from now is a calendar interval that
29+
means "at
30+
this exact time tomorrow" no matter the length of the day. A change to or from
31+
daylight savings time that results in a 23 or 25 hour day is compensated for and the
32+
specification of "this exact time tomorrow" is maintained. But if you specify 2 or
33+
more days, each day must be of the same fixed duration (24 hours). In this case, if
34+
the specified interval includes the change to or from daylight savings time, the
35+
interval will end an hour sooner or later than you expect.
36+
37+
There are similar differences to consider when you specify single versus multiple
38+
minutes or hours. Multiple time periods longer than a day are not supported.
39+
40+
Here are the valid time specifications and their meanings:
41+
42+
milliseconds (ms) ::
43+
Fixed length interval; supports multiples.
44+
45+
seconds (s) ::
46+
1000 milliseconds; fixed length interval (except for the last second of a
47+
minute that contains a leap-second, which is 2000ms long); supports multiples.
48+
49+
minutes (m) ::
50+
All minutes begin at 00 seconds.
51+
52+
* One minute (1m) is the interval between 00 seconds of the first minute and 00
53+
seconds of the following minute in the specified timezone, compensating for any
54+
intervening leap seconds, so that the number of minutes and seconds past the
55+
hour is the same at the start and end.
56+
* Multiple minutes (__n__m) are intervals of exactly 60x1000=60,000 milliseconds
57+
each.
58+
59+
hours (h) ::
60+
All hours begin at 00 minutes and 00 seconds.
61+
62+
* One hour (1h) is the interval between 00:00 minutes of the first hour and 00:00
63+
minutes of the following hour in the specified timezone, compensating for any
64+
intervening leap seconds, so that the number of minutes and seconds past the hour
65+
is the same at the start and end.
66+
* Multiple hours (__n__h) are intervals of exactly 60x60x1000=3,600,000 milliseconds
67+
each.
68+
69+
days (d) ::
70+
All days begin at the earliest possible time, which is usually 00:00:00
71+
(midnight).
72+
73+
* One day (1d) is the interval between the start of the day and the start of
74+
of the following day in the specified timezone, compensating for any intervening
75+
time changes.
76+
* Multiple days (__n__d) are intervals of exactly 24x60x60x1000=86,400,000
77+
milliseconds each.
78+
79+
weeks (w) ::
80+
81+
* One week (1w) is the interval between the start day_of_week:hour:minute:second
82+
and the same day of the week and time of the following week in the specified
83+
timezone.
84+
* Multiple weeks (__n__w) are not supported.
85+
86+
months (M) ::
87+
88+
* One month (1M) is the interval between the start day of the month and time of
89+
day and the same day of the month and time of the following month in the specified
90+
timezone, so that the day of the month and time of day are the same at the start
91+
and end.
92+
* Multiple months (__n__M) are not supported.
93+
94+
quarters (q) ::
95+
96+
* One quarter (1q) is the interval between the start day of the month and
97+
time of day and the same day of the month and time of day three months later,
98+
so that the day of the month and time of day are the same at the start and end. +
99+
* Multiple quarters (__n__q) are not supported.
100+
101+
years (y) ::
102+
103+
* One year (1y) is the interval between the start day of the month and time of
104+
day and the same day of the month and time of day the following year in the
105+
specified timezone, so that the date and time are the same at the start and end. +
106+
* Multiple years (__n__y) are not supported.
107+
108+
NOTE:
109+
In all cases, when the specified end time does not exist, the actual end time is
110+
the closest available time after the specified end.
111+
112+
Widely distributed applications must also consider vagaries such as countries that
113+
start and stop daylight savings time at 12:01 A.M., so end up with one minute of
114+
Sunday followed by an additional 59 minutes of Saturday once a year, and countries
115+
that decide to move across the international date line. Situations like
116+
that can make irregular timezone offsets seem easy.
117+
118+
As always, rigorous testing, especially around time-change events, will ensure
119+
that your time interval specification is
120+
what you intend it to be.
121+
122+
WARNING:
123+
To avoid unexpected results, all connected servers and clients must sync to a
124+
reliable network time service.
125+
126+
==== Examples
10127

11128
Requesting bucket intervals of a month.
12129

@@ -27,13 +144,11 @@ POST /sales/_search?size=0
27144
// CONSOLE
28145
// TEST[setup:sales]
29146

30-
Available expressions for interval: `year` (`1y`), `quarter` (`1q`), `month` (`1M`), `week` (`1w`),
31-
`day` (`1d`), `hour` (`1h`), `minute` (`1m`), `second` (`1s`)
32-
33-
Time values can also be specified via abbreviations supported by <<time-units,time units>> parsing.
34-
Note that fractional time values are not supported, but you can address this by shifting to another
35-
time unit (e.g., `1.5h` could instead be specified as `90m`). Also note that time intervals larger than
36-
days do not support arbitrary values but can only be one unit large (e.g. `1y` is valid, `2y` is not).
147+
You can also specify time values using abbreviations supported by
148+
<<time-units,time units>> parsing.
149+
Note that fractional time values are not supported, but you can address this by
150+
shifting to another
151+
time unit (e.g., `1.5h` could instead be specified as `90m`).
37152

38153
[source,js]
39154
--------------------------------------------------
@@ -52,15 +167,16 @@ POST /sales/_search?size=0
52167
// CONSOLE
53168
// TEST[setup:sales]
54169

55-
==== Keys
170+
===== Keys
56171

57172
Internally, a date is represented as a 64 bit number representing a timestamp
58-
in milliseconds-since-the-epoch. These timestamps are returned as the bucket
59-
++key++s. The `key_as_string` is the same timestamp converted to a formatted
60-
date string using the format specified with the `format` parameter:
173+
in milliseconds-since-the-epoch (01/01/1970 midnight UTC). These timestamps are
174+
returned as the ++key++ name of the bucket. The `key_as_string` is the same
175+
timestamp converted to a formatted
176+
date string using the `format` parameter sprcification:
61177

62-
TIP: If no `format` is specified, then it will use the first date
63-
<<mapping-date-format,format>> specified in the field mapping.
178+
TIP: If you don't specify `format`, the first date
179+
<<mapping-date-format,format>> specified in the field mapping is used.
64180

65181
[source,js]
66182
--------------------------------------------------
@@ -113,15 +229,15 @@ Response:
113229
--------------------------------------------------
114230
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
115231

116-
==== Time Zone
232+
===== Timezone
117233

118234
Date-times are stored in Elasticsearch in UTC. By default, all bucketing and
119-
rounding is also done in UTC. The `time_zone` parameter can be used to indicate
120-
that bucketing should use a different time zone.
235+
rounding is also done in UTC. Use the `time_zone` parameter to indicate
236+
that bucketing should use a different timezone.
121237

122-
Time zones may either be specified as an ISO 8601 UTC offset (e.g. `+01:00` or
123-
`-08:00`) or as a timezone id, an identifier used in the TZ database like
124-
`America/Los_Angeles`.
238+
You can specify timezones as either an ISO 8601 UTC offset (e.g. `+01:00` or
239+
`-08:00`) or as a timezone ID as specified in the IANA timezone database,
240+
such as`America/Los_Angeles`.
125241

126242
Consider the following example:
127243

@@ -151,7 +267,7 @@ GET my_index/_search?size=0
151267
---------------------------------
152268
// CONSOLE
153269

154-
UTC is used if no time zone is specified, which would result in both of these
270+
If you don't specify a timezone, UTC is used. This would result in both of these
155271
documents being placed into the same day bucket, which starts at midnight UTC
156272
on 1 October 2015:
157273

@@ -174,8 +290,8 @@ on 1 October 2015:
174290
---------------------------------
175291
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
176292

177-
If a `time_zone` of `-01:00` is specified, then midnight starts at one hour before
178-
midnight UTC:
293+
If you specify a `time_zone` of `-01:00`, midnight in that timezone is one hour
294+
before midnight UTC:
179295

180296
[source,js]
181297
---------------------------------
@@ -223,28 +339,27 @@ second document falls into the bucket for 1 October 2015:
223339
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
224340

225341
<1> The `key_as_string` value represents midnight on each day
226-
in the specified time zone.
342+
in the specified timezone.
227343

228344
WARNING: When using time zones that follow DST (daylight savings time) changes,
229345
buckets close to the moment when those changes happen can have slightly different
230-
sizes than would be expected from the used `interval`.
346+
sizes than you would expect from the used `interval`.
231347
For example, consider a DST start in the `CET` time zone: on 27 March 2016 at 2am,
232-
clocks were turned forward 1 hour to 3am local time. When using `day` as `interval`,
348+
clocks were turned forward 1 hour to 3am local time. If you use `day` as `interval`,
233349
the bucket covering that day will only hold data for 23 hours instead of the usual
234-
24 hours for other buckets. The same is true for shorter intervals like e.g. 12h.
235-
Here, we will have only a 11h bucket on the morning of 27 March when the DST shift
350+
24 hours for other buckets. The same is true for shorter intervals, like 12h,
351+
where you'll have only a 11h bucket on the morning of 27 March when the DST shift
236352
happens.
237353

354+
===== Offset
238355

239-
==== Offset
240-
241-
The `offset` parameter is used to change the start value of each bucket by the
356+
Use the `offset` parameter to change the start value of each bucket by the
242357
specified positive (`+`) or negative offset (`-`) duration, such as `1h` for
243358
an hour, or `1d` for a day. See <<time-units>> for more possible time
244359
duration options.
245360

246-
For instance, when using an interval of `day`, each bucket runs from midnight
247-
to midnight. Setting the `offset` parameter to `+6h` would change each bucket
361+
For example, when using an interval of `day`, each bucket runs from midnight
362+
to midnight. Setting the `offset` parameter to `+6h` changes each bucket
248363
to run from 6am to 6am:
249364

250365
[source,js]
@@ -301,12 +416,13 @@ documents into buckets starting at 6am:
301416
-----------------------------
302417
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
303418

304-
NOTE: The start `offset` of each bucket is calculated after the `time_zone`
419+
NOTE: The start `offset` of each bucket is calculated after `time_zone`
305420
adjustments have been made.
306421

307-
==== Keyed Response
422+
===== Keyed Response
308423

309-
Setting the `keyed` flag to `true` will associate a unique string key with each bucket and return the ranges as a hash rather than an array:
424+
Setting the `keyed` flag to `true` associates a unique string key with each
425+
bucket and returns the ranges as a hash rather than an array:
310426

311427
[source,js]
312428
--------------------------------------------------
@@ -358,20 +474,25 @@ Response:
358474
--------------------------------------------------
359475
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
360476

361-
==== Scripts
477+
===== Scripts
362478

363-
Like with the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>, both document level scripts and
364-
value level scripts are supported. It is also possible to control the order of the returned buckets using the `order`
365-
settings and filter the returned buckets based on a `min_doc_count` setting (by default all buckets between the first
366-
bucket that matches documents and the last one are returned). This histogram also supports the `extended_bounds`
367-
setting, which enables extending the bounds of the histogram beyond the data itself (to read more on why you'd want to
368-
do that please refer to the explanation <<search-aggregations-bucket-histogram-aggregation-extended-bounds,here>>).
479+
As with the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>,
480+
both document-level scripts and
481+
value-level scripts are supported. You can control the order of the returned
482+
buckets using the `order`
483+
settings and filter the returned buckets based on a `min_doc_count` setting
484+
(by default all buckets between the first
485+
bucket that matches documents and the last one are returned). This histogram
486+
also supports the `extended_bounds`
487+
setting, which enables extending the bounds of the histogram beyond the data
488+
itself. For more information, see
489+
<<search-aggregations-bucket-histogram-aggregation-extended-bounds,`Extended Bounds`>>.
369490

370-
==== Missing value
491+
===== Missing value
371492

372-
The `missing` parameter defines how documents that are missing a value should be treated.
373-
By default they will be ignored but it is also possible to treat them as if they
374-
had a value.
493+
The `missing` parameter defines how to treat documents that are missing a value.
494+
By default, they are ignored, but it is also possible to treat them as if they
495+
have a value.
375496

376497
[source,js]
377498
--------------------------------------------------
@@ -391,20 +512,22 @@ POST /sales/_search?size=0
391512
// CONSOLE
392513
// TEST[setup:sales]
393514

394-
<1> Documents without a value in the `publish_date` field will fall into the same bucket as documents that have the value `2000-01-01`.
515+
<1> Documents without a value in the `publish_date` field will fall into the
516+
same bucket as documents that have the value `2000-01-01`.
395517

396-
==== Order
518+
===== Order
397519

398-
By default the returned buckets are sorted by their `key` ascending, though the order behaviour can be controlled using
399-
the `order` setting. Supports the same `order` functionality as the <<search-aggregations-bucket-terms-aggregation-order,`Terms Aggregation`>>.
520+
By default the returned buckets are sorted by their `key` ascending, but you can
521+
control the order using
522+
the `order` setting. This setting supports the same `order` functionality as
523+
<<search-aggregations-bucket-terms-aggregation-order,`Terms Aggregation`>>.
400524

401525
deprecated[6.0.0, Use `_key` instead of `_time` to order buckets by their dates/keys]
402526

403-
==== Use of a script to aggregate by day of the week
527+
===== Using a script to aggregate by day of the week
404528

405-
There are some cases where date histogram can't help us, like for example, when we need
406-
to aggregate the results by day of the week.
407-
In this case to overcome the problem, we can use a script that returns the day of the week:
529+
When you need to aggregate the results by day of the week, use a script that
530+
returns the day of the week:
408531

409532

410533
[source,js]
@@ -452,5 +575,5 @@ Response:
452575
--------------------------------------------------
453576
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
454577

455-
The response will contain all the buckets having as key the relative day of
456-
the week: 1 for Monday, 2 for Tuesday... 7 for Sunday.
578+
The response will contain all the buckets having the relative day of
579+
the week as key : 1 for Monday, 2 for Tuesday... 7 for Sunday.

0 commit comments

Comments
 (0)