Skip to content

Commit 07c76f2

Browse files
nik9000jrodewig
andauthored
Update date_histogram docs (#56922) (#57387)
* Make it more clear that you can use `month` or `1M`. * Explain rounding rules * Consistently use "time zone" instead of "timezone". It looks like both are right but I see "time zone" much more. And the parameter in elasticsearch is `time_zone` so we may as well line up. Closes #56760 Co-authored-by: James Rodewig <[email protected]>
1 parent d5e86d7 commit 07c76f2

File tree

1 file changed

+72
-46
lines changed

1 file changed

+72
-46
lines changed

docs/reference/aggregations/bucket/datehistogram-aggregation.asciidoc

Lines changed: 72 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,15 @@ that here the interval can be specified using date/time expressions. Time-based
1010
data requires special support because time-based intervals are not always a
1111
fixed length.
1212

13+
Like the histogram, values are rounded *down* into the closest bucket. For
14+
example, if the interval is a calendar day, `2020-01-03T07:00:01Z` is rounded to
15+
`2020-01-03T00:00:00Z`. Values are rounded as follows:
16+
17+
[source,java]
18+
----
19+
bucket_key = Math.floor(value / interval) * interval)
20+
----
21+
1322
[[calendar_and_fixed_intervals]]
1423
==== Calendar and fixed intervals
1524

@@ -47,59 +56,60 @@ will be removed in the future.
4756
===== Calendar intervals
4857

4958
Calendar-aware intervals are configured with the `calendar_interval` parameter.
50-
Calendar intervals can only be specified in "singular" quantities of the unit
51-
(`1d`, `1M`, etc). Multiples, such as `2d`, are not supported and will throw an exception.
59+
You can specify calendar intervals using the unit name, such as `month`, or as a
60+
single unit quantity, such as `1M`. For example, `day` and `1d` are equivalent.
61+
Multiple quantities, such as `2d`, are not supported.
5262

53-
The accepted units for calendar intervals are:
63+
The accepted calendar intervals are:
5464

55-
minute (`1m`) ::
65+
`minute`, `1m` ::
5666

5767
All minutes begin at 00 seconds.
5868
One minute is the interval between 00 seconds of the first minute and 00
59-
seconds of the following minute in the specified timezone, compensating for any
69+
seconds of the following minute in the specified time zone, compensating for any
6070
intervening leap seconds, so that the number of minutes and seconds past the
6171
hour is the same at the start and end.
6272

63-
hour (`1h`) ::
73+
`hour`, `1h` ::
6474

6575
All hours begin at 00 minutes and 00 seconds.
6676
One hour (1h) is the interval between 00:00 minutes of the first hour and 00:00
67-
minutes of the following hour in the specified timezone, compensating for any
77+
minutes of the following hour in the specified time zone, compensating for any
6878
intervening leap seconds, so that the number of minutes and seconds past the hour
6979
is the same at the start and end.
7080

71-
day (`1d`) ::
81+
`day`, `1d` ::
7282

7383
All days begin at the earliest possible time, which is usually 00:00:00
7484
(midnight).
7585
One day (1d) is the interval between the start of the day and the start of
76-
of the following day in the specified timezone, compensating for any intervening
86+
of the following day in the specified time zone, compensating for any intervening
7787
time changes.
7888

79-
week (`1w`) ::
89+
`week`, `1w` ::
8090

8191
One week is the interval between the start day_of_week:hour:minute:second
8292
and the same day of the week and time of the following week in the specified
83-
timezone.
93+
time zone.
8494

85-
month (`1M`) ::
95+
`month`, `1M` ::
8696

8797
One month is the interval between the start day of the month and time of
8898
day and the same day of the month and time of the following month in the specified
89-
timezone, so that the day of the month and time of day are the same at the start
99+
time zone, so that the day of the month and time of day are the same at the start
90100
and end.
91101

92-
quarter (`1q`) ::
102+
`quarter`, `1q` ::
93103

94-
One quarter (1q) is the interval between the start day of the month and
104+
One quarter is the interval between the start day of the month and
95105
time of day and the same day of the month and time of day three months later,
96106
so that the day of the month and time of day are the same at the start and end. +
97107

98-
year (`1y`) ::
108+
`year`, `1y` ::
99109

100-
One year (1y) is the interval between the start day of the month and time of
110+
One year is the interval between the start day of the month and time of
101111
day and the same day of the month and time of day the following year in the
102-
specified timezone, so that the date and time are the same at the start and end. +
112+
specified time zone, so that the date and time are the same at the start and end. +
103113

104114
[[calendar_interval_examples]]
105115
===== Calendar interval examples
@@ -166,7 +176,7 @@ Fixed intervals are configured with the `fixed_interval` parameter.
166176

167177
In contrast to calendar-aware intervals, fixed intervals are a fixed number of SI
168178
units and never deviate, regardless of where they fall on the calendar. One second
169-
is always composed of 1000ms. This allows fixed intervals to be specified in
179+
is always composed of `1000ms`. This allows fixed intervals to be specified in
170180
any multiple of the supported units.
171181

172182
However, it means fixed intervals cannot express other units such as months,
@@ -175,23 +185,24 @@ a calendar interval like month or quarter will throw an exception.
175185

176186
The accepted units for fixed intervals are:
177187

178-
milliseconds (ms) ::
188+
milliseconds (`ms`) ::
189+
A single millisecond. This is a very, very small interval.
179190

180-
seconds (s) ::
181-
Defined as 1000 milliseconds each
191+
seconds (`s`) ::
192+
Defined as 1000 milliseconds each.
182193

183-
minutes (m) ::
194+
minutes (`m`) ::
195+
Defined as 60 seconds each (60,000 milliseconds).
184196
All minutes begin at 00 seconds.
185-
Defined as 60 seconds each (60,000 milliseconds)
186197

187-
hours (h) ::
198+
hours (`h`) ::
199+
Defined as 60 minutes each (3,600,000 milliseconds).
188200
All hours begin at 00 minutes and 00 seconds.
189-
Defined as 60 minutes each (3,600,000 milliseconds)
190201

191-
days (d) ::
202+
days (`d`) ::
203+
Defined as 24 hours (86,400,000 milliseconds).
192204
All days begin at the earliest possible time, which is usually 00:00:00
193205
(midnight).
194-
Defined as 24 hours (86,400,000 milliseconds)
195206

196207
[[fixed_interval_examples]]
197208
===== Fixed interval examples
@@ -261,7 +272,7 @@ Widely distributed applications must also consider vagaries such as countries th
261272
start and stop daylight savings time at 12:01 A.M., so end up with one minute of
262273
Sunday followed by an additional 59 minutes of Saturday once a year, and countries
263274
that decide to move across the international date line. Situations like
264-
that can make irregular timezone offsets seem easy.
275+
that can make irregular time zone offsets seem easy.
265276

266277
As always, rigorous testing, especially around time-change events, will ensure
267278
that your time interval specification is
@@ -338,15 +349,30 @@ Response:
338349
--------------------------------------------------
339350
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
340351

341-
===== Timezone
352+
===== Time zone
342353

343-
Date-times are stored in Elasticsearch in UTC. By default, all bucketing and
354+
{es} stores date-times in Coordinated Universal Time (UTC). By default, all bucketing and
344355
rounding is also done in UTC. Use the `time_zone` parameter to indicate
345-
that bucketing should use a different timezone.
356+
that bucketing should use a different time zone.
357+
358+
For example, if the interval is a calendar day and the time zone is
359+
`America/New_York` then `2020-01-03T01:00:01Z` is :
360+
# Converted to `2020-01-02T18:00:01`
361+
# Rounded down to `2020-01-02T00:00:00`
362+
# Then converted back to UTC to produce `2020-01-02T05:00:00:00Z`
363+
# Finally, when the bucket is turned into a string key it is printed in
364+
`America/New_York` so it'll display as `"2020-01-02T00:00:00"`.
365+
366+
It looks like:
367+
368+
[source,java]
369+
----
370+
bucket_key = localToUtc(Math.floor(utcToLocal(value) / interval) * interval))
371+
----
346372

347-
You can specify timezones as either an ISO 8601 UTC offset (e.g. `+01:00` or
348-
`-08:00`) or as a timezone ID as specified in the IANA timezone database,
349-
such as`America/Los_Angeles`.
373+
You can specify time zones as an ISO 8601 UTC offset (e.g. `+01:00` or
374+
`-08:00`) or as an IANA time zone ID,
375+
such as `America/Los_Angeles`.
350376

351377
Consider the following example:
352378

@@ -375,7 +401,7 @@ GET my_index/_search?size=0
375401
}
376402
---------------------------------
377403

378-
If you don't specify a timezone, UTC is used. This would result in both of these
404+
If you don't specify a time zone, UTC is used. This would result in both of these
379405
documents being placed into the same day bucket, which starts at midnight UTC
380406
on 1 October 2015:
381407

@@ -398,7 +424,7 @@ on 1 October 2015:
398424
---------------------------------
399425
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
400426

401-
If you specify a `time_zone` of `-01:00`, midnight in that timezone is one hour
427+
If you specify a `time_zone` of `-01:00`, midnight in that time zone is one hour
402428
before midnight UTC:
403429

404430
[source,console]
@@ -446,17 +472,17 @@ second document falls into the bucket for 1 October 2015:
446472
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
447473

448474
<1> The `key_as_string` value represents midnight on each day
449-
in the specified timezone.
475+
in the specified time zone.
450476

451-
WARNING: When using time zones that follow DST (daylight savings time) changes,
452-
buckets close to the moment when those changes happen can have slightly different
453-
sizes than you would expect from the used `interval`.
477+
WARNING: Many time zones shift their clocks for daylight savings time. Buckets
478+
close to the moment when those changes happen can have slightly different sizes
479+
than you would expect from the `calendar_interval` or `fixed_interval`.
454480
For example, consider a DST start in the `CET` time zone: on 27 March 2016 at 2am,
455-
clocks were turned forward 1 hour to 3am local time. If you use `day` as `interval`,
456-
the bucket covering that day will only hold data for 23 hours instead of the usual
457-
24 hours for other buckets. The same is true for shorter intervals, like 12h,
458-
where you'll have only a 11h bucket on the morning of 27 March when the DST shift
459-
happens.
481+
clocks were turned forward 1 hour to 3am local time. If you use `day` as the
482+
`calendar_interval`, the bucket covering that day will only hold data for 23
483+
hours instead of the usual 24 hours for other buckets. The same is true for
484+
shorter intervals, like a `fixed_interval` of `12h`, where you'll have only a 11h
485+
bucket on the morning of 27 March when the DST shift happens.
460486

461487
[[search-aggregations-bucket-datehistogram-offset]]
462488
===== Offset

0 commit comments

Comments
 (0)