Skip to content

Mapping: Improve date handling #10971

@spinscale

Description

@spinscale

The current date mapping code treats unix timestamps differently from other date formats. We should unify this, even though this requires changing our defaults and requires the user to explicitely configure the unix timestamp usecase.

Today we parse dates as follows:

Mapped fields with a format (defaults to dateOptionalTime)

  • If number, treat as epoch ms
  • If string, try to parse with defined format(s)
  • If it fails and is purely numeric, treat as epoch ms
  • Else fail

Dynamic date detection

  • If string,
  • and contains at least two :, -, or /
  • and matches dynamic date formats (defaults to dateOptionalTime || yyyy/MM/dd HH:mm:ss || yyyy/MM/dd )
  • then date, else string

There are a few issues which can surprise users:

  • Joda dates are not strict, so "1/1/1" is detected as a date, and "1" would be interpreted as 0001-01-01 00:00:00
  • The distinction between numeric and string values is not always possible, eg query string params are always strings (_timestamp), a date in the query_string query is always a string, and even in the JSON body some languages can render a number as a string and vice versa
  • Dates such as 2015.01.01 (german) or 20150101T000000 (iso8601) can never be detected dynamically

Proposals

Make date parsing as unambiguous as possible. Where there is ambiguity, it is because the user chose ambiguous options (which we can warn about in the docs).

For indices created in 2.0:

For mapped date field:

  • only check the specified formats, which default to strictDateOptionalTime || epoch_ms
  • No distinction between numeric and string values for date fields - always parsed as strings (ie coerce from numeric)

For dynamic date detection:

  • only check string values (don't coerce numerics)
  • accept any formats except epoch_ms and epoch_seconds
  • mapping should add just the matching format (optionally append epoch_ms?)

For indices created before 2.0:

We need to keep bwc on older indices, so we follow the same rules as specified at the beginning of this comment

Query time

Typically users will always use the same format at index time - they don't mix epoch timestamps with formatted dates, which is why we should only parse the specified formats.

However, at query time it is quite possible that (eg) Kibana may query with epoch timestamps, even though the date field only accepts a formatted date. Today, in the range query we accept a format parameter which is used to parse dates at query time.

There are two options to deal with this situation:

  • Add a format parameter to the term, terms, query_string, and simple_query_string queries, and to the range aggregation
  • Add a special format for epoch timestamps which is always recognised, eg epoch_ms:123456789

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Search Foundations/MappingIndex mappings, including merging and defining field typesMetaTeam:Search FoundationsMeta label for the Search Foundations team in Elasticsearch

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions