Skip to content

Allowing dots in field names #15951

@clintongormley

Description

@clintongormley

As part of the Great Mapping Refactoring (#8870), we had to reject field names containing dots (#12068), eg:

{ 
  "foo.bar": "val1",  
  "foo": {
    "bar": "val2"
  }
}

The behaviour was undefined and resulted in ambiguities when trying to reference fields with the dot notation used in queries and aggregations.

Removing support for dots has caused pain for a number of users and especially as Elasticsearch is being used more and more for the metrics use case (where dotted fields are common), we should consider what we can do to improve this situation. Now that mappings are much stricter (and immutable), it becomes feasible to revisit the question of whether to allow dots to occur in field names.

Replace dots with another character

The first and simplest solution is to simply replace dots in field names with another character (eg _) as is done by the Logstash de_dot filter and which will be supported natively in Elasticsearch by the node ingest de_dot processor.

Treat dots as paths

Another solution would be to treat fields with dots in them as "paths" rather than field names. In other words, these two documents would be equivalent:

{ "foo.bar": "value" }
{ "foo": { "bar": "value" }}

To use an edge case as an example, the following document:

{
  "foo.bar" : {
    "baz": "val1"
  },
  "foo": {
    "bar.baz": "val2"
  }

}

would result in the following mapping:

{
  "properties": {
    "foo": {
      "type": "object",
      "properties": {
        "bar": {
          "type": "object",
          "properties": {
            "baz": {
              "type": "string"
            }
          }
        }
      }
    }
  }
}

The lucene field would be called foo.bar.baz and would contain the terms ["val1", "val2]. Stored fields or doc values (for supported datatypes), would both contain ["val1", "val2"].

Issues with this approach

This solution works well for search and aggregations, but leaves us with two incongruities:

_source=

The first occurs when using the _source= parameter to do source filtering on the response. The reason for this is that the _source field is stored as provided - it is not normalized before being stored For instance:

GET _search?_source=foo.*

would return:

{
  "foo.bar" : {
    "baz": "val1"
  },
  "foo": {
    "bar.baz": "val2"
  }

}

rather than:

{
  "foo": {
    "bar": {
      "baz": [
        "val1",
        "val2"
      ]
    }
  }
}

Update requests

The second occurs during update requests, which uses the _source as a map-of-maps. Running an update like:

POST index/type/id/_update
{
  "doc": {
    "foo": {
      "bar": {
        "baz": "val3"
      }
    }
  }
}

could result (depending on how it is implemented) in any of the following:

Version 1:

{
  "foo": {
    "bar": {
      "baz": "val3"
    }
  }
}

Version 2:

{
  "foo": {
    "bar": {
      "baz": [
        "val1",
        "val2",
        "val3"
      ]
    }
  }
}

Version 3:

{
  "foo.bar": {
    "baz": "val1"
  },
  "foo": {
    "bar.baz": "val2",
    "bar": {
      "baz": "val3"
    }
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions