-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
As part of the Great Mapping Refactoring (#8870), we had to reject field names containing dots (#12068), eg:
{
"foo.bar": "val1",
"foo": {
"bar": "val2"
}
}
The behaviour was undefined and resulted in ambiguities when trying to reference fields with the dot notation used in queries and aggregations.
Removing support for dots has caused pain for a number of users and especially as Elasticsearch is being used more and more for the metrics use case (where dotted fields are common), we should consider what we can do to improve this situation. Now that mappings are much stricter (and immutable), it becomes feasible to revisit the question of whether to allow dots to occur in field names.
Replace dots with another character
The first and simplest solution is to simply replace dots in field names with another character (eg _
) as is done by the Logstash de_dot filter and which will be supported natively in Elasticsearch by the node ingest de_dot
processor.
Treat dots as paths
Another solution would be to treat fields with dots in them as "paths" rather than field names. In other words, these two documents would be equivalent:
{ "foo.bar": "value" }
{ "foo": { "bar": "value" }}
To use an edge case as an example, the following document:
{
"foo.bar" : {
"baz": "val1"
},
"foo": {
"bar.baz": "val2"
}
}
would result in the following mapping:
{
"properties": {
"foo": {
"type": "object",
"properties": {
"bar": {
"type": "object",
"properties": {
"baz": {
"type": "string"
}
}
}
}
}
}
}
The lucene field would be called foo.bar.baz
and would contain the terms ["val1", "val2]
. Stored fields or doc values (for supported datatypes), would both contain ["val1", "val2"]
.
Issues with this approach
This solution works well for search and aggregations, but leaves us with two incongruities:
_source=
The first occurs when using the _source=
parameter to do source filtering on the response. The reason for this is that the _source
field is stored as provided - it is not normalized before being stored For instance:
GET _search?_source=foo.*
would return:
{
"foo.bar" : {
"baz": "val1"
},
"foo": {
"bar.baz": "val2"
}
}
rather than:
{
"foo": {
"bar": {
"baz": [
"val1",
"val2"
]
}
}
}
Update requests
The second occurs during update requests, which uses the _source
as a map-of-maps. Running an update like:
POST index/type/id/_update
{
"doc": {
"foo": {
"bar": {
"baz": "val3"
}
}
}
}
could result (depending on how it is implemented) in any of the following:
Version 1:
{
"foo": {
"bar": {
"baz": "val3"
}
}
}
Version 2:
{
"foo": {
"bar": {
"baz": [
"val1",
"val2",
"val3"
]
}
}
}
Version 3:
{
"foo.bar": {
"baz": "val1"
},
"foo": {
"bar.baz": "val2",
"bar": {
"baz": "val3"
}
}
}