-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Elasticsearch version: 7.0.0
Plugins installed: []
JVM version: OpenJDK 1.8.0_191
OS version: Ubuntu 16.04 (or Elastic Cloud)
Description of the problem including expected versus actual behavior:
When setting _source.enabled: false in the index mapping, the _source should not be stored.
In 7.0.0, when two indices have identical data and mappings (except for one having _source.enabled: false), the indices will be almost exactly the same size. This isn't the expected behavior.
In 6.7.1, when two indices with identical data and mappings (except for one having source.enabled: false), the index with _source.enabled: false is roughly half the size of the one with _source enabled. This is the expected behavior.
Steps to reproduce:
Overview:
-
Create two Elasticsearch clusters: version 6.7.1 and version 7.0.0.
-
Create two index templates with identical mappings, but let the second template use
_source.enabled: false. Put these two index templates in both clusters. -
Load data into the two indices on both clusters.
-
Force merge the indices to a single segment.
-
Compare the "Storage Size" of the two indices in Kibana for each cluster:
/app/kibana#/management/elasticsearch/index_management/indices
More detailed:
Create the following templates and pipelines in the 7.0.0 cluster:
PUT _template/logs
{
"index_patterns": ["logs"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"agent": {
"type": "text"
},
"auth": {
"type": "keyword"
},
"bytes": {
"type": "long"
},
"clientip": {
"type": "ip"
},
"httpversion": {
"type": "double"
},
"ident": {
"type": "keyword"
},
"message": {
"type": "text"
},
"referrer": {
"type": "keyword"
},
"request": {
"type": "keyword"
},
"response": {
"type": "long"
},
"verb": {
"type": "keyword"
}
}
}
}PUT _template/logs-nosource
{
"index_patterns": ["logs-nosource"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"_source": {
"enabled": false
},
"properties": {
"@timestamp": {
"type": "date"
},
"agent": {
"type": "text"
},
"auth": {
"type": "keyword"
},
"bytes": {
"type": "long"
},
"clientip": {
"type": "ip"
},
"httpversion": {
"type": "double"
},
"ident": {
"type": "keyword"
},
"message": {
"type": "text"
},
"referrer": {
"type": "keyword"
},
"request": {
"type": "keyword"
},
"response": {
"type": "long"
},
"verb": {
"type": "keyword"
}
}
}
}PUT _ingest/pipeline/logs
{
"description": "Ingest pipeline for logs",
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{COMBINEDAPACHELOG}"
]
}
},
{
"date": {
"field": "timestamp",
"formats": [
"dd/MMM/yyyy:HH:mm:ss XX"
]
}
},
{
"remove": {
"field": "timestamp"
}
}
]
}Create the following indices and templates in the 6.7.1 cluster:
PUT _template/logs
{
"index_patterns": ["logs"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"_doc": {
"properties": {
"@timestamp": {
"type": "date"
},
"agent": {
"type": "text"
},
"auth": {
"type": "keyword"
},
"bytes": {
"type": "long"
},
"clientip": {
"type": "ip"
},
"httpversion": {
"type": "double"
},
"ident": {
"type": "keyword"
},
"message": {
"type": "text"
},
"referrer": {
"type": "keyword"
},
"request": {
"type": "keyword"
},
"response": {
"type": "long"
},
"verb": {
"type": "keyword"
}
}
}
}
}PUT _template/logs-nosource
{
"index_patterns": ["logs-nosource"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"_doc": {
"_source": {
"enabled": false
},
"properties": {
"@timestamp": {
"type": "date"
},
"agent": {
"type": "text"
},
"auth": {
"type": "keyword"
},
"bytes": {
"type": "long"
},
"clientip": {
"type": "ip"
},
"httpversion": {
"type": "double"
},
"ident": {
"type": "keyword"
},
"message": {
"type": "text"
},
"referrer": {
"type": "keyword"
},
"request": {
"type": "keyword"
},
"response": {
"type": "long"
},
"verb": {
"type": "keyword"
}
}
}
}
}PUT _ingest/pipeline/logs
{
"description": "Ingest pipeline for logs",
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{COMBINEDAPACHELOG}"
]
}
},
{
"date": {
"field": "timestamp",
"formats": [
"dd/MMM/yyyy:HH:mm:ss ZZ"
]
}
},
{
"remove": {
"field": "timestamp"
}
}
]
}Download and unzip the data from https://storage.googleapis.com/elasticsearch-sizing-workshop/data/nginx.zip and then load the nginx.log file into the "logs" and "logs-nosource" indices on both clusters.
Force merge the indices to a single segment.
Compare the size of the indices in Kibana. Elasticsearch 7.0.0 shows both indices as being roughly the same size, whereas Elasticsearch 6.7.1 shows the "logs-nosource" index being roughly half the size of the "logs" index.