Skip to content

Terms aggregation by a Painless script ignores empty terms #31744

@vmrudniev

Description

@vmrudniev

Elasticsearch version (bin/elasticsearch --version):
Version: 6.3.0, Build: default/tar/424e937/2018-06-11T23:38:03.357887Z, JVM: 1.8.0_102

Plugins installed: []

JVM version (java -version):
java version "1.8.0_102"
Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)

OS version (uname -a:
Darwin Kernel Version 17.6.0: Tue May 8 15:22:16 PDT 2018; root:xnu-4570.61.1~1/RELEASE_X86_64 x86_64

Description of the problem including expected versus actual behavior:
When terms aggregation is applied to a Painless script, then empty strings, returned by the script, are ignored (there is no bucket for them in the response).

Expected behavior: there should be a bucket for empty strings.

Steps to reproduce:

# delete the index
curl -XDELETE localhost:9200/test

# re-create the index
curl -XPUT localhost:9200/test -H "Content-Type: application/json" -d '{
  "mappings": {
    "test": {
      "properties" : {
        "string_with_empty_values": {
          "type" : "keyword"
        }
      }
    }
  }
}'

# insert documents
curl -XPOST localhost:9200/test/test -H "Content-Type: application/json" -d '{
  "string_with_empty_values": "Not empty"
}'

curl -XPOST localhost:9200/test/test -H "Content-Type: application/json" -d '{
  "string_with_empty_values": ""
}'

curl -XPOST localhost:9200/test/test -H "Content-Type: application/json" -d '{
  "string_with_empty_values": null
}'

# aggregate using a script
curl -XPOST localhost:9200/test/_search -H "Content-Type: application/json" -d '{
  "size": 0,
  "aggregations": {
    "string_with_empty_values_terms": {
      "terms": {
        "script": {
          "source": "doc['\''string_with_empty_values'\''].value",
          "lang": "painless"
        }
      }
    },
    "string_with_empty_values_missing": {
      "missing": {
        "script": {
          "source": "doc['\''string_with_empty_values'\''].value",
          "lang": "painless"
        }
      }
    }
  }
}'

The response will look like the following (note the absence of a bucket for an empty term):

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "string_with_empty_values_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "Not empty",
          "doc_count": 1
        }
      ]
    },
    "string_with_empty_values_missing": {
      "doc_count": 1
    }
  }
}

The same response will be received for the following query:

curl -XPOST localhost:9200/test/_search -H "Content-Type: application/json" -d '{
  "size": 0,
  "aggregations": {
    "string_with_empty_values_terms": {
      "terms": {
        "script": {
          "source": "params._source.string_with_empty_values",
          "lang": "painless"
        }
      }
    },
    "string_with_empty_values_missing": {
      "missing": {
        "script": {
          "source": "params._source.string_with_empty_values",
          "lang": "painless"
        }
      }
    }
  }
}'

At the same time, an aggregation by the field directly:

curl -XPOST localhost:9200/test/_search -H "Content-Type: application/json" -d '{
  "size": 0,
  "aggregations": {
    "string_with_empty_values_terms": {
      "terms": {
        "field": "string_with_empty_values"
      }
    },
    "string_with_empty_values_missing": {
      "missing": {
        "field": "string_with_empty_values"
      }
    }
  }
}'

returns correct result:

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "string_with_empty_values_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "",
          "doc_count": 1
        },
        {
          "key": "Not empty",
          "doc_count": 1
        }
      ]
    },
    "string_with_empty_values_missing": {
      "doc_count": 1
    }
  }
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions