Skip to content

Terms aggregation shows up irrevelant data #28044

@fvilpoix

Description

@fvilpoix

Elasticsearch version (bin/elasticsearch --version): 6.11

Plugins installed: analysis-icu

JVM version (java -version):

openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-1~deb9u1-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)

OS version (uname -a if on a Unix-like system):

Linux plop 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3 (2017-12-03) x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

TL;DR Since upgrade to 6.0 (then last 6.1.1), Terms aggregation on integer field shows result on data that should not exists for the provided query.

Original post on forum

Here is a first request I do, in order to assert that I do not have any data > 60:

{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "media_id": "aaa"
          }
        },
        {
          "range": {
            "eng.visu": {
              "gte": 60
            }
          }
        }
      ]
    }
  },
  "size": 9999
} 

eng.visu is an array of 1 to 5 integers, always < 60 for this media_id.

Result is as expected:

{
  "took": 483,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

But then, I do a terms aggregation on those data:

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "media_id": "aaa"
          }
        }
      ]
    }
  },
  "aggs": {
    "__all__": {
      "terms": {
        "field": "eng.visu",
        "size": 9999
      }
    }
  },
  "size": 0
}

And the result:

{
"took": 24,
"timed_out": false,
"_shards": {
  "total": 5,
  "successful": 5,
  "skipped": 0,
  "failed": 0
},
"hits": {
  "total": 18670,
  "max_score": 0,
  "hits": []
},
"aggregations": {
    "__all__": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1,
          "doc_count": 690
        },
        {
          "key": 0,
          "doc_count": 674
        },
        {
          "key": 2,
          "doc_count": 655
        },
        ...
       {
          "key": 80,
          "doc_count": 298
       },
      {
          "key": 82,
          "doc_count": 298
       },
       ...
       {
          "key": 5276,
          "doc_count": 1
        }
      ]
   }
}
}

As you can see, I have keys that are really greater than 60.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions