Skip to content

bucket_sort aggregation misplace the first bucket #36322

@odespesse

Description

@odespesse

Elasticsearch version 6.5.1

Plugins installed: []

JVM version JVM 1.8.0_192

OS version Debian 8.11

Description of the problem including expected versus actual behavior:

  • Actual behavior :
    When sorting an aggregation with a bucket_sort based on its _count, if doc_count are equals the item that should be in first position is the last one, other items are in the right order.
  • Expected :
    Every items should be in the right order, including the first one.

Steps to reproduce:

  1. Create a basic index :
curl -XPUT 'http://localhost:9200/messages' -H 'Content-Type: application/json' -d '
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas" : 0
  },
  "mappings": {
    "user": {
      "properties": {
        "rank": {
          "type": "keyword"
        }
      }
    }
  }
}'
  1. Insert one user with a different rank each time (from a to d) :
  • user with rank a
curl -XPUT 'http://localhost:9200/messages/user/001' -H 'Content-Type: application/json' -d '
{
  "@timestamp": "2018-12-06T01:00:00+01:00",
  "rank": "a"
}'
  • user with rank b
curl -XPUT 'http://localhost:9200/messages/user/002' -H 'Content-Type: application/json' -d '
{
  "@timestamp": "2018-12-06T01:00:00+01:00",
  "rank": "b"
}'
  • user with rank c
curl -XPUT 'http://localhost:9200/messages/user/003' -H 'Content-Type: application/json' -d '
{
  "@timestamp": "2018-12-06T01:00:00+01:00",
  "rank": "c"
}'
  • user with rank d
curl -XPUT 'http://localhost:9200/messages/user/004' -H 'Content-Type: application/json' -d '
{
  "@timestamp": "2018-12-06T01:00:00+01:00",
  "rank": "d"
}'
  1. Aggregate on the rank property and sort by count with a bucket_sort :
curl -XGET 'http://localhost:9200/messages/_search?pretty=true' -H 'Content-Type: application/json' -d '
    {
      "size": 0,
      "aggregations": {
        "top_by_color": {
          "composite": {
            "size": 1000,
            "sources": [
              {
                "rank": {
                  "terms": {
                    "field": "rank",
                    "missing_bucket": true,
                    "order": "asc"
                  }
                }
              }
            ]
          },
          "aggregations": {
            "top_bucket_sort": {
              "bucket_sort": {
                "sort": [
                  {
                    "_count": {
                      "order": "asc"
                    }
                  }
                ],
                "size": 1000
              }
            }
          }
        }
      }
    }'

Aggregation result is :

    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 4,
        "max_score": 0,
        "hits": [

        ]
      },
      "aggregations": {
        "top_by_color": {
          "after_key": {
            "rank": "d"
          },
          "buckets": [
            {
              "key": {
                "rank": "b"
              },
              "doc_count": 1
            },
            {
              "key": {
                "rank": "c"
              },
              "doc_count": 1
            },
            {
              "key": {
                "rank": "d"
              },
              "doc_count": 1
            },
            {
              "key": {
                "rank": "a"
              },
              "doc_count": 1
            }
          ]
        }
      }
    }

We have ranks sorted has : b, c, d, a instead of a, b, c, d.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions