Skip to content

Highlighter no_match_size ignored with number_of_fragments 0 on 6.7 #41066

@benvand

Description

@benvand

Elasticsearch version:

elasticsearch-6.7.1

JVM version:

java version "1.8.0_201"

OS version:

17.7.0 Darwin Kernel Version 17.7.0

Expected/ 5.x behaviour

As per the documentation:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-highlighting.html#highlighting-settings

If the number of fragments is set to 0, no fragments are returned. Instead, the entire field contents are highlighted and returned.

This was the behaviour on 5.x where the respective highlight field for the original field was returned in full (up to the no_match_size).

The behaviour on 6.x is that only a partial of the field is returned.

Steps to reproduce:

The query command returns different responses on ES5 and ES6

curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/test-index?pretty' -d '{"mappings": {
    "services": {
      "dynamic": "strict",
      "properties": {
        "someDescription": {
          "type": "text"
        }
      }
    }
  }
}'

curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9400/test-index/services/1?pretty' -d '{
  "someDescription": "Here is a description that is quite long and describes the problem that we have had with the new elasticsearch highlighting engine. The length of this field is 500 characters and the problem. Is that the whole description is not returned as a single fragment. Rather the description is split into its constituent sentances. And the first sentence is returned. Those thereafter are not. This was not the behaviour in elasticsearch 5 where the whole field would be returned but has been introduced in es6."
}'

curl -H 'Content-Type: application/json' -XGET 'http://localhost:9200/test-index/services/_search?pretty&search_type=dfs_query_then_fetch' -d '{
  "highlight": {
    "encoder": "html",
    "fields": {
      "someDescription": {
        "no_match_size": 500,
        "number_of_fragments": 0
      }
    },
    "post_tags": [
      "</mark>"
    ],
    "pre_tags": [
      "<mark class=\u0027search-result-highlighted-text\u0027>"
    ]
  },
  "query": {
    "match_all": {}
  },
  "size": 100
}'

curl -XDELETE 'http://localhost:9200/test-*?pretty' -d ''

The response on ES5:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test-index",
        "_type" : "services",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "someDescription" : "Here is a description that is quite long and describes the problem that we have had with the new elasticsearch highlighting engine. The length of this field is 500 characters and the problem. Is that the whole description is not returned as a single fragment. Rather the description is split into its constituent sentances. And the first sentence is returned. Those thereafter are not. This was not the behaviour in elasticsearch 5 where the whole field would be returned but has been introduced in es6."
        },
        "highlight" : {
          "someDescription" : [
            "Here is a description that is quite long and describes the problem that we have had with the new elasticsearch highlighting engine. The length of this field is 500 characters and the problem. Is that the whole description is not returned as a single fragment. Rather the description is split into its constituent sentances. And the first sentence is returned. Those thereafter are not. This was not the behaviour in elasticsearch 5 where the whole field would be returned but has been introduced in"
          ]
        }
      }
    ]
  }
}

The response on ES6:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test-index",
        "_type" : "services",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "someDescription" : "Here is a description that is quite long and describes the problem that we have had with the new elasticsearch highlighting engine. The length of this field is 500 characters and the problem. Is that the whole description is not returned as a single fragment. Rather the description is split into its constituent sentances. And the first sentence is returned. Those thereafter are not. This was not the behaviour in elasticsearch 5 where the whole field would be returned but has been introduced in es6."
        },
        "highlight" : {
          "someDescription" : [
            "Here is a description that is quite long and describes the problem that we have had with the new elasticsearch highlighting engine."
          ]
        }
      }
    ]
  }
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions