Skip to content

The unified highlighter does not work correctly when a query is performed with a slop value different from 0 #122596

@CarlosLoboZamarro

Description

@CarlosLoboZamarro

Elasticsearch Version

8.17.1

Installed Plugins

No response

Java Version

bundled

OS Version

20.04.1-Ubuntu

Problem Description

When a match_phrase type query is performed with a slop value different from 0, the highlighted words in the result do not correspond to the words that match the query.

For example, if the field being searched is test with slop different from zero and the match_phrase query is test with from zero with slop 2, the expected result is:

<em>test with</em> slop different <em>from zero</em>

but the actual result is:

<em>test with slop different from zero</em>

I think it is related to the change #96068 because it works correctly if the Weight#matches mode is disabled by index setting "index.highlight.weight_matches_mode.enabled": false

Steps to Reproduce

Create an index:

PUT object-slop-case
{
  "settings": {
    "number_of_shards": "1",
    "number_of_replicas": "1"
  },
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "field1": {
        "type": "text"
      }
    }
  }
}

Create a document:

PUT object-slop-case/_doc/1
{
  "field1": "test with slop different from zero"
}

Request a search:

GET object-slop-case/_search
{
  "query": {
    "match_phrase": {
      "field1": {
        "slop": 2,
        "query": "test with from zero"
      }
    }
  },
  "highlight": {
    "fields": {
      "field1": {
        "type": "unified"
      }
    }
  }
}

Search result:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.1006968,
    "hits": [
      {
        "_index": "object-slop-case",
        "_id": "1",
        "_score": 1.1006968,
        "_source": {
          "field1": "test with slop different from zero"
        },
        "highlight": {
          "field1": [
            "<em>test with slop different from zero</em>"
          ]
        }
      }
    ]
  }
}

In this result, the words slop different are incorrectly highlighted, as they do not match the query.

Now, disable the index.highlight.weight_matches_mode.enabled for de index:

PUT object-slop-case/_settings
{
  "index.highlight.weight_matches_mode.enabled": false
}

And we execute the previous query.
We get the result:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.1006968,
    "hits": [
      {
        "_index": "object-slop-case",
        "_id": "1",
        "_score": 1.1006968,
        "_source": {
          "field1": "test with slop different from zero"
        },
        "highlight": {
          "field1": [
            "<em>test</em> <em>with</em> slop different <em>from</em> <em>zero</em>"
          ]
        }
      }
    ]
  }
}

In this case, the highlighted words are correct.

Thanks in advance

Logs (if relevant)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Search Relevance/HighlightingHow a query matched a document>bugTeam:Search RelevanceMeta label for the Search Relevance team in Elasticsearchpriority:normalA label for assessing bug priority to be used by ES engineers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions