-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Open
Labels
:Search Relevance/HighlightingHow a query matched a documentHow a query matched a document>enhancementTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearch
Description
Elasticsearch Version
8.10.4
Installed Plugins
No response
Java Version
bundled
OS Version
Elastic Cloud - GCP - Iowa (us-central1)
Problem Description
I encountered an issue when using the span_field_masking feature in Elasticsearch. When attempting to use the highlighter with this feature, the following error is thrown:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "field 'text' was indexed without offsets, cannot highlight"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "test_mask",
"node": "jUZ9p0ZtR6-xYevegW6O_Q",
"reason": {
"type": "illegal_argument_exception",
"reason": "field 'text' was indexed without offsets, cannot highlight"
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "field 'text' was indexed without offsets, cannot highlight",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "field 'text' was indexed without offsets, cannot highlight"
}
}
},
"status": 400
}
If I set "index_options": "offsets" in the mapping of the masked field 'stem', highlighting works as expected. However, I'm puzzled as to why the highlighter requires indexing offsets. I'd like to understand why the highlighter doesn't re-analyze the text to calculate offsets dynamically. My concern is that indexing offsets increases the index size, which I want to avoid.
Steps to Reproduce
PUT test_mask
{
"mappings": {
"properties": {
"text": {
"type": "text",
"analyzer": "whitespace"
},
"stem": {
"type": "text",
"analyzer": "whitespace"
}
}
}
}
PUT test_mask/_doc/1
{
"text": "a _ a b",
"stem": "_ b _ _"
}
GET test_mask/_search
{
"query": {
"span_near": {
"clauses": [
{
"span_term": {
"text": {
"value": "a"
}
}
},
{
"span_field_masking": {
"field": "text",
"query": {
"span_term": {
"stem": {
"value": "b"
}
}
}
}
}
],
"slop": 0,
"in_order": true
}
},
"highlight": {
"pre_tags": "(",
"post_tags": ")",
"fields": {
"*": {}
},
"type": "unified"
}
}
Expected result
I was expecting the highlight to look like this:
"highlight": {
"text": [
"(a) (_) a b"
]
}
Metadata
Metadata
Assignees
Labels
:Search Relevance/HighlightingHow a query matched a documentHow a query matched a document>enhancementTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearch