Skip to content

Highlight is slow after 8.10.0 with weight_matches_mode enabled #120565

@gregolsen

Description

@gregolsen

Elasticsearch Version

8.10.0

Installed Plugins

analysis-kuromoji,analysis-smartcn,analysis-nori

Java Version

bundled

OS Version

Linux es-data-i-xxxxx 6.1.119-129.201.amzn2023.aarch64 #1 SMP Tue Dec 3 21:06:52 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

Problem Description

After upgrading from 8.6.1 to 8.16.1 we noticed a massive drop in performance of the queries. Search requests that were taking ~200ms on 8.6.1 started to take over 9 seconds:

The search phase of the query on 8.16.1 was still taking the same time as before, the slow down is associated with the highlight phase taking over 9 seconds:

{
  "type": "HighlightPhase",
  "description": "",
  "time_in_nanos": 9640635470,
  "breakdown": {
    "process_count": 10,
    "process": 9640634394,
    "next_reader": 1076,
    "next_reader_count": 2
  }
}

After bisecting all the versions between 8.6.1 and 8.16.1 we discovered that the issue was introduced in 8.10.0. With the only change to highlight code being this PR #96068

With require_field_match: false query does get faster but still takes over a second. However, upon fully disabling index.highlight.weight_matches_mode.enabled: false performance issue is fully mitigated: query time drops down to ~200ms and CPU on the cluster data nodes goes down:

Hot threads data:

   Hot threads at 2025-01-21T22:57:18.120Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

   100.0% [cpu=92.7%, other=7.3%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[qMac-4.local][search][T#3]'
     2/10 snapshots sharing following 45 elements
       app/[email protected]/org.apache.lucene.util.automaton.Operations.getLiveStatesToAccept(Operations.java:974)
       app/[email protected]/org.apache.lucene.util.automaton.Operations.getLiveStates(Operations.java:926)
       app/[email protected]/org.apache.lucene.util.automaton.Operations.removeDeadStates(Operations.java:1007)
       app/[email protected]/org.apache.lucene.util.automaton.LevenshteinAutomata.toAutomaton(LevenshteinAutomata.java:220)
       app/[email protected]/org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)
       app/[email protected]/org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:391)
       app/[email protected]/org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:149)
       app/[email protected]/org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:126)
       app/[email protected]/org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:208)
       app/[email protected]/org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:67)
       app/[email protected]/org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:57)
       app/[email protected]/org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
       app/[email protected]/org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:325)
       app/[email protected]/org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:293)
       app/[email protected]/org.apache.lucene.search.DisjunctionMaxQuery.rewrite(DisjunctionMaxQuery.java:234)
       app/[email protected]/org.apache.lucene.search.BoostQuery.rewrite(BoostQuery.java:76)
       app/[email protected]/org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:293)
       app/[email protected]/org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:799)
       app/[email protected]/org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumsWeightMatcher(FieldOffsetStrategy.java:145)
       app/[email protected]/org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumFromReader(FieldOffsetStrategy.java:74)
       app/[email protected]/org.apache.lucene.search.uhighlight.MemoryIndexOffsetStrategy.getOffsetsEnum(MemoryIndexOffsetStrategy.java:119)
       app/[email protected]/org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:80)
   Hot threads at 2025-01-21T22:58:05.892Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

   100.0% [cpu=88.2%, other=11.8%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[qMac-4.local][search][T#11]'
     4/10 snapshots sharing following 43 elements
       app/[email protected]/org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)
       app/[email protected]/org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:391)
       app/[email protected]/org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:149)
       app/[email protected]/org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:126)
       app/[email protected]/org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:208)
       app/[email protected]/org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:67)
       app/[email protected]/org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:57)
       app/[email protected]/org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
       app/[email protected]/org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:325)
       app/[email protected]/org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:293)
       app/[email protected]/org.apache.lucene.search.DisjunctionMaxQuery.rewrite(DisjunctionMaxQuery.java:234)
       app/[email protected]/org.apache.lucene.search.BoostQuery.rewrite(BoostQuery.java:76)
       app/[email protected]/org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:293)
       app/[email protected]/org.apache.lucene.search.ConstantScoreQuery.rewrite(ConstantScoreQuery.java:44)
       app/[email protected]/org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:288)
       app/[email protected]/org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:799)
       app/[email protected]/org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumsWeightMatcher(FieldOffsetStrategy.java:145)
       app/[email protected]/org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumFromReader(FieldOffsetStrategy.java:74)
       app/[email protected]/org.apache.lucene.search.uhighlight.MemoryIndexOffsetStrategy.getOffsetsEnum(MemoryIndexOffsetStrategy.java:119)
       app/[email protected]/org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:80)

Steps to Reproduce

Query:

{
    "profile": false,
    "query": {
        "bool": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "app_id": 123456789
                            }
                        },
                        {
                            "term": {
                                "state": "published"
                            }
                        },
                        {
                            "term": {
                                "visible": true
                            }
                        }
                    ],
                    "should": [
                        {
                            "multi_match": {
                                "fields": [
                                    "title",
                                    "summary",
                                    "heading_text",
                                    "subheading_text",
                                    "body_text"
                                ],
                                "query": "TEXT",
                                "type": "bool_prefix"
                            }
                        },
                        {
                            "multi_match": {
                                "fields": [
                                    "title.full_words^2.5",
                                    "summary.full_words^2",
                                    "heading_text.full_words^1.5",
                                    "subheading_text.full_words^1.5",
                                    "body_text.full_words"
                                ],
                                "query": "TEXT",
                                "type": "phrase",
                                "minimum_should_match": "50%",
                                "slop": 50,
                                "boost": 10
                            }
                        },
                        {
                            "multi_match": {
                                "fields": [
                                    "title.full_words^2.5",
                                    "summary.full_words^2",
                                    "heading_text.full_words^1.5",
                                    "subheading_text.full_words^1.5",
                                    "body_text.full_words"
                                ],
                                "query": "TEXT",
                                "type": "best_fields",
                                "boost": 7
                            }
                        },
                        {
                            "multi_match": {
                                "fields": [
                                    "title.analyzed_shingle^2.5",
                                    "summary.analyzed_shingle^2",
                                    "heading_text.analyzed_shingle^1.5",
                                    "subheading_text.analyzed_shingle^1.5",
                                    "body_text.analyzed_shingle"
                                ],
                                "query": "TEXT",
                                "type": "phrase",
                                "minimum_should_match": "50%",
                                "slop": 50,
                                "boost": 5
                            }
                        },
                        {
                            "multi_match": {
                                "fields": [
                                    "title.analyzed_unigram^2.5",
                                    "summary.analyzed_unigram^2",
                                    "heading_text.analyzed_unigram^1.5",
                                    "subheading_text.analyzed_unigram^1.5",
                                    "body_text.analyzed_unigram"
                                ],
                                "query": "TEXT",
                                "type": "best_fields",
                                "analyzer": "query_analyzer",
                                "boost": 1
                            }
                        },
                        {
                            "multi_match": {
                                "fields": [
                                    "title",
                                    "summary",
                                    "heading_text",
                                    "subheading_text",
                                    "body_text"
                                ],
                                "query": "TEXT",
                                "type": "best_fields",
                                "boost": 0.2,
                                "fuzziness": "auto",
                                "prefix_length": 3
                            }
                        }
                    ],
                    "minimum_should_match": 1
                }
            },
            "should": [
                {
                    "multi_match": {
                        "fields": [
                            "title",
                            "summary",
                            "heading_text",
                            "subheading_text",
                            "body_text"
                        ],
                        "query": "TEXT",
                        "type": "bool_prefix"
                    }
                },
                {
                    "multi_match": {
                        "fields": [
                            "title.full_words^2.5",
                            "summary.full_words^2",
                            "heading_text.full_words^1.5",
                            "subheading_text.full_words^1.5",
                            "body_text.full_words"
                        ],
                        "query": "TEXT",
                        "type": "phrase",
                        "minimum_should_match": "50%",
                        "slop": 50,
                        "boost": 10
                    }
                },
                {
                    "multi_match": {
                        "fields": [
                            "title.full_words^2.5",
                            "summary.full_words^2",
                            "heading_text.full_words^1.5",
                            "subheading_text.full_words^1.5",
                            "body_text.full_words"
                        ],
                        "query": "TEXT",
                        "type": "best_fields",
                        "boost": 7
                    }
                },
                {
                    "multi_match": {
                        "fields": [
                            "title.analyzed_shingle^2.5",
                            "summary.analyzed_shingle^2",
                            "heading_text.analyzed_shingle^1.5",
                            "subheading_text.analyzed_shingle^1.5",
                            "body_text.analyzed_shingle"
                        ],
                        "query": "TEXT",
                        "type": "phrase",
                        "minimum_should_match": "50%",
                        "slop": 50,
                        "boost": 5
                    }
                },
                {
                    "multi_match": {
                        "fields": [
                            "title.analyzed_unigram^2.5",
                            "summary.analyzed_unigram^2",
                            "heading_text.analyzed_unigram^1.5",
                            "subheading_text.analyzed_unigram^1.5",
                            "body_text.analyzed_unigram"
                        ],
                        "query": "TEXT",
                        "type": "best_fields",
                        "analyzer": "query_analyzer",
                        "boost": 1
                    }
                },
                {
                    "multi_match": {
                        "fields": [
                            "title",
                            "summary",
                            "heading_text",
                            "subheading_text",
                            "body_text"
                        ],
                        "query": "TEXT",
                        "type": "best_fields",
                        "boost": 0.2,
                        "fuzziness": "auto",
                        "prefix_length": 3
                    }
                }
            ]
        }
    },
    "from": 0,
    "size": 10,
    "track_total_hits": true,
    "timeout": "21s",
    "highlight": {
        "require_field_match": true,
        "number_of_fragments": 0,
        "fragment_size": 100,
        "pre_tags": [
            "<highlight>"
        ],
        "post_tags": [
            "</highlight>"
        ],
        "fields": {
            "title": {},
            "title.*": {},
            "summary": {},
            "summary.*": {},
            "heading_text": {},
            "heading_text.*": {},
            "subheading_text": {},
            "subheading_text.*": {},
            "body_text": {
                "number_of_fragments": 3
            },
            "body_text.*": {
                "number_of_fragments": 3
            }
        }
    }
}

Logs (if relevant)

No response

Metadata

Metadata

Assignees

Labels

:Search Relevance/HighlightingHow a query matched a document>bugTeam:Search RelevanceMeta label for the Search Relevance team in Elasticsearchpriority:normalA label for assessing bug priority to be used by ES engineers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions