Wrap stacked tokens in `match` query in a BlendedTerms query for better scoring

Stacked tokens (tokens in the same position) in a `match` query usually represent alternatives, eg query-time synonym expansion, fuzzy terms, etc

These queries tend to favour the rarer terms, which (esp with fuzzy queries) is likely to be the wrong choice (see #5883 and #3125).

From https://github.com/elasticsearch/elasticsearch/pull/8352#issuecomment-61847572

> The BlendedTermQuery should be used whenever two query terms are synonyms of each other and should be treated as 'one thing'. It tries to adjust statistics independently of the scoring function (which may have no concept of IDF) to deal with the problem.
> 
> But I think for it to work, it would need per-term boost support? Then we need a rewrite method that can build this instead of BooleanQuery, it would look a lot like the boolean one: https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/MultiTermQuery.java#L140

Per-term boost support is required to be able to take the fuzzy edit distance into account.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrap stacked tokens in `match` query in a BlendedTerms query for better scoring #9103

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wrap stacked tokens in match query in a BlendedTerms query for better scoring #9103

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Wrap stacked tokens in `match` query in a BlendedTerms query for better scoring #9103