Skip to content

Replacing the function_score with discrete queries #27588

@polyfractal

Description

@polyfractal

Overview

The function_score is a powerful query, but can be somewhat unwieldy and difficult to use. It is a monolithic query that has many parameters and options which makes it difficult for new users to learn. It can be difficult to tweak, and now that we've moved to BM25, the defaults don't work well (namely, multiplying scores).

We'd like to deprecate function_score and replace it's functionality with a number of discrete queries. The guiding principle is that these replacement queries will be small, simple and single-purpose. Individual queries should be simpler to implement and maintain, easier for users to learn and should allow the same functionality as function_score by mixing/matching as required.

New queries / needed functionality

Arbitrary numeric functions

This query will allow working with arbitrary numerics and allow applying various mathematical functions. For example, a document may have a popularity field that you wish to roll into the score somehow.

Functions should include those of field_value_factor, like logarithm, sqrt, reciprocal, etc. We'd also like to include a sigmoid and rational function. The query should also be able to return just the value itself. I think the random number functionality of function_score can be rolled into this too.

We could potentially implement this query entirely through scripting, assuming we can provide custom functions through the script context for more complicated operations like sigmoid.

Distance: Numeric and Time

This query would provide a set of decay functions that allow you to score the "distance" from a field value to some point. When dealing with dates this "distance" would essentially represent recency, while with numerics it'd just be a geometric distance.

We'll likely want numeric/time to be grouped together since the operations are essentially the same.

Distance: Geo

This would be similar to numeric/time distance, except operating on geo points and physical distance. The thinking is that geo would be separate, as we may have plans in the future for geo querying to become more robust in general. Geo points are also sufficiently different that the syntax will likely need to be a bit different.

Otherwise the functionality is similar, providing a set of decay functions.

Potential queries

Some other potential queries that we kicked around, unsure if they are needed or quite how the functionality would work. Some of these have a direct comparison in function_score, some are only tangentially related.

Min/Max query

Function score allows taking the min or max score from the set of functions. We could potentially add a combination query that executes the children, then only passes on the min/max score from the children.

Bandpass query / cutoff score query

Function score allows setting max thresholds on scores generated by the functions. It may be useful to have a query that allows setting min or max or both, and the scores that come out of the query would be limited to those values. Similar to a constant score in that it wraps a set of queries, but instead of setting a single score it just limits the scores that are generated to the range.

Note this is different from the above min/max query, in that it limits the produced score to a range, whereas min/max simply takes the min or max score as-is.

Would allow us to remove min_score (#13115), and limit individual queries (#17348)

In-order Boolean

One unique aspect of function_score is the ability to use only the "first" matching function. There's no other place in the query DSL that allows "short-circuiting" evaluation... the equivalent "first" functionality would require a complex set of must/must_not boolean conditions.

So it may be useful to implement an "in-order" boolean query which evaluates child queries in their order and allows exiting after some criteria is met (first, etc).

It's not entirely clear how useful this functionality is outside of the function_score though, and we'd be interested to hear use-cases for this first behavior.

Scripted Boolean

Related to all the above, it may be useful to have a boolean that executes all the child queries and provides those scores to a script, which would then decide how to combine the scores. This could allow very sophisticated behavior by allowing the user to script away which scores are included (e.g. if they meet a criteria, or only if the total boolean exceeds a threshold, etc) and how they are combined

The downside is that all child queries must be run so that all scores can be collected. And the syntax/script interface would likely be complex

Related issues

#23850
#15670 (by boosting each individual decay query)

/cc @mayya-sharipova @colings86 @clintongormley did I miss anything?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions