-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add a new query type - ScriptScoreQuery #34533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new query type - ScriptScoreQuery #34533
Conversation
script_score query uses script to calculate document scores.
Added as a substitute for function_score with an intentation
to deprecate function_scoreq query.
```http
GET /_search
{
"query": {
"script_score" : {
"query": {
"match": { "message": "elasticsearch" }
},
"script" : {
"source": "Math.log(2 + doc['likes'].value)"
},
"min_score" : 2
}
}
}
```
Add several functions to painless to be used inside script_score:
double rational(double, double)
double sigmoid(double, double, double)
double decayGeoLinear(String, String, String, double, GeoPoint)
double decayGeoExp(String, String, String, double, GeoPoint)
double decayGeoGauss(String, String, String, double, GeoPoint)
double decayNumericLinear(String, String, String, double, double)
double decayNumericExp(String, String, String, double, double)
double decayNumericGauss(String, String, String, double, double)
double decayDateLinear(String, String, String, double, JodaCompatibleZonedDateTime)
double decayDateExp(String, String, String, double, JodaCompatibleZonedDateTime)
double decayDateGauss(String, String, String, double, JodaCompatibleZonedDateTime)
Date functions only works on dates in the default format and default time zone
|
Pinging @elastic/es-search-aggs |
|
jenkins test this please |
colings86
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mayya-sharipova I left some comments.
| a function to be used to compute a new score for each document returned | ||
| by the query. A script can be written using either | ||
| <<modules-scripting-painless, painless>> (default) or | ||
| <<modules-scripting-expression, expression>> languages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this a bit more future proof maybe instead of listing the acceptable languages here it would be better to link to the scripting documentation by saying something like "For more information on scripting see https://www.elastic.co/guide/en/elasticsearch/reference/6.4/modules-scripting.html" ?
| Besides these functions, there are a number of predefined functions | ||
| that can help you with scoring. We suggest you to use them instead of | ||
| rewriting equivalent functions of your own, as these functions try | ||
| to be the most efficient by using the internal painless caching mechanism. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about telling users there is caching here when its essentially just an internal optimisation. Maybe instead we can say "these functions try to be the most efficient by using the internal mechanisms."?
|
|
||
| * `double decayNumericLinear(String originStr, String scaleStr, String offsetStr, double decay)` | ||
| * `double decayNumericExp(String originStr, String scaleStr, String offsetStr, double decay, double docValue)` | ||
| * `double decayNumericGauss(String originStr, String scaleStr, String offsetStr, double decay, double docValue)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use Strings here in the numeric decay functions? Is it not possible to use Number here?
| public final String getWriteableName() { | ||
| return getName(); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why this needed to change? Was there a bug in the script function in the function score query?
| @Override | ||
| protected void doXContent(XContentBuilder builder, Params params) throws IOException { | ||
| builder.startObject(NAME); | ||
| if (query != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check necessary? We have a null check for query in the constructor so I don't think this can be null here?
| ScriptScoreFunction function = (ScriptScoreFunction) scriptScoreFunctionBuilder.toFunction(context); | ||
| Query query = this.query.toQuery(context); | ||
| if (query == null) { | ||
| query = new MatchAllDocsQuery(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What conditions lead to the toQuery() call emitting null? I am concerned that assuming null can be converted to a MatchAllDocsQuery might be trappy here
| String currentFieldName = null; | ||
| XContentParser.Token token; | ||
|
|
||
| while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this parsing can't be done using either ObjectParser or ConstructingObjectParser? I think if we can do it with one of those it would be great as its easier to see what they are doing
|
@colings86 Colin, thanks so much for the review, I will address your comments shortly |
| } | ||
| }; | ||
| return context.factoryClazz.cast(factory); | ||
| } else if (context.instanceClazz.equals(ScoreScript.class)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused on why this case is added. I see a couple issues with it:
- GenericElasticsearchScript is about to be removed (see Scripting: Remove SearchScript #34730), and all the uses of it (through SearchScript) have already been removed
- We already have caching of the compiled script in ScriptService, so why do we need special handling of caching here?
|
@colings86 @rjernst I have pushed new commits that address your feedback, can you please continue to review when you have available time |
rjernst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script changes look good to me now. Just a couple small nits.
| import java.util.Set; | ||
|
|
||
| /** | ||
| * A query that allows for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incomplete javadoc?
| } | ||
|
|
||
| /** Return the score of the current document. */ | ||
| public double getScore() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this added? This would make score available in addition to _score (which is already available with get_score() below). Why do we need the additional score accessor, that deviates from the access in other scripts?
|
@colings86 @rjernst I have tried to address your comments. Can you please have another round of review, when you have available time? |
colings86
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mayya-sharipova I left a few more comments but I think this is close assuming @rjernst is happy with the changes on the scripting side
| */ | ||
| public class ScriptScoreQuery extends Query { | ||
| final Query subQuery; | ||
| final ScriptScoreFunction function; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the ultimate aim is to remove the Function Score Query entirely it would be great if we could make it so this query doesn't rely on the ScriptScoreFunction either and inlines that logic. I don't think thats completely necessary for this PR but we should consider doing it in a follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. I can address this in the follow-up PR.
| public Query rewrite(IndexReader reader) throws IOException { | ||
| Query rewritten = super.rewrite(reader); | ||
| if (rewritten != this) { | ||
| return rewritten; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most of the Lucene query implementations the call to super.rewrite() is done on the final return line (in this case line 66) so I wonder if its worth doing the same here for readability and consistency? I think the logic would end up being the same we would just be checking the internals of this subclass first?
| InnerHitContextBuilder.extractInnerHits(query(), innerHits); | ||
| } | ||
|
|
||
| private static ConstructingObjectParser<ScriptScoreQueryBuilder, Void> PARSER = new ConstructingObjectParser<>(NAME, false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could we put the static blocks, methods and constants at the top of the class?
4d4e18e to
a52e743
Compare
a52e743 to
2e0b947
Compare
|
@rjernst @colings86 I have tried to address your feedback. Would you have any other comments for this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only comment is around random scores and whether we should have a function that caches the instance of Random so the same instance can be used for the whole request across all the documents. Other than that I think this is ready
| [source,js] | ||
| -------------------------------------------------- | ||
| "script" : { | ||
| "source" : "Random rnd = new Random(); rnd.setSeed(doc['field'].value); rnd.nextFloat()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should use a seed from a parameter rather than from a document?
Also, I assume this uses the parameter caching so the instance of Random is created once and is kept the same through the execution of the script on each doc? If not, then maybe we should do this as it will make searches with the same seed reproducible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I guess this can't do the caching since we are not using our own function. I think we should implement our own random function that caches the instance of Random with the provided seed?
rjernst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have 2 small last questions
| } | ||
|
|
||
| /** The leaf lookup for the Lucene segment this script was created for. */ | ||
| protected final LeafSearchLookup getLeafLookup() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this accessed from somewhere? I don't see any uses in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst Thanks Ryan, I was copying this from a former SearchScript, this is removed now
| * A script used for adjusting the score on a per document basis. | ||
| */ | ||
| public abstract class ScoreScript { | ||
| public abstract class ScoreScript implements ScorerAware { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for needing to add ScorerAware?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst Thanks Ryan, I was copying this from a former SearchScript, this is removed now
|
@colings86 I have added random functions as you suggested, can you please continue to review when you have available time. |
colings86
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
rjernst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| @@ -1,3 +1,4 @@ | |||
|
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errant newline?
script_score query uses script to calculate document scores.
Added as a substitute for function_score with an intention
to deprecate
function_scorequery.Add several functions to painless to be used inside script_score:
double rational(double, double)
double sigmoid(double, double, double)
double decayGeoLinear(String, String, String, double, GeoPoint)
double decayGeoExp(String, String, String, double, GeoPoint)
double decayGeoGauss(String, String, String, double, GeoPoint)
double decayNumericLinear(double, double, double, double, double)
double decayNumericExp(double, double, double, double, double)
double decayNumericGauss(double, double, double, double, double)
double decayDateLinear(String, String, String, double, JodaCompatibleZonedDateTime)
double decayDateExp(String, String, String, double, JodaCompatibleZonedDateTime)
double decayDateGauss(String, String, String, double, JodaCompatibleZonedDateTime)
Date functions only works on dates in the default format and default time zone
Relates to #23850
Closes #27588, #30303