Skip to content

Conversation

@mayya-sharipova
Copy link
Contributor

@mayya-sharipova mayya-sharipova commented Oct 16, 2018

script_score query uses script to calculate document scores.
Added as a substitute for function_score with an intention
to deprecate function_score query.

GET /_search
{
    "query": {
        "script_score" : {
            "query": {
                "match": { "message": "elasticsearch" }
            },
            "script" : {
              "source": "_score + Math.log(2 + doc['likes'].value)"
            },
            "min_score" : 2
        }
    }
}

Add several functions to painless to be used inside script_score:

double rational(double, double)
double sigmoid(double, double, double)

double decayGeoLinear(String, String, String, double, GeoPoint)
double decayGeoExp(String, String, String, double, GeoPoint)
double decayGeoGauss(String, String, String, double, GeoPoint)

double decayNumericLinear(double, double, double, double, double)
double decayNumericExp(double, double, double, double, double)
double decayNumericGauss(double, double, double, double, double)

double decayDateLinear(String, String, String, double, JodaCompatibleZonedDateTime)
double decayDateExp(String, String, String, double, JodaCompatibleZonedDateTime)
double decayDateGauss(String, String, String, double, JodaCompatibleZonedDateTime)

Date functions only works on dates in the default format and default time zone

Relates to #23850
Closes #27588, #30303

script_score query uses script to calculate document scores.
Added as a substitute for function_score with an intentation
to deprecate function_scoreq query.

```http
GET /_search
{
    "query": {
        "script_score" : {
            "query": {
                "match": { "message": "elasticsearch" }
            },
            "script" : {
              "source": "Math.log(2 + doc['likes'].value)"
            },
            "min_score" : 2
        }
    }
}
```

Add several functions to painless to be used inside script_score:

double rational(double, double)
double sigmoid(double, double, double)

double decayGeoLinear(String, String, String, double, GeoPoint)
double decayGeoExp(String, String, String, double, GeoPoint)
double decayGeoGauss(String, String, String, double, GeoPoint)

double decayNumericLinear(String, String, String, double, double)
double decayNumericExp(String, String, String, double, double)
double decayNumericGauss(String, String, String, double, double)

double decayDateLinear(String, String, String, double, JodaCompatibleZonedDateTime)
double decayDateExp(String, String, String, double, JodaCompatibleZonedDateTime)
double decayDateGauss(String, String, String, double, JodaCompatibleZonedDateTime)

Date functions only works on dates in  the default format and default time zone
@mayya-sharipova mayya-sharipova added the :Search/Search Search-related issues that do not fall into other categories label Oct 16, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@Conky5
Copy link

Conky5 commented Oct 16, 2018

jenkins test this please

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mayya-sharipova I left some comments.

a function to be used to compute a new score for each document returned
by the query. A script can be written using either
<<modules-scripting-painless, painless>> (default) or
<<modules-scripting-expression, expression>> languages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this a bit more future proof maybe instead of listing the acceptable languages here it would be better to link to the scripting documentation by saying something like "For more information on scripting see https://www.elastic.co/guide/en/elasticsearch/reference/6.4/modules-scripting.html" ?

Besides these functions, there are a number of predefined functions
that can help you with scoring. We suggest you to use them instead of
rewriting equivalent functions of your own, as these functions try
to be the most efficient by using the internal painless caching mechanism.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about telling users there is caching here when its essentially just an internal optimisation. Maybe instead we can say "these functions try to be the most efficient by using the internal mechanisms."?


* `double decayNumericLinear(String originStr, String scaleStr, String offsetStr, double decay)`
* `double decayNumericExp(String originStr, String scaleStr, String offsetStr, double decay, double docValue)`
* `double decayNumericGauss(String originStr, String scaleStr, String offsetStr, double decay, double docValue)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use Strings here in the numeric decay functions? Is it not possible to use Number here?

public final String getWriteableName() {
return getName();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why this needed to change? Was there a bug in the script function in the function score query?

@Override
protected void doXContent(XContentBuilder builder, Params params) throws IOException {
builder.startObject(NAME);
if (query != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check necessary? We have a null check for query in the constructor so I don't think this can be null here?

ScriptScoreFunction function = (ScriptScoreFunction) scriptScoreFunctionBuilder.toFunction(context);
Query query = this.query.toQuery(context);
if (query == null) {
query = new MatchAllDocsQuery();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What conditions lead to the toQuery() call emitting null? I am concerned that assuming null can be converted to a MatchAllDocsQuery might be trappy here

String currentFieldName = null;
XContentParser.Token token;

while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this parsing can't be done using either ObjectParser or ConstructingObjectParser? I think if we can do it with one of those it would be great as its easier to see what they are doing

@mayya-sharipova
Copy link
Contributor Author

@colings86 Colin, thanks so much for the review, I will address your comments shortly

}
};
return context.factoryClazz.cast(factory);
} else if (context.instanceClazz.equals(ScoreScript.class)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused on why this case is added. I see a couple issues with it:

  • GenericElasticsearchScript is about to be removed (see Scripting: Remove SearchScript #34730), and all the uses of it (through SearchScript) have already been removed
  • We already have caching of the compiled script in ScriptService, so why do we need special handling of caching here?

@mayya-sharipova
Copy link
Contributor Author

@colings86 @rjernst I have pushed new commits that address your feedback, can you please continue to review when you have available time

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script changes look good to me now. Just a couple small nits.

import java.util.Set;

/**
* A query that allows for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incomplete javadoc?

}

/** Return the score of the current document. */
public double getScore() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this added? This would make score available in addition to _score (which is already available with get_score() below). Why do we need the additional score accessor, that deviates from the access in other scripts?

@mayya-sharipova
Copy link
Contributor Author

@colings86 @rjernst I have tried to address your comments. Can you please have another round of review, when you have available time?

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mayya-sharipova I left a few more comments but I think this is close assuming @rjernst is happy with the changes on the scripting side

*/
public class ScriptScoreQuery extends Query {
final Query subQuery;
final ScriptScoreFunction function;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the ultimate aim is to remove the Function Score Query entirely it would be great if we could make it so this query doesn't rely on the ScriptScoreFunction either and inlines that logic. I don't think thats completely necessary for this PR but we should consider doing it in a follow up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I can address this in the follow-up PR.

public Query rewrite(IndexReader reader) throws IOException {
Query rewritten = super.rewrite(reader);
if (rewritten != this) {
return rewritten;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most of the Lucene query implementations the call to super.rewrite() is done on the final return line (in this case line 66) so I wonder if its worth doing the same here for readability and consistency? I think the logic would end up being the same we would just be checking the internals of this subclass first?

InnerHitContextBuilder.extractInnerHits(query(), innerHits);
}

private static ConstructingObjectParser<ScriptScoreQueryBuilder, Void> PARSER = new ConstructingObjectParser<>(NAME, false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could we put the static blocks, methods and constants at the top of the class?

@mayya-sharipova
Copy link
Contributor Author

@rjernst @colings86 I have tried to address your feedback. Would you have any other comments for this PR?

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only comment is around random scores and whether we should have a function that caches the instance of Random so the same instance can be used for the whole request across all the documents. Other than that I think this is ready

[source,js]
--------------------------------------------------
"script" : {
"source" : "Random rnd = new Random(); rnd.setSeed(doc['field'].value); rnd.nextFloat()"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should use a seed from a parameter rather than from a document?

Also, I assume this uses the parameter caching so the instance of Random is created once and is kept the same through the execution of the script on each doc? If not, then maybe we should do this as it will make searches with the same seed reproducible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I guess this can't do the caching since we are not using our own function. I think we should implement our own random function that caches the instance of Random with the provided seed?

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have 2 small last questions

}

/** The leaf lookup for the Lucene segment this script was created for. */
protected final LeafSearchLookup getLeafLookup() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this accessed from somewhere? I don't see any uses in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjernst Thanks Ryan, I was copying this from a former SearchScript, this is removed now

* A script used for adjusting the score on a per document basis.
*/
public abstract class ScoreScript {
public abstract class ScoreScript implements ScorerAware {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for needing to add ScorerAware?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjernst Thanks Ryan, I was copying this from a former SearchScript, this is removed now

@mayya-sharipova
Copy link
Contributor Author

@colings86 I have added random functions as you suggested, can you please continue to review when you have available time.

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -1,3 +1,4 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errant newline?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search/Search Search-related issues that do not fall into other categories v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants