Skip to content

Add functions to the Script context for a new Script Score Query #30303

@mayya-sharipova

Description

@mayya-sharipova

We are designing a new Script Score Query (SSQ) to replace Function Score Query (FSQ). The goal of SSQ is to have the same (and possibly more) functionalities as FSQ available only through painless script. For this, we would like to add the below functions to painless. They can be available either in the SearchScript or a specifically designed for scoring ScoringScript.

Random score

Similar to random_score in FSQ:

  • generates scores [0, 1]
  • by default, uses Lucene doc ids as a source of randomness (efficient, not reproducible). To make reproducible we need: seed, field (min value for this doc) and salt (function of index name and shard).
"script" : {
  "source" : "random_score(params.seed, doc['field'])",
  "params": {"seed": 10}
}

Currently painless allows to generate random values in the way below, but it is bulky, and not the exact reproduction of random score in FSQ:

"script" : {
  "source" : "Random rnd = new Random(); rnd.setSeed(doc['field'].value); rnd.nextFloat()"
}

Math functions

We would like to introduce a shorter version of the following functions useful for score calculations:

  • log: Math.log10(doc['f'].value) -> log(doc['f'].value)
  • log1p: Math.log10(doc['f'].value + 1) ->log1p(doc['field'].value)
  • log2p: Math.log10(doc['f'].value + 2) -> log2p(doc['f'].value)
  • ln: Math.log(doc['f'].value) -> ln(doc['f'].value)
  • ln1p: Math.log1p(doc['f'].value + 1) -> ln1p(doc['f'].value)
  • ln2p: Math.log(doc['f'].value + 2) -> ln2p(doc['f'].value)
  • square: Math.pow(doc['f'].value, 2) -> square(doc['f'].value)
  • sqrt: Math.sqrt(doc['f'].value) -> sqrt(doc['f'].value)
  • reciprocal 1/value :1.0 / doc['f'].value -> reciprocal(doc['f'].value)
  • rational value/(k + value) :doc['f'].value / (k + doc['f'].value) -> rational(doc['f'].value, k)
  • sigmoid valuea/ (ka + valuea): Math.pow(doc['f'].value,a) / (Math.pow(k,a) + Math.pow(doc['f'].value,a)) -> sigmoid(doc['f'].value, k, a)

Decay functions

Similar to decay functions in FSQ:

  • decay_gauss
  • decay_exp
  • decay_linear

Proposed API:

"script" : {
  "source" : "decay_gauss(doc['date'], params.origin, params.scale, params.offset, params.decay)",
  "params": {
    "origin": "2013-09-17", 
    "scale": "10d", 
    "offset": "5d",
    "decay" : 0.5
  }
}
"script" : {
  "source" : "decay_linear(doc['geo'], params.origin, params.scale, params.offset, params.decay)",
  "params": {
    "origin": "11, 12", 
    "scale": "2km", 
    "offset": "0km",
    "decay" : 0.33
  }
}

Investigate how to parse date and geo parameters only one per query, and don't do the parsing for every document (store in context?).

Normalization functions ???

  • _max_score in the rescore context?
  • Similar to ScaleFloatFunction in Lucene, SOLR’s scale function: scale(x, minTarget, maxTarget): scale the values of x, such that all values will be between minTarget and maxTarget ?

Other functions ???

  • _index lucene terms stats (doc count, doc frequency, tf, total term frequency), e.g. _index[‘text’][‘word’].tf()
  • Index wide statistics similar to DFS_QUERY_THEN_FETCH
  • payloads
  • matches (see IntervalQuery)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions