-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Use the bulk SimScorer#score API to compute impact scores. #15151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the bulk SimScorer#score API to compute impact scores. #15151
Conversation
In apache#15039 we introduced a bulk `SimScorer#score` API and used it to compute scores with the leading conjunctive clause and "essential" clauses of disjunctive queries. With this PR, we are now also using this bulk API when translating (term frequency, length normalization factor) pairs into the maximum possible score that a block of postings may produce. To do it right, I had to change the impacts API to no longer return a List of (term freq, norm) pairs, but instead two parallel arrays of term frequencies and norms that could (almost) directly be passed to the `SimScorer#score` bulk API. Unfortunately this makes the change quite big since many backward formats had to be touched.
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
|
wikibigall on my machine gives the following results: p-values are high due to quite high run-over-run variance, but queries that we'd have expected to get a speedup are at the bottom so it may give a tiny speedup in practice. |
gf2121
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a right direction to me though the improvement does not seems very significant. Thank you!
lucene/core/src/java/org/apache/lucene/search/SloppyPhraseMatcher.java
Outdated
Show resolved
Hide resolved
lucene/core/src/test/org/apache/lucene/search/TestPhraseQuery.java
Outdated
Show resolved
Hide resolved
lucene/core/src/test/org/apache/lucene/search/TestSynonymQuery.java
Outdated
Show resolved
Hide resolved
…her.java Co-authored-by: Guo Feng <[email protected]>
…java Co-authored-by: Guo Feng <[email protected]>
….java Co-authored-by: Guo Feng <[email protected]>
In #15039 we introduced a bulk `SimScorer#score` API and used it to compute scores with the leading conjunctive clause and "essential" clauses of disjunctive queries. With this PR, we are now also using this bulk API when translating (term frequency, length normalization factor) pairs into the maximum possible score that a block of postings may produce. To do it right, I had to change the impacts API to no longer return a List of (term freq, norm) pairs, but instead two parallel arrays of term frequencies and norms that could (almost) directly be passed to the `SimScorer#score` bulk API. Unfortunately this makes the change quite big since many backward formats had to be touched. Co-authored-by: Guo Feng <[email protected]>
In #15039 we introduced a bulk
SimScorer#scoreAPI and used it to compute scores with the leading conjunctive clause and "essential" clauses of disjunctive queries. With this PR, we are now also using this bulk API when translating (term frequency, length normalization factor) pairs into the maximum possible score that a block of postings may produce.To do it right, I had to change the impacts API to no longer return a List of (term freq, norm) pairs, but instead two parallel arrays of term frequencies and norms that could (almost) directly be passed to the
SimScorer#scorebulk API. Unfortunately this makes the change quite big since many backward formats had to be touched.