Skip to content

Conversation

@martijnvg
Copy link
Member

and extract all clauses from a conjunction query.

When clauses from a conjunction are extracted the number of clauses is
also stored in an internal doc values field (minimum_should_match field).
This field is used by the CoveringQuery and allows the percolator to
reduce the number of false positives when selecting candidate matches and
in certain cases be absolutely sure that a conjunction candidate match
will match and then skip MemoryIndex validation. This can greatly improve
performance.

Before this change only a single clause was extracted from a conjunction
query. The percolator tried to extract the clauses that was rarest in order
(based on term length) to attempt less candidate queries to be selected
in the first place. However this still method there is still a very high
chance that candidate query matches are false positives.

This change also removes the influencing query extraction added via #26081
as this is no longer needed because now all conjunction clauses are extracted.

https://www.elastic.co/guide/en/elasticsearch/reference/6.x/percolator.html#_influencing_query_extraction

PR for #26307

@martijnvg martijnvg added :Search Relevance/Percolator Reverse search: find queries that match a document >enhancement v6.1.0 v7.0.0 review labels Nov 6, 2017
@martijnvg martijnvg requested review from colings86 and jimczi November 7, 2017 16:54
@martijnvg martijnvg force-pushed the percolator_use_CoveringQuery branch from 67666ce to dec88e1 Compare November 9, 2017 11:20
Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

extract all clauses from a conjunction query.

When clauses from a conjunction are extracted the number of clauses is
also stored in an internal doc values field (minimum_should_match field).
This field is used by the CoveringQuery and allows the percolator to
reduce the number of false positives when selecting candidate matches and
in certain cases be absolutely sure that a conjunction candidate match
will match and then skip MemoryIndex validation. This can greatly improve
performance.

Before this change only a single clause was extracted from a conjunction
query. The percolator tried to extract the clauses that was rarest in order
(based on term length) to attempt less candidate queries to be selected
in the first place. However this still method there is still a very high
chance that candidate query matches are false positives.

This change also removes the influencing query extraction added via elastic#26081
as this is no longer needed because now all conjunction clauses are extracted.

https://www.elastic.co/guide/en/elasticsearch/reference/6.x/percolator.html#_influencing_query_extraction

Closes elastic#26307
@martijnvg martijnvg force-pushed the percolator_use_CoveringQuery branch from dec88e1 to b4048b4 Compare November 10, 2017 06:45
@martijnvg martijnvg merged commit b4048b4 into elastic:master Nov 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search Relevance/Percolator Reverse search: find queries that match a document v6.1.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants