Add support for selecting percolator query candidate matches containing wildcard / prefix queries #25351

martijnvg · 2017-06-22T11:27:58Z

At index time the percolator tries to extract the longest string that doesn't contain a ? or * from the wildcard expression. At search time each query term is expanded into all possible suffixes and then each suffix is turned in all possible prefixes, this to match with any possible extracted wildcard expression.

This can speed evaluating percolator queries containing wildcard queries as without this change a lot of times all these percolator queries need to be evaluated all time irregardless if they have no chance of ever matching.

…hes containing wildcard and prefix queries. At index time the percolator tries to extract the longest string that doesn't contain a `?` or `*` from the wildcard expression. At search time each query term is expanded into all possible suffixes and then each suffix is turned in all possible prefixes, this to match with any possible extracted wildcard expression. This can speed evaluating percolator queries containing wildcard queries as without this change a lot of times all these percolator queries need to be evaluated all time irregardless if they have no chance of ever matching.

jpountz · 2017-07-06T10:26:02Z

I'm worried this could lead to very large candidate queries if the input document is not tiny?

martijnvg · 2017-07-06T12:02:10Z

@jpountz Good point. Perhaps we can build in a limitation? If we insert a special token in the wildcard query terms field to identify all percolator queries with prefix/wildcard queries. At query time if we detect that we create to many suffix terms (50?) or suffix terms or too long (25?) then we just use the special token instead.

jpountz · 2017-07-06T12:41:49Z

I don't know. My gut feeling is that things can degrade pretty quicly. Even if we only extract substrings of length eg. 4, a token of length 20 in a document would generate 20-4 = 16 underlying terms for the candidate query. My gut feeling is that even simple documents already trigger the creation of non trivial candidate queries. I'm a bit worried of making them even more complex.

Maybe the right thing to do is to leave it up to the users? They would just have to use (edge) ngrams in their index analyzers?

martijnvg · 2017-07-06T18:09:50Z

Maybe the right thing to do is to leave it up to the users? They would just have to use (edge) ngrams in their index analyzers?

Right, maybe that is better. Also it would be clearer why percolation is slower instead of when the percolator is doing what this PR is doing. I'll add some documentation around this. It does mean that wildcard and prefix queries would need to be substituted with term queries in the percolator queries.

martijnvg · 2017-08-10T12:27:59Z

Closing this PR as it can have negative performance impact.

martijnvg added :Search Relevance/Percolator Reverse search: find queries that match a document WIP >enhancement review v6.0.0 and removed WIP labels Jun 22, 2017

martijnvg force-pushed the percolator_wildcard_query_support branch from 8c32178 to e48de2c Compare June 26, 2017 18:19

martijnvg mentioned this pull request Jun 28, 2017

Improve percolator performance #25445

Closed

9 tasks

martijnvg force-pushed the percolator_wildcard_query_support branch from e48de2c to 252fdc2 Compare July 6, 2017 10:14

jpountz self-requested a review July 6, 2017 10:15

martijnvg closed this Aug 10, 2017

colings86 added 6.0.0-beta2 v6.0.0-beta2 and removed v6.0.0 6.0.0-beta2 labels Aug 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for selecting percolator query candidate matches containing wildcard / prefix queries #25351

Add support for selecting percolator query candidate matches containing wildcard / prefix queries #25351

Uh oh!

martijnvg commented Jun 22, 2017 •

edited

Loading

Uh oh!

jpountz commented Jul 6, 2017

Uh oh!

martijnvg commented Jul 6, 2017

Uh oh!

jpountz commented Jul 6, 2017

Uh oh!

martijnvg commented Jul 6, 2017

Uh oh!

martijnvg commented Aug 10, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add support for selecting percolator query candidate matches containing wildcard / prefix queries #25351

Add support for selecting percolator query candidate matches containing wildcard / prefix queries #25351

Uh oh!

Conversation

martijnvg commented Jun 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpountz commented Jul 6, 2017

Uh oh!

martijnvg commented Jul 6, 2017

Uh oh!

jpountz commented Jul 6, 2017

Uh oh!

martijnvg commented Jul 6, 2017

Uh oh!

martijnvg commented Aug 10, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

martijnvg commented Jun 22, 2017 •

edited

Loading