Skip to content

Conversation

@markharwood
Copy link
Contributor

@markharwood markharwood commented Mar 16, 2021

Wrap the wildcard queries produced by the wildcard field as a constant scoring query because there is no sensible index to use for scoring terms with their frequencies.
Added a note about ignoring rewrite parameters on wildcard queries.

Also includes fix where case insensitive/sensitive wildcard queries cached on same key. Run WIldcardFieldMapperTest with seed -Dtests.seed=722EAA9077BDA6AE to reveal the issue.

Closes #69604

@markharwood markharwood added >enhancement :Search Relevance/Ranking Scoring, rescoring, rank evaluation. labels Mar 16, 2021
@markharwood markharwood self-assigned this Mar 16, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Mar 16, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@markharwood
Copy link
Contributor Author

Thanks to @jimczi for discovering the issue revealed as part of adding ConstantScoreQuery - there was a caching issue with AutomatonQueryOnBinaryDV objects due to not implementing hashcode/equals correctly for case sensitivity options. This PR also now includes a fix for that and related test addition.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments.

@markharwood
Copy link
Contributor Author

markharwood commented Mar 18, 2021

One thing I wasn't sure on was when the AutomatonQueryOnBinaryDV should convert the Automaton to a ByteRunAutomaton.
In the constructor or with every call to hashcode/equals/createWeight?

A ByteRunAutomaton looks to use a lot more memory than the source Automaton so might be expensive memory-wise to hold as a field when used as a cache key. The alternative of converting Automaton to ByteRunAutomaton every time we want to use it might slow down comparisons in hash look-ups. I've gone with the latter but not sure if that's the better trade-off.

@markharwood markharwood force-pushed the fix/69604 branch 2 times, most recently from 3c79796 to 7b5c86c Compare March 24, 2021 10:55
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment regarding the building of the BytesRunAutomaton. LGTM otherwise.

@markharwood markharwood merged commit 2f9c731 into elastic:master Mar 30, 2021
markharwood added a commit to markharwood/elasticsearch that referenced this pull request Mar 30, 2021
…d queries and caching fix (elastic#70452)

* Make wildcard field use constant scoring queries for wildcard queries. Add a note about ignoring rewrite parameters on wildcard queries.

Also fixes caching issue where case sensitive and case insensitive results were cached as the same

Closes elastic#69604
markharwood added a commit that referenced this pull request Mar 30, 2021
…d queries and caching fix (#70452) (#71043)

Backport of 2f9c731
* Make wildcard field use constant scoring queries for wildcard queries. Add a note about ignoring rewrite parameters on wildcard queries.

Also fixes caching issue where case sensitive and case insensitive results were cached as the same

Closes #69604
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug >enhancement :Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Search Meta label for search team v7.13.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wildcard field should use constant_score and reject wildcard query rewrite param

5 participants