Skip to content

Conversation

@romseygeek
Copy link
Contributor

Currently, unordered interval matching does not check for duplicates,
which means that a query for to be or not to be can match a document
that contains the phrase to be or not, because the second to be matches
at the same position as the first and the AND interval algorithm does not
check for overlaps. This is counter-intuitive.

This commit adds a check to the interval builder, such that if it finds duplicates
when combining sources into an unordered AND, it combines those duplicates
into an ORDERED interval first; so to be or not to be becomes
UNORDERED(ORDERED(to, to), ORDERED(be, be), or, not)

@romseygeek romseygeek added >bug :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.6.0 labels Dec 2, 2019
@romseygeek romseygeek requested a review from jimczi December 2, 2019 17:24
@romseygeek romseygeek self-assigned this Dec 2, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@romseygeek
Copy link
Contributor Author

This really needs to be handled in lucene, as this solution doesn't correctly handle internal gaps in intervals with repeats. I've opened https://github.com/apache/lucene-solr/pull/1097/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Search/Search Search-related issues that do not fall into other categories

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants