Skip to content

Conversation

@romseygeek
Copy link
Contributor

When the query analyzer examines a conjunction containing both terms and ranges,
it should only include ranges in the minimum_should_match calculation if there are no
other range queries on that same field within the conjunction. This is because we cannot
build a selection query over disjoint ranges on the same field, and it is not easy to check
if two range queries have an overlap.

The current logic to calculate this just sets minimum_should_match to 1 or 0, dependent
on whether or not the current range is over a field that has already been seen. However, this
can be incorrect in the case that there are terms in the same match group which adjust the
minimum_should_match downwards. Instead, the logic should be changed to match the
terms extraction, whereby we adjust minimum_should_match downwards if we have already
seen a range field.

Fixes #49684

@romseygeek romseygeek added >bug :Search Relevance/Percolator Reverse search: find queries that match a document v8.0.0 v7.6.0 labels Dec 3, 2019
@romseygeek romseygeek requested a review from jpountz December 3, 2019 17:29
@romseygeek romseygeek self-assigned this Dec 3, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Percolator)

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is tricky. :)

.build();

Result r = analyze(disj, Version.CURRENT);
assertThat(r.minimumShouldMatch, equalTo(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you count the number of extractions and check verified for all queries?

.build();

result = analyze(q2, Version.CURRENT);
assertThat(result.minimumShouldMatch, equalTo(2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's maybe also have a test for the case that there are multiple range queries on the same field, or multiple range queries on different fields?

@romseygeek
Copy link
Contributor Author

Thanks for the review @jpountz . I added more tests, and discovered that the logic was still not quite correct - tricky, as you say... We now only check that range fields from a particular set of extractions have not been seen in other results. Multiple identical range fields within a single Result extraction set will have already been dealt with, so don't require any adjustment to the msm.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It's hard for me to reason about all possible corner cases, but the tests give me some confidence that it's at least more correct than the previous logic.

}
// add range fields from this Result to the seenRangeFields set so that minimumShouldMatch is correctly
// calculated for subsequent Results
result.extractions.stream().filter(e -> e.range != null).forEach(e -> seenRangeFields.add(e.range.fieldName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I generally have a preference for using method refs when possible and using new lines when chaining, e.g.

Suggested change
result.extractions.stream().filter(e -> e.range != null).forEach(e -> seenRangeFields.add(e.range.fieldName));
result.extractions.stream()
.filter(Objects::nonNull)
.map(e -> e.range)
.forEach(seenRangeFields::add);

@romseygeek romseygeek merged commit fcae55a into elastic:master Dec 10, 2019
@romseygeek romseygeek deleted the percolator-test-failure branch December 10, 2019 10:44
romseygeek added a commit that referenced this pull request Dec 10, 2019
…49803)

When the query analyzer examines a conjunction containing both terms and ranges,
it should only include ranges in the minimum_should_match calculation if there are no
other range queries on that same field within the conjunction. This is because we cannot
build a selection query over disjoint ranges on the same field, and it is not easy to check
if two range queries have an overlap.

The current logic to calculate this just sets minimum_should_match to 1 or 0, dependent
on whether or not the current range is over a field that has already been seen. However, this
can be incorrect in the case that there are terms in the same match group which adjust the
minimum_should_match downwards. Instead, the logic should be changed to match the
terms extraction, whereby we adjust minimum_should_match downwards if we have already
seen a range field.

Fixes #49684
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
…lastic#49803)

When the query analyzer examines a conjunction containing both terms and ranges,
it should only include ranges in the minimum_should_match calculation if there are no
other range queries on that same field within the conjunction. This is because we cannot
build a selection query over disjoint ranges on the same field, and it is not easy to check
if two range queries have an overlap.

The current logic to calculate this just sets minimum_should_match to 1 or 0, dependent
on whether or not the current range is over a field that has already been seen. However, this
can be incorrect in the case that there are terms in the same match group which adjust the
minimum_should_match downwards. Instead, the logic should be changed to match the
terms extraction, whereby we adjust minimum_should_match downwards if we have already
seen a range field.

Fixes elastic#49684
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Search Relevance/Percolator Reverse search: find queries that match a document v7.6.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] org.elasticsearch.percolator.CandidateQueryTests.testDuel2 failure

4 participants