Skip to content

Conversation

@cbuescher
Copy link
Contributor

Description

Currently a TermRangeQuery with the empty String ("") as lower bound and
includeLower=false leads internally constructs an Automaton that doesn't match
anything. This is unexpected expecially for open upper bounds where any string
should be considered to be "higher" than the empty string.

Solution

This PR changes "Automata#makeBinaryInterval" so that for an empty string lower
bound and an open upper bound, any String should match the query regardless or
the includeLower flag.

Tests

Added two new tests to TestAutomaton.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the master branch.
  • I have run ./gradlew check
  • I have added tests for my changes.

Currently a TermRangeQuery with the empty String ("") as lower bound and
includeLower=false leads internally constructs an Automaton that doesn't match
anything. This is unexpected expecially for open upper bounds where any string
should be considered to be "higher" than the empty string.

This PR changes "Automata#makeBinaryInterval" so that for an empty string lower
bound and an open upper bound, any String should match the query regardless or
the includeLower flag.
@cbuescher
Copy link
Contributor Author

@jpountz thanks for the review, I added a commit that rejects the empty string when "includeMin == false" and added tests for that edge case as well.

@cbuescher
Copy link
Contributor Author

@jpountz thanks for the review, I added another commit that should fix the problem you detected.

* Returns a new (deterministic) automaton that accepts all binary terms except
* the empty string.
*/
public static Automaton makeAnyBinaryExceptEmpty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call makeNonEmptyBinary?

@jpountz
Copy link
Contributor

jpountz commented Oct 16, 2020

I had a deeper look to check whether this could be an indicator of other bugs lurking, but it doesn't seem so and the fix looks good to me. For reference here's what the automaton looks like today with an empty string as a min bound and includeMin==false before the above fix.
empty

@cbuescher
Copy link
Contributor Author

@jpountz thanks, I renamed the method. btw. would this change also be backported to the latest 8.7 branch so we can use it in ES to fix elastic/elasticsearch#63386?

@jpountz jpountz merged commit 48348ae into apache:master Oct 16, 2020
@jpountz
Copy link
Contributor

jpountz commented Oct 16, 2020

The 8.7 branch has not been cut yet so this will be in 8.7. This change looked safe to me so I felt free to merge in spite of the branch being imminently cut.

jpountz pushed a commit that referenced this pull request Oct 16, 2020
Currently a TermRangeQuery with the empty String ("") as lower bound and
includeLower=false leads internally constructs an Automaton that doesn't match
anything. This is unexpected expecially for open upper bounds where any string
should be considered to be "higher" than the empty string.

This PR changes "Automata#makeBinaryInterval" so that for an empty string lower
bound and an open upper bound, any String should match the query regardless or
the includeLower flag.
asfgit pushed a commit that referenced this pull request Oct 26, 2020
Currently a TermRangeQuery with the empty String ("") as lower bound and
includeLower=false leads internally constructs an Automaton that doesn't match
anything. This is unexpected expecially for open upper bounds where any string
should be considered to be "higher" than the empty string.

This PR changes "Automata#makeBinaryInterval" so that for an empty string lower
bound and an open upper bound, any String should match the query regardless or
the includeLower flag.
msfroh pushed a commit to msfroh/lucene-solr that referenced this pull request Nov 18, 2020
…e#1976)

Currently a TermRangeQuery with the empty String ("") as lower bound and
includeLower=false leads internally constructs an Automaton that doesn't match
anything. This is unexpected expecially for open upper bounds where any string
should be considered to be "higher" than the empty string.

This PR changes "Automata#makeBinaryInterval" so that for an empty string lower
bound and an open upper bound, any String should match the query regardless or
the includeLower flag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants