Skip to content

LUCENE-9578: TermRangeQuery empty string lower bound edge case #1976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 16, 2020

Conversation

cbuescher
Copy link
Contributor

Description

Currently a TermRangeQuery with the empty String ("") as lower bound and
includeLower=false leads internally constructs an Automaton that doesn't match
anything. This is unexpected expecially for open upper bounds where any string
should be considered to be "higher" than the empty string.

Solution

This PR changes "Automata#makeBinaryInterval" so that for an empty string lower
bound and an open upper bound, any String should match the query regardless or
the includeLower flag.

Tests

Added two new tests to TestAutomaton.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the master branch.
  • I have run ./gradlew check
  • I have added tests for my changes.

Currently a TermRangeQuery with the empty String ("") as lower bound and
includeLower=false leads internally constructs an Automaton that doesn't match
anything. This is unexpected expecially for open upper bounds where any string
should be considered to be "higher" than the empty string.

This PR changes "Automata#makeBinaryInterval" so that for an empty string lower
bound and an open upper bound, any String should match the query regardless or
the includeLower flag.
@cbuescher
Copy link
Contributor Author

@jpountz thanks for the review, I added a commit that rejects the empty string when "includeMin == false" and added tests for that edge case as well.

@cbuescher
Copy link
Contributor Author

@jpountz thanks for the review, I added another commit that should fix the problem you detected.

* Returns a new (deterministic) automaton that accepts all binary terms except
* the empty string.
*/
public static Automaton makeAnyBinaryExceptEmpty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call makeNonEmptyBinary?

@jpountz
Copy link
Contributor

jpountz commented Oct 16, 2020

I had a deeper look to check whether this could be an indicator of other bugs lurking, but it doesn't seem so and the fix looks good to me. For reference here's what the automaton looks like today with an empty string as a min bound and includeMin==false before the above fix.
empty

@cbuescher
Copy link
Contributor Author

@jpountz thanks, I renamed the method. btw. would this change also be backported to the latest 8.7 branch so we can use it in ES to fix elastic/elasticsearch#63386?

@jpountz jpountz merged commit 48348ae into apache:master Oct 16, 2020
@jpountz
Copy link
Contributor

jpountz commented Oct 16, 2020

The 8.7 branch has not been cut yet so this will be in 8.7. This change looked safe to me so I felt free to merge in spite of the branch being imminently cut.

jpountz pushed a commit that referenced this pull request Oct 16, 2020
Currently a TermRangeQuery with the empty String ("") as lower bound and
includeLower=false leads internally constructs an Automaton that doesn't match
anything. This is unexpected expecially for open upper bounds where any string
should be considered to be "higher" than the empty string.

This PR changes "Automata#makeBinaryInterval" so that for an empty string lower
bound and an open upper bound, any String should match the query regardless or
the includeLower flag.
asfgit pushed a commit that referenced this pull request Oct 26, 2020
Currently a TermRangeQuery with the empty String ("") as lower bound and
includeLower=false leads internally constructs an Automaton that doesn't match
anything. This is unexpected expecially for open upper bounds where any string
should be considered to be "higher" than the empty string.

This PR changes "Automata#makeBinaryInterval" so that for an empty string lower
bound and an open upper bound, any String should match the query regardless or
the includeLower flag.
msfroh pushed a commit to msfroh/lucene-solr that referenced this pull request Nov 18, 2020
…e#1976)

Currently a TermRangeQuery with the empty String ("") as lower bound and
includeLower=false leads internally constructs an Automaton that doesn't match
anything. This is unexpected expecially for open upper bounds where any string
should be considered to be "higher" than the empty string.

This PR changes "Automata#makeBinaryInterval" so that for an empty string lower
bound and an open upper bound, any String should match the query regardless or
the includeLower flag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants