-
Notifications
You must be signed in to change notification settings - Fork 2.6k
LUCENE-9578: TermRangeQuery empty string lower bound edge case #1976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently a TermRangeQuery with the empty String ("") as lower bound and includeLower=false leads internally constructs an Automaton that doesn't match anything. This is unexpected expecially for open upper bounds where any string should be considered to be "higher" than the empty string. This PR changes "Automata#makeBinaryInterval" so that for an empty string lower bound and an open upper bound, any String should match the query regardless or the includeLower flag.
@jpountz thanks for the review, I added a commit that rejects the empty string when "includeMin == false" and added tests for that edge case as well. |
lucene/core/src/java/org/apache/lucene/util/automaton/Automata.java
Outdated
Show resolved
Hide resolved
@jpountz thanks for the review, I added another commit that should fix the problem you detected. |
* Returns a new (deterministic) automaton that accepts all binary terms except | ||
* the empty string. | ||
*/ | ||
public static Automaton makeAnyBinaryExceptEmpty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call makeNonEmptyBinary
?
@jpountz thanks, I renamed the method. btw. would this change also be backported to the latest 8.7 branch so we can use it in ES to fix elastic/elasticsearch#63386? |
The 8.7 branch has not been cut yet so this will be in 8.7. This change looked safe to me so I felt free to merge in spite of the branch being imminently cut. |
Currently a TermRangeQuery with the empty String ("") as lower bound and includeLower=false leads internally constructs an Automaton that doesn't match anything. This is unexpected expecially for open upper bounds where any string should be considered to be "higher" than the empty string. This PR changes "Automata#makeBinaryInterval" so that for an empty string lower bound and an open upper bound, any String should match the query regardless or the includeLower flag.
Currently a TermRangeQuery with the empty String ("") as lower bound and includeLower=false leads internally constructs an Automaton that doesn't match anything. This is unexpected expecially for open upper bounds where any string should be considered to be "higher" than the empty string. This PR changes "Automata#makeBinaryInterval" so that for an empty string lower bound and an open upper bound, any String should match the query regardless or the includeLower flag.
…e#1976) Currently a TermRangeQuery with the empty String ("") as lower bound and includeLower=false leads internally constructs an Automaton that doesn't match anything. This is unexpected expecially for open upper bounds where any string should be considered to be "higher" than the empty string. This PR changes "Automata#makeBinaryInterval" so that for an empty string lower bound and an open upper bound, any String should match the query regardless or the includeLower flag.
Description
Currently a TermRangeQuery with the empty String ("") as lower bound and
includeLower=false leads internally constructs an Automaton that doesn't match
anything. This is unexpected expecially for open upper bounds where any string
should be considered to be "higher" than the empty string.
Solution
This PR changes "Automata#makeBinaryInterval" so that for an empty string lower
bound and an open upper bound, any String should match the query regardless or
the includeLower flag.
Tests
Added two new tests to
TestAutomaton
.Checklist
Please review the following and check all that apply:
master
branch../gradlew check