-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Disable graph analysis at query time for shingle and cjk filters producing tokens of different size #23920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ucing tokens of different size Shingle filters that produce shingles of different size and CJK filters that produce bigram AND unigram are problematic when we analyze the graph they produce. The position for each shingle size are not aligned so each position has at least two side paths. So in order to avoid paths explosion this change disables the graph analysis at query time for field analyzers that contain these filters with a problematic configuration. Closes #23918
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good. I'm wondering we should also test that we keep processing the query like before if the shingles are sane?
builder.splitOnWhitespace(false); | ||
Query query = builder.doToQuery(shardContext); | ||
assertThat(expectedQuery, equalTo(query)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also check that the correct query is built if the analyzer is sane and does not output unigrams?
@jpountz I pushed some changes to address your comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Shingle filters that produce shingles of different size and CJK filters that produce bigram AND unigram are problematic when
we analyze the graph they produce. The position for each shingle size are not aligned so each position has at least two side paths.
So in order to avoid paths explosion this change disables the graph analysis at query time for field analyzers that contain these filters
with a problematic configuration.
Closes #23918