-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Tokens generated after token filters ignore match query operator option #25746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@pmishev Could you elaborate on what you think it wrong with the behaviour in 5.5.? Looking at your uax_url_email tokenizer and the subsequent pattern filter, the query looks okay to me:
As documented, the Patter Capture Token Filter emmits all tokens it produces in the same position, and with the same character offsets. This is the cause for the "+Synonym" in the explanation output. |
@cbuescher, to illustrate what I mean:
That query will return the document. |
There is a note at the end of the documentation for the
So the query parser thinks that all these tokens are at the same position and build them as synonyms. I think it should be clearly stated in the docs that each token will be considered as a full replacement for the email address. Bottom line is that this is the expected behavior with this token filter. |
Thank you for clarifying that. That explains a lot. Perhaps when Alternatively perhaps a |
There was some confusion about the fact that tokens emitted from a Pattern Capture Token Filter are treated as synonyms when used to analyze a search query. This commit adds an explanation to the note in the docs to emphasize this behaviour. Closes elastic#25746
Elasticsearch version: 2.4, 5.5
Plugins installed: []
JVM version: 1.8.0_131
OS version: 4.4.0-81-generic #104-Ubuntu x86_64
Description of the problem including expected versus actual behavior:
When
"operator": "and"
is specified in a match query, ALL tokens generated by the search analyzer should be looked for in the indexed tokens.However tokens generated by token filters are behaving differently. It is looking for ANY of those tokens.
Steps to reproduce:
gives
"explanation": "+(+email2:somebody +email2:we +email2:example.com) #ConstantScore(+ConstantScore(_type:emails))"
, which is correctgives
"explanation": "+(email1:[email protected] email1:somebody email1:we email1:example email1:com) #ConstantScore(+ConstantScore(_type:emails))"
in ES 2.4or
"explanation": "+Synonym(email1:com email1:example email1:somebody email1:[email protected] email1:we) #_type:emails"
in ES 5.5I couldn't find a reason for such behaviour documented anywhere and I believe it is wrong and the correct explanation should be:
+(+email1:[email protected] +email1:somebody +email1:we +email1:example +email1:com) #ConstantScore(+ConstantScore(_type:emails))
The text was updated successfully, but these errors were encountered: