-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Synonyms break fuzziness #25518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm not sure we should do anything. I think a popular use-case of fuzzy queries is to search regardless of potential typos. If I'm assuming that the query is correct and I want to find docs that have the term mistyped, then applying synonyms might make sense. On the other hand, if the term is mistyped and I want to find documents that have the correct typing, then applying synonyms does not sound like a good idea anymore. Another perspective on this issue: If you apply synonyms, it means that you want to search based on the meaning of words, and it is likely that you are also applying stemming so that plurals and singulars are collapsed into the same token for instance. However stemming makes little sense in combination with fuzzy queries, or otherwise you could end up with weird things such as jumping being considered at a distance 2 of jam due to the removal of -ing suffixes. It gets even worse when the stemmer also changes letters, eg. they often replace trailing |
In which case surely synonyms and fuzziness should be completely incompatible, and throw an error, rather than silently not applying fuzziness to words with synonyms? In my use case, which prompted this bug report, we definitely need to do both. If synonyms and fuzziness become incompatible we would bool together one query for each, but this would confuse the scoring in correctly spelt cases so not ideal. |
Yeah, there are multiple cases like that, for instance prefix queries make little sense if the analyzer has edge ngrams, wildcard queries make little sense if the analyzer has a stemmer, etc. but analyzers are totally opaque (for good reasons) so we can't check for which filters they wrap. |
Fair enough. It does seem that synonyms and fuzziness have a sensible way that they could behave together, even if in most cases it's a bad idea, though. |
@jpountz after reading your comments, I don't know if we should keep this issue open. Do you think there is anything we should do here? Or maybe add another round of discussion? |
I understand this might be a bit controversial, so I did not want to close the issue right away. @jimczi Do you have an opinion about this? |
FWIW I'd like to add that I also have a use case where I'd like to be able to perform synonym expansion followed by fuzzy matching. Basically I'd like to define the synonyms |
Sorry I missed the ping. I agree with @jpountz, applying fuzziness to query-time synonyms sounds weird to me. If a synonym rule matches an input it means that the input is correctly spelt and that the expansion should match exact terms. The other problem is that we can't differentiate the original term(s) from its synonyms after the analysis so we can't apply fuzziness to the input terms only (which IMO would be an acceptable solution) and it becomes even harder if the synonyms is multi-word. |
cc @elastic/es-search-aggs |
Maybe make it a documentation issue @romseygeek. My mind hasn't changed but I can understand how this can be confusing. |
I documented this in #40783 |
@romseygeek Suppose we have goooglerandom in query, and term is indexed as google. I apply edge_gram to separate out gooogle from the string. Now I need to apply fuzzy on this to return match with google. What would you suggest? |
Elasticsearch version: 5.1
Plugins installed: []
JVM version (
java -version
): Oracle 1.8OS version (
uname -a
if on a Unix-like system): OSXDescription of the problem including expected versus actual behavior:
When searching for a term that is in a list of synonyms, the query will not give results for terms in documents that would normally match a query with a fuzziness value. The query returns the expected document after removing the search term from the synonyms list.
Below is a reproduction of the issue. This occurs if the search analyzer is specified in the query or if it is defined on the field in the mapping.
The text was updated successfully, but these errors were encountered: