Synonyms break fuzziness #25518

jjfalling · 2017-07-03T13:31:32Z

Elasticsearch version: 5.1

Plugins installed: []

JVM version (java -version): Oracle 1.8

OS version (uname -a if on a Unix-like system): OSX

Description of the problem including expected versus actual behavior:

When searching for a term that is in a list of synonyms, the query will not give results for terms in documents that would normally match a query with a fuzziness value. The query returns the expected document after removing the search term from the synonyms list.

Below is a reproduction of the issue. This occurs if the search analyzer is specified in the query or if it is defined on the field in the mapping.

PUT /test_index
{  
   "settings":{  
      "analysis":{  
         "analyzer":{  
            "synonym":{  
               "tokenizer":"standard",
               "filter":[
                  "apostrophe",
                  "synonym"
               ]
            }
         },
         "filter":{  
            "synonym":{  
               "type":"synonym",
               "synonyms":[  
                  "alf,fred"
               ]
            }
         }
      }
   }
}

PUT /test_index/person/1
{
  "name": "ali"
}

GET /test_index/person/_search
{  
   "query":{  
      "match":{  
         "name":{  
            "query":"alf",
            "analyzer":"synonym",
            "fuzziness":"AUTO"
         }
      }
   }
}

The text was updated successfully, but these errors were encountered:

jpountz · 2017-07-04T13:18:38Z

I'm not sure we should do anything. I think a popular use-case of fuzzy queries is to search regardless of potential typos. If I'm assuming that the query is correct and I want to find docs that have the term mistyped, then applying synonyms might make sense. On the other hand, if the term is mistyped and I want to find documents that have the correct typing, then applying synonyms does not sound like a good idea anymore.

Another perspective on this issue: If you apply synonyms, it means that you want to search based on the meaning of words, and it is likely that you are also applying stemming so that plurals and singulars are collapsed into the same token for instance. However stemming makes little sense in combination with fuzzy queries, or otherwise you could end up with weird things such as jumping being considered at a distance 2 of jam due to the removal of -ing suffixes. It gets even worse when the stemmer also changes letters, eg. they often replace trailing ys with is in english.

tobymiller · 2017-07-04T13:27:36Z

In which case surely synonyms and fuzziness should be completely incompatible, and throw an error, rather than silently not applying fuzziness to words with synonyms?

In my use case, which prompted this bug report, we definitely need to do both. If synonyms and fuzziness become incompatible we would bool together one query for each, but this would confuse the scoring in correctly spelt cases so not ideal.

jpountz · 2017-07-04T13:34:23Z

Yeah, there are multiple cases like that, for instance prefix queries make little sense if the analyzer has edge ngrams, wildcard queries make little sense if the analyzer has a stemmer, etc. but analyzers are totally opaque (for good reasons) so we can't check for which filters they wrap.

tobymiller · 2017-07-04T13:37:41Z

Fair enough. It does seem that synonyms and fuzziness have a sensible way that they could behave together, even if in most cases it's a bad idea, though.

cbuescher · 2017-07-10T08:51:18Z

@jpountz after reading your comments, I don't know if we should keep this issue open. Do you think there is anything we should do here? Or maybe add another round of discussion?

jpountz · 2017-07-10T09:00:09Z

I understand this might be a bit controversial, so I did not want to close the issue right away. @jimczi Do you have an opinion about this?

al · 2017-07-25T13:39:46Z

FWIW I'd like to add that I also have a use case where I'd like to be able to perform synonym expansion followed by fuzzy matching. Basically I'd like to define the synonyms google,alphabet so that a search for google would match google, and alphabet, but also gooogle, alpabet, etc.

jimczi · 2017-07-28T07:16:14Z

Sorry I missed the ping. I agree with @jpountz, applying fuzziness to query-time synonyms sounds weird to me. If a synonym rule matches an input it means that the input is correctly spelt and that the expansion should match exact terms. The other problem is that we can't differentiate the original term(s) from its synonyms after the analysis so we can't apply fuzziness to the input terms only (which IMO would be an acceptable solution) and it becomes even harder if the synonyms is multi-word.
The workaround is to use index-time synonyms that would index both terms google, alphabet when google or alphabet are encountered in the documents. The fuzziness would work for a query like google which is expected but also for a query like alpabet with fuzziness enabled.

romseygeek · 2018-03-14T10:48:45Z

cc @elastic/es-search-aggs

romseygeek · 2019-04-03T12:13:07Z

It looks as though we have a consensus here that we want to keep the existing behaviour. Shall I close this @jimczi @jpountz ?

jpountz · 2019-04-03T13:07:24Z

Maybe make it a documentation issue @romseygeek. My mind hasn't changed but I can understand how this can be confusing.

Relates to elastic#25518

Relates to #25518

romseygeek · 2019-04-04T08:11:25Z

I documented this in #40783

Relates to #25518 #41592

…40783) Relates to elastic#25518

zainabtareen · 2021-04-06T06:52:51Z

@romseygeek Suppose we have goooglerandom in query, and term is indexed as google. I apply edge_gram to separate out gooogle from the string. Now I need to apply fuzzy on this to return match with google. What would you suggest?

cbuescher added :Search Relevance/Analysis How text is split into tokens :Search/Search Search-related issues that do not fall into other categories labels Jul 10, 2017

colings86 added the >bug label Apr 24, 2018

weberhofer mentioned this issue Jun 14, 2018

analyze_wildcard is ignored when using synonym_graph filter #31335

Closed

romseygeek added a commit to romseygeek/elasticsearch that referenced this issue Apr 3, 2019

Document restrictions on fuzzy matching when using synonyms

a3e7327

Relates to elastic#25518

romseygeek mentioned this issue Apr 3, 2019

Document restrictions on fuzzy matching when using synonyms #40783

Merged

romseygeek added a commit that referenced this issue Apr 4, 2019

Document restrictions on fuzzy matching when using synonyms (#40783)

8d5b75e

Relates to #25518

romseygeek added a commit that referenced this issue Apr 4, 2019

Document restrictions on fuzzy matching when using synonyms (#40783)

8ca7325

Relates to #25518

romseygeek closed this as completed Apr 4, 2019

rwaight mentioned this issue Apr 26, 2019

[DOCS] Update version-specific documents with restrictions on fuzzy matching when using synonyms #41592

Closed

debadair pushed a commit that referenced this issue Apr 26, 2019

Document restrictions on fuzzy matching when using synonyms (#40783)

d7ab86d

Relates to #25518 #41592

debadair pushed a commit that referenced this issue Apr 26, 2019

Document restrictions on fuzzy matching when using synonyms (#40783)

ca998d2

Relates to #25518 #41592

debadair pushed a commit that referenced this issue Apr 26, 2019

Document restrictions on fuzzy matching when using synonyms (#40783)

7d1db91

Relates to #25518 #41592

debadair pushed a commit that referenced this issue Apr 26, 2019

Document restrictions on fuzzy matching when using synonyms (#40783)

06bf8a8

Relates to #25518 #41592

debadair pushed a commit that referenced this issue Apr 26, 2019

Document restrictions on fuzzy matching when using synonyms (#40783)

dad6400

Relates to #25518 #41592

debadair pushed a commit that referenced this issue Apr 26, 2019

Document restrictions on fuzzy matching when using synonyms (#40783)

d05a0a2

Relates to #25518 #41592

debadair pushed a commit that referenced this issue Apr 26, 2019

Document restrictions on fuzzy matching when using synonyms (#40783)

33d6898

Relates to #25518 #41592

debadair pushed a commit that referenced this issue Apr 26, 2019

Document restrictions on fuzzy matching when using synonyms (#40783)

9896e51

Relates to #25518 #41592

debadair pushed a commit that referenced this issue Apr 26, 2019

Document restrictions on fuzzy matching when using synonyms (#40783)

bddec75

Relates to #25518 #41592

debadair pushed a commit that referenced this issue Apr 26, 2019

Document restrictions on fuzzy matching when using synonyms (#40783)

63108e8

Relates to #25518 #41592

gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019

Document restrictions on fuzzy matching when using synonyms (elastic#…

fe5fdf7

…40783) Relates to elastic#25518

javanna added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Synonyms break fuzziness #25518

Synonyms break fuzziness #25518

jjfalling commented Jul 3, 2017

jpountz commented Jul 4, 2017

Uh oh!

tobymiller commented Jul 4, 2017

Uh oh!

jpountz commented Jul 4, 2017

Uh oh!

tobymiller commented Jul 4, 2017

Uh oh!

cbuescher commented Jul 10, 2017

Uh oh!

jpountz commented Jul 10, 2017

Uh oh!

al commented Jul 25, 2017

Uh oh!

jimczi commented Jul 28, 2017

Uh oh!

romseygeek commented Mar 14, 2018

Uh oh!

romseygeek commented Apr 3, 2019

Uh oh!

jpountz commented Apr 3, 2019

Uh oh!

romseygeek commented Apr 4, 2019

Uh oh!

zainabtareen commented Apr 6, 2021

Uh oh!

Synonyms break fuzziness #25518

Synonyms break fuzziness #25518

Comments

jjfalling commented Jul 3, 2017

jpountz commented Jul 4, 2017

Uh oh!

tobymiller commented Jul 4, 2017

Uh oh!

jpountz commented Jul 4, 2017

Uh oh!

tobymiller commented Jul 4, 2017

Uh oh!

cbuescher commented Jul 10, 2017

Uh oh!

jpountz commented Jul 10, 2017

Uh oh!

al commented Jul 25, 2017

Uh oh!

jimczi commented Jul 28, 2017

Uh oh!

romseygeek commented Mar 14, 2018

Uh oh!

romseygeek commented Apr 3, 2019

Uh oh!

jpountz commented Apr 3, 2019

Uh oh!

romseygeek commented Apr 4, 2019

Uh oh!

zainabtareen commented Apr 6, 2021

Uh oh!