Skip to content

Issue querying #1505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zeitler opened this issue May 13, 2021 · 4 comments
Closed

Issue querying #1505

zeitler opened this issue May 13, 2021 · 4 comments

Comments

@zeitler
Copy link

zeitler commented May 13, 2021

Hi
I've googling and I'm having a lot of problem getting how to query properly.

Having:

*********** models.py *******************
class Account(models.Model):
name = models.CharField(max_length=128, db_index=True)
description = models.TextField(blank=True, null=True)
email = models.EmailField(max_length=254, db_index=True)
....

********* analyzers.py *********************
def ngram_filter(min=2, max=15):
return token_filter(
f"ngram_{min}_{max}_filter",
type="ngram",
min_gram=min,
max_gram=max
)

def edge_gram_filter(min=2, max=15):
return token_filter(
f"edgegram_{min}_{max}_filter",
type="edge_ngram",
min_gram=min,
max_gram=max
)

def stop_words_filter():
return [
english_stop_words_filter,
portuguese_stop_words_filter,
]

filters = stop_words_filter()
filters.append(ngram_filter(3, 3))
filters.append(edge_gram_filter(2, 15))
full_searchable_analyzer = analyzer(
"full_searchable_analyzer",
tokenizer="keyword",
filter=filters
)
full_searchable_analyzer = analyzer(
"full_searchable_analyzer",
tokenizer="keyword",
filter=filters
)
string_sort_analyzer = analyzer(
'string_sort',
type="keyword",
filter=[
"lowercase",
]
)

************** documents.py ********************
@registry.register_document
class AccountDocument(Document):
name = fields.TextField(
attr="name",
fields={
'raw': fields.TextField(
analyzer=full_searchable_analyzer,
search_analyzer=string_sort_analyzer
),
'suggest': fields.CompletionField(),
}
)
description = fields.TextField(
fields={
'raw': fields.TextField(
analyzer=string_sort_analyzer,
search_analyzer=string_sort_analyzer
),
'suggest': fields.CompletionField(),
}
)
class Index:
name = 'accounts'
settings = {'number_of_shards': 1,
'number_of_replicas': 0}
class Django:
model = Account
fields = [
'email',
]


objects data:
1-> {name: 'teste', description: 'testify a common taste', email: '[email protected]'}
2-> {name: 'testJonhy', description: ' asdasdkasçkdkldas', email: '[email protected]'}
3-> {name: 'Mariah', description: 'desctestsherealso', email: '[email protected]'}

s.query(MultiMatch(query='test', fields=fields, fuzziness=10)).execute()
returns record 1

s.query("match_phrase", query='test').execute()
returns nothing

q = s.filter("match_phrase", query='test').execute()
returns nothing also

How can I make query's properly?
The goal is to query and return all this documents.

Also I pretend to highlight and add Did You Mean feature.
And I've already acomplish sugestions with:
s.suggest('name', 'test', completion={'field': 'name.suggest'}).execute()

Can someone help me or point me some documentation where I can figure this out

Thanks

@Sachin-Kahandal
Copy link

For this
s.query("match_phrase", query='test').execute()

Do this
s.query("match_phrase", name='test').execute()

similarly for filter query change query with name of the field that you are looing into.
ElasticSearch expects you to give the field names and the query_text you want to search.

Match phrase query is similar to the match query but is used to query text phrases. Phrase matching is necessary when the ordering of the words is important. Only the documents that contain the words in the same order as the search input are matched.

As per my deduction from your question, you just want to match your query with "test" in name field.
So try using match like
s.query("match", name='test').execute()

For this
s.query(MultiMatch(query='test', fields=fields, fuzziness=10)).execute()

Try this,
s.query(MultiMatch(query='test', fields=['name', 'description'], fuzziness='AUTO')).execute()

Let elasticsearch take care of fuzziness

@zeitler
Copy link
Author

zeitler commented May 17, 2021

Hi @Sachin-Kahandal.
Thank you very much for your help

Still not having the desired results.

Having this records:
ID | Name | Description | Email
1 | admin | | [email protected]
2 | Constatine | sad | [email protected]
3 | Mariah | desctestsherealso | [email protected]
4 | teste | testify a common taste | [email protected]
5 | testJohny | asdasdkasçkdkldas | [email protected]

The goal is to have 3 results:
Record 3 because description haves "test" in "desctestsherealso"
Record 4 because name haves "test" in "teste"
Record 5 because name haves "test" in "testJohny"

Tests:

Testing: s.query("match", name="test").execute().hits.total
FAILED: expected: 3, obtained: 0

Testing: s.query("match", query="test").execute().hits.total
FAILED: expected: 3, obtained: 0

Testing: s.filter("match", name="test").execute().hits.total
FAILED: expected: 3, obtained: 0

Testing: s.query(MultiMatch(query="test", fields=["name", "description"])).execute().hits.total
FAILED: expected: 3, obtained: 0

Testing: s.query(MultiMatch(query="test", fields=["name", "description"], fuzziness="AUTO")).execute().hits.total
FAILED: expected: 3, obtained: 1

If I understood well, MultiMatch will return documents where test is in "name" AND in "description"

But what I pretend is documents where is in name OR in the description

The application have a landing search page. And it's intended to show all the documents that have the search keys, and after having the results I need to highlight the matching words.

The definition of the fields is correct?
...
name = fields.TextField(attr="name", fields={
'raw':` fields.TextField(
analyzer=full_searchable_analyzer,
search_analyzer=string_sort_analyzer
),
'suggest': fields.CompletionField(),
})
...

kind regards,
Thank you very much

@Sachin-Kahandal
Copy link

Sachin-Kahandal commented May 22, 2021

Hi @zeitler,
Ok now that I understand your problem,

  • What you need to solve this sort of problem is nGram/edgeNGram tokenizer.
  • These tokenizers break up text into configurable-sized tuples of letters.
  • For instance, the word "news", run through a min_gram:1, max_gram:2 nGram tokenizer would be broken up into the tokens "n", "e", "w", "s", "ne", "ew", and "ws".
  • This sort of analysis does really well when it comes to imprecise matching.

Also, with multimatch you can pass operator of your choice like
query = MultiMatch(query=text, fields=['Name', 'Description'], operator="OR")

Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html
Examples: https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch

Hope this helps

@Brechard
Copy link

@zeitler did you manage to find the solution? can you post it and close if so?

@elastic elastic locked and limited conversation to collaborators Apr 5, 2024
@miguelgrinberg miguelgrinberg converted this issue into discussion #1740 Apr 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants