-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Match query does not match tokens from path_hierarchy tokenizer #67225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sorry this didn't get resolved in the forums, however I believe it is a general usage question that belongs there. I quickly tried this on 7.10 with your examples and a document indexing
since that was the field name the "path_tree_rev" analyzer was used in. That retrieved the document for me. I will close this issue but please let me know if you want to discuss this further in the forums. |
I'm sorry, I bungled my reduced test case, since in my real application there are many other fields. Here is the actual creation of the index, you can see that it is indeed
Here's inserting an entry:
You can see it in Kibana: The "H.20180812.2343008.757.tif" : {
"doc_freq" : 1,
"ttf" : 1,
"term_freq" : 1,
"tokens" : [
{
"position" : 0,
"start_offset" : 66,
"end_offset" : 92
}
]
}, Yet a search on this field for this term fails:
Kibana also shows no results for the KQL query I hope this does a better job demonstrating that this could be a bug, rather than user error, though this is my first time working with tokenizers. |
"source" is a nested object. You cannot directly query fields inside nested object, you need to use a "nested_query" around it. Again, the forum would be the right place to discuss this. |
This was posted to the discussion forum for one month without finding any resolution, so I am assuming it is a bug and posting here.
Elasticsearch version: 7.5.1
Plugins installed: []
JVM version (
java -version
): (whatever is in the Docker container)OS version (
uname -a
if on a Unix-like system): host is 4.15.0-96-generic #97-Ubuntu SMP Wed Apr 1 03:25:46 UTC 2020 x86_64 x86_64 x86_64 GNU/LinuxDescription of the problem including expected versus actual behavior:
I have a field,
source.file
containing a file path which is tokenized using apath_hierarchy
tokenizer that operates in "reverse" mode so that I can query by the file's base name (the last past component).For instance if I have a document with
source.file
set to/foo/bar/some_boring_filename
I should be able to match the field against simplysome_boring_filename
.If I ask Elasticsearch what the terms for this document are,
Request to get _termvectors
I find indeed that one of the tokens is
some_boring_filename
:Yet when I query it,
there are no hits.
Steps to reproduce:
Create the index according to:
I use the bulk API to insert lots of documents. I give ES plenty of time to index them, in fact this problem manifests with documents that were inserted months ago.
Run the query above and try to match a file based on its base name.
The text was updated successfully, but these errors were encountered: