Description
Elasticsearch version (bin/elasticsearch --version
):
Version: 6.3.0, Build: default/tar/424e937/2018-06-11T23:38:03.357887Z, JVM: 10.0.1
Plugins installed:
analysis-icu
ingest-geoip
ingest-user-agent
mapper-size
repository-gcs
]
JVM version (java -version
):
openjdk version "10.0.1" 2018-04-17
OpenJDK Runtime Environment (build 10.0.1+10)
OpenJDK 64-Bit Server VM (build 10.0.1+10, mixed mode)
OS version (uname -a
if on a Unix-like system): Linux es-master-0 4.4.111+ #1 SMP Sat May 5 12:48:47 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior: When searching an index, if a prefixed token contains only filtered characters (e.g. @*
), Elasticsearch 5.5 previously filtered that token out of the query entirely (the expected behavior). In 6.3.0, this token is preserved, causing the query to match nothing if the same token character filtering is applied at indexing time.
Steps to reproduce:
- Create the index:
curl -X PUT localhost:9200/punct-wildcard-test -d '{
"settings": {
"analysis": {
"analyzer": {
"icu_analyzer": {
"type": "custom",
"tokenizer": "icu_tokenizer"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"txt": {
"type": "text",
"analyzer": "icu_analyzer"
}
}
}
}
}'
- Analyze text (what would happen during indexing):
curl -X POST localhost:9200/punct-wildcard-test/_analyze -d '{
"text": ["foo @bar baz@qux"],
"tokenizer": "icu_tokenizer"
}'
Result:
{
"tokens": [
{
"token": "foo",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "bar",
"start_offset": 5,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "baz",
"start_offset": 9,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "qux",
"start_offset": 13,
"end_offset": 16,
"type": "<ALPHANUM>",
"position": 3
}
]
}
- Validate/explain a problem query:
curl -X POST "localhost:9200/punct-wildcard-test/_validate/query?explain" -d '{
"query": {
"query_string": {
"query": "foo @* @bar* baz@*",
"analyzer": "icu_analyzer",
"default_field": "txt",
"analyze_wildcard": true
}
}
}'
Elasticsearch 5.5 Response:
{
"valid": true,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"explanations": [
{
"index": "punct-wildcard-test",
"valid": true,
"explanation": "txt:foo txt:bar* txt:baz*"
}
]
}
Elasticsearch 6.3.0 Response:
{
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"valid": true,
"explanations": [
{
"index": "punct-wildcard-test",
"valid": true,
"explanation": "txt:foo txt:@* txt:bar* txt:baz*"
}
]
}