Skip to content

Custom analyzer filter scope #20479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mbarker opened this issue Sep 14, 2016 · 30 comments
Closed

Custom analyzer filter scope #20479

mbarker opened this issue Sep 14, 2016 · 30 comments
Labels
>bug :Data Management/Indices APIs APIs to create and manage indices and templates

Comments

@mbarker
Copy link

mbarker commented Sep 14, 2016

Elasticsearch version: 2.4.0

Plugins installed: [kopf]

JVM version: 1.8.0_60

OS version: OSX 10.11.6 locally, reproducible on CentOS 7.2.1511 as well

Description of the problem including expected versus actual behavior:

After upgrading to 2.4.0 our templates are no longer functioning from 2.3.3.

We have our template split up into several templates so we can specify default analyzers per language, and we share analyzers/filters/tokenizers between the templates.

Steps to reproduce:

  1. Push a shared template with common analyzers/filters
POST /_template/analyzers
{
  "template": "*",
  "order": 20,
  "settings": {
    "analysis": {
      "analyzer": {
        "untouched_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [ "max_length", "lowercase" ]
        },
        "no_stopwords_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [ "max_length", "standard", "lowercase" ]
        }
      },
      "tokenizer": {
        "standard": {
          "type": "standard",
          "version": "4.4"
        }
      },
      "filter": {
        "max_length": {
          "type": "length",
          "max": "32766"
        }
      }
    }
  }
}
  1. Push a language specific analyzer template, which is a czech analyzer referencing a filter from the analyzers template.
POST /_template/analyzer-cs
{
  "template": "*-cs",
  "order": 30,
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "type": "czech",
          "filter": [ "max_length" ]
        }
      }
    }
  }
}
  1. Push another language specific analyzer template which uses a custom analyzer, referencing some filters from the shared analyzers template.
POST /_template/analyzer-en
{
  "template": "*-en",
  "order": 30,
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [ "max_length", "standard", "lowercase", "stop_en" ]
        }
      },
      "filter": {
        "stop_en": {
          "type": "stop",
          "stopwords": "_english_",
          "ignore_case": "true"
        }
      }
    }
  }
}

This action fails, resulting in:

2016-09-13 09:52:00,174][DEBUG][action.admin.indices.template.put] [Zartra] failed to put template [analyzers-en]
[kOccXLKbR5CwAsXCt0v_3w] IndexCreationException[failed to create index]; nested: IllegalArgumentException[Custom Analyzer [default] failed to find filter under name [max_length]];

However, as you see the first analyzer-cs was able to reference the max_length filter without issue.

These templates worked fine in 2.3.3 and 2.3.4, haven't tried 2.3.5 but it sounds like there weren't any changes to that release. Reading over release notes for 2.4.0 don't seem to indicate anything that would prevent this functionality from being supported.

We were seeing similar issues with referencing the shared analyzers in our mapping template file.

Provide logs (if relevant):

    2016-09-13 09:52:00,174][DEBUG][action.admin.indices.template.put] [Zartra] failed to put template [analyzers-en]
    [kOccXLKbR5CwAsXCt0v_3w] IndexCreationException[failed to create index]; nested: IllegalArgumentException[Custom Analyzer [default] failed to find filter under name [max_length]];
        at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:360)
        at org.elasticsearch.cluster.metadata.MetaDataIndexTemplateService.validateAndAddTemplate(MetaDataIndexTemplateService.java:196)
        at org.elasticsearch.cluster.metadata.MetaDataIndexTemplateService.access$200(MetaDataIndexTemplateService.java:57)
        at org.elasticsearch.cluster.metadata.MetaDataIndexTemplateService$2.execute(MetaDataIndexTemplateService.java:157)
        at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45)
        at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:468)
        at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
    Caused by: java.lang.IllegalArgumentException: Custom Analyzer [default] failed to find filter under name [max_length]
        at org.elasticsearch.index.analysis.CustomAnalyzerProvider.build(CustomAnalyzerProvider.java:76)
        at org.elasticsearch.index.analysis.AnalysisService.<init>(AnalysisService.java:216)
        at org.elasticsearch.index.analysis.AnalysisService.<init>(AnalysisService.java:70)
        at sun.reflect.GeneratedConstructorAccessor6.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:50)
        at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
        at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
        at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
        at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:886)
        at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
        at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
        at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
        at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
        at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
        at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
        at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
        at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
        at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:886)
        at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
        at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
        at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
        at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
        at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
        at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
        at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
        at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
        at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:886)
        at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
        at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
        at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
        at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:201)
        at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:193)
        at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:879)
        at org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(InjectorBuilder.java:193)
        at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:175)
        at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
        at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:154)
        at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:55)
        at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:358)
        ... 11 more
@clintongormley
Copy link
Contributor

@johtani Could you take a look at this please?

@johtani
Copy link
Contributor

johtani commented Sep 16, 2016

Hi @mbarker

Before 2.4, we didn't validate any template on index template creation.
We added validation on index template creation in this PR.
In this PR, we validate each template as a complete mapping and settings,
we don't check combination because we cannot check all combination when template create.
Now, you should add the max_length filter setting each template for avoiding error.

@clintongormley Should we support to validate any template with only * template?
I think we can not predict templates combination except * template.

@mbarker
Copy link
Author

mbarker commented Sep 16, 2016

@johtani It seems to me like validating a template should be scoped to the context it should have available during indexing, unless I'm misunderstanding how ES supports multiple templates.

We certainly could replicate all of the analyzers/filters to all the templates where they are used, I was trying to avoid duplicating parts of the template.

If it would help I can give more details about how we are trying to compose the templates and the rationale behind it.

@johtani
Copy link
Contributor

johtani commented Sep 16, 2016

Thanks. I will think what the validation is more useful for users.

Validating template on index template creation is helpful for many users because users know the error earlier than validating creating index, especially logging use-case.
However in your situation, this validation is not useful because template is bigger than before 2.4.

e.g. If there is skip validation flag and simulate template with index name without creating index, is it useful?

@clintongormley
Copy link
Contributor

There is something else going on here. I (like @mbarker ) didn't understand why this template succeeded:

POST /_template/analyzer-cs
{
  "template": "*-cs",
  "order": 30,
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "type": "czech",
          "filter": [ "max_length" ]
        }
      }
    }
  }
}

while this template didn't:

POST /_template/analyzer-en
{
  "template": "*-en",
  "order": 30,
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "max_length"
          ]
        }
      }
    }
  }
}

The reason is that the first analyzer definition is invalid, but is not validated:

      "analyzer": {
        "default": {
          "type": "czech",
          "filter": [ "max_length" ]
        }
      }

You're using the czech analyzer and passing in a parameter filter, which the czech analyzer doesn't recognise but silently ignores. We should add validation to analyzers to complain about any left over parameters.

@clintongormley
Copy link
Contributor

If there is skip validation flag and simulate template with index name without creating index, is it useful?

I like the idea of a skip evaluation flag. Not sure the simulate bit is needed - easy to test by just creating an index.

@johtani
Copy link
Contributor

johtani commented Sep 21, 2016

Or elasticsearch can create template always and if a template has some errors elasticsearch only return warning message. what do you think? @clintongormley

@jpountz
Copy link
Contributor

jpountz commented Sep 21, 2016

I'm wondering how hard it would be to figure the parent templates of a given template (because the wildcard is more general) and apply them as well at validation time.

@mbarker
Copy link
Author

mbarker commented Sep 21, 2016

You could always make up name for the index to validate based on the template matching, ie replace * with test and resolve dependencies that way.

The only caveat would be users have to upload templates in dependency order.

@johtani
Copy link
Contributor

johtani commented Sep 21, 2016

@jpountz @mbarker Thanks for your comment. Ah, you are right. I try to fix it.

@qwerty4030
Copy link
Contributor

If templates are validated using their dependencies when creating or updating, then what will happen when a template's dependency is deleted or updated? For example template A depends on B:

  1. template B created
  2. template A created
  3. template B deleted
  4. ?

Heres another thought: what if a template is invalid by itself, but the index is created with explicit settings that the template depends on?

Another possible issue: What if a template overrides something in a another template that results in a final mapping that is invalid?

@johtani
Copy link
Contributor

johtani commented Sep 22, 2016

@qwerty4030 Thanks for your comment!

In your 1st case, I think it is OK that elasticsearch does not validate at template deletion time.
We get the error only when user creates index using template B.

In your 3rd case, we can validate overriding only the parent templates.

I' not sure your 2nd case. Could you explain any examples?

@qwerty4030
Copy link
Contributor

Just tried this in 2.3.2:

  1. create template that is invalid by itself (references custom_analyzer that is not defined).
  2. attempt to create index that matches this template (fails with mapper_parsing_exception as expected).
  3. attempt to create index that matches this template and specifies the custom_analyzer in the settings (this works as expected).

To reproduce:

PUT /_template/test-template
{
  "template": "data-test-*",
  "order": 0,
  "mappings": {
    "test-type": {
      "properties": {
        "field": {
          "type": "string",
          "analyzer": "custom_analyzer"
        }
      }
    }
  }
}

PUT data-test-1

PUT data-test-1
{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  }
}

@johtani
Copy link
Contributor

johtani commented Oct 2, 2016

Thanks. I think Elasticsearch should return the error in your cases, because that is tricky settings.

johtani added a commit to johtani/elasticsearch that referenced this issue Oct 14, 2016
Apply parent templates at creation time
Add some testcases

Closes elastic#20479
@acarstoiu
Copy link

acarstoiu commented Oct 14, 2016

As I explained in #20919, this is a design error: templates should be validated only as much as any incomplete object can be validated. Elasticsearch shoud reject a template only if it contains plain inconsistencies or grammar violations.

On top of that, this is a breaking change which you failed to document!

I have a patch that fixes the too hasty conclusion that a referred analyzer should have been declared in the same template. I consider it elegant enough for production, but I didn't bother with updating the tests.
If anyone is interested, I'm willing to publish it.

@acarstoiu
Copy link

The code is in this branch. GitHub doesn't let me to submit a pull request against a tag, neither from a particular commit. 👎

@clintongormley
Copy link
Contributor

@acarstoiu you don't seem to get how the open source world works. Frankly, your attitude makes me disinclined to converse with you further.

@acarstoiu
Copy link

@clintongormley you do that, I agree.
As for the open source world, I already did two things: served this project a patch and urged repeatedly its team to publish this breaking change here for the benefit of other users.

And by the way, you can find in there lots of deprecations under the "breaking change" label, but when a true one is discovered, you keep it hidden.

@mboudreau
Copy link

@clintongormley I for one agree with @acarstoiu and frankly, your dismissal of this extremely inconvenient non-documented breaking change that has completely affected our cluster and must now be downgraded across the board is very concerning in the least. You are literally costing us time and money.

you don't seem to get how the open source world works.

You don't seem to understand how businesses work and seem to have very little regard as to the users of your product when you inexplicably hide information and then dismiss users that hit the issue because you don't like their attitude when they're justifiably irritated?

Isn't one of the tenets of open source software transparency? If so, you've missed the mark on this one.

@marshall007
Copy link

@clintongormley as mentioned in #21105 (comment) I think this should be noted in the 2.4 migration docs as a breaking change. Took us a while to track down this issue while attempting to migrate our 2.3.x cluster.

@clintongormley
Copy link
Contributor

@marshall007 Yes I agree - I've just been rather put off working on this issue by the comments of others. If you'd like to submit a PR adding the breaking changes docs, I'd be happy to merge it.

@marshall007
Copy link

@clintongormley sure thing! I have it up in PR #21869.

@clintongormley
Copy link
Contributor

@clintongormley sure thing! I have it up in PR #21869.

thanks @marshall007 - merged

@qwerty4030
Copy link
Contributor

@clintongormley @marshall007 is that PR supposed to update this page?
I'm getting a 404 for that URL. However 2.3 breaking changes URL works fine: https://www.elastic.co/guide/en/elasticsearch/reference/2.4/breaking-changes-2.3.html

Thanks!

@clintongormley
Copy link
Contributor

Thanks @qwerty4030 - that page wasn't being included in the docs. Should be fixed now: https://www.elastic.co/guide/en/elasticsearch/reference/2.4/breaking-changes-2.4.html

@marshall007
Copy link

@clintongormley that page still appears to be out of date, FYI. Looks like it hasn't been updated since 537f4e1.

@clintongormley
Copy link
Contributor

@clintongormley
Copy link
Contributor

Closing in favour of #21105

johtani added a commit to johtani/elasticsearch that referenced this issue Mar 12, 2017
johtani added a commit to johtani/elasticsearch that referenced this issue Mar 12, 2017
johtani added a commit to johtani/elasticsearch that referenced this issue Mar 13, 2017
@clintongormley clintongormley added :Data Management/Indices APIs APIs to create and manage indices and templates and removed :Index Templates labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Indices APIs APIs to create and manage indices and templates
Projects
None yet
Development

No branches or pull requests

8 participants