Skip to content

disable dynamic mapping on fields with dots instead of rejecting documents #15714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jillesvangurp opened this issue Dec 30, 2015 · 12 comments
Closed

Comments

@jillesvangurp
Copy link
Contributor

As of ES 2.0; field names with dots are no longer allowed. This is breaking things for a lot of people. I've seen plugins, scripts, etc. being mentioned on twitter and Github that are all intended to hack around this and make this less painful. Just the amount of stuff like this floating around is indicative of how much of an issue this is. It's a showstopper for us and is the main reason we are still on 1.7.x. We are looking at breaking logging infrastructure, major code changes, and complex data migrations; all of which need lots of testing.

Disabling dynamic mapping for dotted fields is probably what most people would prefer rather than the entire document being rejected with some error about dots. In general, I don't think Elasticsearch should reject any valid json for storing. Whether it adds information from those documents to an index is a different matter. This requires a mapping either explicit or dynamic and I think people can live with adapting their mapping if it has options for dealing with dots. For dynamic mapping you could make the case that it should fail more gracefully than to outright reject the document with a bad request.

The options I had in mind are the following:

# controls whether documents with dots in field names are rejected. Setting this to true 
allows them but disables dynamic mappings on these fields
settings.mapper.reject_dotted_fields: false

# automatically maps fields with dots to a field with the dots replaced the value of auto_convert_dots_character
settings.index.mapper.auto_convert_dots:true

# controls how dots are converted, _ is probably a good default
settings.index.mapper.auto_convert_dots_character:_

Whether this should be on or off is up for debate, though I'd probably blindly turn it on just about anywhere in my own setups. Unless I'm missing something, none of this should be particularly hard to support. It merely internalizes the hacks people are currently forced to implement outside of elasticsearch.

ps. I left a comment on #15404 with a similar suggestion on how to fix things. I realize the ticket is closed but it seems at least the workaround I suggest might be a viable improvement at least. So, filing it as a separate issue in the hope that we might make some progress.

@TinLe
Copy link

TinLe commented Dec 30, 2015

+1

@rjernst
Copy link
Member

rjernst commented Dec 31, 2015

Not allowing field names with dots has nothing to do with dynamic mappings. It is entirely about disambiguating field mappings in general. See the discussion on #14359 for an explanation of why it is important for ES 2.x.

@jillesvangurp
Copy link
Contributor Author

@rjernst I get that you don't want to index fields with dots in them; so don't by default. That's a different and better solution than rejecting documents entirely because some of their fields have dots.

This wouldn't be an issue if elasticsearch didn't have dynamic mappings because then there would be simply no way to map field names with dots in them if the mapping didn't support it. One of the workarounds that actually works is actually turning off dynamic mapping on nested fields with dots using a mapping template. So, this is entirely about dynamic mappings and that's also where the fix is going to be.

I understand the argument why you don't want to map fields with dots because it complicates the query dsl. However, I don't see any argument for rejecting documents with such fields just because you can't map the field dynamically. If it is not mapped, it's not part of the mapping and there is no ambiguity. Problem solved.

Also, I don't see the complexity of having a mapping feature to replace . with _. This is different then escaping, which would indeed have implications for the query dsl because then you have to worry about unescaping and keeping dotted expressions unambiguous.

@rjernst
Copy link
Member

rjernst commented Jan 5, 2016

I get that you don't want to index fields with dots in them

This entirely not the point. The restriction on field names without dots has nothing to do with actual indexing (and nothing to do with dynamic mapping, as I stated before). It is about how we store and lookup mappings by field name internally. We must be able to distinguish a field called foo.bar, and an object field foo with a field bar under it. This is not necessarily a restriction that will always be around, but we opted to go with simplicity to start (which allowed us to lock down mappings to remove all ambiguity).

Also, I don't see the complexity of having a mapping feature to replace . with _.

This is trivial enough for a client to do, no need to complicate the ES api with it. (And there is a logstash plugin to do exactly that).

I'm closing this issue as there are sufficient examples and plugins, as the original description mentions, for how to "de dot" field names. Again, this is not necessarily something that will always be a restriction, but it is for now, until the work can be scoped and done to eg allow escaping.

@rjernst rjernst closed this as completed Jan 5, 2016
@jillesvangurp
Copy link
Contributor Author

If it is not indexed, it is by definition not part of the mapping. So there is no disambiguity whatsoever looking up the field because it is not in it. So, if you don't dynamically map fields with dots in their name, this completely solves the problem.

Dynamic mapping throwing the error is the entire problem. If I make a static mapping that has a template that disables indexing on affected fields, documents actually index fine. The only time this is a problem is when dynamic mapping attempts to create a lucene index for a field with a dot in it and then throws an error instead of giving up its attempt to map that particular field.

So I disagree with your reason for closing this issue. Please explain how an unmapped field with dots in it causes ambiguity issues in any way.

@rjernst
Copy link
Member

rjernst commented Jan 6, 2016

Please explain how an unmapped field with dots in it causes ambiguity issues in any way

I never said this. But again, this does not have to do with dynamic mappings. It has to do with mapping a field name with dots, regardless of whether it is dynamic or not (ie doing a PUT mapping). You are free to do exactly what you said to avoid the issue (don't index the field names that contain dots).

@jillesvangurp
Copy link
Contributor Author

This issue had two parts:

  • the title part: don't map dotted fields dynamically so I can safely index json documents that have some fields with dots in them without having to worry about elasticsearch throwing bad requests. They won't be searchable and will only be available through _source. There's no ambiguity for query dsl or anything else that refers mapped fields because they simply never get created to begin with. This is IMHO much better than rejecting the json content with a bad request. Dynamic mapping is never going to be perfect, it's merely a best effort.
  • having a mechanism to escape or convert fields with dots. I get that this is a bit more messy and probably has been debated at length. I'm not arguing otherwise.

So, I can live with doing just part 1 of this issue. This is for me more important than actually getting the fields mapped. That would be extremely helpful for people like me who are planning 1.x to 2.x migration.

I don't get why part 1 is hard/impossible/controversial, all the arguments you provide apply to part 2 of the issue, not part 1. So, should I file a new ticket for this or can we just reopen this ticket or am I really missing something here (this discussion seems very circular so far)?

@rjernst
Copy link
Member

rjernst commented Jan 7, 2016

While I was focusing on what you described as part 2, I still think my original argument applies equally to part 1. It is easy enough for a user to remove fields themselves when serializing, rather than complicating the ES api with additional settings (which may go away in the future when we can support dots).

@jillesvangurp
Copy link
Contributor Author

Sorry, but none of what you say applies to part 1. The only argument I'm hearing now is "we don't want to add complexity". To be clear we are talking about one extra parameter to enable/disable dot tolerance. And, I'd argue that parameter could default to true for the vast majority of users, meaning that it might not actually be needed at all. Why would you actually want Elasticsearch to throw bad requests here?. Just document the behavior that dotted fields won't be dynamically mapped and can't be statically mapped by default until whenever it is that elasticsearch implements escaping. That's actually less complexity because it simplifies using elasticsearch and removes elaborate hacks you need left right and center to de-dot json content.

We've argued this at length and I can see we are not going to agree. So, I am not going to press this issue any further. But for the sake of other users experiencing the same pain, please escalate this internally inside Elasticsearch to get some broader discussion & consensus on whether it is really that necessary to inconvenience your users this much.

API purity is a nice goal but you also have a rather large installed user base that you are leaving out in the cold with this one (again this is a major PITA). IMHO using elasticsearch got more complex, not less complex because of this new behavior. I think a rather large portion of the unmigrated 1.x user base (i.e. at this point most of your user base) will run into this as well and fundamentally I don't see a good technical reason for this problem to exist. I probably won't be the last to bring this up.

@TinLe
Copy link

TinLe commented Jan 7, 2016

Just want to add a metoo. I have a very large user population and over 1000+ physical nodes spread around the globe that are running 1.x. No one wants to upgrade because of these issues. It's very painful.

I've been using tribe nodes as a way to at least federate across the clusters. Unfotunately for me, that is no longer possible with ES 2.x, not backward compatible protocol wise. I've already tried it. That mean I can not do incremental upgrade of cluster by cluster, but have upgrade all of my downstream clusters at once (most of which I do not control :-( ). Alternative is to setup a separate ELK 2.x infrastructure and migrate clusters over to it. That's a lot of time, money and people resources.

@rjernst
Copy link
Member

rjernst commented Jan 18, 2016

Please see the discussion in #15951. As I stated before, we want to solve this problem, and we are trying to come up with a solution that works for at least most cases.

@jillesvangurp
Copy link
Contributor Author

Good to see #15951 is moving forward. I think the third option of simply not indexing field names with dots in them as I suggest above could be added there as well. But, I'd be happy with either of the two approaches discussed there as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants