-
Notifications
You must be signed in to change notification settings - Fork 25.2k
disable dynamic mapping on fields with dots instead of rejecting documents #15714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
+1 |
Not allowing field names with dots has nothing to do with dynamic mappings. It is entirely about disambiguating field mappings in general. See the discussion on #14359 for an explanation of why it is important for ES 2.x. |
@rjernst I get that you don't want to index fields with dots in them; so don't by default. That's a different and better solution than rejecting documents entirely because some of their fields have dots. This wouldn't be an issue if elasticsearch didn't have dynamic mappings because then there would be simply no way to map field names with dots in them if the mapping didn't support it. One of the workarounds that actually works is actually turning off dynamic mapping on nested fields with dots using a mapping template. So, this is entirely about dynamic mappings and that's also where the fix is going to be. I understand the argument why you don't want to map fields with dots because it complicates the query dsl. However, I don't see any argument for rejecting documents with such fields just because you can't map the field dynamically. If it is not mapped, it's not part of the mapping and there is no ambiguity. Problem solved. Also, I don't see the complexity of having a mapping feature to replace . with _. This is different then escaping, which would indeed have implications for the query dsl because then you have to worry about unescaping and keeping dotted expressions unambiguous. |
This entirely not the point. The restriction on field names without dots has nothing to do with actual indexing (and nothing to do with dynamic mapping, as I stated before). It is about how we store and lookup mappings by field name internally. We must be able to distinguish a field called
This is trivial enough for a client to do, no need to complicate the ES api with it. (And there is a logstash plugin to do exactly that). I'm closing this issue as there are sufficient examples and plugins, as the original description mentions, for how to "de dot" field names. Again, this is not necessarily something that will always be a restriction, but it is for now, until the work can be scoped and done to eg allow escaping. |
If it is not indexed, it is by definition not part of the mapping. So there is no disambiguity whatsoever looking up the field because it is not in it. So, if you don't dynamically map fields with dots in their name, this completely solves the problem. Dynamic mapping throwing the error is the entire problem. If I make a static mapping that has a template that disables indexing on affected fields, documents actually index fine. The only time this is a problem is when dynamic mapping attempts to create a lucene index for a field with a dot in it and then throws an error instead of giving up its attempt to map that particular field. So I disagree with your reason for closing this issue. Please explain how an unmapped field with dots in it causes ambiguity issues in any way. |
I never said this. But again, this does not have to do with dynamic mappings. It has to do with mapping a field name with dots, regardless of whether it is dynamic or not (ie doing a PUT mapping). You are free to do exactly what you said to avoid the issue (don't index the field names that contain dots). |
This issue had two parts:
So, I can live with doing just part 1 of this issue. This is for me more important than actually getting the fields mapped. That would be extremely helpful for people like me who are planning 1.x to 2.x migration. I don't get why part 1 is hard/impossible/controversial, all the arguments you provide apply to part 2 of the issue, not part 1. So, should I file a new ticket for this or can we just reopen this ticket or am I really missing something here (this discussion seems very circular so far)? |
While I was focusing on what you described as part 2, I still think my original argument applies equally to part 1. It is easy enough for a user to remove fields themselves when serializing, rather than complicating the ES api with additional settings (which may go away in the future when we can support dots). |
Sorry, but none of what you say applies to part 1. The only argument I'm hearing now is "we don't want to add complexity". To be clear we are talking about one extra parameter to enable/disable dot tolerance. And, I'd argue that parameter could default to true for the vast majority of users, meaning that it might not actually be needed at all. Why would you actually want Elasticsearch to throw bad requests here?. Just document the behavior that dotted fields won't be dynamically mapped and can't be statically mapped by default until whenever it is that elasticsearch implements escaping. That's actually less complexity because it simplifies using elasticsearch and removes elaborate hacks you need left right and center to de-dot json content. We've argued this at length and I can see we are not going to agree. So, I am not going to press this issue any further. But for the sake of other users experiencing the same pain, please escalate this internally inside Elasticsearch to get some broader discussion & consensus on whether it is really that necessary to inconvenience your users this much. API purity is a nice goal but you also have a rather large installed user base that you are leaving out in the cold with this one (again this is a major PITA). IMHO using elasticsearch got more complex, not less complex because of this new behavior. I think a rather large portion of the unmigrated 1.x user base (i.e. at this point most of your user base) will run into this as well and fundamentally I don't see a good technical reason for this problem to exist. I probably won't be the last to bring this up. |
Just want to add a metoo. I have a very large user population and over 1000+ physical nodes spread around the globe that are running 1.x. No one wants to upgrade because of these issues. It's very painful. I've been using tribe nodes as a way to at least federate across the clusters. Unfotunately for me, that is no longer possible with ES 2.x, not backward compatible protocol wise. I've already tried it. That mean I can not do incremental upgrade of cluster by cluster, but have upgrade all of my downstream clusters at once (most of which I do not control :-( ). Alternative is to setup a separate ELK 2.x infrastructure and migrate clusters over to it. That's a lot of time, money and people resources. |
Please see the discussion in #15951. As I stated before, we want to solve this problem, and we are trying to come up with a solution that works for at least most cases. |
Good to see #15951 is moving forward. I think the third option of simply not indexing field names with dots in them as I suggest above could be added there as well. But, I'd be happy with either of the two approaches discussed there as well. |
As of ES 2.0; field names with dots are no longer allowed. This is breaking things for a lot of people. I've seen plugins, scripts, etc. being mentioned on twitter and Github that are all intended to hack around this and make this less painful. Just the amount of stuff like this floating around is indicative of how much of an issue this is. It's a showstopper for us and is the main reason we are still on 1.7.x. We are looking at breaking logging infrastructure, major code changes, and complex data migrations; all of which need lots of testing.
Disabling dynamic mapping for dotted fields is probably what most people would prefer rather than the entire document being rejected with some error about dots. In general, I don't think Elasticsearch should reject any valid json for storing. Whether it adds information from those documents to an index is a different matter. This requires a mapping either explicit or dynamic and I think people can live with adapting their mapping if it has options for dealing with dots. For dynamic mapping you could make the case that it should fail more gracefully than to outright reject the document with a bad request.
The options I had in mind are the following:
Whether this should be on or off is up for debate, though I'd probably blindly turn it on just about anywhere in my own setups. Unless I'm missing something, none of this should be particularly hard to support. It merely internalizes the hacks people are currently forced to implement outside of elasticsearch.
ps. I left a comment on #15404 with a similar suggestion on how to fix things. I realize the ticket is closed but it seems at least the workaround I suggest might be a viable improvement at least. So, filing it as a separate issue in the hope that we might make some progress.
The text was updated successfully, but these errors were encountered: