-
Notifications
You must be signed in to change notification settings - Fork 25.2k
null buckets missing from terms aggregation #6273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You're looking for the Note that there is an open issue (#5324) to add support for the |
I'm not looking for the What you're suggesting (I think) is to do two aggregations in the same search (my fallback solution), one The reason why this is so important is when applying a filter to an aggregation, it has to be applied twice, then all the sub aggregations have to be be done twice etc etc which ends up in a confusing mess. Then the results have to be juggled to get the dataset that contains the complete picture. ElasticSearch is an excellent tool and aggregations are very powerful however in this case jumping through hoops shouldn't be necessary. |
and I would love to see not just missing bucket but also _other bucket if requested. It is very important for many use cases to retain entire data set as stats get rolled up. With facets I did it buy using stats facet and subtracting sum of all buckets including _missing. It was not bad since facets do not allow sub-aggs and I had a nice API for it which hid all the complexities. but with sub-aggs it will be very ugly |
Actually, If you want to be able to distinguish a missing value from an explicit But yes, for convenience's sake and efficiency, the missing count should probably be included in the agg itself, as requested in #5324, so I'll close this issue in favour of that one. |
null_value mapping won't help much as null can be introduced at object level (say you agg on person's country but his/her entire address is missing. I do not believe ES can handle that. I tried to use null_value but it only works when the actual scalar value is null not one of its owning objects Please not to consider it to be a convenience. lack of _missing and _other buckets is a major limitation when implementing dynamic system with user ad-hoc defined analytic, pivot tables etc |
@clintongormley "There is no way of storing the term null in Lucene." Aha, now many things make sense to me, thanks for the explanation. Given field values of |
@j0hnsmith The only way to make Then it'll work just like all other terms and have its own bucket, and the |
but as I said not if field is missing because its entire containing object is missing! |
@roytmana yes, but then you couldn't aggregate on the top level object anyway, as the "container" doesn't have an associated inverted index. But it's a reasonably easy workaround to have a "null" object be represented by an object with a concrete field which contains a |
@clintongormley for a fairly deep graph (say three levels contract/party/address to compensate for party being null and still be able to aggregate on any party field including address fields (say party country) will require producing super ugly json for _source replacing blank party with party object with null substitution and every object type within party with its own null object with its own null values etc. And then people using _source will have to deal with it all in their application. I do it now in few cases but I would not want to do it all the time |
Closing: with #11042 you can now configure arbitrary keys for documents that are missing a value. |
The terms aggregation creates a bucket for each value but doesn't include a
null
bucket. When there is a doc with a key set tonull
the cardinality is currently incorrect.Users may or may not want to include a
null
bucket so it should be configurable (default should be false in line with current behaviour), I proposeAnd the response
The text was updated successfully, but these errors were encountered: