Add an option to create "other" bucket for Terms aggregation #6804

kostiklv · 2014-07-09T18:19:29Z

When using "terms" aggregation, it's often useful to get top X terms (achieved by using size parameter), but as well get a separate bucket for all other terms together (possibly constrained by minimum doc count).

The query syntax might be:

{
    "aggs" : {
        "tags" : {
            "terms" : { 
                "field" : "tag",
                "size": 3,
                "min_doc_count": 10,
                "other": "_other_terms"
             }
        }
    }
}

And the response might look like:

{
    ...
    "aggregations" : {
        "tags" : {
            "buckets" : [
                {
                    "key" : "soccer",
                    "doc_count" : 500
                },
                {
                    "key" : "hockey",
                    "doc_count" : 400
                },
                {
                    "key" : "basketball",
                    "doc_count" : 300
                },
                {
                    "key" : "_other_terms",
                    "doc_count" : 150
                },
            ]
        }
    }
}

The _other_terms bucket will be based on all tags with doc_count > 10 per tag, excluding already listed (top 3).

Related to #5324

The text was updated successfully, but these errors were encountered:

jpountz · 2014-07-18T11:15:45Z

One question that is related to that change is whether other should only track doc counts (cheap, and could be done by default) or also sub aggregations (potentially costly, so would require an option).

clintongormley · 2014-07-18T11:28:18Z

I'd say just the doc counts, at least by default.

kostiklv · 2014-07-22T18:26:59Z

The suggested syntax is already an option, so the developer using this option should understand the cost. Based on that, I think the default should include all sub aggregations.
Consider the following query:

"aggs": {
    "top_selling": {
       "terms": {
          "field": "make",
          "size": 5,
          "other": "_other_terms"
       },
       "aggs": {
          "avg_price": {
             "avg": { "field": "price" }
          }
       }
    }
 }

What's the point of using _other_terms if we don't get the average price for them? I also doubt if the option to disable sub aggregations is needed at all. What's the use case when you want sub aggregations on specific terms, but don't want it for other?

Anyway, the syntax can be future-proof, so instead of "other": "_other_terms" it can be:

...
"other": {
   "bucket_key": "_other_terms",
   "some_future_option": "option_value"
}
...

It may also check if the value of other option is a string, and use it as other.bucket_key by default as syntactic sugar.

jpountz · 2014-07-24T22:00:57Z

I have thought more about this issue and computing the document count for other buckets is not possible in the general case without doing another pass over the data (think about multi-valued fields).

The only thing that it can do would be to return the number of other values (as opposed to documents). But we already have the value_count aggregation for that.

If a bucket or count for other docs is really needed, the right way to build it would be to run a first query with the terms aggregation, and a second query that would have a filter aggregation that would exclude the returned terms.

ebuildy · 2021-07-09T07:17:02Z

This will be really useful with sub-aggregations:

"aggs" : {
    "country" : {
      "terms" : {
        "field" : "geoip.country_name.keyword"
      },
      "aggs" : {
        "response_time_avg" : {
          "avg" : {
            "field" : "message.upstream.response_time"
          }
        },
        "response_time_p95" : {
          "percentiles" : {
            "field" : "message.upstream.response_time",
            "percents": [ 95 ]
          }
        },
        "http_status" : {
          "terms" : {
            "field" : "message.request.status",
            "size" : 5
          }
        }
      }
    }
  }

martijnvg added the discuss label Jul 10, 2014

martijnvg assigned jpountz Jul 10, 2014

jpountz closed this as completed Jul 24, 2014

Kallin mentioned this issue Aug 18, 2014

Optionally include 'Set Difference' for filter aggregation. #7261

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an option to create "other" bucket for Terms aggregation #6804

Add an option to create "other" bucket for Terms aggregation #6804

kostiklv commented Jul 9, 2014

jpountz commented Jul 18, 2014

clintongormley commented Jul 18, 2014

kostiklv commented Jul 22, 2014

jpountz commented Jul 24, 2014

ebuildy commented Jul 9, 2021

Add an option to create "other" bucket for Terms aggregation #6804

Add an option to create "other" bucket for Terms aggregation #6804

Comments

kostiklv commented Jul 9, 2014

jpountz commented Jul 18, 2014

clintongormley commented Jul 18, 2014

kostiklv commented Jul 22, 2014

jpountz commented Jul 24, 2014

ebuildy commented Jul 9, 2021