Skip to content

Added Filters aggregation #6974

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 1, 2014
Merged

Added Filters aggregation #6974

merged 1 commit into from
Aug 1, 2014

Conversation

colings86
Copy link
Contributor

A multi-bucket aggregation where multiple filters can be defined (each filter defines a bucket). The buckets will collect all the documents that match their associated filter.

This aggregation can be very useful when one wants to compare analytics between different criterias. It can also be accomplished using multiple definitions of the single filter aggregation, but here, the user will only need to define the sub-aggregations only once.

(this is a continuation of Pull Request #6119)

Closes #6118,#6119

@kimchy
Copy link
Member

kimchy commented Jul 23, 2014

LGTM, @rashidkpc mind looking at it for the proposed API format?

@rashidkpc
Copy link

API looks good so far. I'm going to build this, try to integrate it into Kibana 4 and see how it goes.

@rashidkpc rashidkpc self-assigned this Jul 23, 2014
@rashidkpc
Copy link

Ok, I prototyped this out and got it working in Kibana 4. The only input I have into the API format is that it is, slightly, inconsistent with how we do other aggregations that require the buckets to be explicitly defined. The currently proposed filters agg format looks like this:

{
    "aggs" : {
        "messages" : {
            "filters" : {
                "errors" : { "query" : { "match" : { "body" : "error" } } },
                "warnings" : { "query" : { "match" : { "body" : "warning" } } }
            },
            "aggs" : { "monthly" : { "histogram" : { "field" : "timestamp", "interval" : "1M" } } }
        }
    }
}

Its a bit awkward, since the keys of the of filters agg are not configuration parameters, but rather names. We don't do that anywhere else. The only other place were we have explicit definitions of buckets like this in in the range agg, where we use an array to define the ranges:

{
    "aggs" : {
        "price_ranges" : {
            "range" : {
                "field" : "price",
                "ranges" : [
                    { "to" : 50 },
                    { "from" : 50, "to" : 100 },
                    { "from" : 100 }
                ]
            }
        }
    }
}

I wonder if we should do the same thing, and return the buckets as an array, like the range agg does. Granted if we did this with the filters agg we'd end up with filters.filters, which is also weird:

{
    "aggs" : {
        "messages" : {
            "filters" : {
                "filters" : [
                    { "query" : { "match" : { "body" : "error" } } },
                    { "query" : { "match" : { "body" : "warning" } } }
                ]
            }
            "aggs" : { "monthly" : { "histogram" : { "field" : "timestamp", "interval" : "1M" } } }
        }
    }
}

Of course, you bring up a good point, people might want the response to be keyed. In which case, we could again use the convention defined by the range agg, using the keyed property (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-range-aggregation.html#_keyed_response)

If we wanted to do the same thing in the filters agg we could use the _name convention from named filters/queries (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-named-queries-and-filters.html). Eg:

{
    "aggs" : {
        "messages" : {
            "filters" : {
                "filters" : [
                    { "query" : { "match" : { "body" : "error", "_name": "errors" } } },
                    { "query" : { "match" : { "body" : "warning", "_name": "warnings" } } }
                ]
            }
            "aggs" : { "monthly" : { "histogram" : { "field" : "timestamp", "interval" : "1M" } } }
        }
    }
}

I don't love the filters.filters thing, but it is consistent.

@clintongormley
Copy link
Contributor

@rashidkpc I'm not sure we need the filters.filters. filters can either have an array as value, in which case it is ordered, or an object, in which case it is keyed. See #6119 (comment)

This assumes that Jackson and client languages can differentiate. @polyfractal - I know that PHP does some funky things with converting hashes to arrays, but I think that's only for empty hashes?

@clintongormley
Copy link
Contributor

Hmm, I may have to eat my words about not requiring the filters.filters setting. See #7020

@colings86
Copy link
Contributor Author

Updated so that you request is filters.filters again and can support providing an array of anonymous filters in the request which are returned in the same order in the response. Not sure if I like it however, since unlike the histogram aggregator the 'keyed: false' option does not have a key field which links back to the request so the only link between the bucket in the response and the filter in the request is its position in the array

@colings86
Copy link
Contributor Author

cc @rashidkpc @clintongormley

@clintongormley
Copy link
Contributor

@colings86 (Ignore my comments about #7020 )

Not sure we need the double level filters.filters? Surely just filters: {....} or filters: [...] would be sufficient?

@clintongormley
Copy link
Contributor

@rashidkpc can you confirm that you'd be happy with support for these two syntax options?

Results returned with keys:

"aggs" : {
    "messages" : {
        "filters" : {
            "errors" : { "query" : { "match" : { "body" : "error" } } },
            "warnings" : { "query" : { "match" : { "body" : "warning" } } }
        }
    },
    "aggs" : { "monthly" : { "histogram" : { "field" : "timestamp", "interval" : "1M" } } }

}

Results returned as an array:

"aggs" : {
    "messages" : {
        "filters" : [
            { "query" : { "match" : { "body" : "error" } }, "name": "errors"},
            { "query" : { "match" : { "body" : "warning" } }, "name" : "warnings" }
        ]
    },
    "aggs" : { "monthly" : { "histogram" : { "field" : "timestamp", "interval" : "1M" } } }        
}

@rashidkpc
Copy link

After reviewing the range aggregation, I understand why the syntax is how it is. I think the syntax as proposed by @clintongormley will be fine for the filters aggregation. filters.filters is probably not required here unless we plan to add any other parameters to the aggregation.

@clintongormley
Copy link
Contributor

w00t

@colings86 colings86 removed the review label Jul 30, 2014
@colings86 colings86 assigned colings86 and unassigned rashidkpc Jul 30, 2014
@colings86
Copy link
Contributor Author

Tried to implement this but the aggregation framework throws an error if the aggregation is not an object so does not work for the Array case. At least for the array case we may need to go back to having filters.filters as it doesn't seem correct to change the agg framework for what seems like a special case here.

thoughts?

@colings86
Copy link
Contributor Author

because of the above issue the syntax is going to stay as the following:

Results returned with keys:

"aggs" : {
    "messages" : {
        "filters" : {
            "filters" : {
                "errors" : { "query" : { "match" : { "body" : "error" } } },
                "warnings" : { "query" : { "match" : { "body" : "warning" } } }
            }
        }
    },
    "aggs" : { "monthly" : { "histogram" : { "field" : "timestamp", "interval" : "1M" } } }

}

Results returned as an array:

"aggs" : {
    "messages" : {
        "filters" : {
            "filters" : [
                { "query" : { "match" : { "body" : "error" } }, "name": "errors"},
                { "query" : { "match" : { "body" : "warning" } }, "name" : "warnings" }
            ]
        }
    },
    "aggs" : { "monthly" : { "histogram" : { "field" : "timestamp", "interval" : "1M" } } }        
}

*/
public class FiltersAggregationBuilder extends AggregationBuilder<FiltersAggregationBuilder> {

private Map<String, FilterBuilder> filters = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be a linked hash map so that the order would be the same in the response?

@jpountz jpountz removed the review label Jul 30, 2014
@jpountz
Copy link
Contributor

jpountz commented Jul 30, 2014

@colings86 Left some comments.

@colings86
Copy link
Contributor Author

@jpountz I implemented thhe changes you suggested and also I added tests and documentation for array type request as well as updating the Builder to be able to construct array requests since I realised these were missing

@jpountz
Copy link
Contributor

jpountz commented Aug 1, 2014

LGTM

@jpountz jpountz removed the review label Aug 1, 2014
A multi-bucket aggregation where multiple filters can be defined (each filter defines a bucket). The buckets will collect all the documents that match their associated filter.

This aggregation can be very useful when one wants to compare analytics between different criterias. It can also be accomplished using multiple definitions of the single filter aggregation, but here, the user will only need to define the sub-aggregations only once.

Closes #6118
@colings86 colings86 merged commit 3c9c9f3 into elastic:master Aug 1, 2014
@JasonPunyon
Copy link

@clintongormley
Copy link
Contributor

@JasonPunyon fixed, thanks

@colings86 colings86 deleted the pr/6119 branch August 13, 2014 08:11
@colings86 colings86 assigned colings86 and unassigned colings86 Aug 21, 2014
@clintongormley clintongormley changed the title Added Filters aggregation Aggregations: Added Filters aggregation Sep 10, 2014
@clintongormley clintongormley changed the title Aggregations: Added Filters aggregation Added Filters aggregation Jun 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aggregations: Add filters aggregation
7 participants