-
Notifications
You must be signed in to change notification settings - Fork 25.2k
exclude aggregator #7020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exclude aggregator #7020
Conversation
@miguelnmiranda I don't think you need to have the query in each aggregator. Users usually solve this problem by using {
"query": {
"filtered": {
"query": "your query goes here",
"filter": "filters to take into account for top-hits and aggs"
}
},
"post_filter" : "filters to take into account for top-hits only",
"aggs": {
"my_filter": {
"filter": "filter to take into account for aggs only"
}
}
} Would it work for you? |
@jpountz I believe that with post_filter you can only add more filter not remove some of the query filters. The example I wrote for the documentation you have the case where you have a facet, lets say colour facet, |
Wouldn't it work if the colour filter was a post_filter (but neither a query filter nor an aggregation filter)? |
Yes it would, but if you have two or more facets, lets say colour and gender you cannot follow that approach. |
I think I understand the issue now. Using |
Yes. Sorry for not putting it in those terms to start with. |
No worries, I was a bit slow to understand on my end as well. :-) |
I think my example for the documentation is incorrect. I used cardinality aggregator when I just wanted the number of docs. Will fix that later if this goes forward. Also there is a live example.. the website of a brand, but it seems to be down for maintenance at the moment. I will post it as an example here when it goes back up. |
I don't like much having the ability to exclude filters from the query since it breaks the expectation that aggregations apply to the documents that match the query (the only exception being the However, I think we could have an aggregation that would accept a list of filters and would build a bucket for every combination of (N-1) filters. Maybe this functionnality could even be folded into the |
Is it because of the semantics of
This would work, and is ideal for the case where you have selected at least a value for each available facet. But if you only filter based on M (<N) of the available facets, meaning you don't have a filter for all facets yet, it won't generate buckets for the remaining N-M facets. |
If I may add I would lowe a feature to control sub aggs on bucket level. One of my cases is to be able to produce output for multilevel agg where lower level aggs will be slightly difgerent from each other not in terms of agg nature but in following:
Imagine a drill down tree UI where user can start from top level agg and drill down int idividual buckets and then change query and see the changes to yhe expanded trer in one call to elastic. It is acievalbe now but at the cost of multiple aggs Say we have agg on country state and city I drilled down into United States/Montana and Alaska In order to reload the data in one call I would have to aggs on the same level Countries and countries/states with filter including alaska and montana and then post process the data to merge second agg results into the first It is doable but if you add need to handle missing and other buckets not supported by ES in sumilar way it becomes rather messy and hard to implement in general way. What I would like to see is abulity to specify extra options inside states agg per possible bucket For example Bucket-config: { In short I would like to be able to exercise some control over sub aggs on parent agg bucket level rather than all of rhem be exactly the same such drill down into individual branches instead of all of them |
As I understand clearly, the current solution is to include Something like described here: http://stackoverflow.com/questions/8908325/elasticsearch-excluding-filters-while-faceting-possible-like-in-solr (it's for the facets, but follows the same idea) |
Correct. @clintongormley explained it much better than me! |
@dmitry so in the countries/states agg definition I want to specify instructions for each country bucket such whether to calculate matching coutry sub-aggs, max size of the sub-agg etc |
@miguelnmiranda and currently it's not possible to have the same behavior without including all the filters in the In my case I have something like that: {
"body": {
"post_filter": {
"and": [{
"terms": {
"type": ["apartment"]
}
}, {
"terms": {
"location_ids": [386]
}
}]
},
"aggregations": {
"types": {
"filter": {
"and": [{
"terms": {
"location_ids": [386]
}
}]
},
"aggs": {
"types": {
"terms": {
"field": "type",
"size": 0
}
}
}
},
"locations": {
"filter": {
"and": [{
"terms": {
"type": ["apartment"]
}
}]
},
"aggs": {
"locations": {
"terms": {
"field": "location_ids",
"size": 0
}
}
}
}
}
},
"index": "properties",
"type": ["property"]
} I thought there should be some better solution for that most used case of elasticsearch or I'm wrong? |
Thanks for the PR, but I agree wholeheartedly with @jpountz. We used to have named "scopes" in facets, back in the day, but they were removed. This PR suffers from the same problem that they did. The DSL allows for complicated nesting of clauses, while scopes refer to individual filters, regardless of their position in the query. You could apply a name to a filter which is a sub-clause of another filter, but in the aggregation, you'd get documents that you are not expecting because the filter is treated as though it were at the top. You include an option to exclude the query - same problem: which query are you referring to? there could be several. I like @jpountz 's idea of extending the
But you'd know up front which clauses don't have values, so you'd just specify these as normal aggs. (@roytmana you're talking about something completely separate, please don't hijack this issue) |
@miguelnmiranda yes, you have a point... The only thing I could come up with looks like this:
In this example, Execution would iterate through the entries and apply all filters except for the current filter, where it would calculate the aggs instead. This is completely different from any other aggs as they are today, so not sure how well this API would fit. |
I deleted my previous the comment by mistake! @clintongormley the behaviour seems odd.. and as you say does not fit well with the API.
The "current" exclude filter only looks at the the names inside the top level I implemented a different approach where you could "cut" the filter tree at any point. "filter": {
"and": {
_name : root
filters : [
"or": {
_name : orBranch
filters : [
"filter": {
_name : f2
}
"filter": {
_name : f3
}
]
], {
"filter": {
_name : f1
}
]
}
But while writting it I found that the behaviour when a sub filter is null is not consistent across filters. |
One way of getting counts for each dimension independent of that dimension's clauses would be to use a minimum_number_should_match value of 1 less than the number of clauses e.g.
Each terms agg in the above would then collect all terms where only 2 of the 3 clauses were present. |
@markharwood I tried out your solution and it doesn't do quite what we're after. For instance, given the following query:
... we want to know what count we would get for:
While your approach actually gives us counts
I don't see any concise way of doing this out of the box. That said, this is a common and very specific use case. We could possibly provide a simple (but inflexible) aggregation that does exactly what is needed here. I say inflexible because we want to keep it simple - if you want flexibility you can go the verbose route instead. What about something like this:
Of course, this syntax doesn't reduce the amount of work that has to be performed. 3 terms means 6 filters, 4 terms means 12 filters, 5 terms means 20 filters... |
Not sure I follow.
The OP primarily asked for a list not a count: "I want to show all colours that I would display if no colour was selected". Each of these lists include counts e.g. how many large shirts are available in blue but the count is perhaps not the primary concern - the typical shopper just wants to know the large shirt is also available in blue. |
Actually, in a later comment the OP says "...I just wanted the number of docs"* And this is the typical use case - how many red, green, blue products do I have which are type:shirt and size:large, ie what will I see if I remove this particular filter. |
Still confused then. The full quote from the OP re numbers is
Cardinality aggs is about count distinct and I don't know what example he refers to. I think the typical requirement is simple - if I have a dimension that supports multiple selections (red OR blue OR green checkboxes) then I don't want the options for blue or green to immediately disappear when I select red. However, if I make a selection in a different dimension that says I'm only interested in Large shirts then I don't want to see colours that are not available in large. That's the behaviour I assumed the user was after and which I think my example provides (along with the related counts). |
What is the status of this pull request? It's possible to have this feature in near future? |
It is clear that this PR isn't going to be merged as is. I haven't seen a good suggestion yet for how to implement this with term counts (although @markharwood's suggestion works without term counts). I'd welcome more suggestions in a new issue. |
we have this issue in https://github.com/searchkit/searchkit SOLR solve this with tags which is a bit like this PR solution http://yonik.com/multi-select-faceting/ We need to put filters applied by aggregators in post_filter |
I think my suggestion should give you the terms and counts you need in aggregation results but the end tail of docs in the hits it produces may have false positives (docs that match n-1 dimensions when you want them to match n dimensions). |
thanks @markharwood I will test out the the n-1 on root bool query |
This feature allows to exclude parts of a query when defining the result set used by an aggregator.
Based on: https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters
I work at a company that develops ecommerce solutions and we use the solr feature a lot to provide a better search experience while keeping the query size manageable. And the one excuse people gave me not to use elasticsearch was that this behaviour would be to verbose, having to write the whole query in each aggregator. I love elasticsearch, and want to use it in future projects.
I experimented with a more generic approach which allowed to remove/cut a branch of the filter tree at any point, but there were some inconsistencies in behaviour when excluding in a 'or' filter vs 'and' filter, because having a null for a filter is not treated the same way everywhere.