Skip to content

Ordering term aggregation based on scripted metric. #15718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

lquerel
Copy link

@lquerel lquerel commented Dec 30, 2015

Hi,

Scripted metrics have been introduced to compute complex computations based on a map/reduce approach. Unfortunately the usage of scripted metrics is limited because they can't be used in the 'order' clause of an aggregation. See this discussion for a full description of the issue.

https://discuss.elastic.co/t/ordering-terms-aggregation-based-on-pipeline-metric/31839

This pull-request is an attempt to fix this issue for scripted metrics returning a number as the result of their reduce method (probably more than 90% of use cases).

That's my first contribution to ES, so there is probably room for improvements and even may be mistakes here and there. Don't hesitate to guide me in the right direction in order to make this pull request a success.

Thanks

Laurent

@lquerel
Copy link
Author

lquerel commented Dec 30, 2015

The CLA is now signed.

@lquerel
Copy link
Author

lquerel commented Jan 6, 2016

For info, this pull request fixes this issue.

#8486

@dakrone
Copy link
Member

dakrone commented Apr 6, 2016

@colings86 I think this is something you might be able to take a look at?

@lquerel
Copy link
Author

lquerel commented Aug 19, 2016

@dakrone, @colings86 is possible to move forward here? So many people complain that ElasticSearch is not able to order term aggregation based on scripted metric?

It's like a major missing feature IMO and for sure it's inconsistent with the rest of the aggregation framework.

@colings86
Copy link
Contributor

@lquerel sorry it's taken so long to get back to you on this PR.

This is a tricky feature as there are problems with sorting the terms aggregation by any aggregation type which is the focus of #17588. Given this I am personally hesistant to add more cases where your can sort by a sub-aggregation but have no idea whether you actually have the top N or even what the error in the result could be.

If we do end up implementing this feature I would also be keen for it to not change the aggregation framework in a way that is specific only to this aggregation since this IMO will make the aggregation framework tricky to understand and maintain. We actually want to move the scripted_metric aggregation to a plugin since it is an expert feature (#19821) adding to the case for not having special cases in the core aggregation code to support it.

@colings86
Copy link
Contributor

I have also added a comment to the issue asking for use cases where users need to order by a scripted_metric aggregation to better understand where users think the current set of aggregations are lacking and what we can do to fill these gaps #8486 (comment)

@lquerel
Copy link
Author

lquerel commented Aug 22, 2016

@colings86 Thanks for your response.

I understand the need to remove all special cases from the core aggregation framework. The fact that scripted metrics don't behave as any other metrics is disturbing for the user.

If scripted metrics are implemented as a plugin, then I hope the "order by" clause will be supported because otherwise ElasticSearch doesn't compete with the aggregation framework of a solution like crate.io (bases on ElasticSearch core) or even with the basic MongoDB aggregation framework. They support group by, order by on any computation.

IMO, with a better support of different variants of average functions we could avoid a lot scripted metrics use cases (weighted average + other forms explained in this thread #8486 (comment)).

Finally compute an accurate Top X on a distributed system is not easy and sometimes not really essential. A fast approximation is usually good enough in most case.

@rjernst
Copy link
Member

rjernst commented Jun 9, 2017

@colings86 Where do we stand on this PR? It has been open for 1.5 years, we should make a decision on it.

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@colings86
Copy link
Contributor

I am going to close this PR as any solution for this would need to work in a way that does not special case the scripted_metric aggregation. Also the issue for this PR (#8486) is closed as we want to solve the specific use-cases raised for sorting terms aggregations by the scripted_metric aggregation in other ways

@colings86 colings86 closed this Jun 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants