Skip to content

Feature gap between Java and HTTP APIs for Percentiles aggregation #23610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mrec opened this issue Mar 16, 2017 · 4 comments
Closed

Feature gap between Java and HTTP APIs for Percentiles aggregation #23610

mrec opened this issue Mar 16, 2017 · 4 comments
Labels

Comments

@mrec
Copy link

mrec commented Mar 16, 2017

This is a pretty minor edge case, but the blog post on retiring the old Java API asks for feedback and this is something we ran into recently. (In 2.3.2, though I don't think anything's changed in later versions.)

In the HTTP API, when you request e.g. [25, 50, 75] percentiles, all you get back is exactly that. In the Java API, on the other hand, you get back a serialized InternalTDigestPercentiles which can be queried for the specific percentiles you requested, but also for any other percentile you might be interested in.

We use Percentiles in a calibration request to find bucket boundaries for a subsequent Range aggregation, and this becomes relevant when you have a very skewed distribution swamped by a single value, so that the [25, 50, 75] percentiles might all be the same value. Since the Range agg is driving faceted drilldown UI and a single-bucket navigator is pretty pointless, in this case we use the InternalTDigestPercentiles to binary-chop around until we find some useful bucket boundaries. AIUI this would no longer be possible with the HTTP API.

It's not the end of the world; one alternative considered was requesting a lot more percentiles up front to improve our chances of getting 3 different ones, and that wouldn't depend on the API. In a perfect world there'd be a Range variant for "N optimally-balanced buckets" instead of for explicit bucket boundaries, which would do away with the need for the extra calibration round-trip, but that's probably wishful thinking.

@cbuescher
Copy link
Member

@mrec thanks for your feedback. You are right that this might be one of the cases where it will be hard to provide the same functionality that we currently have in the Java API when going through the REST layer. @javanna i will label this accordingly so we can track this when we get to parsing the aggregations on the client side.

@javanna
Copy link
Member

javanna commented Apr 18, 2017

Thanks for opening this @mrec . Good find, the transport client ends up exposing internal objects and state that is not available otherwise through REST.

Once you move to REST, we will only be able to return what we get back via the REST layer, hence the java REST client will only return what all the other REST language clients do.

We are making the effort to reuse the same request and response objects as the transport client to ease migration, but we will not be able to return part of objects that are not returned through REST.

In particular for aggregations, we are reusing the interfaces and adding client classes that implement the same interfaces. I understand that you are casting to the internal object at the moment in your application; once you'll move to the high level REST client that cast will throw ClassCastException.

I don't think I can do more than acknowledging this issue and closing it. Those internal objects are exposed as a side-effect of using the same code for internal stuff and the Java api, something that we have been addressing by developing a Java REST client.

@javanna javanna closed this as completed Apr 18, 2017
@mrec
Copy link
Author

mrec commented Apr 18, 2017

In particular for aggregations, we are reusing the interfaces and adding client classes that implement the same interfaces. I understand that you are casting to the internal object at the moment in your application; once you'll move to the high level REST client that cast will throw ClassCastException.

I'm not sure what interface you'll be reusing in this case. SearchResponse.getAggregations() doesn't type things beyond the uselessly vague Aggregation, so you need to cast it to something. Currently I'm casting it to Percentiles which is already an interface and exhibits the behaviour described. I'm not casting to or otherwise relying on InternalTDigestPercentiles outside of unit test setup.

I suspect the root problem here is that Percentiles.percentile(double percent) was only ever intended to support querying for one of the exact values you'd requested, but that this has never been documented or enforced.

@javanna
Copy link
Member

javanna commented Apr 18, 2017

Of course some casting will be needed, Percentiles is the interface that we are implementing, so casting to it will be ok. But we will only be able to return what gets returned via REST through its Percentiles.percentile(double percent) as you figured, rather than anything that is exposed through the internal state that is only serialized back via transport layer and not available when using REST. SO yes, in this specific case the issue is more subtle than a class cast exception, the behaviour of the percentile method will not be exactly the same as before once switched to REST.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants