Skip to content

ValuesSource support for reading points #55552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nik9000 opened this issue Apr 21, 2020 · 5 comments · Fixed by #58769
Closed

ValuesSource support for reading points #55552

nik9000 opened this issue Apr 21, 2020 · 5 comments · Fixed by #58769
Labels
:Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@nik9000
Copy link
Member

nik9000 commented Apr 21, 2020

There are a few aggregations where we could really use some kind of clean way to get access to the "points" because we can iterate those in sorted order for the whole shard. Specifically composite, min, max, date_histogram and auto_date_histogram look at the points. We'd love to have some nice way for them to do it that tells folks when they implement a new type that this is something that they should think about.

min and max look up the minimum or maximum value and cast it to a double. date_histogram and, soon, auto_date_histogram look up the minimum and maximum dates so they can deal with rounding more efficiently. composite iterates values is sorted order.

@nik9000 nik9000 added the :Analytics/Aggregations Aggregations label Apr 21, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@iverase
Copy link
Contributor

iverase commented Apr 21, 2020

There is an idea to use points to compute bounding box aggregation for geo data (points and shapes are just multi-dimensional points). It would be good if the abstraction can hold the concept of dimensionality?

@nik9000
Copy link
Member Author

nik9000 commented Apr 21, 2020

I think it'd need to be a fairly thin abstraction. One common need is preflight checks to make sure that the points are "valid". Like, "there isn't a script on this field" and "there isn't a missing value" and "we're indexing this field at all".

Part of my problem is I'm not sure what the right shape for the abstraction should be. It certainly should support date's resolution but how to line that up with everything else, I dunno.

@polyfractal
Copy link
Contributor

polyfractal commented Apr 21, 2020

To add another data point, there's an old (stale) PR to add BKD optimization to range agg: #47712. We put it on hold because we wanted to investigate a similar optimization for date_histo, which would have a far greater impact.

It might not fit with the above abstraction since it would be fairly more involved. Not just getting a min/max, but building multiple ranges to intersect. That said it would share a lot of the same upfront initialization stuff (preflight checks, getting the point reader, etc)

@nik9000
Copy link
Member Author

nik9000 commented Apr 24, 2020

I'd be happy to add something for this as sort of a follow-up to #55559.

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants