Skip to content

Can date_histograms better take advantage of data locality? #90261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jpountz opened this issue Sep 22, 2022 · 2 comments
Open

Can date_histograms better take advantage of data locality? #90261

jpountz opened this issue Sep 22, 2022 · 2 comments
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@jpountz
Copy link
Contributor

jpountz commented Sep 22, 2022

Description

Date histograms are one of Elasticsearch's most used aggregations. Given the way we index into data streams, it's quite likely for documents with similar @timestamps to be clustered together.

Could we take advantage of this to speed up date histograms (or effectively ranges, since date histograms often rewrite to ranges) by first checking whether the current doc falls within the same bucket as the previous doc before doing more expensive operations? For instance in the case when a date histogram rewrites to a range, this could help save the binary search on bucket boundaries.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 22, 2022
@wchaparro
Copy link
Member

Added to: #65019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

3 participants