Ideas for aggregation performance improvements #65019

polyfractal · 2020-11-12T20:44:07Z

This is a general meta issue to capture the intent of trying to leverage index structures more often in aggregations. Today, we have some simple optimizations that will "short circuit" agg execution by consulting the BKD tree (min/max aggs for example), and recently some substantial work to rewrite date_histograms into ranges/filters.

In both cases, these optimizations can greatly accelerate the "hot path" by looking up data in the index, rather than iterating over each document and polling the DV. We think there are probably a number of such cases, where we can accelerate certain scenarios or arrangements of aggs by reusing data in the index

Index date field data with lower precision #64662
Speed up date_histogram without children #63643
Merge the implementation of filter into filters so it can share in all the performance improvements on filters.
Merge "filter-by-filter" execution with parent aggregations if possible. This'd give huge speed up if filters is under an agg that can run in filter-by-filter mode. It'd be fairly helpful when two "filter-by-filter" compatible aggs are nested in one another.
filters aggregations on range queries without children could use the BKD index to count matches instead of enumerating all matches
cardinality aggregations on match_all queries could build the HLL++ object from the terms dictionary instead of collecting all matches
percentiles aggregations on match_all queries could build the HDR histogram from the BKD tree: for leaf nodes where the min and max value would be on the same bucket, we wouldn't need to collect all individual values one by one.
Can date_histograms better take advantage of data locality? #90261
Support dynamic pruning in the composite aggregation #88185

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-11-12T20:44:08Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

elasticsearchmachine · 2023-07-28T14:20:01Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

elasticsearchmachine · 2025-02-13T14:45:25Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

$@polyfractal$ polyfractal added >enhancement :Analytics/Aggregations Aggregations Meta labels Nov 12, 2020

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 12, 2020

wchaparro added the >tech debt label Dec 21, 2021

wchaparro changed the title ~~Leverage index structures to accelerate aggregations~~ Ideas for aggregation performance improvements Jun 16, 2022

hendrikmuhs mentioned this issue Jun 28, 2022

InternalAggregations lack memory accounting #88128

Open

This was referenced Jul 28, 2023

Can date_histograms better take advantage of data locality? #90261

Open

Support dynamic pruning in the composite aggregation #88185

Open

Remove usage of SortedSetDocValues#NO_MORE_ORDS #88004

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ideas for aggregation performance improvements #65019

Ideas for aggregation performance improvements #65019

polyfractal commented Nov 12, 2020 •

edited by javanna

Loading

elasticmachine commented Nov 12, 2020

Uh oh!

elasticsearchmachine commented Jul 28, 2023

Uh oh!

elasticsearchmachine commented Feb 13, 2025

Uh oh!

Ideas for aggregation performance improvements #65019

Ideas for aggregation performance improvements #65019

Comments

polyfractal commented Nov 12, 2020 • edited by javanna Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

elasticmachine commented Nov 12, 2020

Uh oh!

elasticsearchmachine commented Jul 28, 2023

Uh oh!

elasticsearchmachine commented Feb 13, 2025

Uh oh!

polyfractal commented Nov 12, 2020 •

edited by javanna

Loading