Ideas for aggregation performance improvements #65019
Labels
:Analytics/Aggregations
Aggregations
>enhancement
Meta
Team:Analytics
Meta label for analytical engine team (ESQL/Aggs/Geo)
>tech debt
Uh oh!
There was an error while loading. Please reload this page.
This is a general meta issue to capture the intent of trying to leverage index structures more often in aggregations. Today, we have some simple optimizations that will "short circuit" agg execution by consulting the BKD tree (min/max aggs for example), and recently some substantial work to rewrite date_histograms into ranges/filters.
In both cases, these optimizations can greatly accelerate the "hot path" by looking up data in the index, rather than iterating over each document and polling the DV. We think there are probably a number of such cases, where we can accelerate certain scenarios or arrangements of aggs by reusing data in the index
Related:
filter
intofilters
so it can share in all the performance improvements onfilters
.filters
is under an agg that can run in filter-by-filter mode. It'd be fairly helpful when two "filter-by-filter" compatible aggs are nested in one another.filters
aggregations onrange
queries without children could use the BKD index to count matches instead of enumerating all matchescardinality
aggregations onmatch_all
queries could build the HLL++ object from the terms dictionary instead of collecting all matchespercentiles
aggregations onmatch_all
queries could build the HDR histogram from the BKD tree: for leaf nodes where the min and max value would be on the same bucket, we wouldn't need to collect all individual values one by one.composite
aggregation #88185The text was updated successfully, but these errors were encountered: