Speed up date_histogram without children (backport of #63643) #64823
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This speeds up
date_histogram
aggregations without a parent orchildren. This is quite common - it's the aggregation that Kibana's Discover
uses all over the place. Also, we hope to be able to use the same
mechanism to speed aggs with children one day, but that day isn't today.
The kind of speedup we're seeing is fairly substantial in many cases:
This uses the work we did in #61467 to precompute the rounding points for
a
date_histogram
. Now, when we know the rounding points we execute thedate_histogram
as arange
aggregation. This is nice for two reasons:range
aggregation (see below)to ordinals.
Points 2 and 3 above are nice, but most of the speed difference comes from
point 1. Specifically, we now look into executing
range
aggregations asa
filters
aggregation. Normally thefilters
aggregation is quite slowbut when it doesn't have a parent or any children then we can execute it
"filter by filter" which is significantly faster. So fast, in fact, that
it is faster than the original
date_histogram
.The
range
aggregation is fairly careful in how it rewrites, giving upon the
filters
aggregation if it won't collect "filter by filter" andfalling back to its original execution mechanism.
So an aggregation like this:
is executed like:
Which in turn is executed like this:
And that is faster because we can execute it "filter by filter".
Finally, notice the
range
query filtering the data. That is required forthe data set that I'm using for testing. The "filter by filter" collection
mechanism for the
filters
agg needs special case handling when the queryis a
range
query and the filter is arange
query and they are both onthe same field. That special case handling "merges" the range query.
Without it "filter by filter" collection is substantially slower. Its still
quite a bit quicker than the standard
filter
collection, but not nearlyas fast as it could be.