Speed up date_histogram by precomputing ranges (#61467) #62881
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A few of us were talking about ways to speed up the
date_histogram
using the index for the timestamp rather than the doc values. To do that
we'd have to pre-compute all of the "round down" points in the index. It
turns out that just precomputing those values speeds up rounding
fairly significantly:
That's a 46% speed up when there isn't a time zone and a 58% speed up
when there is.
This doesn't work for every time zone, specifically those that have two
midnights in a single day due to daylight savings time will produce wonky
results. So they don't get the optimization.
Second, this requires a few expensive computation up front to make the
transition array. And if the transition array is too large then we give
up and use the original mechanism, throwing away all of the work we did
to build the array. This seems appropriate for most usages of
round
,but this change uses it for all usages of
round
. That seems ok fornow, but it might be worth investigating in a follow up.
I ran a macrobenchmark as well which showed an 11% preformance
improvement. BUT the benchmark wasn't tuned for my desktop so it
overwhelmed it and might have produced "funny" results. I think it is
pretty clear that this is an improvement, but know the measurement is
weird: