Speed up terms agg when alone (backport #69377) #69632
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This speeds up the
terms
agg in a very specific case:That is a lot of restirctions! But the speed up pretty substantial because
in those cases we can serve the entire aggregation using metadata that
lucene precomputes while it builds the index. In a real rally track we
have we get a 92% speed improvement, but the index isn't that big:
In a rally track with a larger index I ran some tests by hand and the
aggregation went from 2200ms to 8ms.
Even though there are 7 restrictions on this, I expect it to come into
play enough to matter. Restriction 6 just means you are aggregating on
a
keyword
field. Or anip
. And its fairly common forkeyword
s tohave less than a thousand distinct values. Certainly not everywhere, but
some places.
I expect "cold tier" indices are very very likely not to have deleted
documents at all. And the optimization works segment by segment - so
it'll save some time on each segment without deleted documents. But more
time if the entire index doesn't have any.
The optimization builds on #68871 which translates
terms
aggregationsagainst low cardinality fields with global ordinals into a
filters
aggregation. This teaches the
filters
aggregation to recognize whenit can get its results from the index metadata. Rather, it creates the
infrastructure to make that fairly simple and applies it in the case of
the queries generated by the terms aggregation.