-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Properly size empty filters #71864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly size empty filters #71864
Conversation
Pinging @elastic/es-analytics-geo (Team:Analytics) |
This saves a few ArrayList realocations by fixing an order of operations bug issue in the "empty" handling arm for `filters`. I've also added a test so I can debug that branch.
This speeds up the `terms` agg in a very specific case: 1. It has no child aggregations 2. It has no parent aggregations 3. There are no deleted documents 4. You are not using document level security 5. There is no top level query 6. The field has global ordinals 7. There are less than one thousand distinct terms That is a lot of restirctions! But the speed up pretty substantial because in those cases we can serve the entire aggregation using metadata that lucene precomputes while it builds the index. In a real rally track we have we get a 92% speed improvement, but the index isn't *that* big: ``` | 90th percentile service time | keyword-terms-low-cardinality | 446.031 | 36.7677 | -409.263 | ms | ``` In a rally track with a larger index I ran some tests by hand and the aggregation went from 2200ms to 8ms. Even though there are 7 restrictions on this, I expect it to come into play enough to matter. Restriction 6 just means you are aggregating on a `keyword` field. Or an `ip`. And its fairly common for `keyword`s to have less than a thousand distinct values. Certainly not everywhere, but some places. I expect "cold tier" indices are very very likely not to have deleted documents at all. And the optimization works segment by segment - so it'll save some time on each segment without deleted documents. But more time if the entire index doesn't have any. The optimization builds on #68871 which translates `terms` aggregations against low cardinality fields with global ordinals into a `filters` aggregation. This teaches the `filters` aggregation to recognize when it can get its results from the index metadata. Rather, it creates the infrastructure to make that fairly simple and applies it in the case of the queries generated by the terms aggregation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -251,7 +251,7 @@ private FiltersAggregator(String name, AggregatorFactories factories, List<Query | |||
@Override | |||
public InternalAggregation buildEmptyAggregation() { | |||
InternalAggregations subAggs = buildEmptySubAggregations(); | |||
List<InternalFilters.InternalBucket> buckets = new ArrayList<>(filters.size() + otherBucketKey == null ? 0 : 1); | |||
List<InternalFilters.InternalBucket> buckets = new ArrayList<>(filters.size() + (otherBucketKey == null ? 0 : 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. I always say, if you have to think about operator precedence, you should add more parenthesis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Truth.
This saves a few ArrayList realocations by fixing an order of operations bug issue in the "empty" handling arm for `filters`. I've also added a test so I can debug that branch.
This saves a few ArrayList realocations by fixing an order of operations bug issue in the "empty" handling arm for `filters`. I've also added a test so I can debug that branch. Co-authored-by: Elastic Machine <[email protected]>
This saves a few ArrayList realocations by fixing an order of operations
bug issue in the "empty" handling arm for
filters
. I've also added atest so I can debug that branch.