Properly size empty filters #71864

nik9000 · 2021-04-19T16:59:01Z

This saves a few ArrayList realocations by fixing an order of operations
bug issue in the "empty" handling arm for filters. I've also added a
test so I can debug that branch.

elasticmachine · 2021-04-19T16:59:05Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

This saves a few ArrayList realocations by fixing an order of operations bug issue in the "empty" handling arm for `filters`. I've also added a test so I can debug that branch.

This speeds up the `terms` agg in a very specific case: 1. It has no child aggregations 2. It has no parent aggregations 3. There are no deleted documents 4. You are not using document level security 5. There is no top level query 6. The field has global ordinals 7. There are less than one thousand distinct terms That is a lot of restirctions! But the speed up pretty substantial because in those cases we can serve the entire aggregation using metadata that lucene precomputes while it builds the index. In a real rally track we have we get a 92% speed improvement, but the index isn't *that* big: ``` | 90th percentile service time | keyword-terms-low-cardinality | 446.031 | 36.7677 | -409.263 | ms | ``` In a rally track with a larger index I ran some tests by hand and the aggregation went from 2200ms to 8ms. Even though there are 7 restrictions on this, I expect it to come into play enough to matter. Restriction 6 just means you are aggregating on a `keyword` field. Or an `ip`. And its fairly common for `keyword`s to have less than a thousand distinct values. Certainly not everywhere, but some places. I expect "cold tier" indices are very very likely not to have deleted documents at all. And the optimization works segment by segment - so it'll save some time on each segment without deleted documents. But more time if the entire index doesn't have any. The optimization builds on #68871 which translates `terms` aggregations against low cardinality fields with global ordinals into a `filters` aggregation. This teaches the `filters` aggregation to recognize when it can get its results from the index metadata. Rather, it creates the infrastructure to make that fairly simple and applies it in the case of the queries generated by the terms aggregation.

not-napoleon

LGTM

not-napoleon · 2021-05-05T13:51:44Z

server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java

@@ -251,7 +251,7 @@ private FiltersAggregator(String name, AggregatorFactories factories, List<Query
    @Override
    public InternalAggregation buildEmptyAggregation() {
        InternalAggregations subAggs = buildEmptySubAggregations();
-        List<InternalFilters.InternalBucket> buckets = new ArrayList<>(filters.size() + otherBucketKey == null ? 0 : 1);
+        List<InternalFilters.InternalBucket> buckets = new ArrayList<>(filters.size() + (otherBucketKey == null ? 0 : 1));


Nice catch. I always say, if you have to think about operator precedence, you should add more parenthesis.

This saves a few ArrayList realocations by fixing an order of operations bug issue in the "empty" handling arm for `filters`. I've also added a test so I can debug that branch.

This saves a few ArrayList realocations by fixing an order of operations bug issue in the "empty" handling arm for `filters`. I've also added a test so I can debug that branch. Co-authored-by: Elastic Machine <[email protected]>

nik9000 added >bug :Analytics/Aggregations Aggregations v8.0.0 v7.14.0 labels Apr 19, 2021

nik9000 requested a review from not-napoleon April 19, 2021 16:59

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 19, 2021

Properly size empty filters

9ee26dd

This saves a few ArrayList realocations by fixing an order of operations bug issue in the "empty" handling arm for `filters`. I've also added a test so I can debug that branch.

not-napoleon approved these changes May 5, 2021

View reviewed changes

nik9000 merged commit 47719f8 into elastic:master May 5, 2021

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Properly size empty filters #71864

Properly size empty filters #71864

Uh oh!

nik9000 commented Apr 19, 2021

Uh oh!

elasticmachine commented Apr 19, 2021

Uh oh!

not-napoleon left a comment

Uh oh!

not-napoleon May 5, 2021

Uh oh!

nik9000 May 5, 2021

Uh oh!

Uh oh!

Properly size empty filters #71864

Properly size empty filters #71864

Uh oh!

Conversation

nik9000 commented Apr 19, 2021

Uh oh!

elasticmachine commented Apr 19, 2021

Uh oh!

not-napoleon left a comment

Choose a reason for hiding this comment

Uh oh!

not-napoleon May 5, 2021

Choose a reason for hiding this comment

Uh oh!

nik9000 May 5, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!