Small speed up of date_histogram with children #67012

nik9000 · 2021-01-05T15:35:13Z

This allows us to run the optimization introduced in #63643 when the
date_histogram has children. It isn't a revolutionary performance
improvement though because children tend to be a lot heavier than the
date_histogram. It is faster, but only by a couple of percentage
points.

Mostly the idea is that we can build on this to do other optimization.

elasticmachine · 2021-01-05T15:35:16Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

This allows us to run the optimization introduced in elastic#63643 when the `date_histogram` has children. It isn't a revolutionary performance improvement though because children tend to be a lot heavier than the `date_histogram`. It is faster, but only by a couple of percentage points.

nik9000 · 2021-01-05T15:39:51Z

server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java

+                    SubCollector collector = new SubCollector(filterOrd, filterLeafCollector);
+                    scorer.score(collector, live);
+                    incrementBucketDocCount(filterOrd, collector.total);
+                }


I think it's possible that this could actually be slower than the standard execution mechanism. I wonder if we need an escape hatch so folks can dodge this mechanism if it proves a bad idea.

Also: there is another possible implementation here that involves collecting a block of matches for each filter and then running all of the children in parallel. I'm not sure if it'll be faster or not. It kind of depends on the speed of iterating the doc values. It is a little more complex so I didn't do it.

After taking a couple of days away I think the block matching mechanism is probably better here. Its much simpler to estimate the cost. Also - I'd love to know why the old way is so slow - the block based mechanism feels like it'd be fast and it reads quite similarly to the Compatible mechanism. I think the big difference is that we don't get to join the main query with the filter query. So it can't skip matches effectively. Maybe. I've got to play.

nik9000 · 2021-01-05T17:01:58Z

The test failure may be related. Fun

nik9000 · 2021-01-05T21:37:19Z

The build failure found a bug that I'd like to land the fix for into 7.11: #67043.

not-napoleon

I'm just leaving a comment, because it looks like this isn't quite ready for a full review? (still has NOCOMMITs, failing tests). Ping me when it's ready and I'll give it another pass.

not-napoleon · 2021-01-11T15:16:37Z

server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java

-                TotalHitCountCollector collector = new TotalHitCountCollector();
-                scorer.score(collector, live);
-                incrementBucketDocCount(filterOrd, collector.getTotalHits());
+                if (sub == LeafBucketCollector.NO_OP_COLLECTOR) {


The equality check feels brittle to me here. I wonder if we should put a method on LeafBucketCollector to return a boolean if it's going to do any work, and check that. Might be premature abstraction on my part though, what do you think?

nik9000 · 2021-01-11T16:25:30Z

I'm just leaving a comment, because it looks like this isn't quite ready for a full review? (still has NOCOMMITs, failing tests). Ping me when it's ready and I'll give it another pass.

Sorry about that! When I first opened it I was more comfortable with it and then I kind of went backwards on that.....

mark-vieira · 2021-02-03T00:16:33Z

@elasticmachine update branch

nik9000 · 2021-03-01T13:48:40Z

The master branch is quite different now. I'm going to try and revive this against master and reopen with the new block algorithm.

nik9000 added :Analytics/Aggregations Aggregations v8.0.0 v7.12.0 labels Jan 5, 2021

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 5, 2021

nik9000 requested review from not-napoleon and polyfractal January 5, 2021 15:35

nik9000 commented Jan 5, 2021

View reviewed changes

Merge branch 'master' into date_histo_as_range_with_sub

e17a89c

not-napoleon reviewed Jan 11, 2021

View reviewed changes

nik9000 marked this pull request as draft January 11, 2021 16:25

nik9000 requested review from not-napoleon and removed request for polyfractal and not-napoleon January 11, 2021 16:26

williamrandolph added v7.13.0 and removed v7.12.0 labels Feb 18, 2021

nik9000 removed v7.13.0 v8.0.0 labels Mar 1, 2021

nik9000 closed this Mar 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Small speed up of date_histogram with children #67012

Small speed up of date_histogram with children #67012

Uh oh!

nik9000 commented Jan 5, 2021

Uh oh!

elasticmachine commented Jan 5, 2021

Uh oh!

nik9000 Jan 5, 2021

Uh oh!

nik9000 Jan 7, 2021

Uh oh!

nik9000 commented Jan 5, 2021

Uh oh!

nik9000 commented Jan 5, 2021

Uh oh!

not-napoleon left a comment

Uh oh!

not-napoleon Jan 11, 2021

Uh oh!

nik9000 commented Jan 11, 2021

Uh oh!

mark-vieira commented Feb 3, 2021

Uh oh!

nik9000 commented Mar 1, 2021

Uh oh!

Uh oh!

Small speed up of date_histogram with children #67012

Small speed up of date_histogram with children #67012

Uh oh!

Conversation

nik9000 commented Jan 5, 2021

Uh oh!

elasticmachine commented Jan 5, 2021

Uh oh!

nik9000 Jan 5, 2021

Choose a reason for hiding this comment

Uh oh!

nik9000 Jan 7, 2021

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Jan 5, 2021

Uh oh!

nik9000 commented Jan 5, 2021

Uh oh!

not-napoleon left a comment

Choose a reason for hiding this comment

Uh oh!

not-napoleon Jan 11, 2021

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Jan 11, 2021

Uh oh!

mark-vieira commented Feb 3, 2021

Uh oh!

nik9000 commented Mar 1, 2021

Uh oh!

Uh oh!