Skip to content

Small speed up of date_histogram with children #67012

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jan 5, 2021

This allows us to run the optimization introduced in #63643 when the
date_histogram has children. It isn't a revolutionary performance
improvement though because children tend to be a lot heavier than the
date_histogram. It is faster, but only by a couple of percentage
points.

Mostly the idea is that we can build on this to do other optimization.

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 5, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

This allows us to run the optimization introduced in elastic#63643 when the
`date_histogram` has children. It isn't a revolutionary performance
improvement though because children tend to be a lot heavier than the
`date_histogram`. It is faster, but only by a couple of percentage
points.
SubCollector collector = new SubCollector(filterOrd, filterLeafCollector);
scorer.score(collector, live);
incrementBucketDocCount(filterOrd, collector.total);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's possible that this could actually be slower than the standard execution mechanism. I wonder if we need an escape hatch so folks can dodge this mechanism if it proves a bad idea.

Also: there is another possible implementation here that involves collecting a block of matches for each filter and then running all of the children in parallel. I'm not sure if it'll be faster or not. It kind of depends on the speed of iterating the doc values. It is a little more complex so I didn't do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After taking a couple of days away I think the block matching mechanism is probably better here. Its much simpler to estimate the cost. Also - I'd love to know why the old way is so slow - the block based mechanism feels like it'd be fast and it reads quite similarly to the Compatible mechanism. I think the big difference is that we don't get to join the main query with the filter query. So it can't skip matches effectively. Maybe. I've got to play.

@nik9000
Copy link
Member Author

nik9000 commented Jan 5, 2021

The test failure may be related. Fun

@nik9000
Copy link
Member Author

nik9000 commented Jan 5, 2021

The build failure found a bug that I'd like to land the fix for into 7.11: #67043.

Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just leaving a comment, because it looks like this isn't quite ready for a full review? (still has NOCOMMITs, failing tests). Ping me when it's ready and I'll give it another pass.

TotalHitCountCollector collector = new TotalHitCountCollector();
scorer.score(collector, live);
incrementBucketDocCount(filterOrd, collector.getTotalHits());
if (sub == LeafBucketCollector.NO_OP_COLLECTOR) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The equality check feels brittle to me here. I wonder if we should put a method on LeafBucketCollector to return a boolean if it's going to do any work, and check that. Might be premature abstraction on my part though, what do you think?

@nik9000
Copy link
Member Author

nik9000 commented Jan 11, 2021

I'm just leaving a comment, because it looks like this isn't quite ready for a full review? (still has NOCOMMITs, failing tests). Ping me when it's ready and I'll give it another pass.

Sorry about that! When I first opened it I was more comfortable with it and then I kind of went backwards on that.....

@nik9000 nik9000 marked this pull request as draft January 11, 2021 16:25
@nik9000 nik9000 requested review from not-napoleon and removed request for polyfractal and not-napoleon January 11, 2021 16:26
@mark-vieira
Copy link
Contributor

@elasticmachine update branch

@nik9000
Copy link
Member Author

nik9000 commented Mar 1, 2021

The master branch is quite different now. I'm going to try and revive this against master and reopen with the new block algorithm.

@nik9000 nik9000 closed this Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants