-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Small speed up of date_histogram with children #67012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-analytics-geo (Team:Analytics) |
This allows us to run the optimization introduced in elastic#63643 when the `date_histogram` has children. It isn't a revolutionary performance improvement though because children tend to be a lot heavier than the `date_histogram`. It is faster, but only by a couple of percentage points.
SubCollector collector = new SubCollector(filterOrd, filterLeafCollector); | ||
scorer.score(collector, live); | ||
incrementBucketDocCount(filterOrd, collector.total); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's possible that this could actually be slower than the standard execution mechanism. I wonder if we need an escape hatch so folks can dodge this mechanism if it proves a bad idea.
Also: there is another possible implementation here that involves collecting a block of matches for each filter and then running all of the children in parallel. I'm not sure if it'll be faster or not. It kind of depends on the speed of iterating the doc values. It is a little more complex so I didn't do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After taking a couple of days away I think the block matching mechanism is probably better here. Its much simpler to estimate the cost. Also - I'd love to know why the old way is so slow - the block based mechanism feels like it'd be fast and it reads quite similarly to the Compatible mechanism. I think the big difference is that we don't get to join the main query with the filter query. So it can't skip matches effectively. Maybe. I've got to play.
The test failure may be related. Fun |
The build failure found a bug that I'd like to land the fix for into 7.11: #67043. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just leaving a comment, because it looks like this isn't quite ready for a full review? (still has NOCOMMITs, failing tests). Ping me when it's ready and I'll give it another pass.
TotalHitCountCollector collector = new TotalHitCountCollector(); | ||
scorer.score(collector, live); | ||
incrementBucketDocCount(filterOrd, collector.getTotalHits()); | ||
if (sub == LeafBucketCollector.NO_OP_COLLECTOR) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The equality check feels brittle to me here. I wonder if we should put a method on LeafBucketCollector
to return a boolean if it's going to do any work, and check that. Might be premature abstraction on my part though, what do you think?
Sorry about that! When I first opened it I was more comfortable with it and then I kind of went backwards on that..... |
@elasticmachine update branch |
The master branch is quite different now. I'm going to try and revive this against master and reopen with the new block algorithm. |
This allows us to run the optimization introduced in #63643 when the
date_histogram
has children. It isn't a revolutionary performanceimprovement though because children tend to be a lot heavier than the
date_histogram
. It is faster, but only by a couple of percentagepoints.
Mostly the idea is that we can build on this to do other optimization.