-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Gap policy for scripted subaggregates #28077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gap policy for scripted subaggregates #28077
Conversation
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
1 similar comment
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
jenkins, test this |
@fred84 I think instead of adding a new gap policy which is only going to be useful for the bucket script and bucket selector pipeline aggregations, instead we should have |
@colings86 PR updated. I decided to add overloaded variant of
|
@fred84 I don't think we need to have another variant of |
|
I'm not sure that we will need that as the script in that case would probably be written to interpret Infinity and NaN values itself and probably output |
@colings86 |
Hmmm that is true. My concern about |
@colings86 IMHO it is more intuitive to check bucket's document count only in first aggregation and rely only on value in following aggregations. |
@colings86 which way should we choose? |
@fred84 sorry for the delay on this, I am trying to come up with another way to solve #27377 and #27544 which doesn't mean that the behaviour of a pipeline aggregation depends on whether it is referencing another pipeline aggregation or a non-pipeline aggregation. I would prefer a solution where the aggregations behaviour is independent of where its value came from as this will be easy for users to reason about. |
…egation_gap_policy
…egation_gap_policy
@colings86 One more attempt to solve this. I've changed behavior:
PR is updated. |
Any idea if this will get into 6.2? |
Heya all, we chatted about this PR and the larger agg issue today. Ultimately, the issue with the framework right now is that we're overloading the meaning of a Double and doc count. E.g. min/max return positive/negative infinity if there are no documents, pipeline aggs infer the value based on doc counts, etc. What these are all trying to do is express "the bucket has some value" or "there was no data in this bucket". Doc count is a poor proxy for that because things like script and cumulative sum can/should be able to operate even when there is no value for a bucket. A potential fix that we talked about is adjusting the agg framework to return If the framework uses Optionals, then individual aggs can decide what to do based on the presence/lack of value, rather than trying to infer it via doc count. If a script decides to emit a value for an empty bucket, downstream consumers of that value won't care about the doc count because they'll just see the Optional with a value. It's a big change though, and there may be thorny issues in the implementation that aren't apparent right now, but it seems a better option since we'll fix the underlying issue instead of just papering over it.
Regardless of the fix, I'm not sure this can go into 6.x. Any solution will be a fairly large breaking change I think, since it'll fundamentally alter how people's aggs and scripts behave. I think we'd have to wait for 7.0. |
@polyfractal @colings86 Should we close this PR? Or I can try to fix it with Optionals with some guidance. |
Wow, this is a real bummer |
@fred84 Hey, sorry for the long delay... travel and Elasticon and lots of work :) We can try to work on the Optional approach if you'd like to give it a shot. Fair warning, I haven't looked too closely yet but I suspect it will be quite big and involved. Lemme know what you want to do :) I think closing this PR for now makes sense, given the direction we are thinking it should take. Sorry again for the long delay, and thanks so much for helping out! |
@polyfractal I want to do this issue, but will be able to start several week later. |
Sounds good! Feel free to ping me if you have questions |
New gap_policy to handle scripted sub-aggregates (#27377)
@colings86 please take a look. If I'm heading in right direction, I'll add tests for BucketHelper and update javadocs/userdocs. All existing tests are passing.