Skip to content

Gap policy for scripted subaggregates #28077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

fred84
Copy link
Contributor

@fred84 fred84 commented Jan 4, 2018

New gap_policy to handle scripted sub-aggregates (#27377)

@colings86 please take a look. If I'm heading in right direction, I'll add tests for BucketHelper and update javadocs/userdocs. All existing tests are passing.

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

1 similar comment
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@mayya-sharipova
Copy link
Contributor

jenkins, test this

@colings86
Copy link
Contributor

@fred84 I think instead of adding a new gap policy which is only going to be useful for the bucket script and bucket selector pipeline aggregations, instead we should have BucketScriptPipelineAggregator and BucketSelectorPipelineAggregator call bucket.getProperty() directly and then pass the result of that call directly to the script. The script will then need to be written by the user in a way to deal with null, NaN and Infinite values as described in #27377 (comment)

@fred84
Copy link
Contributor Author

fred84 commented Jan 5, 2018

@colings86 PR updated.

I decided to add overloaded variant of BucketHelpers.resolveBucketValue() (without gap) instead of directly calling bucket.getProperty() to avoid copy-pasting.

  1. I think we should also remove gapPolicy from BucketScriptPipelineAggregator and BucketSelectorPipelineAggregator and corresponding builders. I'm only concerned about cluster with nodes with different versions.
  2. BucketHelpers.isValueFromOtherAggregation() is intended to fix Cumilative Sum doesn't work with other pipeline aggregations #27544

@colings86
Copy link
Contributor

@fred84 I don't think we need to have another variant of BucketHelpers.resolveBucketValue() here since we really just need to call bucket.getProperty() and pass the returned value to the script. We don't need to check the returned value at all to make sure it is finite before passing it to the script as the proposed solution is to allow the script to determine what it wants to do with NaN or infinite values. I'd like to keep this change simple and not change BucketHelpers and instead have the BucketScriptPipelineAggregator directly call bucket.getProperty()

@fred84
Copy link
Contributor Author

fred84 commented Jan 9, 2018

@colings86

  1. I agree about replacing another variant of BucketHelpers.resolveBucketValue() with direct call to bucket.getProperty()
  2. Should we keep BucketHelpers.isValueFromOtherAggregation() to fix Cumilative Sum doesn't work with other pipeline aggregations #27544 ?

@colings86
Copy link
Contributor

Should we keep BucketHelpers.isValueFromOtherAggregation() to fix #27544 ?

I'm not sure that we will need that as the script in that case would probably be written to interpret Infinity and NaN values itself and probably output 0 in those case so the cumulative sum would have a value to use without the BucketHelpers.isValueFromOtherAggregation() implemented. Let's see what things look like without that method and we can always go back to it later if its needed.

@fred84
Copy link
Contributor Author

fred84 commented Jan 9, 2018

@colings86 BucketHelpers.isValueFromOtherAggregation() is intended to fix other problem: when script_pipeline_aggregator receive Infinity/NaN on input and produce valid value other then 0.
Without BucketHelpers.isValueFromOtherAggregation() cumulative_sum (and other non-script aggregators) will ignore any valid value produced by preceding script_pipeline_aggregator if bucket have 0 documents.

@colings86
Copy link
Contributor

Hmmm that is true. My concern about BucketHelpers.isValueFromOtherAggregation() is that the behaviour of the pipeline aggregation will change based on whether it references a pipeline aggregation or bucket/metric aggregation and that seems confusing for users to me. I think we need to think a bit more about how to solve the case in #27544 in a way that is going to be easy to explain and not surprising.

@fred84
Copy link
Contributor Author

fred84 commented Jan 11, 2018

@colings86 IMHO it is more intuitive to check bucket's document count only in first aggregation and rely only on value in following aggregations.
Or I can fix #27377 and NPE in #27544 and create new issue to discuss this behavior.

@fred84
Copy link
Contributor Author

fred84 commented Jan 22, 2018

@colings86 which way should we choose?

@colings86
Copy link
Contributor

@fred84 sorry for the delay on this, I am trying to come up with another way to solve #27377 and #27544 which doesn't mean that the behaviour of a pipeline aggregation depends on whether it is referencing another pipeline aggregation or a non-pipeline aggregation. I would prefer a solution where the aggregations behaviour is independent of where its value came from as this will be easy for users to reason about.

@fred84
Copy link
Contributor Author

fred84 commented Jan 23, 2018

@colings86 One more attempt to solve this. I've changed behavior:

  1. when gap_policy is not specified
    1.1. Initial behavior: always return NaN if bucket.getDocCount() == 0
    1.2. Modified behavior: return value when it is valid
  2. when gap_policy is insert_zeros
    2.1. Initial behavior: always return 0.0 if bucket.getDocCount() == 0
    2.2. Modified behavior: return value when it is valid

PR is updated.

@shaharmor
Copy link

Any idea if this will get into 6.2?

@polyfractal
Copy link
Contributor

Heya all, we chatted about this PR and the larger agg issue today.

Ultimately, the issue with the framework right now is that we're overloading the meaning of a Double and doc count. E.g. min/max return positive/negative infinity if there are no documents, pipeline aggs infer the value based on doc counts, etc.

What these are all trying to do is express "the bucket has some value" or "there was no data in this bucket". Doc count is a poor proxy for that because things like script and cumulative sum can/should be able to operate even when there is no value for a bucket.

A potential fix that we talked about is adjusting the agg framework to return Optional<Double> rather than Double. This way an agg can signal it has some value (Optional.of(123.4)) or no value at all (Optional.empty()). It also allows us to "fix" the framework to properly support NaN/+Infinity/-Infinity (e.g. Optional.of(Double.NaN)), since they are technically legit double values, unlike how we use them today.

If the framework uses Optionals, then individual aggs can decide what to do based on the presence/lack of value, rather than trying to infer it via doc count. If a script decides to emit a value for an empty bucket, downstream consumers of that value won't care about the doc count because they'll just see the Optional with a value.

It's a big change though, and there may be thorny issues in the implementation that aren't apparent right now, but it seems a better option since we'll fix the underlying issue instead of just papering over it.

Any idea if this will get into 6.2?

Regardless of the fix, I'm not sure this can go into 6.x. Any solution will be a fairly large breaking change I think, since it'll fundamentally alter how people's aggs and scripts behave. I think we'd have to wait for 7.0.

@fred84
Copy link
Contributor Author

fred84 commented Jan 30, 2018

@polyfractal @colings86 Should we close this PR? Or I can try to fix it with Optionals with some guidance.

@shaharmor
Copy link

Regardless of the fix, I'm not sure this can go into 6.x. Any solution will be a fairly large breaking change I think, since it'll fundamentally alter how people's aggs and scripts behave. I think we'd have to wait for 7.0.

Wow, this is a real bummer

@polyfractal
Copy link
Contributor

@fred84 Hey, sorry for the long delay... travel and Elasticon and lots of work :)

We can try to work on the Optional approach if you'd like to give it a shot. Fair warning, I haven't looked too closely yet but I suspect it will be quite big and involved.

Lemme know what you want to do :) I think closing this PR for now makes sense, given the direction we are thinking it should take.

Sorry again for the long delay, and thanks so much for helping out!

@fred84
Copy link
Contributor Author

fred84 commented Mar 24, 2018

@polyfractal I want to do this issue, but will be able to start several week later.

@polyfractal
Copy link
Contributor

Sounds good! Feel free to ping me if you have questions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants