Skip to content

Aggregation Framework Refactoring meta #41713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
polyfractal opened this issue May 1, 2019 · 5 comments
Closed

Aggregation Framework Refactoring meta #41713

polyfractal opened this issue May 1, 2019 · 5 comments
Labels
:Analytics/Aggregations Aggregations Meta >refactoring Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) team-discuss >tech debt

Comments

@polyfractal
Copy link
Contributor

polyfractal commented May 1, 2019

This is a running list of things in the agg framework we might like to refactor.

Agg framework

Agg boilerplate simplification.

Aggregations require a lot of boilerplate. Most aggregations will require a Builder, Aggregator, AggregatorFactory, result interface, parsed result interface and Internal result object. Simplifying this would make it easier to develop new aggs and collapse the agg package considerably

#43664

ValuesSource/ValuesSourceConfig/ValueType/ValueSourceType

We have a number of classes and enums that represent what kind of data a field is. They all exist for variously good reasons, but it's difficult to grok their purpose as a new developer and how they all interact. If we could simplify or unify them somehow it would be helpful.

Or at the very least come up with clearer names so their purpose is more obvious. And javadocs explaining their purpose

#42949

HashCode/doHashCode/doInnerHashCode

DONE: The agg framework has mixed style when it comes to asking child objects for their hashcode. Some override hashcode directly, some require doHashCode or even doInnerHashCode(). Doesn't really matter which style is used but it should be consistent

#43214

equals/doEquals/doInnerEquals

DONE: Same as hashcodes, but with equals

#43214

Replace DocValuesIndexFieldData builder with static methods?

The DocValuesIndexFieldData builder is a bit funky; instantiated by itself it returns a sortedSetDV, invoked with numericType() it returns a sortedNumericDV, with scriptFunction it returns a sortedSet with the script. And there's a catch-all override for special binary DVs. None of this is very builder'y since the parameters are mutually exclusive, and it's really just a thin wrapper for constructing DVs.

It might be simpler to replace this with a set of static helper methods that callers can use to generate DVs and ditch the builder model.

Separate XContent serialization from reduce logic?

Some classes, like agg buckets, combine both xcontent serialization and reduction logic. This leads to very large classes, and is a confusing mixture of concerns. Is it possible to separate this? Is it a good idea?

Allow aggs to "see" up the tree?

Aggs are isolated by design, and only know about their direct sub-aggs (and only in a limited manner). If aggs could look up the tree, it would open up new agg possibilities. And it would likely help with the pipeline agg refactoring. This would add complexity, and the isolation helps keep everything simple/composable.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

csoulios added a commit that referenced this issue Jun 19, 2019
…() (#43214)

A number of the aggregation base classes have an abstract doEquals() and doHashCode() (e.g. InternalAggregation.java, AbstractPipelineAggregationBuilder.java).

Theoretically this is so the sub-classes can add to the equals/hashCode and don't need to worry about calling super.equals(). In practice, it's mostly just confusing/inconsistent. And if there are more than two levels, we end up with situations like InternalMappedSignificantTerms which has to call super.doEquals() which defeats the point of having these overridable methods.

This PR removes the do versions and just use equals/hashCode ensuring the super when necessary.

This PR is part of #41713 refactoring meta

* Refactored all subclasses of InternalAggregation to remove doEquals() and doHashCode()
* Refactored all subclasses of AbstractPipelineAggregationBuilder to remove doEquals() and doHashCode()
* Refactored all subclasses of AbstractAggregationBuilder and CompositeValuesSourceBuilder to remove doEquals() and doHashCode()
csoulios added a commit to csoulios/elasticsearch that referenced this issue Jun 19, 2019
…() (elastic#43214)

A number of the aggregation base classes have an abstract doEquals() and doHashCode() (e.g. InternalAggregation.java, AbstractPipelineAggregationBuilder.java).

Theoretically this is so the sub-classes can add to the equals/hashCode and don't need to worry about calling super.equals(). In practice, it's mostly just confusing/inconsistent. And if there are more than two levels, we end up with situations like InternalMappedSignificantTerms which has to call super.doEquals() which defeats the point of having these overridable methods.

This PR removes the do versions and just use equals/hashCode ensuring the super when necessary.

This PR is part of elastic#41713 refactoring meta

* Refactored all subclasses of InternalAggregation to remove doEquals() and doHashCode()
* Refactored all subclasses of AbstractPipelineAggregationBuilder to remove doEquals() and doHashCode()
* Refactored all subclasses of AbstractAggregationBuilder and CompositeValuesSourceBuilder to remove doEquals() and doHashCode()

Backport from v8.0.0
@polyfractal
Copy link
Contributor Author

Procedural note: I've moved the test-related items to their own ticket: #51324

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@wchaparro
Copy link
Member

Adding team-discuss label to this meta issue to address the last three.

@nik9000
Copy link
Member

nik9000 commented Jan 5, 2022

Agg boilerplate simplification.

Aggregations require a lot of boilerplate. Most aggregations will require a Builder, Aggregator, AggregatorFactory, result interface, parsed result interface and Internal result object. Simplifying this would make it easier to develop new aggs and collapse the agg package considerably

I think most of what's left on this one is #82273.

@imotov
Copy link
Contributor

imotov commented Jan 5, 2022

I moved "Allow aggs to "see" up the tree" into its own issue #82290. This was the last outstanding issue in this meta issue, so we can now close it.

@imotov imotov closed this as completed Jan 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations Meta >refactoring Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) team-discuss >tech debt
Projects
None yet
Development

No branches or pull requests

6 participants