Skip to content

Terms aggregation should remap global ordinal buckets when a sub-aggregator is used to sort the terms #24941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 30, 2017

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented May 29, 2017

terms aggregations at the root level use the global_ordinals execution hint by default.
When all sub-aggregators can be run in breadth_first mode the collected buckets for these sub-aggs are dense (remapped after the initial pruning).
But if a sub-aggregator is not deferrable and needs to collect all buckets before pruning we don't remap global ords and the aggregator needs to deal with sparse buckets.
Most (if not all) aggregators expect dense buckets and uses this information to allocate memories.
This change forces the remap of the global ordinals but only when there is at least one sub-aggregator that cannot be deferred.

Relates #24788

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left one comment but only for my own understanding, I don't think it will affect whether you merge this as is.

if (segmentOrds != null) {
mapSegmentCountsToGlobalCounts();
}
globalOrds = valuesSource.globalOrdinalsValues(ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed, I don't see where its used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jimczi added 4 commits May 30, 2017 14:07
…egator is used to sort the terms

`terms` aggregations at the root level use the `global_ordinals` execution hint by default.
When all sub-aggregators can be run in `breadth_first` mode the collected buckets for these sub-aggs are dense (remapped after the initial pruning).
But if a sub-aggregator is not deferrable and needs to collect all buckets before pruning we don't remap global ords and the aggregator needs to deal with sparse buckets.
Most (if not all) aggregators expect dense buckets and uses this information to allocate memories.
This change forces the remap of the global ordinals but only when there is at least one sub-aggregator that cannot be deferred.

Relates elastic#24788
@jimczi jimczi force-pushed the global_ordinals_hash_heuristic branch from 5b79cbe to 20d0ee9 Compare May 30, 2017 14:31
@jimczi jimczi merged commit ce7195d into elastic:master May 30, 2017
@jimczi jimczi deleted the global_ordinals_hash_heuristic branch May 30, 2017 17:13
jimczi added a commit that referenced this pull request May 30, 2017
…egator is used to sort the terms (#24941)

`terms` aggregations at the root level use the `global_ordinals` execution hint by default.
When all sub-aggregators can be run in `breadth_first` mode the collected buckets for these sub-aggs are dense (remapped after the initial pruning).
But if a sub-aggregator is not deferrable and needs to collect all buckets before pruning we don't remap global ords and the aggregator needs to deal with sparse buckets.
Most (if not all) aggregators expect dense buckets and uses this information to allocate memories.
This change forces the remap of the global ordinals but only when there is at least one sub-aggregator that cannot be deferred.

Relates #24788
jimczi added a commit that referenced this pull request May 30, 2017
…egator is used to sort the terms (#24941)

`terms` aggregations at the root level use the `global_ordinals` execution hint by default.
When all sub-aggregators can be run in `breadth_first` mode the collected buckets for these sub-aggs are dense (remapped after the initial pruning).
But if a sub-aggregator is not deferrable and needs to collect all buckets before pruning we don't remap global ords and the aggregator needs to deal with sparse buckets.
Most (if not all) aggregators expect dense buckets and uses this information to allocate memories.
This change forces the remap of the global ordinals but only when there is at least one sub-aggregator that cannot be deferred.

Relates #24788
@jimczi jimczi added the v5.4.2 label May 30, 2017
@IdanWo
Copy link

IdanWo commented Nov 11, 2017

Hey. Just to be sure, this bug is a performance issue right?
The results won't be affected without this fix?
By the way, it would be nice to make a label which is called "Performance" for noticing.

@IdanWo
Copy link

IdanWo commented Mar 17, 2018

Up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants