Skip to content

Use global_ordinals_hash execution mode when sorting by sub aggregations. #26014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Aug 2, 2017

This is a safer default since sorting by sub aggregations prevents these
aggregations from being deferred. global_ordinals_hash will at least
make sure that we do not use memory for buckets that are not collected.

Closes #24359

…tions.

This is a safer default since sorting by sub aggregations prevents these
aggregations from being deferred. `global_ordinals_hash` will at least
make sure that we do not use memory for buckets that are not collected.

Closes elastic#24359
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jpountz jpountz merged commit 0bf8a35 into elastic:master Aug 10, 2017
@jpountz jpountz deleted the fix/hash_ords_when_sorting_by_sub_agg branch August 10, 2017 10:28
jpountz added a commit that referenced this pull request Aug 10, 2017
…tions. (#26014)

This is a safer default since sorting by sub aggregations prevents these
aggregations from being deferred. `global_ordinals_hash` will at least
make sure that we do not use memory for buckets that are not collected.

Closes #24359
@colings86
Copy link
Contributor

Could this be back-ported to the 6.0 branch? Although this is labelled as an enhancement I think it will help to prevent OOMEs during the collect phases of aggregations especially when sorting by the cardinality agg which seems to be relatively popular.

@jpountz
Copy link
Contributor Author

jpountz commented Aug 11, 2017

Agreed, I'll backport!

jpountz added a commit that referenced this pull request Aug 11, 2017
…tions. (#26014)

This is a safer default since sorting by sub aggregations prevents these
aggregations from being deferred. `global_ordinals_hash` will at least
make sure that we do not use memory for buckets that are not collected.

Closes #24359
@jpountz
Copy link
Contributor Author

jpountz commented Aug 11, 2017

Done.

@jimczi
Copy link
Contributor

jimczi commented Aug 11, 2017

Sorry I missed this pr, a broader protection has been added with
#24941 which forces the remap of global ordinals when a sub-aggregator cannot be defered even when the global_ordinals mode is used.
These execution hint are really sensitive so I tried to make global_ordinals less trappy but I see that it can also be confusing to still have the distinction.
We can maybe deprecate global_ordinals_hash to make it clear that global_ordinals mode will choose the best option to collect sub aggregations ?

@jpountz
Copy link
Contributor Author

jpountz commented Aug 11, 2017

I think that makes sense. We'd need to apply the other heuristics that we have in the factory such as whether there is a parent bucket aggregation. But then I'm a bit annoyed by the fact that we'd be making most decisions at the Aggregator level while it really belongs to the factory. Not sure, maybe you have opinions/ideas around that?

@jimczi
Copy link
Contributor

jimczi commented Aug 11, 2017

I opened #26173 which deprecates global_ordinals_hash and global_ordinals_low_cardinality and moves all the decisions regarding remapping global ords and low cardinality to the factory.

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Aug 14, 2017
* master: (30 commits)
  Rewrite range queries with open bounds to exists query (elastic#26160)
  Fix eclipse compilation problem (elastic#26170)
  Epoch millis and second formats parse float implicitly (Closes elastic#14641) (elastic#26119)
  fix SplitProcessor targetField test (elastic#26178)
  Fixed typo in README.textile (elastic#26168)
  Fix incorrect class name in deleteByQuery docs (elastic#26151)
  Move more token filters to analysis-common module
  reindex: automatically choose the number of slices (elastic#26030)
  Fix serialization of the `_all` field. (elastic#26143)
  percolator: Hint what clauses are important in a conjunction query based on fields
  Remove unused Netty-related settings (elastic#26161)
  Remove SimpleQueryStringIT#testPhraseQueryOnFieldWithNoPositions.
  Tests: reenable ShardReduceIT#testIpRange.
  Allow `ClusterState.Custom` to be created on initial cluster states (elastic#26144)
  Teach the build about betas and rcs (elastic#26066)
  Fix wrong header level
  inner hits: Unfiltered nested source should keep its full path
  Document how to import Lucene Snapshot libs when elasticsearch clients (elastic#26113)
  Use `global_ordinals_hash` execution mode when sorting by sub aggregations. (elastic#26014)
  Make the README use a single type in examples. (elastic#26098)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants