Skip to content

Support dynamic pruning in the composite aggregation #88185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jpountz opened this issue Jun 29, 2022 · 3 comments
Open

Support dynamic pruning in the composite aggregation #88185

jpountz opened this issue Jun 29, 2022 · 3 comments
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@jpountz
Copy link
Contributor

jpountz commented Jun 29, 2022

Description

Queries sorted by field have improved a lot over the years when it comes to dynamic pruning:

  1. Ancient versions of Elasticsearch would always collect all the matches to return a single page of data. This would have terrible performance when paging through all hits, since it would essentially run in quadratic time with the number of documents in a shard.
  2. Then we introduced index sorting and queries whose sort order is congruent with the index sort could skip irrelevant data.
  3. Then we introduced dynamic pruning when the sort field is indexed with points, by leveraging the index to skip hits that cannot possibly make it to the page that we are retrieving. This yielded major speedups when paginating through all hits. This is the current state.
  4. In the future, we should look into supporting dynamic pruning when sorting on keyword fields too.

The composite aggregation is very similar to sorted queries, yet it is currently at stage 2 in the evolution of sorted queries with regards to dynamic pruning. Unless you are aggregating on the primary index sort field, computing a single page of data requires collecting all matches that match the query.

Can we add dynamic pruning support to the composite aggregation so that computing a single page of results wouldn't need to look at all matches? Ideally it would reuse the same logic that we are using for sorting queries via the LeafFieldComparator#competitiveIterator and LeafCollector#competitiveIterator APIs.

Relates to #85759.

@jpountz jpountz added >enhancement :Analytics/Geo Indexing, search aggregations of geo points and shapes labels Jun 29, 2022
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 29, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@jpountz
Copy link
Contributor Author

jpountz commented Jun 30, 2022

We would probably need to figure out dynamic pruning on keyword fields if we want to get benefits on the composite aggregation since it's often used in combination with keyword fields. I opened LUCENE-10633.

@iverase iverase added :Analytics/Aggregations Aggregations and removed :Analytics/Geo Indexing, search aggregations of geo points and shapes labels Apr 21, 2023
@wchaparro
Copy link
Member

Added to: #65019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

4 participants