Skip to content

Batch query phase shard level requests per data node #112306

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
javanna opened this issue Aug 28, 2024 · 3 comments
Closed

Batch query phase shard level requests per data node #112306

javanna opened this issue Aug 28, 2024 · 3 comments
Assignees
Labels
>feature Meta :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@javanna
Copy link
Member

javanna commented Aug 28, 2024

The query phase fans out to all shards, sending as many shard level requests as the number of shards involved to the relevant data nodes. Years ago we have reworked the can match phase (as part of the many shards effort) to group shard level requests per data node, in order to decrease the number of roundtrips required (including authorization) and the overhead at the transport level. We would like to do the same for the query phase. We want to start small and scope this to only query phase (no DFS or query after DFS, no scroll), and only when there are aggs in the provided search request. That is because this is the type of requests that go through potentially many shards, and use quite a bit of memory on the coordinating node.

We expect that changing the execution model will provide better stability ,as well as better resource usage. In fact, currently the coordinating node throttles to 5 (configurable) concurrent shard requests per data node. If we group shard level requests to a single request per data node, each data node is going to be able to have more context about the portion of the search request it is requested to execute, and may execute its shard level requests at its own pace, depending on current load etc. We have seen that the current throttling mechanism can be a bottleneck, that prevents maximizing resource usage on data node. At the same time, this improvement would drastically reduce the network roundtrips from being a factor of the number of shards for the query phase, to a factor of the number of data nodes involved in the search request.

@javanna javanna added :Search Foundations/Search Catch all for Search Foundations >feature labels Aug 28, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Aug 28, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@original-brownbear
Copy link
Member

original-brownbear commented Jan 20, 2025

List of outstanding tasks (WIP to some degree, but filling in the details over the next few hours):

The main PR that will close this issue can be found in #118490. The goal is merging this PR.

What is absolutely needed to merge this PR:

Good to have to reduce risk or make full use of the work:

The main bottleneck after this change is the coordinating node networking. We send an enormous amount of data over the wire and resolving targeted indices is very slow as well (in fact for querying O(50K) shards it's the by far slowest step for most non-aggregation searches). There's a number of open pull requests addressing this issue by optimizing the logic already:

Future ideas/steps to build on top of this:

  • Remove the can_match phase for queries covered by batched execution, it's 100% redundant for them
  • Introduce async search API and log activation #120024 align on this PR. Currently scheduling on the search pool is rather random, we can do a better job here and save lots of block time and heap by ordering execution better.
  • lz4 or otherwise compress responses to reduce the message size in case data-node side reduction isn't enough to create a reasonably small response. Use cases that don't partial-reduce well still compress extremely well at the byte level.
  • Optimize partial reduce to cover more cases where possible
  • Reuse Lucene (mostly aggs leaf collectors) data structures across shards to hard bound their memory use (this would be an extremely impactful memory saver)
  • Make use of this logic for CCS as well (might do that in the initial PR lets discuss) and remove minimize_roundtrips (that's a future thing)
  • Batch the fetch phase (but also paginate in case of large results) as well as other possible optimizations around fetch:
    • run aggregation (or at least their reduction) concurrently to fetch
    • return fetch response directly in more cases that the single shard scenario we currently optimize. A simple next step here would be leveraging the new batching to fetch right away if a query only targets a single data node

@javanna
Copy link
Member Author

javanna commented Apr 2, 2025

Implemented by #121885 .

@javanna javanna closed this as completed Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature Meta :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

4 participants