-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Point in time reader context for multiple (scroll) queries #25674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
>feature
high hanging fruit
:Search/Search
Search-related issues that do not fall into other categories
Comments
Pinging @elastic/es-search-aggs |
I think it is weird for |
Otherwise, super ❤️ for the idea. |
5 tasks
Closing as duplicate of #26472 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
>feature
high hanging fruit
:Search/Search
Search-related issues that do not fall into other categories
Currently a scroll context is a point in time snapshot of a set of indices associated with a query.
Each successive scroll query on this context moves the query forward until the query is exhausted. This is very useful when a single operation is performed (eg. reindexing) but when multiple scrolls are needed it is difficult to synchronize them efficiently.
For instance a multi-pass algorithm that needs to scan the data multiple times would have to use different scrolls to extract the results from ES and each pass could return different results.
Similarly if a batch of queries need to be executed, the only way to make sure that they will see the same documents is to freeze the indices and to use a different scroll for each of them.
So instead of coupling a scroll context with a set of shards and a query we could create shareable reader context with no query associated. One or multiple users could then reference this reader to perform different queries using the
_search
endpoint.So for instance:
... creates a reader context for index1, index2 and index3 and returns the context id for this reader.
Then the context id can be used in a search query to retrieve results associated with this context:
... but instead of using the
_scroll
endpoint to paginate through this query we could leveragesearch_after
and let the user decides how he wants to paginate.search_after
is difficult to use when the reader can change since it requires a tie breaker but inside a specific contextsearch_after
can simply use thedocID/shardID
as efficient tie breakers (no need to use_id
or anything that needs extra memory) so any sort would be eligible tosort_after
.Also the reader context could have the same restriction than a scroll query regarding expiration. The only change would be that different queries could hit the same context so each of them would concurrently extend the context if needed.
The text was updated successfully, but these errors were encountered: