Skip to content

Point in time reader context for multiple (scroll) queries #25674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jimczi opened this issue Jul 12, 2017 · 4 comments
Closed

Point in time reader context for multiple (scroll) queries #25674

jimczi opened this issue Jul 12, 2017 · 4 comments
Labels
>feature high hanging fruit :Search/Search Search-related issues that do not fall into other categories

Comments

@jimczi
Copy link
Contributor

jimczi commented Jul 12, 2017

Currently a scroll context is a point in time snapshot of a set of indices associated with a query.
Each successive scroll query on this context moves the query forward until the query is exhausted. This is very useful when a single operation is performed (eg. reindexing) but when multiple scrolls are needed it is difficult to synchronize them efficiently.
For instance a multi-pass algorithm that needs to scan the data multiple times would have to use different scrolls to extract the results from ES and each pass could return different results.
Similarly if a batch of queries need to be executed, the only way to make sure that they will see the same documents is to freeze the indices and to use a different scroll for each of them.

So instead of coupling a scroll context with a set of shards and a query we could create shareable reader context with no query associated. One or multiple users could then reference this reader to perform different queries using the _search endpoint.

So for instance:

GET index1,index2,index3/_create_context

... creates a reader context for index1, index2 and index3 and returns the context id for this reader.

Then the context id can be used in a search query to retrieve results associated with this context:

GET _search
{
    "context_id": "DXF1ZXJ5QW5...",
    "query": { ... }
}

... but instead of using the _scroll endpoint to paginate through this query we could leverage search_after and let the user decides how he wants to paginate. search_after is difficult to use when the reader can change since it requires a tie breaker but inside a specific context search_after can simply use the docID/shardID as efficient tie breakers (no need to use _id or anything that needs extra memory) so any sort would be eligible to sort_after.

Also the reader context could have the same restriction than a scroll query regarding expiration. The only change would be that different queries could hit the same context so each of them would concurrently extend the context if needed.

@jimczi jimczi added :Search/Search Search-related issues that do not fall into other categories discuss >feature labels Jul 12, 2017
@jimczi jimczi removed the discuss label Aug 4, 2017
@talevy talevy added the help wanted adoptme label Mar 23, 2018
@talevy
Copy link
Contributor

talevy commented Mar 23, 2018

Pinging @elastic/es-search-aggs

@nik9000
Copy link
Member

nik9000 commented Mar 27, 2018

GET index1,index2,index3/_create_context

I think it is weird for GET to make a thing on the server, even if all it is making is a context. Maybe POST?

@nik9000
Copy link
Member

nik9000 commented Mar 27, 2018

Otherwise, super ❤️ for the idea.

@jimczi
Copy link
Contributor Author

jimczi commented Apr 8, 2019

Closing as duplicate of #26472

@jimczi jimczi closed this as completed Apr 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature high hanging fruit :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

3 participants