Point in time reader context for multiple (scroll) queries #25674

jimczi · 2017-07-12T13:34:39Z

Currently a scroll context is a point in time snapshot of a set of indices associated with a query.
Each successive scroll query on this context moves the query forward until the query is exhausted. This is very useful when a single operation is performed (eg. reindexing) but when multiple scrolls are needed it is difficult to synchronize them efficiently.
For instance a multi-pass algorithm that needs to scan the data multiple times would have to use different scrolls to extract the results from ES and each pass could return different results.
Similarly if a batch of queries need to be executed, the only way to make sure that they will see the same documents is to freeze the indices and to use a different scroll for each of them.

So instead of coupling a scroll context with a set of shards and a query we could create shareable reader context with no query associated. One or multiple users could then reference this reader to perform different queries using the _search endpoint.

So for instance:

GET index1,index2,index3/_create_context

... creates a reader context for index1, index2 and index3 and returns the context id for this reader.

Then the context id can be used in a search query to retrieve results associated with this context:

GET _search
{
    "context_id": "DXF1ZXJ5QW5...",
    "query": { ... }
}

... but instead of using the _scroll endpoint to paginate through this query we could leverage search_after and let the user decides how he wants to paginate. search_after is difficult to use when the reader can change since it requires a tie breaker but inside a specific context search_after can simply use the docID/shardID as efficient tie breakers (no need to use _id or anything that needs extra memory) so any sort would be eligible to sort_after.

Also the reader context could have the same restriction than a scroll query regarding expiration. The only change would be that different queries could hit the same context so each of them would concurrently extend the context if needed.

The text was updated successfully, but these errors were encountered:

talevy · 2018-03-23T22:51:10Z

Pinging @elastic/es-search-aggs

nik9000 · 2018-03-27T21:46:00Z

GET index1,index2,index3/_create_context

I think it is weird for GET to make a thing on the server, even if all it is making is a context. Maybe POST?

nik9000 · 2018-03-27T21:46:19Z

Otherwise, super ❤️ for the idea.

jimczi · 2019-04-08T11:18:37Z

Closing as duplicate of #26472

jimczi added :Search/Search Search-related issues that do not fall into other categories discuss >feature labels Jul 12, 2017

jpountz mentioned this issue Jul 28, 2017

Change the recommended tie-breaking fields from [_id] to [_seq_no, _shard]. #25797

Closed

jimczi removed the discuss label Aug 4, 2017

talevy added the help wanted adoptme label Mar 23, 2018

Bargs mentioned this issue Apr 3, 2018

[Context view] Incrementally increase context time window elastic/kibana#16878

Merged

jimczi added high hanging fruit and removed help wanted adoptme labels May 31, 2018

weltenwort mentioned this issue Jun 27, 2018

[context app] _doc as tiebreaker field can be unstable and cause error elastic/kibana#20219

Closed

simianhacker mentioned this issue Jul 11, 2018

[Infra UI] Config Settings elastic/kibana#20428

Closed

5 tasks

jimczi mentioned this issue Aug 6, 2018

Heap slowly filling up with org.elasticsearch.index.SearchSlowLog$SlowLogSearchContextPrinter #32537

Closed

imotov mentioned this issue Aug 6, 2018

Sort on _id causes OOM #32626

Closed

droberts195 mentioned this issue Mar 20, 2019

[ML] Make sort order for datafeeds deterministic #39187

Open

jimczi closed this as completed Apr 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Point in time reader context for multiple (scroll) queries #25674

Point in time reader context for multiple (scroll) queries #25674

jimczi commented Jul 12, 2017

talevy commented Mar 23, 2018

nik9000 commented Mar 27, 2018

nik9000 commented Mar 27, 2018

jimczi commented Apr 8, 2019

Point in time reader context for multiple (scroll) queries #25674

Point in time reader context for multiple (scroll) queries #25674

Comments

jimczi commented Jul 12, 2017

talevy commented Mar 23, 2018

nik9000 commented Mar 27, 2018

nik9000 commented Mar 27, 2018

jimczi commented Apr 8, 2019