Allow search requests to run on an older version of the index #7881

Pharmerino · 2014-09-25T20:04:24Z

When the user does a query and enables scroll it returns back a data set (with from / size set in query) and a scroll id. However when you search on that scroll id it doesn't allow you to use from / size (size is equal to the original query size and you can't use from on the data set). Instead it just pages to the next (size) number of results until the array is empty.

My issue comes from a setup where elasticsearch is ingesting data in real time so re-doing the query with from / size every time for pagination can return different results for the same page. It would be nice to see (from) implemented in the scroll api so the user can just grab a specific range of results from the original scroll request for pagination purposes.

clintongormley · 2014-09-30T16:42:02Z

Hi @Pharmerino

The idea behind scrolling is that, when you start a search, you get a snapshot of the index at that time. You keep pulling results until you're done, or there are no more results.

But this doesn't have anything to do with pagination. Why are you trying to use pagination with scroll?

Pharmerino · 2014-09-30T22:13:26Z

I put this in under scroll because it has the closest functionality of what I need.

I have a setup where I need to query and receive back a fairly large data-set. Then set up pagination within that data set (i.e. only display 100 results at a time for each page etc). The issue is, the return data is large enough to where we can't really keep the whole set in memory, and sorting the whole amount of data each time you query, just to get a set number of results at large "from" values will be taxing on the ES server as well.

So in my effort to fix this I noticed two things that were similar to the functionality I require. One of those is a filter with caching. This would just about be perfect except the setup will constantly be ingesting data and if those newly indexed items fall within that data set instead of at the end, the page (i.e. from--size) could potentially be different. That's where scroll came in. It, as you said, stores a snapshot of the index at that time which is exactly what I need. Only I can't offset what I want back (i.e. start from a certain point) because it always returns the next "size" worth of data until the return set is empty.

Hopefully this cleared up my issue and feel free to ask any other information of me.

clintongormley · 2014-10-14T13:55:54Z

OK - so what you are after (more or less) is a persistent read-only view of an index at a point in time. You want to be able to run normal search requests on an older version of the index, eg to give a single user a consistent view during their session.

This could end up using an enormous number of file handles, but may be an interesting idea. I'll put this up for discussion.

markharwood · 2014-11-28T11:40:06Z

While it is technically possible for Lucene to provide simultaneous access to multiple readers, each with a different point in time, in practice it can be resource-intensive (disk, memory, file handles) to provide this capability, especially if there are a large number of users who may require views held open for lengthy periods. An added concern is that in a distributed system where replicas can diverge (not necessarily in content but in how documents are physically organised into Lucene segments) it would not be possible to maintain a point-in-time view that could be migrated over if there was a change in the choice of replica used to service the user's request (e.g. due to an outage). For these reasons it's not a feature that we would feel comfortable offering as a part of the standard search API.

An alternative way of achieving your goals may be to use a filter on a timestamp field as part of the user search to lock-down the time-range of records under consideration. This would work if your index only has additions rather than updates or deletes which would obviously change the items being considered

clintongormley added the feedback_needed label Sep 30, 2014

clintongormley changed the title ~~Add From field to scroll api~~ Allow search requests to run on an older version of the index Oct 14, 2014

clintongormley added discuss and removed feedback_needed labels Oct 14, 2014

clintongormley mentioned this issue Oct 22, 2014

Search: Expose Lucene's searchAfter in the search API #8192

Closed

markharwood closed this as completed Nov 28, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow search requests to run on an older version of the index #7881

Allow search requests to run on an older version of the index #7881

Pharmerino commented Sep 25, 2014

clintongormley commented Sep 30, 2014

Uh oh!

Pharmerino commented Sep 30, 2014

Uh oh!

clintongormley commented Oct 14, 2014

Uh oh!

markharwood commented Nov 28, 2014

Uh oh!

Allow search requests to run on an older version of the index #7881

Allow search requests to run on an older version of the index #7881

Comments

Pharmerino commented Sep 25, 2014

clintongormley commented Sep 30, 2014

Uh oh!

Pharmerino commented Sep 30, 2014

Uh oh!

clintongormley commented Oct 14, 2014

Uh oh!

markharwood commented Nov 28, 2014

Uh oh!