-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Allow search requests to run on an older version of the index #7881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @Pharmerino The idea behind scrolling is that, when you start a search, you get a snapshot of the index at that time. You keep pulling results until you're done, or there are no more results. But this doesn't have anything to do with pagination. Why are you trying to use pagination with scroll? |
I put this in under scroll because it has the closest functionality of what I need. I have a setup where I need to query and receive back a fairly large data-set. Then set up pagination within that data set (i.e. only display 100 results at a time for each page etc). The issue is, the return data is large enough to where we can't really keep the whole set in memory, and sorting the whole amount of data each time you query, just to get a set number of results at large "from" values will be taxing on the ES server as well. So in my effort to fix this I noticed two things that were similar to the functionality I require. One of those is a filter with caching. This would just about be perfect except the setup will constantly be ingesting data and if those newly indexed items fall within that data set instead of at the end, the page (i.e. from--size) could potentially be different. That's where scroll came in. It, as you said, stores a snapshot of the index at that time which is exactly what I need. Only I can't offset what I want back (i.e. start from a certain point) because it always returns the next "size" worth of data until the return set is empty. Hopefully this cleared up my issue and feel free to ask any other information of me. |
OK - so what you are after (more or less) is a persistent read-only view of an index at a point in time. You want to be able to run normal search requests on an older version of the index, eg to give a single user a consistent view during their session. This could end up using an enormous number of file handles, but may be an interesting idea. I'll put this up for discussion. |
While it is technically possible for Lucene to provide simultaneous access to multiple readers, each with a different point in time, in practice it can be resource-intensive (disk, memory, file handles) to provide this capability, especially if there are a large number of users who may require views held open for lengthy periods. An added concern is that in a distributed system where replicas can diverge (not necessarily in content but in how documents are physically organised into Lucene segments) it would not be possible to maintain a point-in-time view that could be migrated over if there was a change in the choice of replica used to service the user's request (e.g. due to an outage). For these reasons it's not a feature that we would feel comfortable offering as a part of the standard search API. An alternative way of achieving your goals may be to use a filter on a timestamp field as part of the user search to lock-down the time-range of records under consideration. This would work if your index only has additions rather than updates or deletes which would obviously change the items being considered |
When the user does a query and enables scroll it returns back a data set (with from / size set in query) and a scroll id. However when you search on that scroll id it doesn't allow you to use from / size (size is equal to the original query size and you can't use from on the data set). Instead it just pages to the next (size) number of results until the array is empty.
My issue comes from a setup where elasticsearch is ingesting data in real time so re-doing the query with from / size every time for pagination can return different results for the same page. It would be nice to see (from) implemented in the scroll api so the user can just grab a specific range of results from the original scroll request for pagination purposes.
The text was updated successfully, but these errors were encountered: