You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We already integrated IndexSearcher.searchAfter in #4940 in order to make deep pagination more efficient with the scroll API. However, in quite a number of cases using the scroll API is not possible as it is heavy and requires to associate a scroll context to each search request and to clear this context when it is not needed anymore.
So if you have a user-facing application that needs to perform deep pagination, performance is terrible because of the pagination, and you cannot really use the scroll API since scroll contexts are costly and users typically don't explicitely tell the application when they don't need the context anymore.
A middle ground could be to allow configuring an array of sort values, and we would only search after these sort values. Compared to the scroll API, it would have the downside of not always requesting the same point-in-time view of a shard, so you can miss documents because of deletes or see documents twice because of insertions, but you already have this issue when paginating using from/size. On the other hand, performance could be much better since it would allow to manage smaller priority queues on each shard.
NOTE: in order for this feature to work well with pagination, the _uid should be used as a last element of the sort specification. Otherwise the sort order for documents that have the same sort values would be undefined (it is defined in the case of Lucene using doc ids, but this doesn't work with elasticsearch because we do not always query the same shard, and if a merge happens between 2 requests, doc ids could be reordered)
The text was updated successfully, but these errors were encountered:
We already integrated IndexSearcher.searchAfter in #4940 in order to make deep pagination more efficient with the scroll API. However, in quite a number of cases using the scroll API is not possible as it is heavy and requires to associate a scroll context to each search request and to clear this context when it is not needed anymore.
So if you have a user-facing application that needs to perform deep pagination, performance is terrible because of the pagination, and you cannot really use the scroll API since scroll contexts are costly and users typically don't explicitely tell the application when they don't need the context anymore.
A middle ground could be to allow configuring an array of sort values, and we would only search after these sort values. Compared to the scroll API, it would have the downside of not always requesting the same point-in-time view of a shard, so you can miss documents because of deletes or see documents twice because of insertions, but you already have this issue when paginating using from/size. On the other hand, performance could be much better since it would allow to manage smaller priority queues on each shard.
NOTE: in order for this feature to work well with pagination, the _uid should be used as a last element of the sort specification. Otherwise the sort order for documents that have the same sort values would be undefined (it is defined in the case of Lucene using doc ids, but this doesn't work with elasticsearch because we do not always query the same shard, and if a merge happens between 2 requests, doc ids could be reordered)
The text was updated successfully, but these errors were encountered: