Skip to content

Improve scroll search by using Lucene's IndexSearcher#searchAfter(...) #4940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
martijnvg opened this issue Jan 29, 2014 · 5 comments
Closed

Comments

@martijnvg
Copy link
Member

Improve the regular scroll search by using Lucene's searchAfter, which allows subsequent scroll request to always have a priority queue size equal to the specified size in the first search request. (priority queue is used to collect the competitive hits that match with a query)

Currently the priority queue size grows with each subsequent scroll request with what has been specified in from of the first search request.

Note: scan scroll is unaffected by this issue, which already is a highly optimized search to fetch a large part or all docs from a cluster. Scan scroll forcefully sort the hits always by the Lucene docids, while with the regular scroll can now support any sort efficiently.

@ghost ghost assigned martijnvg Jan 29, 2014
@nik9000
Copy link
Member

nik9000 commented Jan 29, 2014

Is the idea to be able to scroll a non-scan search?

@martijnvg
Copy link
Member Author

This is already possible, the scroll parameter can also be used on non scan search requests.

@nik9000
Copy link
Member

nik9000 commented Jan 29, 2014

Hey, neat. That is kinda documented on the scroll page but it is implied that scroll is a scan thing. So this enhancement will make it more efficient to scroll without scan?

@martijnvg
Copy link
Member Author

Yes, this enhancement will make scroll without scan more efficient.

The memory usage will be improved from O(from+size) to O(size) and also collecting the competitive hits for a query will be improved from O(numHits + log(from+size)) to O(numHits + (log(size)). This improvement becomes really noticeable when scrolling deep into a result set.

@s1monw s1monw added v1.2.0 and removed v1.1.0 labels Mar 20, 2014
martijnvg added a commit that referenced this issue Mar 21, 2014
…of regular search methods which rely on `from` for pagination.

This prevents the creation of priority queues of `from + size`, instead the size of the priority queue will always be equal to `size`.

Closes #4940
@tlrx
Copy link
Member

tlrx commented Mar 21, 2014

nice, thanks for this optimisation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants