-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add search_after
parameter in the SearchAPI
#16125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jpountz this is a work in progress, I still need to write more tests to cover cases with scripts, geo and nested sort. I called it "search_from" rather than "search_after" because searching from somewhere (rather than search after) is also a use case and also because "search_from" with from=1 is equivalent to a search after. |
|
@rjernst,
|
@@ -19,13 +19,8 @@ | |||
|
|||
package org.elasticsearch.search.internal; | |||
|
|||
import org.apache.lucene.search.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beware that the build now bans wildcard imports
I would need to dive in a bit more but this looks rather good for a WIP! I agree with Ryan
This recommendation is problematic given that _uid does not have doc values and that adding doc values to _uid is controversial: #11887. I'm not sure what to say besides requiring that the set of provided sort values should uniquely identify a document given that documents that compare equal will be skipped? |
Hmmm I don't like the interaction between
Two possibilities:
I'm leaning towards the second option |
@clintongormley I agree, I'll change the PR with the second option ( |
search_after
parameter in the SearchAPI
@jpountz @clintongormley I renamed
What do we recommend then ? It's not only for search_after but for pagination in general. For a pure search use case we should have an easy way to provide a deterministic sort. IMO _uid is the simplest and the safest solution. Now, should we activate the doc_values per default on this field ? Maybe not but I don't see why it could not be activated by the user. Bottom line is that the sort would work on _uid even if the doc_values are not activated, are we planning to remove the ability to sort on a field without doc_values ? |
We recommend using |
lastEmittedDoc = null; | ||
if (searchContext.searchAfter() != null) { | ||
after = searchContext.searchAfter(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the outer if
statement or could we just do after = searchContext.searchAfter();
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sure, the if is useless.
I left some comments but it looks good overall! |
@jpountz thanks for the review, I think I've covered all your comments. |
}, | ||
"search_after": { | ||
"type" : "list", | ||
"description" : "A comma-separated list of sort values that indicates where the sort of the top hits should start" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/A comma-separated list/An array/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, as a query string param, it'd have to be a comma-separated list of sort values. Honestly, I'm not sure we can reliably pass these values in via query string parameters, eg it is possible for a sort value to be null
, but that would become the string "null"
as a query string param.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clintongormley right but this PR does not handle null
values. The sort values in the response from the previous request should contain the default value instead of null
. I'll add a note in the documentation about this limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimferenczi the default value returned in the REST API for a missing string value is null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimferenczi I had missed that, why can't it deal with nulls?
LGTM |
The search_after parameter provides a way to efficiently paginate from one page to the next. This parameter accepts an array of sort values, those values are then used by the searcher to sort the top hits from the first document that is greater to the sort values. This parameter must be used in conjunction with the sort parameter, it must contain exactly the same number of values than the number of fields to sort on. NOTE: A field with one unique value per document should be used as the last element of the sort specification. Otherwise the sort order for documents that have the same sort values would be undefined. The recommended way is to use the field `_uuid` which is certain to contain one unique value for each document. Fixes #8192
Add `search_after` parameter in the SearchAPI
Pagination of results can be done by using the
from
andsize
parameters but it can be very costly for indices with more than one shard if thefrom
parameter is big. Thesearch_after
parameter has been added to address this problem, it provides a way to efficiently paginate from one page to the next. This parameter accepts an array of sort values, those values are then used by the searcher to sort the top hits from the first document that is equal to the sort values.This parameter must be used in conjunction with the sort parameter, it must contain exactly the same number of values than the number of fields to sort on.
NOTE: A field with one unique value per document should be used as the last element of the sort specification. Otherwise the sort order for documents that have the same sort values would be undefined. The recommended way is to use the field
_uid
which is certain to contain one unique value for each document.Relates to #8192