Skip to content

Add search_after parameter in the SearchAPI #16125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 27, 2016
Merged

Add search_after parameter in the SearchAPI #16125

merged 1 commit into from
Jan 27, 2016

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Jan 20, 2016

Pagination of results can be done by using the from and size parameters but it can be very costly for indices with more than one shard if the from parameter is big. The search_after parameter has been added to address this problem, it provides a way to efficiently paginate from one page to the next. This parameter accepts an array of sort values, those values are then used by the searcher to sort the top hits from the first document that is equal to the sort values.
This parameter must be used in conjunction with the sort parameter, it must contain exactly the same number of values than the number of fields to sort on.

NOTE: A field with one unique value per document should be used as the last element of the sort specification. Otherwise the sort order for documents that have the same sort values would be undefined. The recommended way is to use the field _uid which is certain to contain one unique value for each document.

Relates to #8192

@jimczi jimczi added >feature review discuss :Search/Search Search-related issues that do not fall into other categories v5.0.0-alpha1 labels Jan 20, 2016
@jimczi
Copy link
Contributor Author

jimczi commented Jan 20, 2016

@jpountz this is a work in progress, I still need to write more tests to cover cases with scripts, geo and nested sort. I called it "search_from" rather than "search_after" because searching from somewhere (rather than search after) is also a use case and also because "search_from" with from=1 is equivalent to a search after.

@rjernst
Copy link
Member

rjernst commented Jan 20, 2016

search_from is a very confusing name, given the existing from parameter.

@jimczi
Copy link
Contributor Author

jimczi commented Jan 20, 2016

@rjernst, search_after with a parameter ìnclude_top to indicate if the top document (the one matching exactly the search_from values) must be included in the result ?

"search_after": {
    "values": ["tweet#3465"]
    "include_top": true|false
}

@@ -19,13 +19,8 @@

package org.elasticsearch.search.internal;

import org.apache.lucene.search.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beware that the build now bans wildcard imports

@jpountz
Copy link
Contributor

jpountz commented Jan 20, 2016

I would need to dive in a bit more but this looks rather good for a WIP! I agree with Ryan search_from would be confusing. Maybe we would need @clintongormley 's naming skills here. Something else I was wondering about is whether we want it as a top-level element or not (thinking of putting it under sort right now).

The recommended way is to use the field _uuid which is certain to contain one unique value for each document.

This recommendation is problematic given that _uid does not have doc values and that adding doc values to _uid is controversial: #11887. I'm not sure what to say besides requiring that the set of provided sort values should uniquely identify a document given that documents that compare equal will be skipped?

@clintongormley
Copy link
Contributor

search_from is a very confusing name, given the existing from parameter.

I called it "search_from" rather than "search_after" because searching from somewhere (rather than search after) is also a use case and also because "search_from" with from=1 is equivalent to a search after.

Hmmm I don't like the interaction between search_from and from. These settings should be exclusive. I think that, if you want to include the previous last result as the first result in the next page, then you should just choose the values from the previous second-last doc, instead of the last doc.

Something else I was wondering about is whether we want it as a top-level element or not (thinking of putting it under sort right now).

sort is pretty overloaded already - I don't think it is the right place to put it.

from today means "how many docs should I skip before i start returning results, and defaults to 0.

Two possibilities:

  • support "from": ["val1","val2"] to mean "start returning results after these sort values", or
  • "search_after": ["val1", "val2"] and prohibit the use of from in combination with search_after

I'm leaning towards the second option

@jimczi
Copy link
Contributor Author

jimczi commented Jan 21, 2016

@clintongormley I agree, I'll change the PR with the second option (search_after and prohibit from+ search_after combination).

@jimczi jimczi changed the title [WIP] Initial implementation of the search_from parameter in the Search API. Add search_after parameter in the SearchAPI Jan 26, 2016
@jimczi jimczi removed the discuss label Jan 26, 2016
@jimczi
Copy link
Contributor Author

jimczi commented Jan 26, 2016

@jpountz @clintongormley I renamed search_fromto search_after and prohibited the usage of from!= 0 when search_after is used.

This recommendation is problematic given that _uid does not have doc values and that adding doc values to _uid is controversial: #11887. I'm not sure what to say besides requiring that the set of provided sort values should uniquely identify a document given that documents that compare equal will be skipped?

What do we recommend then ? It's not only for search_after but for pagination in general. For a pure search use case we should have an easy way to provide a deterministic sort. IMO _uid is the simplest and the safest solution. Now, should we activate the doc_values per default on this field ? Maybe not but I don't see why it could not be activated by the user. Bottom line is that the sort would work on _uid even if the doc_values are not activated, are we planning to remove the ability to sort on a field without doc_values ?

@clintongormley
Copy link
Contributor

We recommend using _uid as a tiebreaker (and really there is no other readily available field that can be used), plus the random_score function in the function score query relies on the _uid... I think that using the _uid is the best advice we can give at the moment, and rather than blocking this feature, we should see if we can make progress on #11887

lastEmittedDoc = null;
if (searchContext.searchAfter() != null) {
after = searchContext.searchAfter();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the outer if statement or could we just do after = searchContext.searchAfter();?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure, the if is useless.

@jpountz
Copy link
Contributor

jpountz commented Jan 26, 2016

I left some comments but it looks good overall!

@jimczi
Copy link
Contributor Author

jimczi commented Jan 26, 2016

@jpountz thanks for the review, I think I've covered all your comments.

},
"search_after": {
"type" : "list",
"description" : "A comma-separated list of sort values that indicates where the sort of the top hits should start"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/A comma-separated list/An array/ ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, as a query string param, it'd have to be a comma-separated list of sort values. Honestly, I'm not sure we can reliably pass these values in via query string parameters, eg it is possible for a sort value to be null, but that would become the string "null" as a query string param.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clintongormley right but this PR does not handle null values. The sort values in the response from the previous request should contain the default value instead of null. I'll add a note in the documentation about this limitation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimferenczi the default value returned in the REST API for a missing string value is null

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimferenczi I had missed that, why can't it deal with nulls?

@jpountz
Copy link
Contributor

jpountz commented Jan 26, 2016

LGTM

The search_after parameter provides a way to efficiently paginate from one page to the next. This parameter accepts an array of sort values, those values are then used by the searcher to sort the top hits from the first document that is greater to the sort values.
This parameter must be used in conjunction with the sort parameter, it must contain exactly the same number of values than the number of fields to sort on.

NOTE: A field with one unique value per document should be used as the last element of the sort specification. Otherwise the sort order for documents that have the same sort values would be undefined. The recommended way is to use the field `_uuid` which is certain to contain one unique value for each document.

Fixes #8192
jimczi added a commit that referenced this pull request Jan 27, 2016
Add `search_after` parameter in the SearchAPI
@jimczi jimczi merged commit 9160095 into elastic:master Jan 27, 2016
@jimczi jimczi deleted the search_from branch January 27, 2016 10:09
@jimczi jimczi removed the review label Jan 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature release highlight :Search/Search Search-related issues that do not fall into other categories v5.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants