-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Paginate from _id #15799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That seems hard to achieve considering the way the query is executed. In a multi shards environment you would need at least another round trip to retrieve the timestamp value associated with the from _id. Additionally this does not solve the case where the timestamps are equals. If your problem is just to resolve the case where the timestamps are equals then you can use the script sort, something like:
Unfortunately this would be very slow because the _id fields is stored but has no doc values. You could activate doc values for the field but the memory usage would make the feature very costly. |
related to #8192 |
As long as your UIDs increase, you could do the following to get the 10 results before and after a particular document:
Get 10 docs after doc 3:
Get 10 docs before doc 3:
|
Is it possible to have a kibana search string to show 10 lines of context around line with _id="X" |
@clintongormley This solution is not very realistic for a real world (distributed) system that is not single threaded |
@simianhacker i don't follow why? Also, the new |
@clintongormley I can't think of a way to ensure unique increasing UIDs across distributed writers, eg, logstash. Is there an elasticsearch option to ensure that? |
Or does that not matter with |
@rashidkpc it doesn't matter with search_after as long as the UIDs are unique. |
Doesn't matter. The main sort is on (eg) timestamp, the UID is used purely as a tie breaker for documents that have the same timestamp |
Currently the
from
key on a request takes an integer and paging picks up after chopping off that number of results from the top of the queued result set.This works well for static sets. However in high write load situations, in which we're sorting by time, it becomes a problem. We may end up with missing results that it seems should be there, but have a hard time expressing where we want the results to start.
Take for example, a logging case in which I want to see the 10 records before and the 10 records after some event.
I could read the time from the event, and do a sort in both directions. However I'd have to hope that 10 things didn't happen at the same time, common in say, error scenarios where a number of errors (eg, one per shard in a distributed system, all happen at exactly the same time. If that was the case there's a very good chance that my record wouldn't actually appear in the results.
Of course we could say the time resolution wasn't high enough, or that events would be of arbitrary order anyway, but the goal here really is to make sure that our "context" event is included and the results around it are the same as they would be in any previous request.
Ideally I could request the following, assuring that I get
event24587302
as the first result in the chronologically sorted listStems from issues in elastic/kibana#275
The text was updated successfully, but these errors were encountered: