Skip to content

Paginate from _id #15799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rashidkpc opened this issue Jan 6, 2016 · 10 comments
Closed

Paginate from _id #15799

rashidkpc opened this issue Jan 6, 2016 · 10 comments
Labels
discuss :Search/Search Search-related issues that do not fall into other categories

Comments

@rashidkpc
Copy link

Currently the from key on a request takes an integer and paging picks up after chopping off that number of results from the top of the queued result set.

This works well for static sets. However in high write load situations, in which we're sorting by time, it becomes a problem. We may end up with missing results that it seems should be there, but have a hard time expressing where we want the results to start.

Take for example, a logging case in which I want to see the 10 records before and the 10 records after some event.

I could read the time from the event, and do a sort in both directions. However I'd have to hope that 10 things didn't happen at the same time, common in say, error scenarios where a number of errors (eg, one per shard in a distributed system, all happen at exactly the same time. If that was the case there's a very good chance that my record wouldn't actually appear in the results.

Of course we could say the time resolution wasn't high enough, or that events would be of arbitrary order anyway, but the goal here really is to make sure that our "context" event is included and the results around it are the same as they would be in any previous request.

Ideally I could request the following, assuring that I get event24587302 as the first result in the chronologically sorted list

{
 sort: { "@timestamp" : "desc" }, // And fire another with "asc" at the same time, in an _msearch
 from: "event24587302"
}

Stems from issues in elastic/kibana#275

@jimczi
Copy link
Contributor

jimczi commented Jan 6, 2016

That seems hard to achieve considering the way the query is executed. In a multi shards environment you would need at least another round trip to retrieve the timestamp value associated with the from _id. Additionally this does not solve the case where the timestamps are equals. If your problem is just to resolve the case where the timestamps are equals then you can use the script sort, something like:

"sort": [
    {
      "timestamp": {
        "order": "desc"
      }
    },
    {
      "_script": {
        "type": "number",
        "script": {
          "inline": "if (_fields['_id'].value == fromId) { return 0 } else { return 1}",
          "params": {
            "fromId": "$eventId"
          }
        },
        "order": "asc"
      }
    }
  ] 

Unfortunately this would be very slow because the _id fields is stored but has no doc values. You could activate doc values for the field but the memory usage would make the feature very costly.

@jpountz
Copy link
Contributor

jpountz commented Jan 6, 2016

related to #8192

@clintongormley clintongormley added discuss :Search/Search Search-related issues that do not fall into other categories labels Jan 10, 2016
@clintongormley
Copy link
Contributor

As long as your UIDs increase, you could do the following to get the 10 results before and after a particular document:

POST t/t/_bulk
{"index": {"_id": "1"}}
{"timestamp":"2016-01-01T00:00:00Z"}
{"index": {"_id": "2"}}
{"timestamp":"2016-01-01T00:00:00Z"}
{"index": {"_id": "3"}}
{"timestamp":"2016-01-01T00:00:00Z"}
{"index": {"_id": "4"}}
{"timestamp":"2016-01-01T00:00:00Z"}
{"index": {"_id": "5"}}
{"timestamp":"2016-01-01T11:11:11Z"}

Get 10 docs after doc 3:

GET _search
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "should": [
            {
              "range": {
                "timestamp": {
                  "gt": "2016-01-01T00:00:00Z"
                }
              }
            },
            {
              "bool": {
                "must": [
                  {
                    "term": {
                      "timestamp": "2016-01-01T00:00:00Z"
                    }
                  },
                  {
                    "range": {
                      "_uid": {
                        "gt": "t#3"
                      }
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
  },
  "sort": [
    {
      "timestamp": "asc"
    },
    {
      "_uid": "asc"
    }
  ]
}

Get 10 docs before doc 3:

GET _search
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "should": [
            {
              "range": {
                "timestamp": {
                  "lt": "2016-01-01T00:00:00Z"
                }
              }
            },
            {
              "bool": {
                "must": [
                  {
                    "term": {
                      "timestamp": "2016-01-01T00:00:00Z"
                    }
                  },
                  {
                    "range": {
                      "_uid": {
                        "lt": "t#3"
                      }
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
  },
  "sort": [
    {
      "timestamp": "desc"
    },
    {
      "_uid": "desc"
    }
  ]
}

@dtr2
Copy link

dtr2 commented Jan 17, 2016

Is it possible to have a kibana search string to show 10 lines of context around line with _id="X"
Looking at the IDs, they don't seem numeric to me.

@simianhacker
Copy link
Member

@clintongormley This solution is not very realistic for a real world (distributed) system that is not single threaded

@clintongormley
Copy link
Contributor

@simianhacker i don't follow why? Also, the new search_after feature (#16125) makes this even easier. I think with search_after implemented, there is nothing else to do here.

@rashidkpc
Copy link
Author

@clintongormley I can't think of a way to ensure unique increasing UIDs across distributed writers, eg, logstash. Is there an elasticsearch option to ensure that?

@rashidkpc
Copy link
Author

Or does that not matter with search_after?

@jimczi
Copy link
Contributor

jimczi commented Feb 29, 2016

@rashidkpc it doesn't matter with search_after as long as the UIDs are unique.

@clintongormley
Copy link
Contributor

Doesn't matter. The main sort is on (eg) timestamp, the UID is used purely as a tie breaker for documents that have the same timestamp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

6 participants