Paginate from _id #15799

rashidkpc · 2016-01-06T18:51:25Z

Currently the from key on a request takes an integer and paging picks up after chopping off that number of results from the top of the queued result set.

This works well for static sets. However in high write load situations, in which we're sorting by time, it becomes a problem. We may end up with missing results that it seems should be there, but have a hard time expressing where we want the results to start.

Take for example, a logging case in which I want to see the 10 records before and the 10 records after some event.

I could read the time from the event, and do a sort in both directions. However I'd have to hope that 10 things didn't happen at the same time, common in say, error scenarios where a number of errors (eg, one per shard in a distributed system, all happen at exactly the same time. If that was the case there's a very good chance that my record wouldn't actually appear in the results.

Of course we could say the time resolution wasn't high enough, or that events would be of arbitrary order anyway, but the goal here really is to make sure that our "context" event is included and the results around it are the same as they would be in any previous request.

Ideally I could request the following, assuring that I get event24587302 as the first result in the chronologically sorted list

{
 sort: { "@timestamp" : "desc" }, // And fire another with "asc" at the same time, in an _msearch
 from: "event24587302"
}

Stems from issues in elastic/kibana#275

The text was updated successfully, but these errors were encountered:

jimczi · 2016-01-06T20:51:23Z

That seems hard to achieve considering the way the query is executed. In a multi shards environment you would need at least another round trip to retrieve the timestamp value associated with the from _id. Additionally this does not solve the case where the timestamps are equals. If your problem is just to resolve the case where the timestamps are equals then you can use the script sort, something like:

"sort": [
    {
      "timestamp": {
        "order": "desc"
      }
    },
    {
      "_script": {
        "type": "number",
        "script": {
          "inline": "if (_fields['_id'].value == fromId) { return 0 } else { return 1}",
          "params": {
            "fromId": "$eventId"
          }
        },
        "order": "asc"
      }
    }
  ]

Unfortunately this would be very slow because the _id fields is stored but has no doc values. You could activate doc values for the field but the memory usage would make the feature very costly.

jpountz · 2016-01-06T22:11:11Z

related to #8192

clintongormley · 2016-01-10T11:50:46Z

As long as your UIDs increase, you could do the following to get the 10 results before and after a particular document:

POST t/t/_bulk
{"index": {"_id": "1"}}
{"timestamp":"2016-01-01T00:00:00Z"}
{"index": {"_id": "2"}}
{"timestamp":"2016-01-01T00:00:00Z"}
{"index": {"_id": "3"}}
{"timestamp":"2016-01-01T00:00:00Z"}
{"index": {"_id": "4"}}
{"timestamp":"2016-01-01T00:00:00Z"}
{"index": {"_id": "5"}}
{"timestamp":"2016-01-01T11:11:11Z"}

Get 10 docs after doc 3:

GET _search
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "should": [
            {
              "range": {
                "timestamp": {
                  "gt": "2016-01-01T00:00:00Z"
                }
              }
            },
            {
              "bool": {
                "must": [
                  {
                    "term": {
                      "timestamp": "2016-01-01T00:00:00Z"
                    }
                  },
                  {
                    "range": {
                      "_uid": {
                        "gt": "t#3"
                      }
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
  },
  "sort": [
    {
      "timestamp": "asc"
    },
    {
      "_uid": "asc"
    }
  ]
}

Get 10 docs before doc 3:

GET _search
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "should": [
            {
              "range": {
                "timestamp": {
                  "lt": "2016-01-01T00:00:00Z"
                }
              }
            },
            {
              "bool": {
                "must": [
                  {
                    "term": {
                      "timestamp": "2016-01-01T00:00:00Z"
                    }
                  },
                  {
                    "range": {
                      "_uid": {
                        "lt": "t#3"
                      }
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
  },
  "sort": [
    {
      "timestamp": "desc"
    },
    {
      "_uid": "desc"
    }
  ]
}

dtr2 · 2016-01-17T16:37:01Z

Is it possible to have a kibana search string to show 10 lines of context around line with _id="X"
Looking at the IDs, they don't seem numeric to me.

simianhacker · 2016-02-02T23:08:44Z

@clintongormley This solution is not very realistic for a real world (distributed) system that is not single threaded

clintongormley · 2016-02-13T22:19:14Z

@simianhacker i don't follow why? Also, the new search_after feature (#16125) makes this even easier. I think with search_after implemented, there is nothing else to do here.

rashidkpc · 2016-02-29T18:58:56Z

@clintongormley I can't think of a way to ensure unique increasing UIDs across distributed writers, eg, logstash. Is there an elasticsearch option to ensure that?

rashidkpc · 2016-02-29T19:03:28Z

Or does that not matter with search_after?

jimczi · 2016-02-29T19:11:27Z

@rashidkpc it doesn't matter with search_after as long as the UIDs are unique.

clintongormley · 2016-02-29T19:11:46Z

Doesn't matter. The main sort is on (eg) timestamp, the UID is used purely as a tie breaker for documents that have the same timestamp

tbragin mentioned this issue Jan 6, 2016

Extract log event context elastic/kibana#275

Closed

clintongormley added discuss :Search/Search Search-related issues that do not fall into other categories labels Jan 10, 2016

clintongormley closed this as completed Feb 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Paginate from _id #15799

Paginate from _id #15799

rashidkpc commented Jan 6, 2016

jimczi commented Jan 6, 2016

Uh oh!

jpountz commented Jan 6, 2016

Uh oh!

clintongormley commented Jan 10, 2016

Uh oh!

dtr2 commented Jan 17, 2016

Uh oh!

simianhacker commented Feb 2, 2016

Uh oh!

clintongormley commented Feb 13, 2016

Uh oh!

rashidkpc commented Feb 29, 2016

Uh oh!

rashidkpc commented Feb 29, 2016

Uh oh!

jimczi commented Feb 29, 2016

Uh oh!

clintongormley commented Feb 29, 2016

Uh oh!

Paginate from _id #15799

Paginate from _id #15799

Comments

rashidkpc commented Jan 6, 2016

jimczi commented Jan 6, 2016

Uh oh!

jpountz commented Jan 6, 2016

Uh oh!

clintongormley commented Jan 10, 2016

Uh oh!

dtr2 commented Jan 17, 2016

Uh oh!

simianhacker commented Feb 2, 2016

Uh oh!

clintongormley commented Feb 13, 2016

Uh oh!

rashidkpc commented Feb 29, 2016

Uh oh!

rashidkpc commented Feb 29, 2016

Uh oh!

jimczi commented Feb 29, 2016

Uh oh!

clintongormley commented Feb 29, 2016

Uh oh!