Search API response time breakdown #21073

PhaedrusTheGreek · 2016-10-21T16:38:18Z

There is currently no easy way to troubleshoot which phase of an API request is taking what amount of time.

It is possible to see a took time of a few milliseconds, but the actual API response takes 60 seconds. This can happen when there are network problems between nodes, and also in the case when integrated components such as AD or LDAP are slow to respond, or are timing out. In some cases there is no logging or other way to identify slow phases.

It would be awesome if we could do:

GET /index/_search?vtook

{
   "vtook": {
       "total_api" : 60000
       "security/ldap_realm" : 59980
       "total_query": 20
       "fetch" : 10
       "collate": 10
       "threadpool_queue" : 0
    }
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1000000,
      "max_score": 1,
      "hits": [
        ...
      ]
   }
}

The text was updated successfully, but these errors were encountered:

jpountz · 2016-10-24T07:11:05Z

If we do this, I think this would belong to the profile API.

evanvolgas · 2017-03-15T13:18:58Z

The Profile API makes sense but it would also be pretty useful if you could log this and do reporting on it, not just profile one-off cases. For example, maybe there could be a sample threshold in which every Nth query gets profiled with vtook and logged. That sample threshold parameter would be quite handy in general. To accomplish that now, I have to use tools like goreplay and rewrite the requests that I want to sample and profile.

elasticmachine · 2018-03-26T03:20:28Z

Pinging @elastic/es-search-aggs

polyfractal · 2018-07-19T21:09:29Z

Chatted about this in Fixit ~~Friday~~ Thursday. Adding some of the extra search phases (fetch, aggregation reduce, etc) wouldn't be too bad, just needs some work to implement.

Adding non-search phases like security or general network stages would probably need a dedicated framework to hook into phases of a request's lifecycle. The current search profiler is pretty hardwired to just work with Search... extending it would likely be more hassle than just sitting down to work on an overarching framework.

@PhaedrusTheGreek Was the interest more in the non-search phases (security, queuing, etc), adding the additional search phases (fetch, agg reduce, etc) or all the above??

PhaedrusTheGreek · 2018-10-24T19:15:28Z

@polyfractal this matter has come up again today, in regards to wanting to understand how much time a query spends in the queue. Previously, the matter was raised by a ~1m LDAP authentication delay.

PhaedrusTheGreek · 2018-12-06T15:25:10Z

@polyfractal - adding a network stage metric could be used to assist/resolve this matter, closed in favour of #36127

{
   "vtook": {
       "total" : 1024,
       "network" : 1020,
       "security" : 0,
       "query": 4,
       "fetch" : 2,
       "collate": 2,
       "threadpool_queue" : 0
    }
}

I wonder if something like this could be added to indices stats also, if so it would tie into monitoring well

polyfractal · 2018-12-06T16:26:21Z

We're still unfortunately at where #21073 (comment) left off... the profiler framework is purely in the Search Phase and would need some fairly substantial overhaul to take into account other things like network. :(

I still think this is an enhancement that the Profiler should get, it'd be very useful (and also related to #23114 which could provide some of this data too)

That said, the new Adaptive Replica Selection sorta provides these details, albeit in a more roundabout manner.

E.g. ARS tracks the overall latencies between nodes so that it can choose better replicas for search.

"adaptive_selection" : {
        "<nodeID>" : {
          "outgoing_searches" : 0,
          "avg_queue_size" : 0,
          "avg_service_time" : "342.5ms",
          "avg_service_time_ns" : 342511123,
          "avg_response_time" : "1.2ms",
          "avg_response_time_ns" : 1256317,
          "rank" : "1.3"
        },
}

Which essentially represents the node-to-node service and response time from a 10,000ft view. Doesn't help with individual queries, but might provide some insight into the effect of network (and load) on the cluster.

PhaedrusTheGreek · 2019-05-31T15:13:07Z

Another option to track the problem of unresponsive authentication services could be to track timeouts/errors in some counter instead of having to rely on finding errors in the logs.

kaliljoao · 2020-11-26T18:35:25Z

Hi guys, can I work on this issue?

PhaedrusTheGreek · 2021-05-20T17:03:56Z

Surfacing this information could potentially be solved with APM via elastic/apm-agent-java#1561

javanna · 2022-03-07T14:14:32Z

I believe that what will be implemented as part of #84369 will help solving this issue. We'll have to evaluate whether more search specific info needs to be exposed as a follow-up.

elasticsearchmachine · 2024-07-17T18:58:34Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

PhaedrusTheGreek added the >enhancement label Oct 21, 2016

clintongormley added discuss :Search/Search Search-related issues that do not fall into other categories labels Nov 5, 2016

evanvolgas mentioned this issue Mar 15, 2017

Ability to associate a search task ID #23250

Closed

talevy added :Search/Search Search-related issues that do not fall into other categories and removed :Search/Search Search-related issues that do not fall into other categories labels Mar 26, 2018

tomcallahan added high hanging fruit help wanted adoptme and removed discuss labels Jul 19, 2018

cbuescher mentioned this issue Dec 5, 2018

took time value is much larger than that calculated by Profile API #29275

Closed

PhaedrusTheGreek mentioned this issue Jan 29, 2020

Improve Elasticsearch network monitoring #36127

Closed

22 tasks

rjernst added the Team:Search Meta label for search team label May 4, 2020

javanna mentioned this issue Mar 7, 2022

Better tooling/logs for troubleshooting long running CCS requests #73922

Open

javanna mentioned this issue Jun 16, 2022

[feature request]high level search profile API #48696

Closed

javanna changed the title ~~API response time breakdown~~ Search API response time breakdown Jun 17, 2022

javanna mentioned this issue Jul 8, 2022

Improve query performance analysis UX #88370

Open

javanna added :Search Foundations/Search Catch all for Search Foundations and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024

elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 17, 2024

elasticsearchmachine removed the Team:Search Meta label for search team label Jul 17, 2024

javanna removed the help wanted adoptme label Apr 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search API response time breakdown #21073

Search API response time breakdown #21073

PhaedrusTheGreek commented Oct 21, 2016

jpountz commented Oct 24, 2016

evanvolgas commented Mar 15, 2017

elasticmachine commented Mar 26, 2018

polyfractal commented Jul 19, 2018 •

edited

Loading

PhaedrusTheGreek commented Oct 24, 2018

PhaedrusTheGreek commented Dec 6, 2018 •

edited

Loading

polyfractal commented Dec 6, 2018

PhaedrusTheGreek commented May 31, 2019 •

edited

Loading

kaliljoao commented Nov 26, 2020

PhaedrusTheGreek commented May 20, 2021

javanna commented Mar 7, 2022

elasticsearchmachine commented Jul 17, 2024

Search API response time breakdown #21073

Search API response time breakdown #21073

Comments

PhaedrusTheGreek commented Oct 21, 2016

jpountz commented Oct 24, 2016

evanvolgas commented Mar 15, 2017

elasticmachine commented Mar 26, 2018

polyfractal commented Jul 19, 2018 • edited Loading

PhaedrusTheGreek commented Oct 24, 2018

PhaedrusTheGreek commented Dec 6, 2018 • edited Loading

polyfractal commented Dec 6, 2018

PhaedrusTheGreek commented May 31, 2019 • edited Loading

kaliljoao commented Nov 26, 2020

PhaedrusTheGreek commented May 20, 2021

javanna commented Mar 7, 2022

elasticsearchmachine commented Jul 17, 2024

polyfractal commented Jul 19, 2018 •

edited

Loading

PhaedrusTheGreek commented Dec 6, 2018 •

edited

Loading

PhaedrusTheGreek commented May 31, 2019 •

edited

Loading