-
Notifications
You must be signed in to change notification settings - Fork 323
Surface detailed/internal timings from Elasticsearch #1561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We intentionally don't measure lots of stuff. And a lot of what we do measure is exposed over the |
@nik9000 I'd like to have per-request timings, not necessarily per cluster/node/index/shard if that makes sense. I'll give an example for search to start the discussion but please keep in mind that I'm not very familiar with ES internals and it will not be the correct representation, but hopefully it makes it more clear what I'm looking for. In the context of this discussion, a transaction is an event that describes the time it took for Elasticsearch to respond to a search request. A span is an (async) operation that happens in the context for this request, and is a child of the transaction. Both events have a start time ( Suppose service-a is calling I think this would be useful in detecting network issues, thread pool exhaustion and optimisation opportunities (e.g. a search request that hits many shards because there is no range query included). The fact that these timings would be visible in the context of a trace can be very helpful because it helps people correlate slowness for a specific request to Elasticsearch issues. While cluster/node stats are useful they report averages which might not be helpful in analysing performance issues for a specific request. I also think this would be helpful in having people working with Elasticsearch understand its architecture better. |
Today, the tasks API and the slow query log are the best two ways to hook into this information, but tasks are transient and slow query log is a narrow lens. |
We've actually resorted to polling tasks API on a schedule to help spot bulks that are taking too long. |
It would be valuable to have more detailed and/or internal timings about Elasticsearch search requests, rather than just timings from the perspective of the client.
The profile API returns how much time is spent doing what on each shard, but it has a lot of overhead and doesn't give a great timeline of a search.
We can collect use cases and list them here and see what is feasible and what is needed from other teams to make it happen.
Examples:
canMatch
The text was updated successfully, but these errors were encountered: