-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add a way to downsample metrics #66247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/es-analytics-geo (Team:Analytics) |
I think we should rename this feature/functionality/ticket, as it is causing confusion in relation to #64777. I've had three conversations in the last week where different folks were confusing the two issues :) This requested functionality isn't really downsampling or subsampling since traditionally sampling involves selecting a subset of the original data points to use as a proxy for the overall population. The requested feature here is really more of choosing an appropriate aggregate function to represent a bucket of time, rather than choosing a single point to represent the bucket. Perhaps something like "time series aware aggregate function" or similar? |
The term "downsampling" is quite extended in time series databases, introducing a new term here for this could be confusing in the observability context (though perhaps this could be solved by documentation).
I think that actually this feature fits quite well with the definition of Downsampling in this link 🙂: When the process is performed on a sequence of samples of a signal or other continuous function, it produces an approximation of the sequence that would have been obtained by sampling the signal at a lower rate This is exactly what we would be looking for, an approximation for the sequence as if its samples would have been collected at a lower rate.
Usually "downsampling" needs of an aggregate function, so depending on how this is implemented perhaps two terms are needed, one for the feature itself, and another one for the process that produces each value of the new sequence, the function. |
It seems to me that the key ask here, from an aggregations perspective, is the ability to condense a bucket to a single value via a metric aggregation (aggregate function, or downsampling function if you like), and then run another metric aggregation over those values, bucketed by another level. In the example above, for each time-host bucket, you want the sum of the averages over the container name. This idea - being able to condense buckets and then run further aggregations on them - strikes me a something that might be useful outside of the downsampling use case too. We've talked about a few related ideas (sub-queries, windowing functions) internally to the aggs team, and I think the generic metric-of-metrics idea is worth considering. I just want to validate that metric-of-metrics, as described above, would meet your needs here, or if there's some other piece of the puzzle that I'm not seeing. |
This is right, in order to obtain correct results we need to make sure we are not aggregating the same time series twice, hence we need to condense it first (by all its dimensions). A way to do metric-of-metrics could work here, my only concern is around performance, as this would be widely used across all metric queries. I wonder if there are any optimization we can do for this specific use case. @imotov mentioned this issue here: #65623 (comment) and how it relates to #60619 (to some extent). |
related to #74660 |
@not-napoleon / @imotov / @ruflin are we comfortable closing this in favor of the wider TSDB effort? |
It is a part of the TSDB effort. We don't have a separate issue for that, so I am ok with keeping this one as a placeholder. |
We need to be careful not to get tripped up by terminology here. So far, TSDB has used |
Good point, @not-napoleon — this ticket is particularly focused on something very similar to your summary here: #66247 (comment) |
This aggregation (#85798) can solve the problem, and it can easily support promQL. |
Closing as not planned. |
We store metrics coming from Beats or APM using the following convention: Metrics are stored as numeric fields in documents, together with a set of other fields representing the dimensions for these metrics (these are normally of type keyword). For example:
We normally put together all metrics with the same dimensions in the same document, for storage & query efficiency.
Each combination of dimensions key-values creates a unique times series. This has some implications on the way we need to aggregate them. For example for the following data points:
We have 4 time series, across 3 hosts and 3 different container names. If we want to graph the "total containers memory usage per host" I would do a sum agg grouping (terms) by host.
This provides good results, as long as the date_histogram bucket size corresponds to the reporting period (10s). At time t:
Now, when the date_histogram bucket size is different from the reporting period the query will provide "wrong" results, as it will be aggregating multiple points for the same time series. At time t with a 20s bucket:
The reason is that we ended up with 2 points in the same bucket for each time series, so we are double counting them.
To get the expected results we need to downsample time series first, so we get a single data point per time series in each bucket, then we can apply the agg afterwards. In this case we could use "avg" as the downsampling function:
The downsampling function may depend on the type of metric that is being queried, with
avg
,last/max
orsum
as possible options.It would be nice to have a way to automatically downsample time series based on a given set of dimensions and a downsampling function. It would also be interesting to discuss if dimensions could be something known to Elasticsearch, so users/kibana don't need to provide them at query time.
The text was updated successfully, but these errors were encountered: