Add a metric aggs to support TSDB high performance computing #84930

weizijun · 2022-03-14T03:10:08Z

Description

There is a common case in TSDB, e.g the total cpu cost of a cluster. The metric is collected by node, named node.cpu_percent. To get the metric line, the DSL is:

{
  "aggs": {
    "@timestamp": {
      "date_histogram": {
        "field": "@timestamp",
        "fixed_interval": "10m"
      },
      "aggs": {
        "tsid": {
          "time_series": {
          },
          "aggs": {
            "avg_value": {
              "avg": {
                "field": "node.cpu_percent"
              }
            }
          }
        },
        "sum_value": {
          "sum_bucket": {
            "buckets_path": "tsid>avg_value"
          }
        }
      }
    }
  }
}

It will use the pipeline sum_bucket, to calculate the cpu_percent of total node.
But it will face many problems:

We only need the sum_value, but the response return all tsid bucket results.
If there are so many tsid, it maybe failed by the search.max_buckets count limit.
If set a size in the time_series aggs, the result may be wrong.
The calculate in the coordinate node is slow and cost so many memory.

The example is a common case in TSDB, we can describe the requirement as that:
To calculate the metric from a bucket, the data is as follows:

time_series line\timestamp	t1	t2	t3
a	a1	a2	a3
b	b1	b2	b3
c	c1	c2	c3

If we want to get a metric from the bucket, we must calculate the total 9 numbers in only one way. e.g:

sum = a1+a2+a3+b1+b2+b3+c1+c2+c3
avg = (a1+a2+a3+b1+b2+b3+c1+c2+c3) / 9

But in time series case, the requirement is alway that: calculate the metric of one time series line, and then calculate the metric of all time series lines.
e.g in the above node.cpu_percent case, we first calculate the avg value of each time series line:

a' = (a1 + a2 + a3) / 3
b' = (b1 + b2 + b3) / 3
c' = (c1 + c2 + c3) / 3

And then to get the sum of all time series lines: sum = a' + b' + c'.
The requirement can be implement by pipeline aggs, but it has many problems as above.
To implement the requirement, we can add a new metric aggs operator.

  "time_series_metric": {
    "field": "@m_aliyunes.ecs.node_stats_process_cpu_percent_raw",
    "downsample": "avg/max/min/sum/count/last/first",
    "aggregator": ["sum", "max"]
  }

time_series_metric contain three fields:

field : the metric field name
downsample : downsample one time series line, support: avg/max/min/sum/value_count/last/first.
aggregator : aggregator the metric of all time series lines, it support multi-value, support: avg/max/min/sum.

the result is:

{
  "sum" : 99.0,
  "max" : 20.0
}

Since the data is sorted by _tsid, we can calculate the aggregator results in the data node, and aggregate the results of the _tsid downsampled value one by one.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-03-14T16:25:32Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

elasticmachine · 2022-03-14T16:25:32Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

imotov · 2022-03-14T18:31:03Z

@weizijun thank you for openning the issue. We understand the issue, but we are planning to address it by adding support for aggregation results filtering (allowing users to specify that they are not interested in individual time series results, but only care about the summary result) and optimizing pipeline aggregation to take advantage of this knowledge as a follow up. We are not planning to introduce a new aggregation since as you mentioned the same result can be achieved using pipeline aggregations. So, instead we would like to figure out how to make the pipeline aggregation more optimal in this particular use case.

If you don't mind I am going to close this as the duplicate of #66247 since if I understood your correctly, you are describing the same issue here.

imotov · 2022-03-17T01:27:12Z

Closing as duplicate of #66247

weizijun added >enhancement needs:triage Requires assignment of a team area label labels Mar 14, 2022

csoulios added :Analytics/Aggregations Aggregations :StorageEngine/TSDB You know, for Metrics labels Mar 14, 2022

elasticmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Mar 14, 2022

csoulios removed the needs:triage Requires assignment of a team area label label Mar 14, 2022

imotov closed this as completed Mar 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a metric aggs to support TSDB high performance computing #84930

Add a metric aggs to support TSDB high performance computing #84930

weizijun commented Mar 14, 2022 •

edited

Loading

elasticmachine commented Mar 14, 2022

elasticmachine commented Mar 14, 2022

imotov commented Mar 14, 2022

imotov commented Mar 17, 2022

Add a metric aggs to support TSDB high performance computing #84930

Add a metric aggs to support TSDB high performance computing #84930

Comments

weizijun commented Mar 14, 2022 • edited Loading

Description

elasticmachine commented Mar 14, 2022

elasticmachine commented Mar 14, 2022

imotov commented Mar 14, 2022

imotov commented Mar 17, 2022

weizijun commented Mar 14, 2022 •

edited

Loading