Skip to content

Add a metric aggs to support TSDB high performance computing #84930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
weizijun opened this issue Mar 14, 2022 · 4 comments
Closed

Add a metric aggs to support TSDB high performance computing #84930

weizijun opened this issue Mar 14, 2022 · 4 comments
Labels
:Analytics/Aggregations Aggregations >enhancement :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@weizijun
Copy link
Contributor

weizijun commented Mar 14, 2022

Description

There is a common case in TSDB, e.g the total cpu cost of a cluster. The metric is collected by node, named node.cpu_percent. To get the metric line, the DSL is:

{
  "aggs": {
    "@timestamp": {
      "date_histogram": {
        "field": "@timestamp",
        "fixed_interval": "10m"
      },
      "aggs": {
        "tsid": {
          "time_series": {
          },
          "aggs": {
            "avg_value": {
              "avg": {
                "field": "node.cpu_percent"
              }
            }
          }
        },
        "sum_value": {
          "sum_bucket": {
            "buckets_path": "tsid>avg_value"
          }
        }
      }
    }
  }
}

It will use the pipeline sum_bucket, to calculate the cpu_percent of total node.
But it will face many problems:

  • We only need the sum_value, but the response return all tsid bucket results.
  • If there are so many tsid, it maybe failed by the search.max_buckets count limit.
  • If set a size in the time_series aggs, the result may be wrong.
  • The calculate in the coordinate node is slow and cost so many memory.

The example is a common case in TSDB, we can describe the requirement as that:
To calculate the metric from a bucket, the data is as follows:

time_series line\timestamp t1 t2 t3
a a1 a2 a3
b b1 b2 b3
c c1 c2 c3

If we want to get a metric from the bucket, we must calculate the total 9 numbers in only one way. e.g:

  • sum = a1+a2+a3+b1+b2+b3+c1+c2+c3
  • avg = (a1+a2+a3+b1+b2+b3+c1+c2+c3) / 9

But in time series case, the requirement is alway that: calculate the metric of one time series line, and then calculate the metric of all time series lines.
e.g in the above node.cpu_percent case, we first calculate the avg value of each time series line:

  • a' = (a1 + a2 + a3) / 3
  • b' = (b1 + b2 + b3) / 3
  • c' = (c1 + c2 + c3) / 3

And then to get the sum of all time series lines: sum = a' + b' + c'.
The requirement can be implement by pipeline aggs, but it has many problems as above.
To implement the requirement, we can add a new metric aggs operator.

  "time_series_metric": {
    "field": "@m_aliyunes.ecs.node_stats_process_cpu_percent_raw",
    "downsample": "avg/max/min/sum/count/last/first",
    "aggregator": ["sum", "max"]
  }

time_series_metric contain three fields:

  • field : the metric field name
  • downsample : downsample one time series line, support: avg/max/min/sum/value_count/last/first.
  • aggregator : aggregator the metric of all time series lines, it support multi-value, support: avg/max/min/sum.

the result is:

{
  "sum" : 99.0,
  "max" : 20.0
}

Since the data is sorted by _tsid, we can calculate the aggregator results in the data node, and aggregate the results of the _tsid downsampled value one by one.

@weizijun weizijun added >enhancement needs:triage Requires assignment of a team area label labels Mar 14, 2022
@csoulios csoulios added :Analytics/Aggregations Aggregations :StorageEngine/TSDB You know, for Metrics labels Mar 14, 2022
@elasticmachine elasticmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Mar 14, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

1 similar comment
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@csoulios csoulios removed the needs:triage Requires assignment of a team area label label Mar 14, 2022
@imotov
Copy link
Contributor

imotov commented Mar 14, 2022

@weizijun thank you for openning the issue. We understand the issue, but we are planning to address it by adding support for aggregation results filtering (allowing users to specify that they are not interested in individual time series results, but only care about the summary result) and optimizing pipeline aggregation to take advantage of this knowledge as a follow up. We are not planning to introduce a new aggregation since as you mentioned the same result can be achieved using pipeline aggregations. So, instead we would like to figure out how to make the pipeline aggregation more optimal in this particular use case.

If you don't mind I am going to close this as the duplicate of #66247 since if I understood your correctly, you are describing the same issue here.

@imotov
Copy link
Contributor

imotov commented Mar 17, 2022

Closing as duplicate of #66247

@imotov imotov closed this as completed Mar 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

4 participants