TSDB: add a time_series_aggregation to support TSDB query and promQL #85798

weizijun · 2022-04-12T02:23:43Z

Background

In the issue #66247, it represent the TSDB query plan.
The TSDB query alway like this:

downsample metric of one time_series line (_tsid)
aggregator _tsids by group

The query can be explain as this:

{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "@timestamp": {
              "from": xxx,
              "to": xxx
            }
          }
        },
        {
          // other query conditions
        }
      ]
    }
  },
  "aggregations": {
    "group": {
      "multi_terms": {
        "terms": [
          {
            "field": "dim1"
          },
          {
            "field": "dim2"
          },
          ......
        ]
      },
      "aggregations": {
        "time_bucket": {
          "date_histogram": {
            "field": "@timestamp",
            "interval": "1m"
          },
          "aggregations": {
            "tsid": {
              "time_series": {},
              "aggs": {
                "avg_value": {
                  "avg": {
                    "field": "metric"
                  }
                }
              }
            },
            "sum_value": {
              "sum_bucket": {
                "buckets_path": "tsid>avg_value"
              }
            }
          }
        }
      }
    }
  }
}

But the query is very slow and cost so many memory. To speed up TSDB query, I create a new low-level aggregation, it don't used by user. It used by the new QL storage API.

Low-level time_series_aggregation

The aggregation like this:

"time_series_aggregation" : {
  "metric" : "xxx",
  "group" : [],
  "without" : [],
  "interval" : "10m",
  "offset" : "",
  "downsample" : {
    "range" : "10m",
    "function" : "sum"
  },
  "aggregator" : ""
}

time_series_aggregation contain below parameters：

metric：metric name
group：group by dimension fields, like promQL by.
without：without dimension fields, like promQL without.
interval：time range interval, like promQL step.
offset：time offset, like promQL offset.
downsample：downsampling time_series data.
- range：time range，like PromQL range vector.
- function：aggregator function in one time series line, e.g sum, rate.
aggregator：aggregator function for all time _series line in one group.

The result like this:

{
  "took": 109,
  "timed_out": false,
  "_shards": {
    "total": 10,
    "successful": 10,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 351372,
    "max_score": null,
    "hits": [
      
    ]
  },
  "aggregations": {
    "group": {
      "buckets": {
        "group1": {
          "key": {
            xxx
          },
          "doc_count": 2,
          "values": {
            "1649689260000": {
              "value": 1.0
            },
            "1649689200000": {
              "value": 3.0
            },
            "1649689140000": {
              "value": 2.0
            }
          }
        },
        "group2": {
          "key": {
            xxxx
          },
          "doc_count": 2,
          "values": {
            "1649689260000": {
              "value": 2.0
            },
            "1649689200000": {
              "value": 1.0
            },
            "1649689140000": {
              "value": 1.0
            }
          }
        }
      }
    }
  }
}

Execute Flow

In the collect phase:

As the collect is order by tsid, so we can calculate and do downsampling the metric results of tsid sequential.
When the current tsid is a different one, we can begin to calculate the preview tsid metric results.
Decode the tsid and filter the group by group and without parameters to get the bucket key.
do aggregator function of the metric downsampling results.

In the post collection phase:

do the aggregator collect function of the last tsid metric downsampling results.

In the buildAggregations phase:

generate the group bucket result
do the aggregator build function of each bucket.

In the reduce phase:

reduce the bucket result

Special case

There is a special case where multiple indices may have the same tsid even though the tsid is routed in one index. So if the date_histogram covers multiple indices, we can't do the aggregator function in the datanode.

We can check if the date_histogram bucket will covers multiple indices by the settings index.time_series.start_time and index.time_series.end_time.

First we calculate the timestamp value to a rounding interval value.
Then we compare the rounding value with start_time and end_time.
If the rounding value is between the start_time and end_time, we can confirm that the date_histogram bucket is absolute in one index. So we can do aggregator function in the datanode.
But if the rounding is smaller than start_time or bigger than (end_time - interval). The date_histogram bucket may be have multiple indices. So we keep the original tsid downsampling results.And calculate the aggregator results in the reduce phase of the coordinate node.

How to Speed up

Compare time_series_aggregation and the first long query DSL, the reason why time_series_aggregation can speed up is:

time_series_aggregation decode the terms from tsid, which have be used in the TimeSeriesIndexSearcher, it save the IO cost of seek terms.
sequential calculate the downsampling result of tsid, then it can the do the aggregator collect directly. It saves memory on storing the original tsid list, and the cost of building aggregator results from the original tsid list.

Explain

This low-level aggregation is the MVP of TSDB queries. We can change the aggregation implementation with a new TSDB execution framework. Possibly the Distributed nested delayed execution framework have the same functionality as the time series aggregation.
With the help of time series aggregation, we can implement the promQL support.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-04-12T17:32:20Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

weizijun · 2022-05-16T14:11:08Z

hi, @nik9000 , can you help to review the issue, I implemented it in #86097.
It's WIP. This aggregation can be used to support PromQL. I have implemented the PromQL support prototype internally. It can push down the aggregation to this low-level time_series aggregation, and do nested expressions in the coordinate node.

weizijun · 2022-10-26T14:18:14Z

hi, @nik9000 , do you have any suggestions for this TSDB aggregation?

elasticsearchmachine · 2024-03-20T16:37:34Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

weizijun added >enhancement needs:triage Requires assignment of a team area label labels Apr 12, 2022

mayya-sharipova added the :StorageEngine/TSDB You know, for Metrics label Apr 12, 2022

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 12, 2022

mayya-sharipova removed Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) needs:triage Requires assignment of a team area label labels Apr 12, 2022

weizijun mentioned this issue Apr 22, 2022

[WIP] TSDB: add a low-level time series aggregation to support promQL #86097

Draft

weizijun mentioned this issue May 19, 2022

Add a way to downsample metrics #66247

Closed

weizijun changed the title ~~TSDB: add a time_series_aggregation to support TSDB query~~ TSDB: add a time_series_aggregation to support TSDB query to support promQL May 27, 2022

weizijun changed the title ~~TSDB: add a time_series_aggregation to support TSDB query to support promQL~~ TSDB: add a time_series_aggregation to support TSDB query and promQL May 27, 2022

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 9, 2022

weizijun mentioned this issue Sep 28, 2022

Shortcut aggs for TSDB #90423

Closed

wchaparro removed the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 20, 2024

elasticsearchmachine added the Team:StorageEngine label Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TSDB: add a time_series_aggregation to support TSDB query and promQL #85798

TSDB: add a time_series_aggregation to support TSDB query and promQL #85798

weizijun commented Apr 12, 2022 •

edited

Loading

elasticmachine commented Apr 12, 2022

weizijun commented May 16, 2022

weizijun commented Oct 26, 2022

elasticsearchmachine commented Mar 20, 2024

TSDB: add a time_series_aggregation to support TSDB query and promQL #85798

TSDB: add a time_series_aggregation to support TSDB query and promQL #85798

Comments

weizijun commented Apr 12, 2022 • edited Loading

Background

Low-level time_series_aggregation

Execute Flow

Special case

How to Speed up

Explain

elasticmachine commented Apr 12, 2022

weizijun commented May 16, 2022

weizijun commented Oct 26, 2022

elasticsearchmachine commented Mar 20, 2024

weizijun commented Apr 12, 2022 •

edited

Loading