Skip to content

TSDB: add a time_series_aggregation to support TSDB query and promQL #85798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
weizijun opened this issue Apr 12, 2022 · 4 comments
Open

TSDB: add a time_series_aggregation to support TSDB query and promQL #85798

weizijun opened this issue Apr 12, 2022 · 4 comments

Comments

@weizijun
Copy link
Contributor

weizijun commented Apr 12, 2022

Background

In the issue #66247, it represent the TSDB query plan.
The TSDB query alway like this:

  • downsample metric of one time_series line (_tsid)
  • aggregator _tsids by group

image

The query can be explain as this:

{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "@timestamp": {
              "from": xxx,
              "to": xxx
            }
          }
        },
        {
          // other query conditions
        }
      ]
    }
  },
  "aggregations": {
    "group": {
      "multi_terms": {
        "terms": [
          {
            "field": "dim1"
          },
          {
            "field": "dim2"
          },
          ......
        ]
      },
      "aggregations": {
        "time_bucket": {
          "date_histogram": {
            "field": "@timestamp",
            "interval": "1m"
          },
          "aggregations": {
            "tsid": {
              "time_series": {},
              "aggs": {
                "avg_value": {
                  "avg": {
                    "field": "metric"
                  }
                }
              }
            },
            "sum_value": {
              "sum_bucket": {
                "buckets_path": "tsid>avg_value"
              }
            }
          }
        }
      }
    }
  }
}

But the query is very slow and cost so many memory. To speed up TSDB query, I create a new low-level aggregation, it don't used by user. It used by the new QL storage API.

Low-level time_series_aggregation

The aggregation like this:

"time_series_aggregation" : {
  "metric" : "xxx",
  "group" : [],
  "without" : [],
  "interval" : "10m",
  "offset" : "",
  "downsample" : {
    "range" : "10m",
    "function" : "sum"
  },
  "aggregator" : ""
}

time_series_aggregation contain below parameters:

  • metric:metric name
  • group:group by dimension fields, like promQL by.
  • without:without dimension fields, like promQL without.
  • interval:time range interval, like promQL step.
  • offset:time offset, like promQL offset.
  • downsample:downsampling time_series data.
    • range:time range,like PromQL range vector.
    • function:aggregator function in one time series line, e.g sum, rate.
  • aggregator:aggregator function for all time _series line in one group.

The result like this:

{
  "took": 109,
  "timed_out": false,
  "_shards": {
    "total": 10,
    "successful": 10,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 351372,
    "max_score": null,
    "hits": [
      
    ]
  },
  "aggregations": {
    "group": {
      "buckets": {
        "group1": {
          "key": {
            xxx
          },
          "doc_count": 2,
          "values": {
            "1649689260000": {
              "value": 1.0
            },
            "1649689200000": {
              "value": 3.0
            },
            "1649689140000": {
              "value": 2.0
            }
          }
        },
        "group2": {
          "key": {
            xxxx
          },
          "doc_count": 2,
          "values": {
            "1649689260000": {
              "value": 2.0
            },
            "1649689200000": {
              "value": 1.0
            },
            "1649689140000": {
              "value": 1.0
            }
          }
        }
      }
    }
  }
}

Execute Flow

In the collect phase:

  • As the collect is order by tsid, so we can calculate and do downsampling the metric results of tsid sequential.
  • When the current tsid is a different one, we can begin to calculate the preview tsid metric results.
  • Decode the tsid and filter the group by group and without parameters to get the bucket key.
  • do aggregator function of the metric downsampling results.

In the post collection phase:

  • do the aggregator collect function of the last tsid metric downsampling results.

In the buildAggregations phase:

  • generate the group bucket result
  • do the aggregator build function of each bucket.

In the reduce phase:

  • reduce the bucket result

Special case

There is a special case where multiple indices may have the same tsid even though the tsid is routed in one index. So if the date_histogram covers multiple indices, we can't do the aggregator function in the datanode.

We can check if the date_histogram bucket will covers multiple indices by the settings index.time_series.start_time and index.time_series.end_time.

First we calculate the timestamp value to a rounding interval value.
Then we compare the rounding value with start_time and end_time.
If the rounding value is between the start_time and end_time, we can confirm that the date_histogram bucket is absolute in one index. So we can do aggregator function in the datanode.
But if the rounding is smaller than start_time or bigger than (end_time - interval). The date_histogram bucket may be have multiple indices. So we keep the original tsid downsampling results.And calculate the aggregator results in the reduce phase of the coordinate node.

How to Speed up

Compare time_series_aggregation and the first long query DSL, the reason why time_series_aggregation can speed up is:

  • time_series_aggregation decode the terms from tsid, which have be used in the TimeSeriesIndexSearcher, it save the IO cost of seek terms.
  • sequential calculate the downsampling result of tsid, then it can the do the aggregator collect directly. It saves memory on storing the original tsid list, and the cost of building aggregator results from the original tsid list.

Explain

This low-level aggregation is the MVP of TSDB queries. We can change the aggregation implementation with a new TSDB execution framework. Possibly the Distributed nested delayed execution framework have the same functionality as the time series aggregation.
With the help of time series aggregation, we can implement the promQL support.

@weizijun weizijun added >enhancement needs:triage Requires assignment of a team area label labels Apr 12, 2022
@mayya-sharipova mayya-sharipova added the :StorageEngine/TSDB You know, for Metrics label Apr 12, 2022
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 12, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@mayya-sharipova mayya-sharipova removed Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) needs:triage Requires assignment of a team area label labels Apr 12, 2022
@weizijun
Copy link
Contributor Author

hi, @nik9000 , can you help to review the issue, I implemented it in #86097.
It's WIP. This aggregation can be used to support PromQL. I have implemented the PromQL support prototype internally. It can push down the aggregation to this low-level time_series aggregation, and do nested expressions in the coordinate node.

@weizijun weizijun changed the title TSDB: add a time_series_aggregation to support TSDB query TSDB: add a time_series_aggregation to support TSDB query to support promQL May 27, 2022
@weizijun weizijun changed the title TSDB: add a time_series_aggregation to support TSDB query to support promQL TSDB: add a time_series_aggregation to support TSDB query and promQL May 27, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 9, 2022
@weizijun
Copy link
Contributor Author

hi, @nik9000 , do you have any suggestions for this TSDB aggregation?

@wchaparro wchaparro removed the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 20, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants