Skip to content

[Rollup] Add ability to delete rollup data #31347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
polyfractal opened this issue Jun 14, 2018 · 6 comments
Closed

[Rollup] Add ability to delete rollup data #31347

polyfractal opened this issue Jun 14, 2018 · 6 comments
Labels
>enhancement :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@polyfractal
Copy link
Contributor

Deleting rollup data (as opposed to the rollup job) is done manually today. If there is only a single job per rollup index, this is an easy process: just delete the index.

But if multiple jobs share a single index, the user needs to execute a Delete-By-Query against the job name to purge the relevant rollup docs. Theoretically, the should also update the _meta of the index to remove the job config, so that the job name can be reused.

This was documented in #31299, but it's really too much to ask users to do.

Our DeleteJob API should either have a flag to also delete the data, or a separate endpoint should be added which is dedicated to purging rollup data.

@polyfractal polyfractal added >enhancement :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data labels Jun 14, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@pyro2927
Copy link

This may be included in the original intent, but it would be nice to have a convenient way to delete a subset of rolled-up data. As a specific example, I'm looking to have two levels of rollups, one with a 1hour interval for the date histogram that I'd like to keep for 3 months, and a second with a 4 hour interval that I'd like to keep for a year. The retention period is a rolling window though, so deleting a rollup index and starting fresh won't work.

@fbaligand
Copy link
Contributor

As an elasticsearch user, I agree that remove a rollup job associated data is not easy.
The most complex part to do manually (and not documented) is to remove _meta section in rollup index.

So I would enjoy that there is a flag to remove data and metadata associated to the rollup job!

@bravurasteve
Copy link

bravurasteve commented Mar 20, 2019

Ideally, the user would have the option to also delete the index or not. I just experienced a case where I wanted to make a simple change to a rollup job to change the interval from "1h" to "60m" as a work-around to another issue, but the index itself was fine, so I didn't want to have to recreate the index just because I wanted to change the job.

@fbaligand
Copy link
Contributor

I agree with @bravurasteve need!

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@wchaparro
Copy link
Member

With the 8.7 release of Elasticsearch, we have made a new downsampling capability associated with the new time series datastreams functionality generally available (GA). This capability was in tech preview in ILM since 8.5. Downsampling provides a method to reduce the footprint of your time series data by storing it at reduced granularity. The downsampling process rolls up documents within a fixed time interval into a single summary document. Each summary document includes statistical representations of the original data: the min, max, sum, value_count, and average for each metric. Data stream time series dimensions are stored unchanged.

Downsampling is superior to rollup because:

  • Downsampled indices are searched through the _search API
  • It is possible to query multiple downsampled indices together with raw data indices
  • The pre-aggregation is based on the metrics and time series definitions in the index mapping so very little configuration is required (i.e. much easier to add new time serieses)
  • Downsampling is managed as an action in ILM
  • It is possible to downsample a downsampled index, and reduce granularity as the index ages
  • The performance of the pre-aggregation process is superior in downsampling, as it builds on the time_series index mode infrastructure

Because of the introduction of this new capability, we are deprecating the rollups functionality, which never left the Tech Preview/Experimental status, in favor of downsampling and thus we are closing this issue. We encourage you to migrate your solution to downsampling and take advantage of the new TSDB functionality.

@wchaparro wchaparro closed this as not planned Won't fix, can't repro, duplicate, stale Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants