Skip to content

[Transform] Allow aligning checkpointing with date_histogram bucket boundaries #62746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hendrikmuhs opened this issue Sep 22, 2020 · 2 comments

Comments

@hendrikmuhs
Copy link

hendrikmuhs commented Sep 22, 2020

spin-off from #61587

If a transform is used with date_histogram transform creates and updates intermediate buckets, because checkpointing is not aligned with the interval in a the date histogram. For usecases with lot's of data and/or low intervals, this creates extra load, because upserts (delete + create) are expensive.

If checkpointing was aligned with bucket boundaries, upserts could be avoided.

Pro's

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

@przemekwitek
Copy link
Contributor

przemekwitek commented Jun 30, 2021

In order to achieve BWC, we should add a per-transform setting that will enable/disable this optimization.
There are two questions that come to mind:

What should the setting name be?

Propositions are:

  • (more generic) optimize_checkpoint_ranges, enable_checkpoint_ranges_optimization
  • (more specific) align_checkpoint_ranges_with_date_histogram_intervals.

My proposition is to go with the more generic name, e.g.: enable_checkpoint_ranges_optimization

What should the setting default value be?

Propositions are:

Default value  BWC Adoption
enabled from the beginning yes biggest, all the new transforms get the optimization
enabled starting 7.x where x > 15 yes  
enabled starting 8.0 yes  
disabled yes marginal, only for support cases

My proposition is to go with the first option, i.e.: use the optimization by default from the start (7.15).

Special care should be taken during transform update when the default value changes between versions. One transform should always use consistent value of this setting throughout its life.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants