Skip to content

Removal of old apiVersions in CRDs #11894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 of 5 tasks
sbueringer opened this issue Feb 24, 2025 · 4 comments · Fixed by #11889
Closed
4 of 5 tasks

Removal of old apiVersions in CRDs #11894

sbueringer opened this issue Feb 24, 2025 · 4 comments · Fixed by #11889
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@sbueringer
Copy link
Member

sbueringer commented Feb 24, 2025

Problem statement

As we evolve our CRDs over time we regularly bump our apiVersions. While we have an established process to add new apiVersions in Cluster API, removal of apiVersions is still problematic.

Before we can remove an apiVersion from a CRD we have to go through the following steps:

1. Ensure all custom resources can still be read from etcd after the apiVersion is removed ("Storage version migration")

This is typically done by writing all custom resources with the current storage version. More information can be found in Versions in CRDs.

In Cluster API today, this is only implemented as part of the clusterctl upgrade command. If clusterctl is not used, folks have to write their own full custom implementation or build on top of the builtin storage migration (alpha since v1.30).

Note: Storage version migration should be run as soon as a new apiVersion becomes storage version to minimize conversion webhook calls.

2. Remove the old apiVersion from managedFields of all custom resources ("ManagedField cleanup")

Kubernetes stores field ownership information by apiVersion in managedFields. Unfortunately, there is no builtin logic that removes managedFields of an apiVersion when that apiVersion is removed from the CRD. If there are still managedFields with a removed apiVersion any subsequent apply requests will fail:

Note: managedField cleanup should be run as soon as an apiVersion is not served anymore to minimize conversion webhook calls. As long as an apiVersion is still served, users can still apply with this apiVersion and then the corresponding managedFields are needed to properly execute the apply.

Why is this important now

Expand for more details

For the following reasons we want to remove old apiVersions as soon as possible:

Maintenance effort

We have to keep the Go types of the old apiVersions around. We also have to adjust the conversion implementation whenever we add a new field to our current API.

Increased resource usage through conversion requests

As long as the old apiVersions are part of our CRDs we will get a significant number of requests on the conversion webhooks:

Implementation

A few notes:

  • The solution should be also available for folks that don't use clusterctl
  • It should be easy for providers to re-use the implementation for their own CRDs
  • It should be possible to disable storage version migration and/or managedField cleanup for cases where folks want to take care of these themselves

Idea:

  • Implement a controller / reconciler that can be embedded in core CAPI / providers to run storage version migration and managedField cleanup for the providers' CRDs

For more implementation details, see: #11889

Removal of v1alpha3 & v1alpha4 and v1beta1 apiVersions

Context:

  • CAPI supports 3 versions: for 2 versions we regularly release patch releases, for the 3rd one we create emergency patches on demand
  • CAPI currently tests up to n-3 => n upgrades.
  • We want to make sure that removal of old apiVersions do not break the n-3 => n upgrade path. This means that we have to keep apiVersions around long enough that "storage version migration" and "managedField cleanup" are run.

v1alpha3 & v1alpha4

CAPI Release date v1alpha3 + v1alpha4 Notes
v1.9 Dec 24 Served: false
v1.10 April 25 Served: false CRD migrator added
v1.11 August 25 Served: false
v1.12 December 25 Served: false
v1.13 April 26 v1alpha3 + v1alpha4 removed
v1.14 August 26
v1.15 December 26

Notes:

  • v1.10-v1.12: We have to keep v1alpha3 + v1alpha4 around for 3 versions after the CRD migrator has been added, to ensure that managedField cleanup is run even if someone upgrades from n-3 => n

v1beta1

CAPI Release date v1beta1 v1beta2 Notes
v1.9 Dec 24 Served: true, Storage
v1.10 April 25 Served: true, Storage
v1.11 August 25 Served: true Served: true, Storage v1beta2 added
v1.12 December 25 Served: true Served: true, Storage
v1.13 April 26 Served: true Served: true, Storage
v1.14 August 26 Served: false Served: true, Storage v1beta1 unserved
v1.15 December 26 Served: false Served: true, Storage
v1.16 April 27 Served: false Served: true, Storage
v1.17 August 27 Served: false Served: true, Storage
v1.18 December 27 Served: true, Storage v1beta1 removed

Notes:

  • v1.11-v1.13: We have to keep v1beta1 served for 3 versions after introduction of v1beta2 according to the Kubernetes deprecation policy
  • v1.14-v1.17: We have to keep v1beta1 around for 3 versions after it was unserved, to ensure that managedField cleanup is run even if someone upgrades from n-3 => n.
    • Note: We want to keep v1beta1 around for one additional release so that folks have 1 buffer release where they can revert v1beta1 back to served if they need more time to pick up v1beta2 (we did the same for v1alpha3 + v1alpha4 in the past).

Tasks:

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 24, 2025
@sbueringer sbueringer added kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Feb 24, 2025
@k8s-ci-robot k8s-ci-robot removed needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Feb 24, 2025
@sbueringer
Copy link
Member Author

sbueringer commented Feb 24, 2025

cc @fabriziopandini @chrischdi @enxebre @vincepri @JoelSpeed

(I'll bring it up on Wednesday in the office hours)

@fabriziopandini
Copy link
Member

Thanks for doing the hard work on this issue, +1 for me

@chrischdi
Copy link
Member

Also +1 from my side.

@sbueringer
Copy link
Member Author

I created issues for v1alpha3 / v1alpha4 / v1beta1 removal and updated the task list. I think we can keep this issue closed.

Thx everyone for the quick reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
4 participants