Skip to content

Define clusterctl move process #1525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fabriziopandini opened this issue Oct 14, 2019 · 14 comments · Fixed by #2060
Closed

Define clusterctl move process #1525

fabriziopandini opened this issue Oct 14, 2019 · 14 comments · Fixed by #2060
Assignees
Labels
area/clusterctl Issues or PRs related to clusterctl kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@fabriziopandini
Copy link
Member

fabriziopandini commented Oct 14, 2019

[2019-11-26 Updated] according to #1730 (comment) Pivot is now rebranded into Move; issue updated accordingly

User Story

As an operator I would like to move cluster objects and all the associated resources (Machines, Machine Depolyments etc.) from the current management cluster to another management cluster for any reason

Detailed Description

The clusterctl CAEP currently in flight assumes the user should brig its own management cluster, so technically the sequence bootstrap cluster -> pivot to -> management cluster is not necessary anymore.

However, the same CAEP consider pivoting a possible answer to different operational needs, e.g because of maintenance or replacement of the management cluster, so pivoting is still in scope.

With v1alpha3 in flight and the new assumptions around clusterctl - one binary for rule all the providers -, the implementation detail should be re-validated, keeping in mind also #1730 (comment) discussion that lead to transforming pivot into move.

Goals

    1. To define a moving process that can work on any clusterctl generated management cluster (with any provider or any combinations of providers)
    • To define how to move provider-specific objects (or eventually hierarchies of objects)
    1. To define conventions/requirements the above process should rely on e.g.
    1. To define move specializations, like e.g. move only clusters in a given namespace or move only a specific cluster
    1. To define eventual move side effects (e.g. If forcing cluster-api controllers to not reconcile resources will be implemented scaling down controllers, all the cluster objects stop to reconcile)
    1. To define preflight checks to be executed before move, e.g.
    • check all the required providers exists in the target cluster (how to identify required providers TBD)
    • other checks e.g. objects already exist in the target cluster

Non-Goals

    1. to support move for DIY management clusters (not created with clusterctl)
    1. to support move for clusters < v1alpha3 (TO BE CONFIRMED)

Anything else you would like to add:

There is a lot of learning from past experiences on pivoting, so I'm pasting below some comments from different threads. Feel free to add more.

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 14, 2019
@fabriziopandini
Copy link
Member Author

from #1065 (comment)

Pivot is complicated for a few reasons, I'm not sure it could be simplified outside of an external tool.

  • cluster-api components need to be moved to a new cluster
  • scaling of the cluster-api controllers from the source cluster need to be scaled down before the target cluster cluster-api controllers are running to avoid multiple controllers running at the same time
  • cluster and machine* objects need to be deleted out of the source cluster without removing the underlying resources (currently called "force delete" and done by removing the finalizers before deleting)
  • cluster and machine* objects need to be created in the right order on the target cluster

from comments in the GDOC for clusterctl redesing proposal

We either need to:

  1. ensure controllers are not running in both source/target management clusters
    or:
  2. Ensure that all individual Machines are moved prior to all MachineSets, which are moved prior to all MachineDeployments. All pre-requisite resources for those will all need to be moved prior as well (Cluster, cluster infra, machine infra templates, machine bootstrap templates, secrets, etc).

@detiber
Copy link
Member

detiber commented Oct 14, 2019

I would also add a potential for:
3) Add an annotation to inform cluster-api controllers to not reconcile resources, that way the annotation could be applied prior to pivoting for all resources, and removed after pivoting all resources.

@fabriziopandini
Copy link
Member Author

@detiber ACK, updated. WRT the implementation, we can consider also the option to scale down controller deployments

@detiber
Copy link
Member

detiber commented Oct 14, 2019

we can consider also the option to scale down controller deployments

100%, I'm just thinking of ways that are potentially less error prone and generally more forgiving to changes such as the one from v1alpha1 to v1alpha2 where we switched from StatefulSets to Deployments.

@ncdc ncdc added this to the v0.3.0 milestone Oct 16, 2019
@ncdc ncdc added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Oct 16, 2019
@fabriziopandini fabriziopandini changed the title Define a generic pivoting process Define clusterctl move process Nov 26, 2019
@fabriziopandini
Copy link
Member Author

@ncdc @vincepri @detiber, considering the fact that pivot changed into move, and that we are going to support partial move (e.g. move of cluster objects existing in a namespace only), IMO the best option for forcing cluster-api controllers to not reconcile resources is to add an annotation to inform cluster-api controllers to not reconcile resources, as suggested by @detiber

WDYT?

@ncdc
Copy link
Contributor

ncdc commented Nov 26, 2019

+1 to annotation

@vincepri
Copy link
Member

Annotation sounds good, there was someone else asking for something similar to pause reconciliation on certain objects, which might be helpful.

@joonas
Copy link

joonas commented Dec 6, 2019

/assign @fabriziopandini

@vincepri vincepri added the area/clusterctl Issues or PRs related to clusterctl label Dec 11, 2019
@fabriziopandini
Copy link
Member Author

Ok, the annotation address the problem of stopping controllers to reconcile objects before move.

However, there is still two problems to be addressed:

  1. How to move "generic" hierarchies of objects for swappable control-plane/infrastructure/bootstrap providers. (in v1alpha2 everything was 1 level unstructured object, now there are cases with nested hierarchies of objects e.g. CAPV, CACPK)
  2. If/How to support objects being shared across clusters

@akutz @ncdc

@akutz
Copy link
Contributor

akutz commented Jan 6, 2020

Thank you @fabriziopandini,

Number one is super important to CAPV. We have numerous resources unknown to CAPI's core CRDs, but our entire graph is still reachable via owner refs. We need the move operation to support descendant discovery via owner refs, or the move will not work for CAPV.

@fabriziopandini
Copy link
Member Author

/lifecycle active
For the clusterctl part

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Jan 7, 2020
@fabriziopandini
Copy link
Member Author

@akutz descendant discovery via owner refs is in flight; I will ping you as soon as ready

@fabriziopandini
Copy link
Member Author

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

this was implemted by

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clusterctl Issues or PRs related to clusterctl kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants