Skip to content

Drain workload clusters of service Type=Loadbalancer on teardown #3075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
randomvariable opened this issue May 19, 2020 · 28 comments
Closed
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@randomvariable
Copy link
Member

User Story

As a cluster operator, I delete a cluster (e.g. in AWS), but my cluster won't delete infrastructure objects because developers had created services of Type=LoadBalancer that consumed the cluster resources, the cloud provider controller has already been deleted and therefore can't remove them and I have to manually clean up the resources.

Detailed Description

An extension of kubernetes-sigs/cluster-api-provider-aws#1718, there is logic in CAPA to delete all ELBv1s matching the cluster type, but as soon as users add a different loadbalancer provisioner, say NLBs or ALBs, we would have to add additional code to CAPA to handle this.

This could be extended in the future to other AWS resources.

We might want to think of a mechanism to in the first instance tear down services of Type=LoadBalancer in the workload cluster prior to destroying the cluster.

Anything else you would like to add:

[Miscellaneous information that will assist in solving the issue.]

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label May 19, 2020
@vincepri
Copy link
Member

vincepri commented May 19, 2020

This is an interesting one, @detiber @ncdc @CecileRobertMichon thoughts on getting this in v0.3.x? It could be an optional flag on the Cluster object. I'd discuss it at a community meeting as well to get more thoughts

@detiber
Copy link
Member

detiber commented May 19, 2020

@vincepri we could also likely introduce it as an experimental command line flag to start with rather than modifying the Cluster object to start.

@randomvariable
Copy link
Member Author

randomvariable commented May 19, 2020

Feature gate seems appropriate to me. I've already put it down on the agenda for 2020/05/20.

@ncdc
Copy link
Contributor

ncdc commented May 20, 2020

We had talked about this previously as an alternative to infra providers having to manage this. I ok experimenting with it.

@vincepri
Copy link
Member

/milestone Next

@k8s-ci-robot k8s-ci-robot added this to the Next milestone May 20, 2020
@CecileRobertMichon
Copy link
Contributor

I looked into this a bit more and I think this doesn't apply to CAPZ. Azure has resource groups so all the resources for the cluster are contained within a resource group, including the load balancer created by cloud provider for exposing services. Also, the Azure vnet doesn't have any dependencies on the load balancer, only network interfaces do so I think since the AzureMachines get deleted first, it's all good. When the resource group gets deleted, all the remaining resources also get deleted including the LB and public IP associated with the service. The only scenario where resources created out of band wouldn't be deleted is if the user brings their own resource group, in which case the resource group is "unmanaged" and CAPZ won't delete it, so LBs created by cloud provider and any other resources manually created by the user won't get deleted, but I think that's okay since CAPZ can still delete all the Azure resources it created and manages.

I just tried to repro this to make sure and I was able to create a CAPZ cluster, expose a simple service type LB (verified that a LB got provisioned in my resource group) and then delete the cluster with kubectl delete cluster, and all resources were deleted without a problem.

@randomvariable
Copy link
Member Author

Thanks, I thought resource groups, that feature if only AWS had :( would mitigate having to do this in Azure.

If this is really AWS specific, but also in CAPA we don't want to have to sync with any number of exotic AWS LB provisioners, then perhaps a variant of @michaelgugino 's machine lifecycle hooks proposal is relevant, but with CAPA issuing the deletes to loadbalancer type services in the workload cluster.

@voor
Copy link
Member

voor commented May 26, 2020

I've been kicking this around in the old noggin, I'm wondering if we're taking what might become a more significant issue and trying to band-aid fix it. As more external dependencies are impacted by clusters, perhaps there should be a better way to handle graceful destruction of stuff? I can't think of any immediate other comparables, but this is something that has come up in carvel-dev/kapp-controller#7 since certain things that were dependent on the cluster are not continually trying to remediate stuff that no longer exists.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 24, 2020
@vincepri
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 24, 2020
@yastij
Copy link
Member

yastij commented Aug 25, 2020

this issue should be generic to other first-class kubernetes objects such as PVs

@detiber
Copy link
Member

detiber commented Aug 26, 2020

this issue should be generic to other first-class kubernetes objects such as PVs

I think we need to be much more conservative around PVs, especially since it could result in unexpected data loss.

@randomvariable
Copy link
Member Author

💯 Intent around reclaim policies during the lifetime of a cluster and at the end of its life is tricky. We definitely should not delete PVs which were part of PVCs that don't have a delete reclaim policy.

@maplain
Copy link

maplain commented Oct 8, 2020

I think a generic solution is needed as long as people use an agent in a cluster to manage external infrastructure resources. My personal understanding is this is not CAPI specific, more like a mismatch between cluster lifecycle and the external infrastructure resource lifecycles. The former is bound to the "Cluster" CR(or other constructs in other scenarios) whereas the latter is bound to the agent(Pod or CNI?) running in the Cluster.

The disconnection is between the "Cluster" CR's lifecycle events and the agent.

Most of the time a simple signal to the agent might be enough, like the existing mechanism: non-blocking preStop hook, SIGTERM + graceperiod + SIGKILL.
But sometimes though, it should be rich enough for the receiver to distinguish various lifecycle events originated from different sources. eg: a stateless agent receives a SIGTERM, should it clean up all resources on the external infrastructure or do nothing? If it's a cluster shutdown, it should go with the former, but if it's initiated by a rolling update, it should probably go with the latter.

There are three high-level approaches I can think of: rich "signaling" to the in-cluster subscriber, a higher level construct like Azure's resource group mentioned above, or a webhook like entitiy sitting in the critical path for executing various hooks.
The 2nd makes sense when it's easier to wrap everything's lifecycles into one big envelop, but it'll break the moment user goes beyond its boundary, eg: using another resource group, or provision resources on another cloud.
The 1st builds an interface and requires the agent's support. The 2nd and 3rd make good managed solutions and keep the agent simple(if not moving the logic around).

For concrete examples:

  1. could be implemented as a CAPI-level controller, adding/removing a finalizer on Cluster, communicating with the in-cluster agent out-of-band;
  2. Azure's resource group like mentioned above;
  3. I've seen a proxy-based implementation in a non-CAPI world for SDN resources cleanup(not only LBs)

@maplain
Copy link

maplain commented Oct 21, 2020

Would something like a ClusterResourceGarbageCollector help in this situation?

This controller simply adds a finalizer on the Cluster CR when it's created, and creates a Provider-specific CR to request GC any external resources for that Cluster when it's being deleted. Status will be synced from the Provider-specific CR to the Cluster and once everything finishes successfully, the finalizer will be removed. Of course, this description over-simplifies the question a bit. But would love to hear your thoughts before going any further.

@ncdc @vincepri @yastij what do you think?

@detiber
Copy link
Member

detiber commented Oct 21, 2020

@maplain I think that could indeed be a potential solution. My main concern would be that it seems to externalize the responsibility to users. I think it would be a fine approach for the short term, I would just like for us to continue to pursue a better solution longer term.

I'm not sure we have a good way within Cluster API itself to know what all types could exist that would need to be cleaned up on Cluster deletion (and which explicitly not to, such as the case for certain persistent storage).

Ideally I would love to see this be something that k8s provided first class support for (CRDs could register themselves as managing external resources along with appropriate deletion polices related to the cluster's lifecycle), way to trigger cleanup of said resources from a centralized endpoint, and a way of verifying that operations are completed.

Then it would just be a matter hand waving of us invoking the endpoint for cleanup, and waiting for things to complete (with a possible timeout). We would have to make sure that this process doesn't yank the infrastructure we are running on out from underneath us, but it prevents us putting the onus on users to explicitly configure what types of resources to cleanup.

@maplain
Copy link

maplain commented Oct 21, 2020

thx for your reply @detiber !

yes. You're right. I was thinking that's something users have to configure at the very beginning, eg: the --infrastructure flag when initializing a management cluster.
I guess to make it more flexible(or complex), a ConfigMap could be added so the CRGC can discover various providers in runtime.

The CRD approach is very interesting. If my understanding is correct, it's similar to the ConfigMap mentioned above in that it allows the discovery of providers for a certain lifecycle event. What's better is it's "decentralized" so CAPI controllers are agnostic of those providers, who register themselves in a "self-service" way as a "cleanup" service.
This requires the API to support semantics like: expressing the relationship between objects, expressing action/hook/lifecycle event/trigger on that relationship, policies on the trigger etc.
Does this understanding match what you have in mind?

Also, do you mind sharing some pointers so I could study more in this direction? Thx in advance!

@detiber
Copy link
Member

detiber commented Oct 22, 2020

@maplain I wasn't necessarily thinking in terms of trying to encode the dependencies in the API as much as just a simple way to trigger deletion automatically. Dependency management, ordering, etc can be left to the individual controllers, which could leverage finalizers and handle things as they need to.

I don't necessarily have any pointers to give, other than any such proposal would need to be created against Kubernetes itself and would likely involve SIG Cluster Lifecycle and SIG API Machinery.

@maplain
Copy link

maplain commented Oct 23, 2020

gotcha. While thinking about ClusterResourceGarbageCollector, I discovered #3132, which seems a lot like what I want. Will play around with it first.

@vincepri
Copy link
Member

/assign @randomvariable
to do some research

@sbueringer
Copy link
Member

/cc @chrischdi Not sure if it's already implemented in CAPO, but it sounds like the same problem.

@vincepri
Copy link
Member

/milestone v1.2

@yastij to start a document to outline the scope of the work

@k8s-ci-robot k8s-ci-robot modified the milestones: Next, v1.2 Feb 11, 2022
@fabriziopandini fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini fabriziopandini removed this from the v1.2 milestone Jul 29, 2022
@fabriziopandini fabriziopandini removed the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini
Copy link
Member

/triage accepted
/unassign @randomvariable
/help

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/triage accepted
/unassign @randomvariable
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Sep 30, 2022
@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jan 19, 2024
@lubronzhan
Copy link
Contributor

Taking a look 👀

@fabriziopandini
Copy link
Member

/close

This issue has not been updated in over 1 year and as discussed above there are provider's specific considerations making it very complex to find a generalized solution, unless someone invests time in this effort.

Also, as noted above there are existing options to allow each provider to handle deletion of resources/explore this problem space, e.g.:

  • using finalizers
  • using lifecycle hooks

We can re-open (or re-create) whenever there are the conditions and the required community consensus to work on it

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

/close

This issue has not been updated in over 1 year and as discussed above there are provider's specific considerations making it very complex to find a generalized solution, unless someone invests time in this effort.

Also, as noted above there are existing options to allow each provider to handle deletion of resources/explore this problem space, e.g.:

  • using finalizers
  • using lifecycle hooks

We can re-open (or re-create) whenever there are the conditions and the required community consensus to work on it

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests