-
Notifications
You must be signed in to change notification settings - Fork 602
feat: external load balancer garbage collection (part 2) - new gc service #3610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: external load balancer garbage collection (part 2) - new gc service #3610
Conversation
@richardcase: This issue is currently awaiting triage. If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test ? |
@richardcase: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-cluster-api-provider-aws-e2e |
pkg/cloud/services/gc/ec2.go
Outdated
eksClusterName := getTagValue(eksClusterNameTag, mapping) | ||
if eksClusterName != "" { | ||
s.scope.V(2).Info("Security group created by EKS directly, skipping deletion", "cluster_name", eksClusterName) | ||
|
||
return nil | ||
} | ||
|
||
//TODO: should we check for the security group name start with k8s-elb- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it confusing that the caller filters the set of resources to pass to the clean up function, e.g., https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/3610/files#diff-ac45a93b8f5e87ab26c8de0dd3165846262b1cb89b0e318586f6ed2b33f61627R127, and then the clean up function filters the set further.
Can we make either the caller, or this function, responsible for all the filtering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller is only filtering based on the resource type (i.e. security-group
). I'm not sure this will be any clearer:
if strings.HasPrefix(parsedARN.Resource, "security-group/") {
eksClusterName := getTagValue(eksClusterNameTag, mapping)
if eksClusterName != "" {
s.scope.V(2).Info("Security group created by EKS directly, skipping deletion", "cluster_name", eksClusterName)
continue
}
s.scope.V(2).Info("Deleting Security group", "arn", parsedARN.String())
return s.deleteSecurityGroup(ctx, &parsedARN, res)
}
But happy to change if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what I am trying to say is this: if I had to call this function, I'm not how I would know that my input is correct.
Right now, the caller needs to filter resources before calling this function, but I think that's not enforced by the compiler.
For example, what happens if I call the function as it currently stands and pass a resource that is NOT a security group? Does groupID
get an invalid value, and then the DeleteSecurityGroupWithContext
call fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, i see what you are saying. I guess i am relying on the fact that the only exportable function is ReconcileDelete
and from here we control what calls what.
The current call to deleteSecurityGroup
is protected with a check (i.e. strings.HasPrefix(parsedARN.Resource, "security-group/")
). But what you are saying is that someone could come along later and make a change in the gc
package somewhere and call deleteSecurityGroup
without this check.
Its a bit late in the day for me...will ponder this tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dlipovetsky - i have changed this now so that the clean up functions get passed the list of all the resources, they then decide to filter the resources and delete if required.
Really nice job, especially with all the new tests, which are complex to write! 🎉 I looked at everything, except the |
@dlipovetsky - i have updated this in line with the latest changes from the proposal (which was updated based on your suggestions). |
/test pull-cluster-api-provider-aws-e2e-eks |
22d651a
to
0f6468c
Compare
Updated based on latest proposal changes....removing the extra finalizer we were using. |
/test pull-cluster-api-provider-aws-e2e-eks |
e72b63b
to
12d5e8a
Compare
80a3a16
to
5cde6ae
Compare
The the cleanup functions now accepted the full list of AWS resources and then they filter which resources they want to delete themselves. Signed-off-by: Richard Case <[email protected]>
5cde6ae
to
c25778c
Compare
Great work! Really cool. :) I can't seem to resolve my comments. :( So sorry about them litering too much. |
Thanks for the review @Skarlso - some great ideas on how to improve the implementation 🙇 |
Started reviewing this one, will leave comments by EOD today. |
Great work @richardcase. |
With 3 different reviews lets merge. We can make changes if needed in the future. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: richardcase The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@richardcase: #3610 failed to apply on top of branch "release-1.5":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-1.5 |
@richardcase: #3610 failed to apply on top of branch "release-1.5":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This introduces a new garbage collection service which will be used later by the controllers to do clean up of AWS resources for workload clusters that were created via the CCM. Initially, we support deleting load balancers (classic elb, nlb, alb), target groups and security groups.
The service is design to be called by the infra cluster reconcilers when a cluster is delete. The entry point is ReconcileDelete into the service.
When a cluster is deleted the cluster controllers will call ReconcileDelete. The first job is to determine if the workload cluster being deleted should be garbage collected. A cluster will be garbage collected if either of these are true:
aws.cluster.x-k8s.io/external-resource-gc
annotation is absentaws.cluster.x-k8s.io/external-resource-gc
annotation exists and its value is set to trueIf a cluster is to be garbage collected then we need to identify the AWS resources that where created by the CCM in the workload cluster. This is done by using the AWS resource tagging api and getting resources with the Kubernetes cluster owned label (i.e.
kubernetes.io/cluster/[CLUSTERNAME]
). The resources that are returned are converted and then the list of them is passed to the defined cleanup functions in order. The order of the functions matter because some AWS resources cannot be delete before others. For example, target groups can only be deleted after load balancers. The cleanup function will then iterate over the resources and decide if it wants to handle that resource and delete it from AWS.The service will be use by a later PR to the controllers and so is initially unused.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Relates #1718
Special notes for your reviewer:
This part 2 of 4 of changes (i.e. a stack) to implement the garbage collection. This is work to split up the original pr #3518
Checklist: