-
Notifications
You must be signed in to change notification settings - Fork 602
feat: external load balancer garbage collection (part 1) - proposal #3609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: external load balancer garbage collection (part 1) - proposal #3609
Conversation
@richardcase: This issue is currently awaiting triage. If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
tag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First initial read-through. :D
I will review the structure next, but I think this is looking good so far. :)
This is the proposal for the new garbage collection functionality. It was written post implementation to make sure we captured the intent of the change and to aid with the review. Even though its written post implementation it may merge first as the gc work has been split into smaller PRs to aid review. Signed-off-by: Richard Case <[email protected]>
23f9b93
to
91bd9f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial review - looks good with minor nits.
One question - I have noticed the term, a child or a tenant cluster is used. Is there a reason that these terms are used instead of a workload cluster, which is commonly used in CAPI (i.e. CAPI quickstart guide)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I looked at the original PR, I formed a different model, perhaps by mistake. At the time, I said:
The garbage collection controller always applies its finalizer, and always removes it, but it decides whether to perform garbage collection based on the annotation.
-- #3518 (comment)
This proposal describes a different mental model: The annotation controls whether the finalizer is applied.
I find this model more complex, because it allows these scenarios:
- User creates cluster without the gc annotation.
- Controller applies the gc finalizer.
- User wants to disable the gc, so they add the gc annotation, its value set to
false
. - User deletes the cluster.
Observe that the gc runs during step 4, even though the user might believe that what they do in step 3 disables the gc.
- User creates cluster with the gc annotation, its value set to
false
. - Controller does not apply the gc finalizer.
- User wants to enable the gc, so they remove the gc annotation, or modify its value to
true
. - User deletes the cluster.
Observe that the gc does not run during step 4, even though the user might believe that what they do in step 3 enables the gc.
- User creates cluster with the gc annotation, its value set to
false
. - Controller does not apply the gc finalizer.
- User wants to enable the gc, so they add the gc finalizer.
- User deletes the cluster.
Observe that the gc does run during step 4, just as the user wants, even though the annotation's value is false
.
I think these scenarios violate the Principle of Least Surprise.
What do you think of the model I had in mind, where the controller always applies the gc finalizer, but, at the time it reconciles the cluster delete, evaluates the annotation and decides how to act on the gc finalizer, i.e. to clean up AWS resources, or to skip them.
I think another way of distinguishing the two models is this: In the proposal, the user decides whether to enable the gc when they create the cluster. In the model I describe, the user decides whether to enable the gc at any time before they delete the cluster.
Thanks @pydctw . No reason other than we use that term a lot.....i have changed it to workload. |
I agree this is a better model. I think we could simplify the implementation further. As we aren't using a separate controller and the GC logic is being added to the existing reconciliation of the awsc & awsmcp then we don't specifically need another finalizer. If the gc feature is enabled, in the reconcileDelete of the 2 controllers we will call the gc service. The gc service will then decide at that point if it should do GC based on the existence of the annotation. Although having a separate finalizer does allow us to remove it after doing gc so that if reconcileDelete occurs again we don't try and do gc again. |
596be6f
to
e9a07c1
Compare
@dlipovetsky - i have updated the proposal with the suggested model changes. Any chance you could take a look? |
Agreed!
👍
While we can save a few "discover" API calls if we skip, I would prefer to call the gc every time; it has to be idempotent, since the controller could crash at any time. |
Looks very good! Thank you for incorporating the feedback. I think not using a separate finalizer would be an improvement, so I'll wait to see if you want to make that change before giving an lgtm. |
That was some very good thinking @dlipovetsky. I shall endeavour to be more vigilant. :) |
e9a07c1
to
37cb3fe
Compare
@dlipovetsky @Skarlso - i have now removed the use of the additional finalizer based on the discussion above. |
I'll re-read the entire thing in a bit with that in mind. |
Seems like you have the same ci problem as this PR: #3483 |
/retest |
/test pull-cluster-api-provider-aws-test |
This proposal has been changed so that a user can decide if they want a workload cluster to opt in or out of garbage collection anytime before its deleted. The default is that if the gc feature is enabled then a workload cluster will be deleted. This change is based on feedback from review: To quote dlipovetsky: "In the proposal, the user decides whether to enable the gc when they create the cluster. In the model I describe, the user decides whether to enable the gc at any time before they delete the cluster." Signed-off-by: Richard Case <[email protected]>
37cb3fe
to
4d18d09
Compare
Thanks for a great proposal! /lgtm I just had one more thought: Would we ever want the user to choose which resources are garbage collected? If so, we may want to include the resource category in the annotation key. For example,
Since this is an experimental feature, I think we can make this change later, if necessary. |
Yes i think we would and have created #3541 to look at this type customization. |
Thanks everyone for taking the time to review. And special thanks to @dlipovetsky for his diligence in the review and helping to drive us to a better overall solution 🙇 /approve And we will want all GC change in 1.5 so: /cherry-pick release-1.5 |
@richardcase: once the present PR merges, I will cherry-pick it on top of release-1.5 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: richardcase The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@richardcase: new pull request created: #3625 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Very nice work! Sorry I didn't get to re-reading it. :/ |
What type of PR is this?
/kind documentation
What this PR does / why we need it:
This is the proposal for the new garbage collection functionality. It was written post implementation to make sure we captured the intent of the change and to aid with the review.
Even though its written post implementation it may merge first as the gc work has been split into smaller PRs to aid review.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Relates #1718
Special notes for your reviewer:
This part 1 of 4 of changes (i.e. a stack) to implement the garbage collection. This is work to split up the original pr #3518
Checklist: