-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Allow machine controller to function without the need of a cluster object #644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The alternative is to integrate with the Cluster Registry. We would need a spec and status... |
Not sure I understand how integrating with the cluster registry would work in this context. Can you please explain? The idea here is to push the responsibility of checking whether the cluster object is nil to the provider specific machine actuator. This allows the machine actuator to take any actions necessary without being completely blocked. |
Fair enough. Please ignore my comment. Cluster status is an orthogonal issue. Speaking with folks at KubeCon there multiple implementations which do not rely on the Cluster CRD/Controller. And without this change, if a user deletes the Cluster before the Machines they cannot delete the Machines, which results in a poor user experience. |
/lgtm |
Does this change mean that actuators need to properly handle a nil cluster (which was not previously the case)? I think that should be noted as a release note (since the pre-1.0 releases are typically consumed by providers, not by end users). |
Added a release note. |
} | ||
|
||
// TODO: Assume single master for now. | ||
// TODO: Assume we never change the role for the machines. (Master->Node, Node->Master, etc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've lost these comments, but I agree I don't really see why they belong here
pkg/controller/machine/controller.go
Outdated
|
||
cluster, err := r.getCluster(ctx, m) | ||
if err != nil { | ||
klog.Warningf("Cluster not found, machine creation might fail: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message is going to get annoying for providers that don't need a cluster!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the provider should be the one to determine what a nil Cluster reference means for their implementation. Also, at present, it is not just Create() that will fail. Update() and Delete() will too...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increased the verbosity of the log message.
pkg/controller/machine/controller.go
Outdated
@@ -185,35 +183,6 @@ func (r *ReconcileMachine) Reconcile(request reconcile.Request) (reconcile.Resul | |||
return reconcile.Result{}, nil | |||
} | |||
|
|||
func (c *ReconcileMachine) create(ctx context.Context, machine *clusterv1.Machine) error { | |||
cluster, err := c.getCluster(ctx, machine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup!
This PR was discussed in the meeting. |
An alternative to this change is for Providers to create/maintain the Cluster CRD even if they do not use the Cluster object. This would be an ongoing burden to some Providers, as opposed a one-time change to code that all Providers have to make to handle nil Cluster objects. Even if this change lands, we still have the option of evaluating the Machine actuator interface in the future. If there are no objections from Providers, I think this is ok. |
I'd prefer not to force providers to create / maintain a dummy cluster object just to keep the code the same. There was also the point above about deletion order, and I think most providers should be able to handle the delete flow for a machine if the cluster reference is nil. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from me.
I think the change here is pretty small and allows more providers to operate without a Cluster object. For long term solutions we should consolidate ideas and use cases in #639.
/assign @sidharthsurana I think you had the most objections during the meeting yesterday. |
@roberthbailey Thanks a lot for providing me a chance to respond here :) |
What we discussed today was that making it optional makes it possible for implementations that don't need a cluster object (such as existing systems like kops) to adopt machines without needing to create a fake cluster object just to pass into the actuator to make it work. I agree that for many of the existing implementations, create won't work if nil is passed in. |
+1 for this change, Here i have started making cluster object optional in aws actuator, though WIP at the moment. I am testing these changes. Meanwhile early comments are welcome. |
/test pull-cluster-api-test |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: maisem, roberthbailey The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I think we have general consensus that we should merge this. |
Release note: