Skip to content

Allow machine controller to function without the need of a cluster object #644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 17, 2019

Conversation

maisem
Copy link

@maisem maisem commented Dec 18, 2018

Release note:

The machine controller can now call the machine actuators with a nil cluster object.
Provider specific actuators should be updated to handle that case.
The primary motivation is to have allow machine reconcilation in the absence of cluster objects.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 18, 2018
@davidewatson
Copy link
Contributor

The alternative is to integrate with the Cluster Registry. We would need a spec and status...

@maisem
Copy link
Author

maisem commented Dec 18, 2018

Not sure I understand how integrating with the cluster registry would work in this context. Can you please explain?

The idea here is to push the responsibility of checking whether the cluster object is nil to the provider specific machine actuator. This allows the machine actuator to take any actions necessary without being completely blocked.

@enxebre enxebre mentioned this pull request Dec 18, 2018
@davidewatson
Copy link
Contributor

davidewatson commented Dec 18, 2018

Fair enough. Please ignore my comment. Cluster status is an orthogonal issue.

Speaking with folks at KubeCon there multiple implementations which do not rely on the Cluster CRD/Controller. And without this change, if a user deletes the Cluster before the Machines they cannot delete the Machines, which results in a poor user experience.

@davidewatson
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 18, 2018
@davidewatson davidewatson mentioned this pull request Dec 18, 2018
@roberthbailey
Copy link

Does this change mean that actuators need to properly handle a nil cluster (which was not previously the case)? I think that should be noted as a release note (since the pre-1.0 releases are typically consumed by providers, not by end users).

@maisem
Copy link
Author

maisem commented Dec 18, 2018

Does this change mean that actuators need to properly handle a nil cluster (which was not previously the case)? I think that should be noted as a release note (since the pre-1.0 releases are typically consumed by providers, not by end users).

Added a release note.

}

// TODO: Assume single master for now.
// TODO: Assume we never change the role for the machines. (Master->Node, Node->Master, etc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've lost these comments, but I agree I don't really see why they belong here


cluster, err := r.getCluster(ctx, m)
if err != nil {
klog.Warningf("Cluster not found, machine creation might fail: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message is going to get annoying for providers that don't need a cluster!

Copy link
Contributor

@davidewatson davidewatson Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the provider should be the one to determine what a nil Cluster reference means for their implementation. Also, at present, it is not just Create() that will fail. Update() and Delete() will too...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increased the verbosity of the log message.

@@ -185,35 +183,6 @@ func (r *ReconcileMachine) Reconcile(request reconcile.Request) (reconcile.Resul
return reconcile.Result{}, nil
}

func (c *ReconcileMachine) create(ctx context.Context, machine *clusterv1.Machine) error {
cluster, err := c.getCluster(ctx, machine)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup!

@dlipovetsky
Copy link
Contributor

This PR was discussed in the meeting.

@dlipovetsky
Copy link
Contributor

An alternative to this change is for Providers to create/maintain the Cluster CRD even if they do not use the Cluster object. This would be an ongoing burden to some Providers, as opposed a one-time change to code that all Providers have to make to handle nil Cluster objects.

Even if this change lands, we still have the option of evaluating the Machine actuator interface in the future.

If there are no objections from Providers, I think this is ok.

@roberthbailey
Copy link

I'd prefer not to force providers to create / maintain a dummy cluster object just to keep the code the same. There was also the point above about deletion order, and I think most providers should be able to handle the delete flow for a machine if the cluster reference is nil.

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from me.

I think the change here is pretty small and allows more providers to operate without a Cluster object. For long term solutions we should consolidate ideas and use cases in #639.

@roberthbailey
Copy link

roberthbailey commented Dec 20, 2018

/assign @sidharthsurana

I think you had the most objections during the meeting yesterday.

@sidharthsurana
Copy link
Contributor

@roberthbailey Thanks a lot for providing me a chance to respond here :)
Looking at this change per say, I am okay with it. However, going over the past conversation involving this PR, I believe my main objection was around the fact that if we make cluster optional, what happens to the information that is indeed needed from the cluster object. For e.g. in the vsphere provider implementation the vCenter server URL and credentials are part of the cluster definition. Thus if the cluster object is missing, then the current vsphere implementation won't work. During the previous meeting the idea of defining a new object that would concretely hold all the common cluster level information (connection info, cloud provider configuration, etc.) was also brought up which I supported. Once something like this exists then the cluster object can truly be optional.

@roberthbailey
Copy link

What we discussed today was that making it optional makes it possible for implementations that don't need a cluster object (such as existing systems like kops) to adopt machines without needing to create a fake cluster object just to pass into the actuator to make it work. I agree that for many of the existing implementations, create won't work if nil is passed in.

@vikaschoudhary16
Copy link
Contributor

+1 for this change, Here i have started making cluster object optional in aws actuator, though WIP at the moment. I am testing these changes. Meanwhile early comments are welcome.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 17, 2019
@roberthbailey
Copy link

/test pull-cluster-api-test

@roberthbailey
Copy link

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 17, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: maisem, roberthbailey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 17, 2019
@roberthbailey
Copy link

I think we have general consensus that we should merge this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants