Separate Machine APIs from Cluster APIs #490

enxebre · 2018-09-05T15:12:44Z

I'd like to bring up a conversation about evolving a Machine APIs at a separate cadence from the Cluster API. This ticket aims to facilitate and track a discussion around beta requirements for having a machine API distinctly from beta requirements for the Cluster API project as a whole. A beta API for machines should enable provisioning and deprovisioning machines via Kubernetes-style API, support basic health management, and clear policies for handling scale up and down of sets

Proposed changes:

Evolve the Machine API at a separate cadence from Cluster API
Split machine types into separate API group (machines.k8s.io) from clusters (cluster.k8s.io) in order to support the ability to serve the types that are only in that group, enable gradual adoption of types in more deployment environments.
Machine health checking
Machine deletion strategy
Make optional fields machine.spec.versions, machine.status.versions
Remove cluster assumption from actuators. Move from (Cluster, Machine) tuple to something like (Context, Machine)

enxebre · 2018-09-05T15:15:41Z

ping @roberthbailey @hardikdr @ingvagabund @bison @vikaschoudhary16 @detiber

dlipovetsky · 2018-09-05T17:58:58Z

I'd like to understand what the Context object would be. If it does not have all the fields from the Cluster object, will it be able to provide the information wanted by all different machine actuators?

dlipovetsky · 2018-09-05T18:15:21Z

If we want to remove the cluster assumption from the machine actuator, there is an alternative to introducing a Context object: We can change the machine controller so that it passes only the Machine object to the actuator.

The actuator would need to identify and get the Cluster object from the API (Both are done by the machine controller today). To identify it, the actuator could follow a reference to the Cluster object (adding this reference to the Machine object has been discussed in #41 and #252).

detiber · 2018-09-05T19:36:00Z

If we want to remove the cluster assumption from the machine actuator, there is an alternative to introducing a Context object: We can change the machine controller so that it passes only the Machine object to the actuator.

I also wonder how much this actually accomplishes, considering that the provider implementations are still very much in flux, it seems like this Context object would still need to pass in the full ProviderConfig and ProviderStatus from the cluster object to ensure that actuator implementations are able to retrieve the necessary config/status they might require.

enxebre · 2018-09-10T10:59:58Z

@dlipovetsky that sounds reasonable.

Remove the cluster assumption from the actuator interface https://github.com/kubernetes-sigs/cluster-api/blob/master/pkg/controller/machine/actuator.go#L25
Actuators can fetch the cluster object or use self contained machines. Each specific actuator implementation might/might-not choose to retrieve the cluster from the given machine object
Separately keep discussion on how this fetch/link would happen. We can even keep this assumption for now https://github.com/kubernetes-sigs/cluster-api/blob/master/pkg/controller/machine/controller.go#L207
Or implement label/owner references similar to how machines link to machineSets as suggested in Establish a stronger link between Machines, MachineSets, and MachineDeployments with their Cluster. #41

Machines api seems close to feature complete. Having a dedicated API group and dropping the controller coupling will likely allow us to evolve it quicker and make it consumable by tools needing beta status which will also favour adoption of the project

cc @dgoodwin @csrwng wdyt?

csrwng · 2018-09-10T14:46:46Z

Remove the cluster assumption from the actuator interface

Separately keep discussion on how this fetch/link would happen.

@enxebre I agree, this would allow us to separate concerns.

Given that the general agreement on referencing the cluster from the machine resource was to make it an optional reference, it makes sense that the actuator implementation be where the link is made if necessary.

hardikdr · 2018-09-14T13:14:10Z

IMHO we should try to avoid reference to cluster-object from actuator-interface in machine-controller: ref
I can see following supporting pointers:

Users might actually be willing to use the machine-api stack, machine-* controllers indepedent of cluster-controller.
- I can recall this discussed when someone asked about running machine-controller against existing GKE cluster, and not worry about cluster-object.
From a design standpoint, it would be nice to expect each controller to make use of the objects from the same or below layer: only Downward-depedency.
- For instance, machineset-controller should have visibility of only MachineObjects and MachineSetObjects.
- This should allow us to even fine-grain the responsibilities across controller-layers and offers freedom to users of choosing very specific subset of controllers if users don't intend to use higher-level controllers.

Although, regarding overall flow, we could expect cluster-controller to generate necessary MachineDeployment and MachineClass objects with fully-contained providerConfig in it. Further MachineDeployment controller should process MDObject and generate MSetObjects which will in turn generate MachineObjects.
Actuator-interface would only require MachineClass/ProviderConig & MachinObject as parameter, which should be sufficient for actual machine-lifecycle.

derekwaynecarr · 2018-09-17T00:51:47Z

@enxebre we need to also determine how we coordinate draining nodes prior to removal of the machine. not having consistent drain will cause problems with stateful applications.

roberthbailey · 2018-10-01T11:14:11Z

@derekwaynecarr - I believe that the drain logic is intended to be part of the common code in the machine controller, with the goal of making it consistent. I don't recall if it's in there yet or not.

enxebre · 2018-10-01T11:46:32Z

@roberthbailey @derekwaynecarr we are currently patching the core machine Controller for draining the node on a delete event. For this to work we are also running a custom controller which links nodes with machines and ensures this is filled machine.Status.NodeRef so the right node can be drained hence the need for #497

roberthbailey · 2018-11-02T08:14:21Z

The drain logic definitely isn't in the machine controller yet. We should add it. I might be able to put together a PR for this in the next week or so, since we have drain logic floating around in a number of places already.

enxebre · 2018-11-02T08:31:17Z

@roberthbailey we've been patching the core controller to include the logic here https://github.com/openshift/cluster-api-provider-aws/pull/60/files
I'm happy to create a PR of the likes here if there's interest. We should probably also to consider consume this logic from a common place, e.g similar to https://github.com/openshift/kubernetes-drain

roberthbailey · 2018-11-02T08:40:27Z

Ideally we will get server side drain at some point... I'll take a look at the OS drain code; it's probably similar to what we have and is already is an easily consumable package.

fejta-bot · 2019-04-28T11:15:01Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

vincepri · 2019-04-28T16:34:18Z

/remove-lifecycle stale

vincepri · 2019-06-10T15:30:11Z

/area api

fejta-bot · 2019-09-08T15:34:39Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

vincepri · 2019-09-26T18:14:16Z

Closing this issue given that the project's scope and objectives specifies that this is out of scope

/close

k8s-ci-robot · 2019-09-26T18:14:17Z

@vincepri: Closing this issue.

In response to this:

Closing this issue given that the project's scope and objectives specifies that this is out of scope

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

davidewatson mentioned this issue Oct 29, 2018

Initial draft GitBook. #566

Merged

hardikdr mentioned this issue Nov 2, 2018

REQUEST: New membership for @hardikdr kubernetes/org#216

Closed

6 tasks

davidewatson mentioned this issue Nov 25, 2018

0.1.0 release #595

Closed

enxebre mentioned this issue Dec 18, 2018

Machine api #645

Closed

roberthbailey changed the title ~~machine api beta - exit criteria~~ Separate Machine APIs from Cluster APIs Jan 11, 2019

roberthbailey added this to the Next milestone Jan 11, 2019

roberthbailey added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jan 11, 2019

dlipovetsky mentioned this issue Jan 28, 2019

REQUEST: New membership for dlipovetsky kubernetes/org#426

Closed

6 tasks

maisem mentioned this issue Jan 30, 2019

Dissociating Machines from Kubernetes #721

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 28, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 28, 2019

k8s-ci-robot added the area/api Issues or PRs related to the APIs label Jun 10, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 8, 2019

k8s-ci-robot closed this as completed Sep 26, 2019

enxebre mentioned this issue Feb 5, 2020

REQUEST: New membership for enxebre kubernetes/org#1614

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate Machine APIs from Cluster APIs #490

Separate Machine APIs from Cluster APIs #490

enxebre commented Sep 5, 2018 •

edited

Loading

enxebre commented Sep 5, 2018 •

edited

Loading

dlipovetsky commented Sep 5, 2018 •

edited

Loading

dlipovetsky commented Sep 5, 2018 •

edited

Loading

detiber commented Sep 5, 2018

enxebre commented Sep 10, 2018

csrwng commented Sep 10, 2018

hardikdr commented Sep 14, 2018 •

edited

Loading

derekwaynecarr commented Sep 17, 2018

roberthbailey commented Oct 1, 2018

enxebre commented Oct 1, 2018 •

edited

Loading

roberthbailey commented Nov 2, 2018

enxebre commented Nov 2, 2018 •

edited

Loading

roberthbailey commented Nov 2, 2018

fejta-bot commented Apr 28, 2019

vincepri commented Apr 28, 2019

vincepri commented Jun 10, 2019

fejta-bot commented Sep 8, 2019

vincepri commented Sep 26, 2019

k8s-ci-robot commented Sep 26, 2019

Separate Machine APIs from Cluster APIs #490

Separate Machine APIs from Cluster APIs #490

Comments

enxebre commented Sep 5, 2018 • edited Loading

enxebre commented Sep 5, 2018 • edited Loading

dlipovetsky commented Sep 5, 2018 • edited Loading

dlipovetsky commented Sep 5, 2018 • edited Loading

detiber commented Sep 5, 2018

enxebre commented Sep 10, 2018

csrwng commented Sep 10, 2018

hardikdr commented Sep 14, 2018 • edited Loading

derekwaynecarr commented Sep 17, 2018

roberthbailey commented Oct 1, 2018

enxebre commented Oct 1, 2018 • edited Loading

roberthbailey commented Nov 2, 2018

enxebre commented Nov 2, 2018 • edited Loading

roberthbailey commented Nov 2, 2018

fejta-bot commented Apr 28, 2019

vincepri commented Apr 28, 2019

vincepri commented Jun 10, 2019

fejta-bot commented Sep 8, 2019

vincepri commented Sep 26, 2019

k8s-ci-robot commented Sep 26, 2019

enxebre commented Sep 5, 2018 •

edited

Loading

enxebre commented Sep 5, 2018 •

edited

Loading

dlipovetsky commented Sep 5, 2018 •

edited

Loading

dlipovetsky commented Sep 5, 2018 •

edited

Loading

hardikdr commented Sep 14, 2018 •

edited

Loading

enxebre commented Oct 1, 2018 •

edited

Loading

enxebre commented Nov 2, 2018 •

edited

Loading