-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Cluster-API Provisioning Mechanism Consolidation Proposal #585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
# Cluster-API Provisioning Mechanism Consolidation Proposal | ||
|
||
## Abstract | ||
|
||
Cluster-API requires implementers to implement provisioning mechanisms (bootstrap scripts and logic for running those scripts) themselves. However, the mechanism, including bootstrap scripts, is same or very similar for all cloud providers. As this is a time consuming task it could be very useful to provide a reusable provisioning mechanism in the upstream Cluster-API. | ||
|
||
## Proposal | ||
|
||
The upstream Cluster-API doesn’t have any bootstrap scripts neither any wrapper around kubeadm for handling cluster operations (initialization and join). This means implementers have to manually write bootstrap scripts used by specific Cluster-API provider and manually handle cluster initialization and node joining. For most cloud providers both bootstrap scripts and steps for cluster initialization/joining are the same, except for provider specifics that could be provided via template variables. | ||
|
||
The bootstrap scripts usually do the following tasks: | ||
|
||
* Install dependencies required by Kubernetes | ||
* Includes dependencies such as socat, ebtables, which are same for all providers. | ||
* Install container runtime | ||
* Steps for installing container runtime such as Docker or containerd are same for all providers. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @roberthbailey: For a specific runtime. But different providers may choose not to implement support for all runtimes. If this code is standardized, does that imply a burden of supporting all runtimes that are upstream on a provider? @alvaroaleman: I think we will provide just one script for one runtime that works, if ppl want something else they can choose not to use that script |
||
* Install Kubernetes packages—kubeadm, kubelet, kubectl, kubernetes-cni | ||
* The required packages and steps to install them are same for all providers. | ||
* Configure kubelet | ||
* There are several differences based on provider, however differences could be templated. | ||
* kubeadm configuration file can handle some configuration tasks for kubelet. | ||
* Initialize cluster or join node to a cluster | ||
* Should be done using kubeadm configuration files, written by implementer or Cluster-API provider user. | ||
|
||
Implementing the provisioning scripts in the upstream Cluster-API ensures: | ||
|
||
* Cluster-API implementers can easily get started, as they don’t need to implement complex provisioning logic and scripts for many various setups and operating systems | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @roberthbailey: How many variations do you expect we will maintain in the cluster-api repo? @xmudrii: I'm not expecting too many variations. For now we should go with basic single-master setup with kubeadm. Regarding Kubernetes versions, depends on how the big difference is and hassle to maintain it. Regarding operating systems, Ubuntu is the most popular one and should be supported, beside it we can choose one more optionally. |
||
* The bootstrap scripts are located in one place, which makes it easier to maintain, distribute and test them | ||
|
||
### Example use case | ||
|
||
Han wants to write a new Cluster-API provider for his employers cloud solution. Han knows the only difference is the cloud providers’ API, hence Han only wants to implement those API calls. Because the Cluster-API provides templates for the actual provisioning, Han can easily import and re-use them and safely assume they will work, as they are well tested. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @roberthbailey: How will we test that a change upstream to these scripts will work on all providers next time they vendor the latest upstream repo? And I'm not sure Han can assume they will "just work" because with required templating Han will need to pass values that are unique to the cloud environment. @alvaroaleman: So IMHO what we can test on the cluster-api main repo is that these scripts work when |
||
|
||
## Goals | ||
|
||
* Cluster-API provides re-usable provisioning scripts for bootstrapping a cluster for the most popular operating systems (Ubuntu, CentOS and CoreOS) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @roberthbailey: It feels like the scripts have two jobs, that we could maybe separate. There is the "install stuff" step, which I think could be argued is mostly the same across providers, and then there is the "configure and run stuff" part, which either needs heavy templating or shouldn't be shared. If we focused on the first part, how much would that go towards solving the issues you are hoping to address? @roberthbailey: This is still my biggest outstanding question about this proposal. Re-reading the doc, it seems like we are going to end up with a lot of templating for the second part, which may not make it particularly easy for "Han" to pick up the scripts and just use them in a new environment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have the same impression that it would be desirable to split in two actions to make easier to reuse the install step and leave the configure part to each provider. |
||
* Provisioning scripts are well tested | ||
|
||
## Non-Goals | ||
|
||
* Implement mechanism for executing the bootstrap scripts, usable in the all scenarios. This is not subject of this proposal, but a follow-up proposal will be written soon | ||
|
||
## Implementation | ||
|
||
### Scope of Implementation | ||
|
||
The provisioning scripts are supposed to do actions described in the Proposal section: install dependencies, container runtime, Kubernetes, and configure Kubelet and Cloud Provider. Initializing Kubernetes cluster and joining nodes a cluster may or may not be part of those scripts. | ||
|
||
Cluster initialization and node joining are going to be configured using Kubeadm configuration files, which provide a big set of options for configuring cluster and all relevant cluster components (such as kube-controller-manager, kube-proxy, etc.). Cluster-API users would write kubeadm configuration files themselves and pass them as a ConfigMap/Secret or pass them to the function for generating userdata. | ||
|
||
Installing any other packages or manifests, including CNI, are not scope of those scripts, neither scope of Cluster-API. Cluster-API is not supposed to be a package manager, so everything needed should be installed as an add-on. | ||
|
||
### Implementation Prerequisites | ||
|
||
1. clusterctl should be able to install CNI. This was discussed on Cluster-API Breakout meeting held on 10/10. [Relevant issue #534](https://github.com/kubernetes-sigs/cluster-api/issues/534) | ||
|
||
### Implementation details | ||
|
||
The implementation will be based on [Go Templates](https://golang.org/pkg/text/template/). That allows implementers to source environment variables, pass data such as IP addresses, host names, and kubeadm tokens, configure Kubelet and Cloud Provider, without any change to the Cluster-API source code. | ||
|
||
Similar approach is used by many projects and providers, including: | ||
|
||
* [Cluster-API provider for DigitalOcean](https://github.com/kubermatic/cluster-api-provider-digitalocean/blob/master/cloud/digitalocean/actuators/machine/userdata.go) | ||
* [Cluster-API provider for GCP](https://github.com/kubernetes-sigs/cluster-api-provider-gcp/blob/master/pkg/cloud/google/metadata.go) | ||
* [Cluster-API provider for AWS](https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/4553a80b6337b4adcc378c07db943772d30fbc78/pkg/cloud/aws/services/ec2/bastion.go) | ||
* [Cluster-API provider for VSphere](https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/master/cloud/vsphere/provisioner/common/templates.go) | ||
|
||
The implementation will be realized using interfaces, meaning that for each supported operating system interface will be implemented. The interface has two functions, for getting user data for Master instances and for workers. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sidharthsurana: But would this mean that we are putting the go templates inside the go code?
Yes, that is something we will always have to do because there are some fields that are unique per machine, e.G. a join token or certificates. The idea to use a CRD to make adjusting them easier is probably a good one, but I think it would be a mechanism that would then work on top of what is proposed here
Well may not be necessary. The way I am thinking is this CRs for the custom CRD above would be a go template with say placeholder like "join token", "certificate" , etc. And the machine controller would read the specified CR and would mash that template with the runtime information (via the machineSpec, clusterSpec, etc.) to render the final script to be used. The point I am trying to get to is if we can avoid changing the go code everytime we want to customize that script, that would be great and make this much more useful @roberthbailey: The GCP implementation moved towards storing these scripts in a config map (to make them easier to change). There is also the option to embed them into a bundle, which might make it easier to share them across providers. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I find the idea of moving scripts as config maps (or better, as CRD) more appealing, because it will offer an standard operational interface (how to setup/update configuration) without making any strong assumption about the specific OS or cloud platform to be supported. IMHO this should be the main goal: define the operational interface. Another example of this approach, besides GCP previously mentioned is Gardener's Operating System controller used to setup CoreOS nodes. Offering a reference implementation is great, but I don't see it as the biggest benefit. This may be because In my case, I plan to use a different distribution than suggested (Ubuntu) and it is unlikely I could reuse the scripts. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I second this. Let's figure out the "standard operational interface (how to setup/update configuration)." |
||
|
||
Both functions should take the following arguments: | ||
|
||
* Environment variables, provided as a string map and then parsed and appended on the top of script. | ||
* Kubernetes version, including Kubelet, kubeadm, kubernetes-cni versions. | ||
* Kubelet and Cloud Provider configuration files | ||
* To be discussed how we want to provide those: | ||
* Kubeadm configuration file. Do we want to be able to pass it clusterctl, as a Secret/ConfigMap, or as a function parameter? | ||
|
||
```go | ||
type Userdata interface { | ||
MasterUserData(envVars map[string]string, kubeVersion string, cniVersion string) (string, error) | ||
NodeUserData(envVars map[string]string, kubeVersion, cniVersion string) (string, error) | ||
} | ||
``` | ||
|
||
### Decisions to be made | ||
|
||
* What Kubernetes versions should be supported? | ||
* Kubernetes supports 3 version at the same time. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @roberthbailey: I just reviewed kubernetes-sigs/cluster-api-provider-gcp#54 where we updated bootstrap scripts to a different k8s version and the scripts had to be changed. And if we look at the history of kubeadm, I expect constant changes in the flags / configuration that we need to pass into kubeadm itself as well as the system components. We are constantly adding new flags / behavior to k8s that needs to be configured as we move to new releases. That needs to be reflected somewhere in the configuration (at least for the control plane components). |
||
* It should be possible to provide scripts for all 3 versions. Usually there are no changes in the bootstrap scripts between older and newer versions. | ||
* What operating systems should be supported? The following three are supported by most of cloud providers: | ||
* Ubuntu 16.04 and Ubuntu 18.04. Very popular and very widely used | ||
* Container Linux | ||
* CentOS 7 | ||
* What container engine should be set up by the scripts? | ||
* Docker - very popular and very widely used in production. | ||
* containerd - newer solution, seems to work well. Used by AWS provider. | ||
* Opinion: we should choose one and stay with it. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sflxn: Where would we keep scripts used to set up other container runtimes? I assume you're not suggesting we just choose one and only support that, right? Strongly isolated container runtimes are coming. @roberthbailey: Also, even if we pick one today we will need a way to upgrade to a newer version or a different one in the future. For the GCP provider we will want to track what the node folks qualify on GCE/GKE even if that isn't what is being tested on other environments. @alvaroaleman: The idea here would be to provide something that "just works" and can be used as a default, if ppl decide they want another runtime they can decide to not use that script or add another script template for that other runtime |
||
|
||
## Alternatives Considered | ||
|
||
* Custom images | ||
* Positive sides: | ||
* Cluster-API implementers can build images and redistribute them to the end users. | ||
* Images have all components preinstalled and preconfigured. Usually no further interaction is needed from the user side. | ||
* It is possible to use various frameworks and utilities for building, versioning and testing images. | ||
* This approach is used by Cluster-API provider for AWS. | ||
* Negative sides: | ||
* Heavily depends on the provider’s ability to host and use custom images. This feature is still not available or not fully implemented for many providers. | ||
* Building and releasing images is time consuming. If you use images built by implementer you must fully trust the image. | ||
* It is not possible to modify an image without rebuilding and hosting it yourself. | ||
* Using Custom images can introduce additional costs depending on the cloud provider. | ||
* The build and publish process is unique for each cloud provider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roberthbailey: What about a provider that decided not to build on top of kubeadm? The kops adoption of the machine types isn't going to use kubeadm for node joining, since kops already has node joining.
@alvaroaleman: The purpose of this scripts is to provide a working default that can be used, if ppl decide they want it different they are free to build and use something else
@roberthbailey: Have we considered putting the common bootstrapping scripts into a bundle definition that can be reused instead of adding code into the upstream repo?
@xmudrii:
I don't think we have considered that at all.
I'm not really sure what bundle definition is and how it looks like, so would you mind giving me an example or short introduction?