Skip to content

⚠️ Add ClusterClass types #4928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions PROJECT
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ resources:
- group: cluster
kind: MachineDeployment
version: v1alpha3
- group: cluster
kind: ClusterClass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is our strategy for introducing new CRDs in terms of API versions? We should discuss, there was also #4917 on the topic

seems like it'd make more sense to introduce new APIs at alpha once we move to beta for the rest of the project.

Copy link
Member Author

@fabriziopandini fabriziopandini Jul 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 in defining policies, but I think the two usa case have a key difference.
In the case of ClusterClass we are adding a new type to the main CAPI group, so it shares the API version of the existing types (the strategy is implicit here). In the case of the operator instead we are creating a new, separated API group so we have freedom to choose a different API version.

version: v1alpha4
- group: cluster
kind: Cluster
version: v1alpha4
Expand Down
23 changes: 21 additions & 2 deletions api/v1alpha3/conversion.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,16 @@ func (src *Cluster) ConvertTo(dstRaw conversion.Hub) error {
conditions.MarkTrue(dst, v1alpha4.ControlPlaneInitializedCondition)
}

// Manually restore data.
restored := &v1alpha4.Cluster{}
if ok, err := utilconversion.UnmarshalData(src, restored); err != nil || !ok {
return err
}

if restored.Spec.Topology != nil {
dst.Spec.Topology = restored.Spec.Topology
}

return nil
}

Expand All @@ -53,6 +63,11 @@ func (dst *Cluster) ConvertFrom(srcRaw conversion.Hub) error {
dst.Status.ControlPlaneInitialized = true
}

// Preserve Hub data on down-conversion except for metadata
if err := utilconversion.MarshalData(src, dst); err != nil {
return err
}

return nil
}

Expand Down Expand Up @@ -222,8 +237,12 @@ func (dst *MachineHealthCheckList) ConvertFrom(srcRaw conversion.Hub) error {
return Convert_v1alpha4_MachineHealthCheckList_To_v1alpha3_MachineHealthCheckList(src, dst, nil)
}

// Convert_v1alpha3_Bootstrap_To_v1alpha4_Bootstrap is an autogenerated conversion function.
func Convert_v1alpha3_Bootstrap_To_v1alpha4_Bootstrap(in *Bootstrap, out *v1alpha4.Bootstrap, s apiconversion.Scope) error { //nolint
func Convert_v1alpha4_ClusterSpec_To_v1alpha3_ClusterSpec(in *v1alpha4.ClusterSpec, out *ClusterSpec, s apiconversion.Scope) error {
// NOTE: custom conversion func is required because spec.Topology does not exists in v1alpha3
return autoConvert_v1alpha4_ClusterSpec_To_v1alpha3_ClusterSpec(in, out, s)
}

func Convert_v1alpha3_Bootstrap_To_v1alpha4_Bootstrap(in *Bootstrap, out *v1alpha4.Bootstrap, s apiconversion.Scope) error {
return autoConvert_v1alpha3_Bootstrap_To_v1alpha4_Bootstrap(in, out, s)
}

Expand Down
16 changes: 6 additions & 10 deletions api/v1alpha3/zz_generated.conversion.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

67 changes: 67 additions & 0 deletions api/v1alpha4/cluster_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,73 @@ type ClusterSpec struct {
// for provisioning infrastructure for a cluster in said provider.
// +optional
InfrastructureRef *corev1.ObjectReference `json:"infrastructureRef,omitempty"`

// This encapsulates the topology for the cluster.
// NOTE: It is required to enable the ClusterTopology
// feature gate flag to activate managed topologies support;
// this feature is highly experimental, and parts of it might still be not implemented.
// +optional
Topology *Topology `json:"topology,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question about the PR: does it make master unreleasable? Should we be merging new types without corresponding implementation/controllers since we're not in a "breaking change" development phase anymore and might need to release v0.4.1 any time to fix bugs? Would it be a better strategy to start a new branch for ClusterClass like we did for the provider operator work? Apologies if this was already discussed and I missed it.

Copy link
Member Author

@fabriziopandini fabriziopandini Jul 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question about the PR: does it make master unreleasable?

No. Master should always be ready for a release.

In order to make it possible while adding ClusterClass and managed topologies on the master branch we are avoiding breaking changes and we are keeping all the new features behind a feature flag, and if the flag is false (which is the default), Cluster API just works as of today with web hooks enforcing you can't set Cluster.Topology or Create/Update ClusterClass types. Also, keeping "strict" isolation between ClusterClass and existing controllers is a driving principle of the next steps of this effort, see e.g. notes on #4933.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, as long as webhooks are installed (which we always assume) we should be fine. While the fields show up within the CRDs, enabling the feature gate won't do much more than enabling something non functional.

One thing I'd add into the comment for now is This feature is alpha, highly experimental, and parts of it might still be not implemented

}

// Topology encapsulates the information of the managed resources.
type Topology struct {
// The name of the ClusterClass object to create the topology.
Class string `json:"class"`

// The Kubernetes version of the cluster.
Version string `json:"version"`

// RolloutAfter performs a rollout of the entire cluster one component at a time,
// control plane first and then machine deployments.
// +optional
RolloutAfter *metav1.Time `json:"rolloutAfter,omitempty"`

// ControlPlane describes the cluster control plane.
ControlPlane ControlPlaneTopology `json:"controlPlane"`

// Workers encapsulates the different constructs that form the worker nodes
// for the cluster.
// +optional
Workers *WorkersTopology `json:"workers,omitempty"`
}

// ControlPlaneTopology specifies the parameters for the control plane nodes in the cluster.
type ControlPlaneTopology struct {
Metadata ObjectMeta `json:"metadata,omitempty"`

// Replicas is the number of control plane nodes.
Replicas int `json:"replicas"`
}

// WorkersTopology represents the different sets of worker nodes in the cluster.
type WorkersTopology struct {
// MachineDeployments is a list of machine deployments in the cluster.
MachineDeployments []MachineDeploymentTopology `json:"machineDeployments,omitempty"`
}

// MachineDeploymentTopology specifies the different parameters for a set of worker nodes in the topology.
// This set of nodes is managed by a MachineDeployment object whose lifecycle is managed by the Cluster controller.
type MachineDeploymentTopology struct {
Metadata ObjectMeta `json:"metadata,omitempty"`

// Class is the name of the MachineDeploymentClass used to create the set of worker nodes.
// This should match one of the deployment classes defined in the ClusterClass object
// mentioned in the `Cluster.Spec.Class` field.
Class string `json:"class"`

// Name is the unique identifier for this MachineDeploymentTopology.
// The value is used with other unique identifiers to create a MachineDeployment's Name
// (e.g. cluster's name, etc). In case the name is greater than the allowed maximum length,
// the values are hashed together.
Name string `json:"name"`

// Replicas is the number of worker nodes belonging to this set.
// If the value is nil, the MachineDeployment is created without the number of Replicas (defaulting to zero)
// and it's assumed that an external entity (like cluster autoscaler) is responsible for the management
// of this value.
// +optional
Replicas *int `json:"replicas,omitempty"`
}

// ANCHOR_END: ClusterSpec
Expand Down
162 changes: 159 additions & 3 deletions api/v1alpha4/cluster_webhook.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,16 @@ limitations under the License.
package v1alpha4

import (
"fmt"
"strings"

"github.com/blang/semver"
apierrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/util/sets"
"k8s.io/apimachinery/pkg/util/validation/field"
"sigs.k8s.io/cluster-api/feature"
"sigs.k8s.io/cluster-api/util/version"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/webhook"
)
Expand All @@ -45,24 +52,36 @@ func (c *Cluster) Default() {
if c.Spec.ControlPlaneRef != nil && len(c.Spec.ControlPlaneRef.Namespace) == 0 {
c.Spec.ControlPlaneRef.Namespace = c.Namespace
}

// If the Cluster uses a managed topology
if c.Spec.Topology != nil {
// tolerate version strings without a "v" prefix: prepend it if it's not there
if !strings.HasPrefix(c.Spec.Topology.Version, "v") {
c.Spec.Topology.Version = "v" + c.Spec.Topology.Version
}
Comment on lines +58 to +61
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this assumption and only accept versions with v prefix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make this version field behave differently from the other version fields, I would prefer consistency vs being strict.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with Fabrizio on this one, consistency is better. But we can open an issue to consider requiring the "v" prefix on all version fields across the API. Although IMO it's nice to not have to worry about it as a user, functionally it's the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is similar toleration currently on Machine and MachineDeployment. We should have a single approach for all version fields. Either all or none!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open an issue to not make it easier for users but rather acknowledge that it should always be prefixed with v. The reason to drop the simplified approach is that we're making an assumption which isn't clear to the user until they get/describe the object. This can also break gitops.

}
}

// ValidateCreate implements webhook.Validator so a webhook will be registered for the type.
func (c *Cluster) ValidateCreate() error {
return c.validate()
return c.validate(nil)
}

// ValidateUpdate implements webhook.Validator so a webhook will be registered for the type.
func (c *Cluster) ValidateUpdate(old runtime.Object) error {
return c.validate()
oldCluster, ok := old.(*Cluster)
if !ok {
return apierrors.NewBadRequest(fmt.Sprintf("expected a Cluster but got a %T", old))
}
return c.validate(oldCluster)
}

// ValidateDelete implements webhook.Validator so a webhook will be registered for the type.
func (c *Cluster) ValidateDelete() error {
return nil
}

func (c *Cluster) validate() error {
func (c *Cluster) validate(old *Cluster) error {
var allErrs field.ErrorList
if c.Spec.InfrastructureRef != nil && c.Spec.InfrastructureRef.Namespace != c.Namespace {
allErrs = append(
Expand All @@ -86,8 +105,145 @@ func (c *Cluster) validate() error {
)
}

// Validate the managed topology, if defined.
if c.Spec.Topology != nil {
if topologyErrs := c.validateTopology(old); len(topologyErrs) > 0 {
allErrs = append(allErrs, topologyErrs...)
}
}

if len(allErrs) == 0 {
return nil
}
return apierrors.NewInvalid(GroupVersion.WithKind("Cluster").GroupKind(), c.Name, allErrs)
}

func (c *Cluster) validateTopology(old *Cluster) field.ErrorList {
// NOTE: ClusterClass and managed topologies are behind ClusterTopology feature gate flag; the web hook
// must prevent the usage of Cluster.Topology in case the feature flag is disabled.
if !feature.Gates.Enabled(feature.ClusterTopology) {
return field.ErrorList{
field.Forbidden(
field.NewPath("spec", "topology"),
"can be set only if the ClusterTopology feature flag is enabled",
),
}
}

var allErrs field.ErrorList

// class should be defined.
if len(c.Spec.Topology.Class) == 0 {
allErrs = append(
allErrs,
field.Invalid(
field.NewPath("spec", "topology", "class"),
c.Spec.Topology.Class,
"cannot be empty",
),
)
}

// version should be valid.
if !version.KubeSemver.MatchString(c.Spec.Topology.Version) {
allErrs = append(
allErrs,
field.Invalid(
field.NewPath("spec", "topology", "version"),
c.Spec.Topology.Version,
"must be a valid semantic version",
),
)
}

// MachineDeployment names must be unique.
if c.Spec.Topology.Workers != nil {
names := sets.String{}
for _, md := range c.Spec.Topology.Workers.MachineDeployments {
if names.Has(md.Name) {
allErrs = append(allErrs,
field.Invalid(
field.NewPath("spec", "topology", "workers", "machineDeployments"),
md,
fmt.Sprintf("MachineDeployment names should be unique. MachineDeployment with name %q is defined more than once.", md.Name),
),
)
}
names.Insert(md.Name)
}
}

switch old {
case nil: // On create
// c.Spec.InfrastructureRef and c.Spec.ControlPlaneRef could not be set
if c.Spec.InfrastructureRef != nil {
allErrs = append(
allErrs,
field.Invalid(
field.NewPath("spec", "infrastructureRef"),
c.Spec.InfrastructureRef,
"cannot be set when a Topology is defined",
),
)
}
if c.Spec.ControlPlaneRef != nil {
allErrs = append(
allErrs,
field.Invalid(
field.NewPath("spec", "controlPlaneRef"),
c.Spec.ControlPlaneRef,
"cannot be set when a Topology is defined",
),
)
}
default: // On update
// Class could not be mutated.
if c.Spec.Topology.Class != old.Spec.Topology.Class {
allErrs = append(
allErrs,
field.Invalid(
field.NewPath("spec", "topology", "class"),
c.Spec.Topology.Class,
"class cannot be changed",
),
)
}

// Version could only be increased.
inVersion, err := semver.ParseTolerant(c.Spec.Topology.Version)
if err != nil {
allErrs = append(
allErrs,
field.Invalid(
field.NewPath("spec", "topology", "version"),
c.Spec.Topology.Version,
"is not a valid version",
),
)
}
oldVersion, err := semver.ParseTolerant(old.Spec.Topology.Version)
if err != nil {
// NOTE: this should never happen. Nevertheless, handling this for extra caution.
allErrs = append(
allErrs,
field.Invalid(
field.NewPath("spec", "topology", "version"),
c.Spec.Topology.Class,
"cannot be compared with the old version",
),
)
}
if inVersion.NE(semver.Version{}) && oldVersion.NE(semver.Version{}) && !inVersion.GTE(oldVersion) {
allErrs = append(
allErrs,
field.Invalid(
field.NewPath("spec", "topology", "version"),
c.Spec.Topology.Version,
"cannot be decreased",
),
)
}
}

return allErrs
}
Loading