Skip to content

🐛 Bootstrap machine only if it conforms to the version skew policy #6044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions internal/controllers/machine/machine_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import (
"fmt"
"time"

"github.com/blang/semver"
"github.com/pkg/errors"
corev1 "k8s.io/api/core/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
Expand All @@ -46,12 +47,14 @@ import (
"sigs.k8s.io/cluster-api/controllers/external"
"sigs.k8s.io/cluster-api/controllers/noderefutil"
"sigs.k8s.io/cluster-api/controllers/remote"
"sigs.k8s.io/cluster-api/internal/contract"
"sigs.k8s.io/cluster-api/util"
"sigs.k8s.io/cluster-api/util/annotations"
"sigs.k8s.io/cluster-api/util/collections"
"sigs.k8s.io/cluster-api/util/conditions"
"sigs.k8s.io/cluster-api/util/patch"
"sigs.k8s.io/cluster-api/util/predicates"
"sigs.k8s.io/cluster-api/util/version"
)

const (
Expand Down Expand Up @@ -252,6 +255,14 @@ func (r *Reconciler) reconcile(ctx context.Context, cluster *clusterv1.Cluster,
})
}

ok, err := r.isVersionAllowed(ctx, m, cluster)
if err != nil {
return ctrl.Result{}, errors.Wrap(err, "failed to check machine version")
}
if !ok {
return ctrl.Result{}, errors.Wrap(err, "machine version is not allowed")
}
Comment on lines +258 to +264
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is right, wouldn't this code block reconciliation of a Machine as soon as the control plane bumps the version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we should have checks in a validation webhook, potentially this one can be carrying a client that looks at the control plane reference

Copy link
Member

@sbueringer sbueringer Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Webhook on which resource? MachineDeployment, MachinePool, MachineSet, Machine or all of them?

I assume only on create, which would solve the issue that this check should not be run once a Machine already exists (to avoid blocking reconciliation of existing machines because a control plane upgrade is in progress)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha I just commented something similar here: #6040 (comment)

IMO the validation should be in the webhook of every object that can't change before the ControlPlane does but that has a Version in spec that can be edited by the user: Machine, MachinePool, MachineDeployment, and MachineSet.

I was thinking we'd want to validate updates too. If a user tries to upgrade a MachinePool or MachineDeployment before upgrading the ControlPlane that's breaking. What's your concern with blocking reconciliation of existing machines because a control plane upgrade is in progress @sbueringer? Wouldn't the new version be reflected in the control plane spec if the control plane has already been updated? Also, we wouldn't block reconciliation, just updating the Machine/MD/MP/MS version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Machines, we need to make sure to not block Machine objects that are being created as part of the reconciliation of machine-based control planes, otherwise the control plane is never going to be rolled out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I think we can exclude control plane machines from this validation based on the control plane label?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm!


phases := []func(context.Context, *clusterv1.Cluster, *clusterv1.Machine) (ctrl.Result, error){
r.reconcileBootstrap,
r.reconcileInfrastructure,
Expand Down Expand Up @@ -488,6 +499,74 @@ func (r *Reconciler) isDeleteNodeAllowed(ctx context.Context, cluster *clusterv1
return nil
}

// isVersionAllowed returns true if the machine version conforms to the Kubernetes version skew policy.
// If the Machine is part of the control plane, then its major.minor version must be greater than or equal to the
// control plane actual major.minor version, and less than or equal to the control plane desired major.minor version.
Comment on lines +503 to +504
Copy link
Member

@sbueringer sbueringer Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but could it be that it's possible to downgrade with KCP today?
(I only found something that it's forbidden via ClusterClass)

If I'm correct we should probably start blocking downgrades in KCP? (separately from this PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm correct we should probably start blocking downgrades in KCP? (separately from this PR)

Probably.

The current version skew policy does not mention downgrade. The most recent discussion took place in 2019-2020: kubernetes/website#12327.

// If the Machine is not part of the control plane, then its major.minor version must be less than or equal to the control
// plane actual major.minor version.
func (r *Reconciler) isVersionAllowed(ctx context.Context, machine *clusterv1.Machine, cluster *clusterv1.Cluster) (bool, error) {
Copy link
Member

@sbueringer sbueringer Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would make sense to align some of the logic with what we are doing in the topology reconciler when deciding if we want to trigger MD rollouts (computeMachineDeploymentVersion).

E.g. when the control plane is currently upgrading (ControlPlaneContract.IsUpgrading) we are not triggering new MD upgrades.

Most of the logic does not apply, but I wonder if there are some common parts.

// Skip the check if Version nil, because it is an optional field.
if machine.Spec.Version == nil {
return true, nil
}

// Skip the check if ControlPlaneRef is nil, because it is an optional field.
if cluster.Spec.ControlPlaneRef == nil {
return true, nil
}

mv, err := semver.ParseTolerant(*machine.Spec.Version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mv, err := semver.ParseTolerant(*machine.Spec.Version)
machineVersion, err := semver.ParseTolerant(*machine.Spec.Version)

if err != nil {
return false, errors.Wrap(err, "failed to parse machine version")
}

controlPlane, err := external.Get(ctx, r.Client, cluster.Spec.ControlPlaneRef, cluster.Spec.ControlPlaneRef.Namespace)
if err != nil {
return false, errors.Wrap(err, "failed to get control plane object")
}

controlPlaneActualVersionStr, err := contract.ControlPlane().StatusVersion().Get(controlPlane)
if err != nil {
return false, errors.Wrap(err, "failed to read control plane actual version")
}
// controlPlaneVersionStr is not nil, because Get did not return an error.
controlPlaneActualVersion, err := semver.ParseTolerant(*controlPlaneActualVersionStr)
if err != nil {
return false, errors.Wrap(err, "failed to parse control plane actual version")
}
Comment on lines +528 to +536
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status.version was an optional field that was added later, not sure if it's in the current contract (cc @fabriziopandini), although spec.version should be there. If the current version in status is not available, can we fallback to only look at the desired one instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only .spec.version exists and .status.version doesn't, in most cases this means that the first control plane machine isn't up yet. In the topology reconcile we're interpreting this as "control plane is provisioning"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional context. I have a TODO to handle the absence of Status.Version.


controlPlaneDesiredVersionStr, err := contract.ControlPlane().Version().Get(controlPlane)
if err != nil {
return false, errors.Wrap(err, "failed to read control plane desired version")
}
// controlPlaneVersionStr is not nil, because Get did not return an error.
controlPlaneDesiredVersion, err := semver.ParseTolerant(*controlPlaneDesiredVersionStr)
if err != nil {
return false, errors.Wrap(err, "failed to parse control plane desired version")
}

if util.IsControlPlaneMachine(machine) {
if version.Compare(mv, controlPlaneDesiredVersion, version.IgnorePatchVersion()) == 1 {
return false, errors.Errorf("machine major.minor version (%d.%d) must be less than or equal to"+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return false, errors.Errorf("machine major.minor version (%d.%d) must be less than or equal to"+
return false, errors.Errorf("control plane machine major.minor version (%d.%d) must be less than or equal to"+

"the control plane desired major.minor (%d.%d) version", mv.Major, mv.Minor,
controlPlaneDesiredVersion.Major, controlPlaneDesiredVersion.Minor)
}
if version.Compare(mv, controlPlaneActualVersion, version.IgnorePatchVersion()) == -1 {
return false, errors.Errorf("machine major.minor version (%d.%d) must be greater than or equal to"+
"the control plane actual major.minor (%d.%d) version", mv.Major, mv.Minor,
controlPlaneActualVersion.Major, controlPlaneActualVersion.Minor)
}
} else {
if version.Compare(mv, controlPlaneActualVersion, version.IgnorePatchVersion()) == 1 {
return false, errors.Errorf("machine major.minor version (%d.%d) must be less than or equal to"+
"the control plane actual major.minor (%d.%d) version", mv.Major, mv.Minor,
controlPlaneActualVersion.Major, controlPlaneActualVersion.Minor)
}
}

return true, nil
}

func (r *Reconciler) drainNode(ctx context.Context, cluster *clusterv1.Cluster, nodeName string) (ctrl.Result, error) {
log := ctrl.LoggerFrom(ctx, "cluster", cluster.Name, "node", nodeName)

Expand Down
15 changes: 14 additions & 1 deletion util/version/version.go
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,8 @@ func CompareWithBuildIdentifiers(a semver.Version, b semver.Version) int {
}

type comparer struct {
buildTags bool
buildTags bool
ignorePatchVersion bool
}

// CompareOption is a configuration option for Compare.
Expand All @@ -217,6 +218,13 @@ func WithBuildTags() CompareOption {
}
}

// IgnorePatchVersion modifies the version comparison to ignore the patch versions.
func IgnorePatchVersion() CompareOption {
return func(c *comparer) {
c.ignorePatchVersion = true
}
}

// Compare 2 semver versions.
// Defaults to doing the standard semver comparison when no options are specified.
// The comparison logic can be modified by passing additional compare options.
Expand All @@ -228,6 +236,11 @@ func Compare(a, b semver.Version, options ...CompareOption) int {
o(c)
}

if c.ignorePatchVersion {
a.Patch = 0
b.Patch = 0
}

if c.buildTags {
return CompareWithBuildIdentifiers(a, b)
}
Expand Down