-
Notifications
You must be signed in to change notification settings - Fork 1.4k
🐛 Bootstrap machine only if it conforms to the version skew policy #6044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -21,6 +21,7 @@ import ( | |||||
"fmt" | ||||||
"time" | ||||||
|
||||||
"github.com/blang/semver" | ||||||
"github.com/pkg/errors" | ||||||
corev1 "k8s.io/api/core/v1" | ||||||
apierrors "k8s.io/apimachinery/pkg/api/errors" | ||||||
|
@@ -46,12 +47,14 @@ import ( | |||||
"sigs.k8s.io/cluster-api/controllers/external" | ||||||
"sigs.k8s.io/cluster-api/controllers/noderefutil" | ||||||
"sigs.k8s.io/cluster-api/controllers/remote" | ||||||
"sigs.k8s.io/cluster-api/internal/contract" | ||||||
"sigs.k8s.io/cluster-api/util" | ||||||
"sigs.k8s.io/cluster-api/util/annotations" | ||||||
"sigs.k8s.io/cluster-api/util/collections" | ||||||
"sigs.k8s.io/cluster-api/util/conditions" | ||||||
"sigs.k8s.io/cluster-api/util/patch" | ||||||
"sigs.k8s.io/cluster-api/util/predicates" | ||||||
"sigs.k8s.io/cluster-api/util/version" | ||||||
) | ||||||
|
||||||
const ( | ||||||
|
@@ -252,6 +255,14 @@ func (r *Reconciler) reconcile(ctx context.Context, cluster *clusterv1.Cluster, | |||||
}) | ||||||
} | ||||||
|
||||||
ok, err := r.isVersionAllowed(ctx, m, cluster) | ||||||
if err != nil { | ||||||
return ctrl.Result{}, errors.Wrap(err, "failed to check machine version") | ||||||
} | ||||||
if !ok { | ||||||
return ctrl.Result{}, errors.Wrap(err, "machine version is not allowed") | ||||||
} | ||||||
|
||||||
phases := []func(context.Context, *clusterv1.Cluster, *clusterv1.Machine) (ctrl.Result, error){ | ||||||
r.reconcileBootstrap, | ||||||
r.reconcileInfrastructure, | ||||||
|
@@ -488,6 +499,74 @@ func (r *Reconciler) isDeleteNodeAllowed(ctx context.Context, cluster *clusterv1 | |||||
return nil | ||||||
} | ||||||
|
||||||
// isVersionAllowed returns true if the machine version conforms to the Kubernetes version skew policy. | ||||||
// If the Machine is part of the control plane, then its major.minor version must be greater than or equal to the | ||||||
// control plane actual major.minor version, and less than or equal to the control plane desired major.minor version. | ||||||
Comment on lines
+503
to
+504
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could be wrong, but could it be that it's possible to downgrade with KCP today? If I'm correct we should probably start blocking downgrades in KCP? (separately from this PR) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Probably. The current version skew policy does not mention downgrade. The most recent discussion took place in 2019-2020: kubernetes/website#12327. |
||||||
// If the Machine is not part of the control plane, then its major.minor version must be less than or equal to the control | ||||||
// plane actual major.minor version. | ||||||
func (r *Reconciler) isVersionAllowed(ctx context.Context, machine *clusterv1.Machine, cluster *clusterv1.Cluster) (bool, error) { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if it would make sense to align some of the logic with what we are doing in the topology reconciler when deciding if we want to trigger MD rollouts (computeMachineDeploymentVersion). E.g. when the control plane is currently upgrading (ControlPlaneContract.IsUpgrading) we are not triggering new MD upgrades. Most of the logic does not apply, but I wonder if there are some common parts. |
||||||
// Skip the check if Version nil, because it is an optional field. | ||||||
if machine.Spec.Version == nil { | ||||||
return true, nil | ||||||
} | ||||||
|
||||||
// Skip the check if ControlPlaneRef is nil, because it is an optional field. | ||||||
if cluster.Spec.ControlPlaneRef == nil { | ||||||
return true, nil | ||||||
} | ||||||
|
||||||
mv, err := semver.ParseTolerant(*machine.Spec.Version) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
if err != nil { | ||||||
return false, errors.Wrap(err, "failed to parse machine version") | ||||||
} | ||||||
|
||||||
controlPlane, err := external.Get(ctx, r.Client, cluster.Spec.ControlPlaneRef, cluster.Spec.ControlPlaneRef.Namespace) | ||||||
if err != nil { | ||||||
return false, errors.Wrap(err, "failed to get control plane object") | ||||||
} | ||||||
|
||||||
controlPlaneActualVersionStr, err := contract.ControlPlane().StatusVersion().Get(controlPlane) | ||||||
if err != nil { | ||||||
return false, errors.Wrap(err, "failed to read control plane actual version") | ||||||
} | ||||||
// controlPlaneVersionStr is not nil, because Get did not return an error. | ||||||
controlPlaneActualVersion, err := semver.ParseTolerant(*controlPlaneActualVersionStr) | ||||||
if err != nil { | ||||||
return false, errors.Wrap(err, "failed to parse control plane actual version") | ||||||
} | ||||||
Comment on lines
+528
to
+536
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Both fields are required for "implementations using version" (https://cluster-api.sigs.k8s.io/developer/architecture/controllers/control-plane.html#required-status-fields-for-implementations-using-version) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If only .spec.version exists and .status.version doesn't, in most cases this means that the first control plane machine isn't up yet. In the topology reconcile we're interpreting this as "control plane is provisioning" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the additional context. I have a TODO to handle the absence of Status.Version. |
||||||
|
||||||
controlPlaneDesiredVersionStr, err := contract.ControlPlane().Version().Get(controlPlane) | ||||||
if err != nil { | ||||||
return false, errors.Wrap(err, "failed to read control plane desired version") | ||||||
} | ||||||
// controlPlaneVersionStr is not nil, because Get did not return an error. | ||||||
controlPlaneDesiredVersion, err := semver.ParseTolerant(*controlPlaneDesiredVersionStr) | ||||||
if err != nil { | ||||||
return false, errors.Wrap(err, "failed to parse control plane desired version") | ||||||
} | ||||||
|
||||||
if util.IsControlPlaneMachine(machine) { | ||||||
if version.Compare(mv, controlPlaneDesiredVersion, version.IgnorePatchVersion()) == 1 { | ||||||
return false, errors.Errorf("machine major.minor version (%d.%d) must be less than or equal to"+ | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
"the control plane desired major.minor (%d.%d) version", mv.Major, mv.Minor, | ||||||
controlPlaneDesiredVersion.Major, controlPlaneDesiredVersion.Minor) | ||||||
} | ||||||
if version.Compare(mv, controlPlaneActualVersion, version.IgnorePatchVersion()) == -1 { | ||||||
return false, errors.Errorf("machine major.minor version (%d.%d) must be greater than or equal to"+ | ||||||
"the control plane actual major.minor (%d.%d) version", mv.Major, mv.Minor, | ||||||
controlPlaneActualVersion.Major, controlPlaneActualVersion.Minor) | ||||||
} | ||||||
} else { | ||||||
if version.Compare(mv, controlPlaneActualVersion, version.IgnorePatchVersion()) == 1 { | ||||||
return false, errors.Errorf("machine major.minor version (%d.%d) must be less than or equal to"+ | ||||||
"the control plane actual major.minor (%d.%d) version", mv.Major, mv.Minor, | ||||||
controlPlaneActualVersion.Major, controlPlaneActualVersion.Minor) | ||||||
} | ||||||
} | ||||||
|
||||||
return true, nil | ||||||
} | ||||||
|
||||||
func (r *Reconciler) drainNode(ctx context.Context, cluster *clusterv1.Cluster, nodeName string) (ctrl.Result, error) { | ||||||
log := ctrl.LoggerFrom(ctx, "cluster", cluster.Name, "node", nodeName) | ||||||
|
||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is right, wouldn't this code block reconciliation of a Machine as soon as the control plane bumps the version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, we should have checks in a validation webhook, potentially this one can be carrying a client that looks at the control plane reference
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Webhook on which resource? MachineDeployment, MachinePool, MachineSet, Machine or all of them?
I assume only on create, which would solve the issue that this check should not be run once a Machine already exists (to avoid blocking reconciliation of existing machines because a control plane upgrade is in progress)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha I just commented something similar here: #6040 (comment)
IMO the validation should be in the webhook of every object that can't change before the ControlPlane does but that has a Version in spec that can be edited by the user: Machine, MachinePool, MachineDeployment, and MachineSet.
I was thinking we'd want to validate updates too. If a user tries to upgrade a MachinePool or MachineDeployment before upgrading the ControlPlane that's breaking. What's your concern with
blocking reconciliation of existing machines because a control plane upgrade is in progress
@sbueringer? Wouldn't the new version be reflected in the control plane spec if the control plane has already been updated? Also, we wouldn't block reconciliation, just updating the Machine/MD/MP/MS version.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Machines, we need to make sure to not block Machine objects that are being created as part of the reconciliation of machine-based control planes, otherwise the control plane is never going to be rolled out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I think we can exclude control plane machines from this validation based on the control plane label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm!