-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPNODE-3029: WIP: handle required minimum kubelet version featuregate rollout #4929
base: main
Are you sure you want to change the base?
OCPNODE-3029: WIP: handle required minimum kubelet version featuregate rollout #4929
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: haircommander The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
some background openshift/enhancements#1766 |
@haircommander: This pull request references OCPNODE-3029 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
} | ||
go ctrl.writebackMinimumKubeletVersionIfAppropriate(updatedPools, renderedVersions, nodeConfig, func() ([]*mcfgv1.MachineConfigPool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any particular reason this is a separate goroutine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it could be async in case it takes a while to have the mcps rollback and I didn't know what could be blocked by having it sync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better solution might be to push this onto one of the work queues by doing something like this:
func (ctrl *Controller) syncKubeletConfig(key string) error {
// Key lookup stuff above here.
// Here, we detect that we need to do this for the current kubeletconfig, so we just kick that off.
if ctrl.writeMinimumKubeletVersion[kubeletCfg.Name] {
defer func() {
delete(ctrl.writeMinimumKubeletVersion, kubeletCfg.Name)
}()
return ctrl.writebackMinimumKubeletVersionIfAppropriate(...)
}
//
// Bulk of function above here.
//
if ctrl.isMinimumKubeletVersionWritebackNeeded(...) {
// Here, we update our internal state to indicate that we need to perform this action
// and enqueue the action.
ctrl.writeMinimumKubeletVersion[kubeletcfg.Name] = true
return ctrl.enqueue(kubeletCfg)
}
// End of function
}
We'll probably want to use a sync.Map
since the work queue has multiple workers that could mutate the controller struct state at any given time.
if node.Spec.MinimumKubeletVersion == node.Status.MinimumKubeletVersion || | ||
node.Status.MinimumKubeletVersion == renderedKubeletVersion || | ||
node.Spec.MinimumKubeletVersion != renderedKubeletVersion { | ||
klog.InfoS("Skipping writeback to nodes.config.Status.MinimumKubeletVersion because situation not correct", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: why are these conditions not correct? isn't this first one just saying that the spec and status match what we expect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the status matches the spec, no need to do the update (though I probably should actually check renderedKubeletVersion there too)
for the second, if the status already matches the rendered then we've already done the write back for this rendered version
for the third: we're rendering a different version than what is set in the spec..
I think I have to rework this condition though
for _, mcp := range mcps { | ||
if mcp.Status.UpdatedMachineCount > 0 { | ||
poolsUpdated[mcp.Name] = true | ||
} else if _, updated := poolsUpdated[mcp.Name]; !updated { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the logic here. The if condition checks if at least 1 node is updated, and if not, whether a previous loop had updated it? Wouldn't it be easier to check if the mcp itself is updated directly, and then break out of the loop if any pool doesn't meet either condition?
Note that we should be careful of pools with no nodes in them, which I think would be covered if we copy the server check
Signed-off-by: Peter Hunt <[email protected]>
Signed-off-by: Peter Hunt <[email protected]>
bdece5e
to
b6c210d
Compare
When a cluster admin rolls out a minimum kubelet verison, the cluster-config-operator (eventually) will render a new set of featuregates based on the new minimum version (enabling ones that are safe given the new minimum). However, on a rollback, we need to make sure the MCS won't return a rendered config using the old features, in the very odd case a cluster admin creates a newly old node and somehow overrides the osImage (unlikely). Thus, we use the similar condition as MCS to serve the config--once one node has updated, we treat the MCP as using the new config. Then, we write the status to the node config object so the node authorization plugin can allow older nodes that are now deemed safe. Signed-off-by: Peter Hunt <[email protected]>
b6c210d
to
c0c5483
Compare
@haircommander: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
} | ||
go ctrl.writebackMinimumKubeletVersionIfAppropriate(updatedPools, renderedVersions, nodeConfig, func() ([]*mcfgv1.MachineConfigPool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better solution might be to push this onto one of the work queues by doing something like this:
func (ctrl *Controller) syncKubeletConfig(key string) error {
// Key lookup stuff above here.
// Here, we detect that we need to do this for the current kubeletconfig, so we just kick that off.
if ctrl.writeMinimumKubeletVersion[kubeletCfg.Name] {
defer func() {
delete(ctrl.writeMinimumKubeletVersion, kubeletCfg.Name)
}()
return ctrl.writebackMinimumKubeletVersionIfAppropriate(...)
}
//
// Bulk of function above here.
//
if ctrl.isMinimumKubeletVersionWritebackNeeded(...) {
// Here, we update our internal state to indicate that we need to perform this action
// and enqueue the action.
ctrl.writeMinimumKubeletVersion[kubeletcfg.Name] = true
return ctrl.enqueue(kubeletCfg)
}
// End of function
}
We'll probably want to use a sync.Map
since the work queue has multiple workers that could mutate the controller struct state at any given time.
} | ||
|
||
node.Status.MinimumKubeletVersion = renderedKubeletVersion | ||
if _, err := ctrl.configClient.ConfigV1().Nodes().Update(context.TODO(), node, metav1.UpdateOptions{}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Since it's possible that multiple goroutines could be running this simultaneously, it is possible that this object could be updated before we call .Update()
. A potential solution is to wrap everything within this function in a RetryOnConflict()
: https://github.com/kubernetes/client-go/blob/master/util/retry/util.go#L103 .
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
- What I did
When a cluster admin rolls out a minimum kubelet verison, the cluster-config-operator (eventually) will render
a new set of featuregates based on the new minimum version (enabling ones that are safe given the new minimum).
- How to verify it
- Description for the changelog