Skip to content

DRAFT MCO-1580: MCO-1581: Achieving parity with MCO node disruption frequency #4871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dkhater-redhat
Copy link
Contributor

- What I did

- How to verify it

- Description for the changelog

@dkhater-redhat dkhater-redhat marked this pull request as draft February 20, 2025 17:48
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 20, 2025
@dkhater-redhat dkhater-redhat changed the title Spike on achieving parity with MCO node disruption frequency - DRAFT DRAFT - Spike on achieving parity with MCO node disruption frequency Feb 20, 2025
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 20, 2025
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 26, 2025
Copy link
Contributor

openshift-ci bot commented Mar 10, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dkhater-redhat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 17, 2025
@dkhater-redhat dkhater-redhat force-pushed the skip-ignition-apply branch 5 times, most recently from 95ee82b to a12c4de Compare March 24, 2025 23:34
}
)

func HashMachineConfigSpec(spec mcfgv1.MachineConfigSpec) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Nicely done with this!

suggestion: Migrate rendercontroller to use this function instead of the one it's currently using. We can also migrate the MachineOSBuild function to use this too if we make it slightly more generic like this:

func GetHashForName(in interface{}) (string, error) {
	data, err := yaml.Marshal(in)
	if err != nil {
		return "", err
	}

	h, err := hashData(data)
	if err != nil {
		return "", err
	}
	return fmt.Sprintf("%x", h), nil
}

@dkhater-redhat dkhater-redhat force-pushed the skip-ignition-apply branch 2 times, most recently from b328026 to 96a9e52 Compare March 25, 2025 20:37
@dkhater-redhat dkhater-redhat changed the title DRAFT - Spike on achieving parity with MCO node disruption frequency DRAFT MCO-1580: MCO-1581: Achieving parity with MCO node disruption frequency Mar 25, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 25, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 25, 2025

@dkhater-redhat: This pull request references MCO-1580 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

- What I did

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@dkhater-redhat dkhater-redhat force-pushed the skip-ignition-apply branch 4 times, most recently from a22eaf1 to 35c352e Compare March 31, 2025 16:52
containerMC.Spec.Config = runtime.RawExtension{
Raw: []byte(`{
"ignition": {
"version": "3.4.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix this

@dkhater-redhat dkhater-redhat force-pushed the skip-ignition-apply branch 2 times, most recently from 9c9e192 to 71240a9 Compare April 3, 2025 18:15
@@ -1258,6 +1294,11 @@ func (ctrl *Controller) updateCandidateNode(mosc *mcfgv1.MachineOSConfig, mosb *
// getAllCandidateMachines returns all possible nodes which can be updated to the target config, along with a maximum
// capacity. It is the reponsibility of the caller to choose a subset of the nodes given the capacity.
func getAllCandidateMachines(layered bool, config *mcfgv1.MachineOSConfig, build *mcfgv1.MachineOSBuild, pool *mcfgv1.MachineConfigPool, nodesInPool []*corev1.Node, maxUnavailable int) ([]*corev1.Node, uint) {
if layered && (config == nil || build == nil) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

@dkhater-redhat dkhater-redhat force-pushed the skip-ignition-apply branch 16 times, most recently from 6274734 to e928d4c Compare April 8, 2025 18:33
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 8, 2025
cheesesashimi and others added 3 commits April 8, 2025 14:00
Occasionally, there is a delay between the time that a new rendered
MachineConfig is produced and OCL begins a build. However, a couple of
things happen in the interim:

- The RenderController updates the MachineConfigPool. Because of the
  delay mentioned above, the NodeController begins updating all of the
  nodes with only the new rendered MachineConfig. The OS image remains
  the same because the NodeController is not ensuring that the image
  pullspec on the MachineOSConfig is the same as the MachineOSBuild.
- Because of the work done to the MCD in
  openshift#4825, the
  original check that we had to determine whether the image pullspecs
  were the same is no longer present. Additionally, the logic change
  there makes it possible for an OS update to always occur whenever OCL
  is enabled, further bypassing that check.

This fixes that by doing two things:

1. Update the Node Controller to ensure that the both the
   MachineOSBuild's MachineConfig reference matches the MCP's current
   rendered MachineConfig while also checking that the MachineOSConfig's
   image pullspec matches the MachineOSBuild's. In the situation where
   the MachineOSBuild's pullspec is empty, this check will fail and the
   Node Controller will requeue.
2. Update the MCD so that even when OCL is enabled, if the OS images are
   the same, the OS update process is skipped.
Comment on lines -202 to -204
Finalizers: []string{
metav1.FinalizerDeleteDependents,
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did this need to be deleted

Copy link
Contributor

openshift-ci bot commented Apr 30, 2025

@dkhater-redhat: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-op-ocl 3a40dec link false /test e2e-gcp-op-ocl
ci/prow/e2e-azure-ovn-upgrade-out-of-change 3a40dec link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/e2e-vsphere-ovn-upi 3a40dec link false /test e2e-vsphere-ovn-upi
ci/prow/e2e-aws-ovn-upgrade-out-of-change 3a40dec link false /test e2e-aws-ovn-upgrade-out-of-change
ci/prow/okd-scos-e2e-aws-ovn 3a40dec link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-gcp-op-techpreview 3a40dec link false /test e2e-gcp-op-techpreview
ci/prow/e2e-vsphere-ovn-upi-zones 3a40dec link false /test e2e-vsphere-ovn-upi-zones
ci/prow/e2e-vsphere-ovn-zones 3a40dec link false /test e2e-vsphere-ovn-zones
ci/prow/4.12-upgrade-from-stable-4.11-images 3a40dec link true /test 4.12-upgrade-from-stable-4.11-images
ci/prow/e2e-gcp-op-single-node 3a40dec link true /test e2e-gcp-op-single-node
ci/prow/images 3a40dec link true /test images
ci/prow/unit 3a40dec link true /test unit
ci/prow/e2e-aws-ovn 3a40dec link true /test e2e-aws-ovn
ci/prow/okd-images 3a40dec link true /test okd-images
ci/prow/verify 3a40dec link true /test verify
ci/prow/e2e-hypershift 3a40dec link true /test e2e-hypershift
ci/prow/e2e-gcp-op 3a40dec link true /test e2e-gcp-op
ci/prow/e2e-aws-ovn-upgrade 3a40dec link true /test e2e-aws-ovn-upgrade
ci/prow/cluster-bootimages 3a40dec link true /test cluster-bootimages
ci/prow/okd-scos-images 3a40dec link true /test okd-scos-images
ci/prow/verify-deps a50cdbc link true /test verify-deps

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2025
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants