Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-52656: Update the MCN PIS status of only the primary pool #4948

Merged
merged 1 commit into from
Apr 3, 2025

Conversation

RishabhSaini
Copy link
Contributor

@RishabhSaini RishabhSaini commented Mar 25, 2025

- What I did
Do not add the same PIS to the MCNStatusPIS for separate different MCPs

- How to verify it

  1. Create a custom MCP
  2. Add a worker node to the custom MCP
  3. Create a PIS targeting the Worker MCP
  4. Look at the MCD logs and ensure no error like "Failed to update MCN status" pops up

- Description for the changelog

@openshift-ci-robot openshift-ci-robot added jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 25, 2025
@openshift-ci-robot
Copy link
Contributor

@RishabhSaini: This pull request references Jira Issue OCPBUGS-52656, which is invalid:

  • expected the bug to target the "4.19.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

- What I did
Do not add the same PIS to the MCNStatusPIS for separate different MCPs

- How to verify it

  1. Create a custom MCP
  2. Add a worker node to the custom MCP
  3. Create a PIS targeting the Worker MCP
  4. Look at the logs and ensure no error like "Failed to update MCN status" pops up

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@RishabhSaini RishabhSaini changed the title OCPBUGS-52656: Update the status of only the primary pool OCPBUGS-52656: Update the MCN PIS status of only the primary pool Mar 25, 2025
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 25, 2025
@RishabhSaini
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 25, 2025
@openshift-ci-robot
Copy link
Contributor

@RishabhSaini: This pull request references Jira Issue OCPBUGS-52656, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from sergiordlr March 25, 2025 17:46
@RishabhSaini
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@RishabhSaini: This pull request references Jira Issue OCPBUGS-52656, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@RishabhSaini
Copy link
Contributor Author

/test unit

@RishabhSaini
Copy link
Contributor Author

/test e2e-hypershift

Copy link
Contributor

openshift-ci bot commented Mar 26, 2025

@RishabhSaini: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-op-ocl 63d4103 link false /test e2e-gcp-op-ocl
ci/prow/okd-scos-e2e-aws-ovn 63d4103 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-azure-ovn-upgrade-out-of-change 63d4103 link false /test e2e-azure-ovn-upgrade-out-of-change

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +223 to +228
primaryPool, err := helpers.GetPrimaryPoolForNode(p.mcpLister, node)
if err != nil {
klog.Errorf("error getting primary pool for node: %v", node.Name)
return err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Since this is is being added to replace the previous get pool functionality, are you able to remove the pool selection code up here?

Suggestion: Since GetPrimaryPoolForNode can return nil, nil (no pool, but also no error), it might be good to add a check to make sure you do get a pool (like the check here).

Copy link
Contributor Author

@RishabhSaini RishabhSaini Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Since this is is being added to replace the previous get pool functionality, are you able to remove the pool selection code up here?

We still want to sync the individual pools of the node. Just not update the status as it duplicates

Suggestion: Since GetPrimaryPoolForNode can return nil, nil (no pool, but also no error), it might be good to add a check to make sure you do get a pool (like the check here).

agreed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still want to sync the individual pools of the node. Just not update the status as it duplicates

Understood, thanks!

@hexfusion
Copy link
Contributor

/lgtm

thanks 🙏

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 27, 2025
Copy link
Contributor

openshift-ci bot commented Mar 27, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hexfusion, RishabhSaini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@isabella-janssen
Copy link
Member

@RishabhSaini Are you able to fix the typo on this line in this PR also?
"Failed to updated machine config node: %v" --> "Failed to update machine config node: %v"

/hold
Holding for QE validation.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 28, 2025
@RishabhSaini
Copy link
Contributor Author

@RishabhSaini Are you able to fix the typo on this line in this PR also? "Failed to updated machine config node: %v" --> "Failed to update machine config node: %v"

Since this typo is a miscellaneous change rather than part of this bug fix, I believe it would be more appropriate to address it in PR #4934. I'll make the necessary changes there.

@ptalgulk01
Copy link

Pre-merge verification steps :
Verified using IPI based AWS cluster and AWS Disconnected cluster 4.19.

build 4.19,openshift/api#2257,openshift/machine-config-operator#4948,openshift/machine-config-operator#4962
  • Create the custom Pool
Infra pool template
oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: infra
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/infra: ""
EOF
- Add the node to custom pool
oc label node/ip-10-0-56-194.us-east-2.compute.internal node-role.kubernetes.io/infra=
node/ip-10-0-56-194.us-east-2.compute.internal labeled
- Apply the PIS on worker pool
oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: PinnedImageSet
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: worker-pinned-images
spec:
  pinnedImages:
  - name: "quay.io/openshifttest/busybox@sha256:0415f56ccc05526f2af5a7ae8654baec97d4a614f24736e8eef41a4591f08019"
  - name: quay.io/openshifttest/alpine@sha256:be92b18a369e989a6e86ac840b7f23ce0052467de551b064796d67280dfa06d5
EOF
pinnedimageset.machineconfiguration.openshift.io/worker-pinned-images created
- Verify No error is seen in MCD logs and infra MCN does not report any error
for node in $(oc get nodes -l node-role.kubernetes.io/infra -o name)
do
   n=${node/node\//}
   echo $n
   oc get machineconfignode $n -ojsonpath='{.status.conditions[?(@.type=="PinnedImageSetsDegraded")]}' | jq
   oc get machineconfignode $n -ojsonpath='{.status.conditions[?(@.type=="PinnedImageSetsProgressing")]}' | jq
done
ip-10-0-56-194.us-east-2.compute.internal
{
  "lastTransitionTime": "2025-04-03T11:09:28Z",
  "message": "All is good",
  "reason": "AsExpected",
  "status": "False",
  "type": "PinnedImageSetsDegraded"
}
{
  "lastTransitionTime": "2025-04-03T11:09:28Z",
  "message": "All pinned image sets complete",
  "reason": "AsExpected",
  "status": "False",
  "type": "PinnedImageSetsProgressing"
}
oc logs machine-config-daemon-r5946 | tail -n 10
Defaulted container "machine-config-daemon" out of: machine-config-daemon, kube-rbac-proxy
I0403 10:03:23.063704    2636 certificate_writer.go:294] Certificate was synced from controllerconfig resourceVersion 109207
I0403 10:03:23.106638    2636 certificate_writer.go:294] Certificate was synced from controllerconfig resourceVersion 109208
I0403 10:03:23.898563    2636 certificate_writer.go:294] Certificate was synced from controllerconfig resourceVersion 109225
I0403 10:14:08.096306    2636 certificate_writer.go:294] Certificate was synced from controllerconfig resourceVersion 109225
I0403 10:34:16.735136    2636 pinned_image_set.go:308] Reconciling pinned image set: tc-80334-worker-pinned-images: generation: 1
I0403 10:34:22.360835    2636 pinned_image_set.go:432] Completed scheduling 50% of images
I0403 10:34:22.360859    2636 pinned_image_set.go:432] Completed scheduling 100% of images
I0403 10:34:23.767860    2636 file_writers.go:234] Writing file "/etc/crio/crio.conf.d/50-pinned-images"
I0403 10:34:23.770925    2636 update.go:2731] Running: systemctl reload crio
I0403 10:36:02.458515    2636 certificate_writer.go:294] Certificate was synced from controllerconfig resourceVersion 109225
- Verify images are pin to worker pool
oc debug -q node/$(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}")  -- chroot host crictl images --pinned 
IMAGE                                                    TAG                 IMAGE ID            SIZE                PINNED
quay.io/openshifttest/alpine                                           45683da4f97c2       5.87MB              true
quay.io/openshifttest/busybox                                          b97242f89c8a2       1.45MB              true

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Apr 3, 2025
@isabella-janssen
Copy link
Member

/unhold
QE approved, PR does not need a hold anymore.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 3, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 7331a53 into openshift:main Apr 3, 2025
14 of 17 checks passed
@openshift-ci-robot
Copy link
Contributor

@RishabhSaini: Jira Issue OCPBUGS-52656: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-52656 has been moved to the MODIFIED state.

In response to this:

- What I did
Do not add the same PIS to the MCNStatusPIS for separate different MCPs

- How to verify it

  1. Create a custom MCP
  2. Add a worker node to the custom MCP
  3. Create a PIS targeting the Worker MCP
  4. Look at the MCD logs and ensure no error like "Failed to update MCN status" pops up

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-machine-config-operator
This PR has been included in build ose-machine-config-operator-container-v4.19.0-202504040242.p0.g7331a53.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants