Skip to content

OCPBUGS-45496: Prevent undesired MOSBs from building #4739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

RishabhSaini
Copy link
Contributor

@RishabhSaini RishabhSaini commented Dec 9, 2024

- What I did

reconciler: only build mosb's currently targeted by the MCP

In OCPBUGS-45496, here are the steps which lead to the failure of the creation of an MOSB upon the addition of a new MC:

  1. An MC with erroneous contents not caught by the API validations creates a rendered-MC which triggers a MOSB build
    and fails as expected with an error
  2. This erroneous MOSB keeps getting added to the rate limited worker
    queue till it hits the max retries. Then it is forgotten from the
    queue and is subjected to a backoff time to get added back again
  3. In the meantime if the erroneous MC is deleted and a new valid MC is
    added targetting the same MCP, a valid MOSB build starts
  4. When the erroneous mosb enters the queue again and sees that there
    already exists another not successfull MOSB in build, it cancels all
    other builds.

Hence the valid MOSB is cancelled and the erroneous MOSB is re-triggerred again. Since it will never be able to start the build and fail again the steps 2 and 4 keep happening.
Any new MC will fail to successfully create a MOSB and trigger a build.

Thus the solution is to check upon the MOSB sync whether the MCP and rendered-MC that the MOSB targets even exists anymore. If not we don't need to build it.

- How to verify it

  1. Create an erroneous MC which is still valid by the API server. Example (adding an unsupported extension as a systemd unit)
  2. Wait till a rendered-MC and MOSB appears and fails
  3. Remove the erroneous MC
  4. Add a valid MC (for example supported extension or a core user ssh keys change)
  5. The valid MC should trigger a MOSB and build successfully

- Description for the changelog

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Dec 9, 2024
@openshift-ci-robot
Copy link
Contributor

@RishabhSaini: This pull request references Jira Issue OCPBUGS-45496, which is invalid:

  • expected the bug to target the "4.19.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

- What I did

reconciler: only build mosb's currently targeted by the MCP

In OCPBUGS-45496, here are the steps which lead to the failure of the creation of an MOSB upon the addition of a new MC:

  1. An MC with erroneous contents not caught by the API validations creates a rendered-MC which triggers a MOSB build
    and fails as expected with an error
  2. This erroneous MOSB keeps getting added to the rate limited worker
    queue till it hits the max retries. Then it is forgotten from the
    queue and is subjected to a backoff time to get added back again 3) In the meantime if the erroneous MC is deleted and a new valid MC is
    added targetting the same MCP, a valid MOSB build starts
  3. When the erroneous mosb enters the queue again and sees that there
    already exists another not successfull MOSB in build, it cancels all
    other builds.

Hence the valid MOSB is cancelled and the erroneous MOSB is re-triggerred again. Since it will never be able to start the build and fail again the steps 1 and 2 keep happening.
Any new MC will fail to successfully create a MOSB and trigger a build.

Thus the solution is to check upon the MOSB sync whether the MCP and rendered-MC that the MOSB targets even exists anymore. If not we don't need to build it.

- How to verify it

  1. Create an erroneous MC which is still valid by the API server. Example (adding an unsupported extension as a systemd unit)
  2. Wait till a rendered-MC and MOSB appears and fails
  3. Remove the erroneous MC
  4. Add a valid MC (for example supported extension or a core user ssh keys change)
  5. The valid MC should trigger a MOSB and build successfully

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Dec 9, 2024
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 9, 2024
@RishabhSaini
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 9, 2024
@openshift-ci-robot
Copy link
Contributor

@RishabhSaini: This pull request references Jira Issue OCPBUGS-45496, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from sergiordlr December 9, 2024 22:47
@openshift-ci-robot
Copy link
Contributor

@RishabhSaini: This pull request references Jira Issue OCPBUGS-45496, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

In response to this:

- What I did

reconciler: only build mosb's currently targeted by the MCP

In OCPBUGS-45496, here are the steps which lead to the failure of the creation of an MOSB upon the addition of a new MC:

  1. An MC with erroneous contents not caught by the API validations creates a rendered-MC which triggers a MOSB build
    and fails as expected with an error
  2. This erroneous MOSB keeps getting added to the rate limited worker
    queue till it hits the max retries. Then it is forgotten from the
    queue and is subjected to a backoff time to get added back again
  3. In the meantime if the erroneous MC is deleted and a new valid MC is
    added targetting the same MCP, a valid MOSB build starts
  4. When the erroneous mosb enters the queue again and sees that there
    already exists another not successfull MOSB in build, it cancels all
    other builds.

Hence the valid MOSB is cancelled and the erroneous MOSB is re-triggerred again. Since it will never be able to start the build and fail again the steps 2 and 4 keep happening.
Any new MC will fail to successfully create a MOSB and trigger a build.

Thus the solution is to check upon the MOSB sync whether the MCP and rendered-MC that the MOSB targets even exists anymore. If not we don't need to build it.

- How to verify it

  1. Create an erroneous MC which is still valid by the API server. Example (adding an unsupported extension as a systemd unit)
  2. Wait till a rendered-MC and MOSB appears and fails
  3. Remove the erroneous MC
  4. Add a valid MC (for example supported extension or a core user ssh keys change)
  5. The valid MC should trigger a MOSB and build successfully

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@RishabhSaini
Copy link
Contributor Author

/test unit

@RishabhSaini
Copy link
Contributor Author

/retest-required

@umohnani8
Copy link
Contributor

LGTM
Would adding a test for this be possible so that we can catch any regressions in future

osbuildcontroller_test: Unit testing for cascading failure

In OCPBUGS-45496, here are the steps which lead to the failure of the
creation of an MOSB upon the addition of a new MC:

1) An MC with erroneous contents not caught by the API validations
creates a rendered-MC which triggers a MOSB build
and fails as expected with an error
2) This erroneous MOSB keeps getting added to the rate limited worker
   queue till it hits the max retries. Then it is forgotten from the
queue and is subjected to a backoff time to get added back again
3) In the meantime if the erroneous MC is deleted and a new valid MC is
   added targetting the same MCP, a valid MOSB build starts
4) When the erroneous mosb enters the queue again and sees that there
   already exists another not successfull MOSB in build, it cancels all
other builds.

Hence the valid MOSB is cancelled and the erroneous MOSB is
re-triggerred again. Since it will never be able to start the build and
fail again the steps 1 and 2 keep happening.
Any new MC will fail to successfully create a MOSB and trigger a build.

Thus the solution is to check upon the MOSB sync whether the MCP and
rendered-MC that the MOSB targets even exists anymore. If not we don't
need to build it.
@RishabhSaini
Copy link
Contributor Author

/test unit

@RishabhSaini
Copy link
Contributor Author

/test e2e-gcp-op

@umohnani8
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 11, 2024
Copy link
Contributor

openshift-ci bot commented Dec 11, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RishabhSaini, umohnani8

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [RishabhSaini,umohnani8]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD daef1c6 and 2 for PR HEAD aba18f3 in total

Copy link
Contributor

openshift-ci bot commented Dec 11, 2024

@RishabhSaini: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn-upgrade-out-of-change aba18f3 link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/e2e-vsphere-ovn-upi aba18f3 link false /test e2e-vsphere-ovn-upi
ci/prow/okd-scos-e2e-aws-ovn aba18f3 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD daef1c6 and 2 for PR HEAD aba18f3 in total

@RishabhSaini
Copy link
Contributor Author

/test e2e-gcp-op

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD daef1c6 and 2 for PR HEAD aba18f3 in total

@openshift-merge-bot openshift-merge-bot bot merged commit 52b26e7 into openshift:master Dec 12, 2024
16 of 19 checks passed
@openshift-ci-robot
Copy link
Contributor

@RishabhSaini: Jira Issue OCPBUGS-45496: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-45496 has been moved to the MODIFIED state.

In response to this:

- What I did

reconciler: only build mosb's currently targeted by the MCP

In OCPBUGS-45496, here are the steps which lead to the failure of the creation of an MOSB upon the addition of a new MC:

  1. An MC with erroneous contents not caught by the API validations creates a rendered-MC which triggers a MOSB build
    and fails as expected with an error
  2. This erroneous MOSB keeps getting added to the rate limited worker
    queue till it hits the max retries. Then it is forgotten from the
    queue and is subjected to a backoff time to get added back again
  3. In the meantime if the erroneous MC is deleted and a new valid MC is
    added targetting the same MCP, a valid MOSB build starts
  4. When the erroneous mosb enters the queue again and sees that there
    already exists another not successfull MOSB in build, it cancels all
    other builds.

Hence the valid MOSB is cancelled and the erroneous MOSB is re-triggerred again. Since it will never be able to start the build and fail again the steps 2 and 4 keep happening.
Any new MC will fail to successfully create a MOSB and trigger a build.

Thus the solution is to check upon the MOSB sync whether the MCP and rendered-MC that the MOSB targets even exists anymore. If not we don't need to build it.

- How to verify it

  1. Create an erroneous MC which is still valid by the API server. Example (adding an unsupported extension as a systemd unit)
  2. Wait till a rendered-MC and MOSB appears and fails
  3. Remove the erroneous MC
  4. Add a valid MC (for example supported extension or a core user ssh keys change)
  5. The valid MC should trigger a MOSB and build successfully

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-machine-config-operator
This PR has been included in build ose-machine-config-operator-container-v4.19.0-202412120406.p0.g52b26e7.assembly.stream.el9.
All builds following this will include this PR.

cheesesashimi pushed a commit to cheesesashimi/machine-config-operator that referenced this pull request May 1, 2025
OCPBUGS-45496: Prevent undesired MOSBs from building
cheesesashimi pushed a commit to cheesesashimi/machine-config-operator that referenced this pull request May 21, 2025
OCPBUGS-45496: Prevent undesired MOSBs from building
cheesesashimi pushed a commit to cheesesashimi/machine-config-operator that referenced this pull request May 22, 2025
OCPBUGS-45496: Prevent undesired MOSBs from building
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants