Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COS-3013: overlay node image before bootstrapping if necessary #899

Merged
merged 2 commits into from
Jan 21, 2025

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Sep 6, 2024

As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

To adapt to this, the OpenShift installer now ships a new
node-image-overlay.service in its bootstrap Ignition config. This
service takes care of pulling down the node image and overlaying it,
effectively updating the system to the node image version.

Here, we accordingly also adapt assisted-installer so that we run
node-image-overlay.service before starting e.g. kubelet.service and
bootkube.service.

See also: openshift/installer#8742

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 6, 2024
Copy link

openshift-ci bot commented Sep 6, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 6, 2024
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 10, 2024
@jlebon
Copy link
Member Author

jlebon commented Dec 13, 2024

/remove-lifecycle stale

Now that openshift/enhancements#1637 has merged, we will need this. I'll leave it in draft for now until I can rebase and retest it, but feel free to start reviewing.

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 13, 2024
@jlebon jlebon changed the title WIP: overlay node image before bootstrapping if necessary overlay node image before bootstrapping if necessary Dec 21, 2024
@jlebon jlebon marked this pull request as ready for review December 21, 2024 17:21
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 21, 2024
@openshift-ci openshift-ci bot requested review from jhernand and omertuc December 21, 2024 17:22
@jlebon
Copy link
Member Author

jlebon commented Dec 21, 2024

Rebased and verified this still works! Requires openshift/installer#8742.

Copy link

codecov bot commented Dec 21, 2024

Codecov Report

Attention: Patch coverage is 73.52941% with 9 lines in your changes missing coverage. Please review.

Project coverage is 55.39%. Comparing base (34b971a) to head (f6996dd).
Report is 11 commits behind head on master.

Files with missing lines Patch % Lines
src/installer/installer.go 80.64% 3 Missing and 3 partials ⚠️
src/ops/ops.go 0.00% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #899      +/-   ##
==========================================
+ Coverage   55.20%   55.39%   +0.19%     
==========================================
  Files          15       15              
  Lines        3286     3318      +32     
==========================================
+ Hits         1814     1838      +24     
- Misses       1292     1297       +5     
- Partials      180      183       +3     
Files with missing lines Coverage Δ
src/ops/ops.go 43.24% <0.00%> (-0.15%) ⬇️
src/installer/installer.go 68.76% <80.64%> (+0.57%) ⬆️

// If we're in a pure RHEL/CentOS environment, we need to overlay the node image
// first to have access to e.g. oc, kubelet, cri-o, etc...
// https://github.com/openshift/enhancements/pull/1637
if !i.ops.FileExists("/usr/bin/oc") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we pull overlay the node image as a first thing? I mean we already applied bootstrap ignition at this point and this can be a little bit late for us to pull another image layer, we probably should do it before

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code to pull the node image is part of the bootstrap Ignition so we can't use it before that.

@andfasano
Copy link
Contributor

/test e2e-agent-compact-ipv4

@jlebon jlebon changed the title overlay node image before bootstrapping if necessary COS-3013: overlay node image before bootstrapping if necessary Jan 17, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 17, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 17, 2025

@jlebon: This pull request references COS-3013 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set.

In response to this:

As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

To adapt to this, the OpenShift installer now ships a new
node-image-overlay.service in its bootstrap Ignition config. This
service takes care of pulling down the node image and overlaying it,
effectively updating the system to the node image version.

Here, we accordingly also adapt assisted-installer so that we run
node-image-overlay.service before starting e.g. kubelet.service and
bootkube.service.

See also: openshift/installer#8742

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jlebon
Copy link
Member Author

jlebon commented Jan 17, 2025

Updated this now!

  • Restored DryRebootHappened()
  • Switched to utils.WaitForPredicate()
  • Made various strings into constants
  • Added stronger error-handling. I've tested various error cases. At the openshift-install level, it looks like:
INFO Host: master-0, reached installation stage Installing: bootstrap
INFO Host: master-0, reached installation stage Failed: service node-image-overlay.service failed

and then the host logs have more information about the failure.

@jlebon
Copy link
Member Author

jlebon commented Jan 18, 2025

/retest

Prep for next patch. Also use that in one spot where we were manually
calling `stat`.
As per openshift/enhancements#1637, we're trying
to get rid of all OpenShift-versioned components from the bootimages.

This means that there will no longer be oc, kubelet, or crio
binaries for example, which bootstrapping obviously relies on.

To adapt to this, the OpenShift installer now ships a new
`node-image-overlay.service` in its bootstrap Ignition config. This
service takes care of pulling down the node image and overlaying it,
effectively updating the system to the node image version.

Here, we accordingly also adapt assisted-installer so that we run
`node-image-overlay.service` before starting e.g. `kubelet.service` and
`bootkube.service`.

See also: openshift/installer#8742
@openshift-ci openshift-ci bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 20, 2025
@jlebon
Copy link
Member Author

jlebon commented Jan 20, 2025

OK, fixed a lint, a unit test, and added two unit tests (one positive and one negative) that checks the new path!

@carbonin
Copy link
Member

Looks good to me, thanks for taking this on @jlebon!

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 20, 2025
Copy link

openshift-ci bot commented Jan 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: carbonin, jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 20, 2025
@jlebon
Copy link
Member Author

jlebon commented Jan 20, 2025

Hmm, edge-e2e-ai-operator-ztp failing with

 [2025-01-20 21:43:45] ++ echo '2025-01-20 21:43:45+00:00 ERROR: failed Waiting for crd/agentserviceconfigs.agent-install.openshift.io on namespace '
[2025-01-20 21:43:45] 2025-01-20 21:43:45+00:00 ERROR: failed Waiting for crd/agentserviceconfigs.agent-install.openshift.io on namespace 
[2025-01-20 21:43:45] ++ oc get crd/agentserviceconfigs.agent-install.openshift.io --namespace= -o json
[2025-01-20 21:43:45] Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io "agentserviceconfigs.agent-install.openshift.io" not found 

Which I don't think is related to this change. OK yeah, I see a similar looking failure in e.g. #1003.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD ba4f15f and 2 for PR HEAD f6996dd in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD d17c2f6 and 1 for PR HEAD f6996dd in total

Copy link

openshift-ci bot commented Jan 21, 2025

@jlebon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/edge-e2e-oci-assisted-4-18 f6996dd link false /test edge-e2e-oci-assisted-4-18
ci/prow/okd-scos-e2e-aws-ovn f6996dd link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit e9ae12d into openshift:master Jan 21, 2025
19 of 21 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-agent-installer-orchestrator
This PR has been included in build ose-agent-installer-orchestrator-container-v4.19.0-202501211738.p0.ge9ae12d.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-agent-installer-csr-approver
This PR has been included in build ose-agent-installer-csr-approver-container-v4.19.0-202501211738.p0.ge9ae12d.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants