Skip to content

Commit f3d1b1f

Browse files
JM1aleskandro
authored andcommitted
OCPBUGS-19303: Changed OKD/FCOS workaround to also support Agent-based Installer
OKD/FCOS uses FCOS as its bootimage, i.e. when booting cluster nodes the first time during installation. FCOS does not provide tools such as OpenShift Client (oc) or crio.service which Agent-based Installer uses at the rendezvous host, e.g. to launch the bootstrap control plane. RHCOS and SCOS include these tools, but FCOS has to pivot the root fs [1] to okd-machine-os [2] first in order to make those tools available. Pivoting uses 'rpm-ostree rebase' but the rendezvous host is booted the first time the node boots from a FCOS Live ISO where the root fs and /sysroot are mounted read-only. Thus 'rpm-ostree rebase' fails and necessary tools will not be available, causing the setup to stall. Until rpm-ostree has implemented support for rebasing Live ISOs [3], this patch adapts the workaround for SNO installations [4] to also support Agent-based Installer. In particular, the Go conditional {{- if .BootstrapInPlace }} which is used to mark a SNO install has been replaced with a shell if-else which checks at runtime whether the system is launched from are on a Live ISO. Most code in the OpenShift ecosystem is written with RHCOS in mind and often assumes that tools like oc or crio.service are available. These assumptions can be satisfied by applying this workaround to all Live ISO boots. It will not remove functionality or overwrite configuration files in /etc and thus side effects should be minimal. The Go conditional {{- if .BootstrapInPlace }} in the release-image-\ pivot.service has been dropped completely. This service is only used in OKD only, so OCP will not be impacted at all. The 'Before=' option will not cause systemd to fail if a service does not exist. So, in case bootkube.service or kubelet.service do not exist, the option will have no effect. When bootkube.service or kubelet.service do exist, it must always be ensured that release-image-pivot.service is started first because it might reboot the system or change /usr in the Live ISO use case. So it is safe to drop the Go conditional and ask systemd to always launch release-image-pivot.service before bootkube.service and kubelet.service. [0] https://github.com/openshift/installer/blob/master/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template [1] https://github.com/openshift/installer/blob/master/data/data/bootstrap/files/usr/local/bin/bootstrap-pivot.sh.template [2] https://github.com/openshift/okd-machine-os [3] coreos/rpm-ostree#4547 [4] openshift#7445
1 parent 1c0d4dc commit f3d1b1f

File tree

3 files changed

+65
-21
lines changed

3 files changed

+65
-21
lines changed

data/data/bootstrap/files/usr/local/bin/bootstrap-pivot.sh.template

+59-17
Original file line numberDiff line numberDiff line change
@@ -42,25 +42,67 @@ if [ ! -f /opt/openshift/.pivot-done ]; then
4242
record_service_stage_start "rebase-to-okd-os-image"
4343
{{if .IsFCOS -}}
4444
mnt="$(podman image mount "${MACHINE_OS_IMAGE}")"
45-
{{- if or (.BootstrapInPlace) (eq .Invoker "agent-installer") }}
46-
# SNO setup boots into Live ISO which cannot be rebased
47-
# https://github.com/coreos/rpm-ostree/issues/4547
48-
mkdir /var/mnt/{upper,worker}
49-
mount -t overlay overlay -o "lowerdir=/usr:$mnt/usr" /usr
50-
mount -t overlay overlay -o "lowerdir=/etc:$mnt/etc,upperdir=/var/mnt/upper,workdir=/var/mnt/worker" /etc
51-
systemctl daemon-reload
5245

53-
# Workaround for SELinux denials when launching crio.service from overlayfs
54-
setenforce Permissive
46+
# The bootstrap host during SNO installation and the rendezvous host of Agent-based Installer both boot into a Live
47+
# ISO which cannot be rebased. Until rpm-ostree supports this live rebase [0], the following workaround will mount the
48+
# proper OKD/FCOS Machine OS image over the existing mount at /usr and copy new config files to /etc.
49+
# [0] https://github.com/coreos/rpm-ostree/issues/4547
50+
if grep -q coreos.liveiso= /proc/cmdline; then
51+
mount -t tmpfs -o size=50% none /var/mnt/
52+
rsync -aHAXx "$mnt/" /var/mnt/
53+
mount -t overlay overlay -o lowerdir=/usr:/var/mnt/usr /usr
54+
rsync -rlt --ignore-existing /var/mnt/etc/ /etc/
5555

56-
systemctl start crio.service
57-
# No reboot necessary because SNO setup will reboot system
58-
{{ else }}
59-
pushd "${mnt}/bootstrap"
60-
# shellcheck disable=SC1091
61-
. ./pre-pivot.sh
62-
popd
63-
{{ end -}}
56+
# Agent-based Installer will launch a ephemeral control plane at the rendezvous host which will create and publish
57+
# Ignition configs for the other master nodes. These Ignition configs must match what the in-cluster control plane
58+
# would generate else machine config operator will fail [0]. Because the rendezvous host is booted with a FCOS Live
59+
# ISO without any OKD/FCOS related changes, we have to copy the manifests from OKD Machine OS manually to the
60+
# bootstrap manifests folder of the rendezvous host.
61+
# [0] https://access.redhat.com/solutions/4970731
62+
mkdir -p /var/opt/openshift/manifests
63+
cp -av /var/mnt/manifests/*.* /var/opt/openshift/manifests/
64+
65+
# Load new systemd unit files and configuration such as crio.service after mounting the content of OKD/FCOS Machine
66+
# OS over /usr and copying new files to /etc
67+
systemctl daemon-reload
68+
69+
# On OKD/FCOS prior to commit e859a66 [0] systemd-resolved is used by default and NetworkManager's DNS handling is
70+
# disabled. In this case, CoreDNS fails to listen to 127.0.0.53:53 when Agent-based Installer boots its the
71+
# rendezvous host with a Fedora CoreOS bootimage because by default FCOS' systemd-resolved already listens to this
72+
# port. OKD/FCOS disables resolved's stub listener [1] but the resolved must be restarted for this setting to take
73+
# effect.
74+
# On OKD/FCOS since commit e859a66 [0] systemd-resolved is disabled by default and NetworkManager's DNS handling is
75+
# used. However, the bootimage is vanilla FCOS and thus uses systemd-resolved by default. The latter has to be
76+
# disabled after rebasing to OKD Machine OS and NetworkManager as well as the service to fix /etc/resolv.conf have
77+
# to be started.
78+
# [0] https://github.com/openshift/okd-machine-os/commit/e859a6643330596a8a282aeb4bf853763a2d219e#diff-808ba069aeee05cbeb08aa7b8b5b4f6feb8aefe15ea9737339d07f8d7bf5d74a
79+
# [1] https://github.com/openshift/okd-machine-os/blob/28dec35d60ea07069366b22ebdcb296d429b15e9/overlay.d/99okd/etc/systemd/resolved.conf.d/okd-no-dns-stub.conf
80+
if [ -e /etc/systemd/resolved.conf.d/okd-no-dns-stub.conf ]; then
81+
systemctl restart systemd-resolved.service
82+
else
83+
systemctl disable --now systemd-resolved.service
84+
fi
85+
86+
if systemctl list-unit-files -q fix-resolvconf.service >/dev/null; then
87+
systemctl start fix-resolvconf.service
88+
systemctl restart NetworkManager.service
89+
fi
90+
91+
# Workaround for SELinux denials when launching crio.service from overlayfs
92+
setenforce Permissive
93+
94+
# crio.service is not part of FCOS but of OKD Machine OS. It will loaded after systemctl daemon-reload above but has
95+
# to be started manually
96+
systemctl start crio.service
97+
98+
# No reboot necessary because setup will reboot the system automatically
99+
else
100+
pushd "${mnt}/bootstrap"
101+
# shellcheck disable=SC1091
102+
. ./pre-pivot.sh
103+
popd
104+
fi
105+
record_service_stage_success
64106
{{else if .IsSCOS -}}
65107
chmod 0644 /etc/containers/registries.conf
66108
rpm-ostree rebase --experimental "ostree-unverified-registry:${MACHINE_OS_IMAGE}"

data/data/bootstrap/systemd/common/units/kubelet.service.template

+6
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,13 @@
11
[Unit]
22
Description=Kubernetes Kubelet
33
Wants=rpc-statd.service crio.service release-image.service
4+
{{if .IsOKD -}}
5+
Wants=release-image-pivot.service
6+
{{end -}}
47
After=crio.service release-image.service
8+
{{if .IsOKD -}}
9+
After=release-image-pivot.service
10+
{{end -}}
511

612
[Service]
713
Type=notify

data/data/bootstrap/systemd/common/units/release-image-pivot.service.template

-4
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,7 @@
33
Description=Pivot bootstrap to the OpenShift Release Image
44
Wants=release-image.service
55
After=release-image.service
6-
{{- if or (.BootstrapInPlace) (eq .Invoker "agent-installer") }}
76
Before=bootkube.service kubelet.service
8-
{{ else }}
9-
Before=bootkube.service
10-
{{ end -}}
117

128
[Service]
139
Type=oneshot

0 commit comments

Comments
 (0)