Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-node bare-metal dual-stack installation fails #2120

Open
pshemk opened this issue Feb 25, 2025 · 2 comments
Open

Single-node bare-metal dual-stack installation fails #2120

pshemk opened this issue Feb 25, 2025 · 2 comments

Comments

@pshemk
Copy link

pshemk commented Feb 25, 2025

Describe the bug

I'm attempting to install OKD on a single bare metal node. The node uses static IP address assignment for both IPv4 and IPv6. The IPv6 address doesn't get configured on the interface, which ultimately leads to keepalived not coming up, which stops the VIPs from coming up which stops the installation (I think).

Version
4.17.0-okd-scos.0

How reproducible
I create a single node ISO using the following config (IPv6 obscured):

apiVersion: v1
baseDomain: private.domain
compute:
- name: worker
  replicas: 0
  hyperthreading: Enabled
controlPlane:
  name: master
  replicas: 1
  hyperthreading: Enabled
metadata:
  name: okd1
networking:
  clusterNetwork:
  - cidr: 10.233.64.0/18
    hostPrefix: 22
  - cidr: dead:beef:a:f106::/63
    hostPrefix: 64
  machineNetwork:
  - cidr: 192.168.3.0/24
  - cidr: dead:beef:a:f103::/64
  networkType: OVNKubernetes
  serviceNetwork:
  - 10.233.0.0/18
  - dead:beef:a:f104::1000/116
bootstrapInPlace:
  installationDisk: /dev/nvme0n1
platform: 
  baremetal: 
    hosts:
    - name: node
      role: master
      rootDeviceHints:
        deviceName: "/dev/nvme0n1"
      bootMACAddress: 52:54:00:12:34:56
      networkConfig:
        dns-resolver:
          config:
            server:
              - 192.168.3.1
        interfaces:
          - name: enp7s0
            type: ethernet
            state: up
            ipv4:
              dhcp: false
              address: 
              - ip: 192.168.3.2
                prefix-length: 24
              enabled: true  
              routes:
                config:
                - destination: 0.0.0.0/0
                  next-hop-address: 192.168.3.1
                  next-hop-interface: enp7s0
            ipv6:
              enabled: true  
              dhcp: false
              autoconf: false
              accept-ra: false
              link-local: false
              address: 
              - ip: dead:beef:a:f103::2
                prefix-length: 64
              routes:
                config:
                - destination: ::/0
                  next-hop-address: dead:beef:a:f103::1
                  next-hop-interface: enp7s0
    apiVIPs:
      - 192.168.3.10
      - dead:beef:a:f103::10
    ingressVIPs:
      - 192.168.3.11
      - dead:beef:a:f103::11
    provisioningNetwork: "Disabled"       
pullSecret: 'xxx' 
sshKey: 'ssh-ed25519 xxx'

During the initial boot I can see that the IPv4 address is configured on the enp7s0 interface as expected (and I can SSH to the machine), but not the IPv6 one (only comes up with link-local):

[core@node ~]$ ip a sh
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: enp7s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 04:92:26:da:66:23 brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.2/24 brd 192.168.3.255 scope global dynamic noprefixroute enp7s0
       valid_lft 86375sec preferred_lft 86375sec
    inet6 fe80::d649:e799:2afe:8117/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

Once the system pivots and upgrades I can see the static pods coming up (including keepalived and keepalived-monitor), but the /etc/keepalived/keepalived.conf is missing.

keepalived-monitor logs:

[root@node core]# crictl logs 488688f93c4b5
time="2025-02-25T20:32:08Z" level=info msg="Monitor conf file doesn't exist" file=/etc/keepalived/unsupported-monitor.conf
time="2025-02-25T20:32:08Z" level=info msg="Failed to read ip from file /run/nodeip-configuration/ipv6" error="open /run/nodeip-configuration/ipv6: no such file or directory"
time="2025-02-25T20:32:08Z" level=info msg="Failed to get client config" err="invalid configuration: [unable to read client-cert /var/lib/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/kubelet/pki/kubelet-client-current.pem: no such file or directory, unable to read client-key /var/lib/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/kubelet/pki/kubelet-client-current.pem: no such file or directory]"

keepalived logs (addresses redacted):

[root@node core]# crictl logs 0426354b3f7ee
+ remove_vip 192.168.3.10
+ address=192.168.3.10
++ ip -o a
++ awk '/\s192.168.3.10\// {print $2}'
+ interface=
++ ip -o a
++ awk '/\s192.168.3.10\// {print $4}'
+ cidr=
+ '[' -n '' ']'
+ remove_vip dead:beef:a:f103::10
+ address=dead:beef:a:f103::10
++ ip -o a
++ awk '/\sdead:beef:a:f103::10\// {print $2}'
+ interface=
++ ip -o a
++ awk '/\sdead:beef:a:f103::10\// {print $4}'
+ cidr=
+ '[' -n '' ']'
+ remove_vip 192.168.3.11
+ address=192.168.3.11
++ ip -o a
++ awk '/\s192.168.3.11\// {print $2}'
+ interface=
++ ip -o a
++ awk '/\s192.168.3.11\// {print $4}'
+ cidr=
+ '[' -n '' ']'
+ remove_vip dead:beef:a:f103::11
+ address=dead:beef:a:f103::11
++ ip -o a
++ awk '/\sdead:beef:a:f103::11\// {print $2}'
+ interface=
++ ip -o a
++ awk '/\sdead:beef:a:f103::11\// {print $4}'
+ cidr=
+ '[' -n '' ']'
+ declare -r keepalived_sock=/var/run/keepalived/keepalived.sock
+ export -f msg_handler
+ export -f reload_keepalived
+ export -f sigterm_handler
+ '[' -f /run/nodeip-configuration/remote-worker ']'
+ trap sigterm_handler SIGTERM
+ '[' -s /etc/keepalived/keepalived.conf ']'
+ rm -f /var/run/keepalived/keepalived.sock
+ socat UNIX-LISTEN:/var/run/keepalived/keepalived.sock,fork 'system:bash -c msg_handler'

If I manually assign the VIPs (both apps and api) to the br-ex interface the installation proceeds but at some stage the IPs disappear (I think due to keepalived pod removing them).

@pshemk
Copy link
Author

pshemk commented Feb 26, 2025

Some further debugging:

Creating a manual /etc/keepalived/keepalived.conf with a basic content of:

vrrp_instance API_VIP {
    state MASTER
    interface enp7s0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass abc
    }
    virtual_ipaddress {
        192.168.3.10/24
    }
    virtual_ipaddress_excluded {
        dead:beef:a:f103::10/64
    }
    notify_master "/usr/local/bin/keepalived-notify.sh master"
    notify_backup "/usr/local/bin/keepalived-notify.sh backup"
    notify_fault "/usr/local/bin/keepalived-notify.sh fault"
}

vrrp_instance INGRESS_VIP {
    state MASTER
    interface enp7s0
    virtual_router_id 52
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass abc
    }
    virtual_ipaddress {
        192.168.3.11/24
    }
    virtual_ipaddress_excluded {
        dead:beef:a:f103::11/64
    }
    notify_master "/usr/local/bin/keepalived-notify.sh master"
    notify_backup "/usr/local/bin/keepalived-notify.sh backup"
    notify_fault "/usr/local/bin/keepalived-notify.sh fault"
}

moves the process along and a proper configuration file for keepalived gets generated.

One that's cleared the installation stops at ovnkube-controller:

F0226 02:38:15.524646   49068 ovnkube.go:137] failed to run ovnkube: [failed to start network controller: failed to start default network controller - while waiting for any node to have zone: "node.okd1.private.domain", error: context canceled, failed to start node network controller: failed to start default node network controller: failed to find IPv6 address on interface br-ex]

Adding a /64 IPv6 to the br-ex fixes this issues and moves the installation along.

@pshemk
Copy link
Author

pshemk commented Feb 27, 2025

For anyone that encounters this issue - I ended up creating an additional MachineConfig to run at boot time to make sure both IPv4 and IPv6 are assigned correctly. This fixes the problem with the node/cluster being unable to reboot:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: 99-custom-static-addresses
  labels:
    machineconfiguration.openshift.io/role: master  
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
        - name: configure-static-addresses.service
          enabled: true
          contents: |
            [Unit]
            Description=Configure static IPv4 and IPv6 addresses
            After=network-online.target
            Wants=network-online.target

            [Service]
            Type=oneshot
            ExecStart=/usr/bin/nmcli con mod "Wired connection 1" ipv6.address dead:beef:a:f103::2/64
            ExecStart=/usr/bin/nmcli con mod "Wired connection 1" ipv6.gateway dead:beef:a:f103::1
            ExecStart=/usr/bin/nmcli con mod "Wired connection 1" ipv6.dns "dead:beef:a:f103::1"
            ExecStart=/usr/bin/nmcli con mod "Wired connection 1" ipv6.routes "::/0 dead:beef:a:f103::1"
            ExecStart=/usr/bin/nmcli con mod "Wired connection 1" ipv6.method manual
            ExecStart=/usr/bin/nmcli con mod "Wired connection 1" ipv4.address 192.168.3.2/24
            ExecStart=/usr/bin/nmcli con mod "Wired connection 1" ipv4.gateway 192.168.3.1
            ExecStart=/usr/bin/nmcli con mod "Wired connection 1" ipv4.dns "192.168.3.1"
            ExecStart=/usr/bin/nmcli con mod "Wired connection 1" ipv4.routes "0.0.0.0/0 192.168.3.1"
            ExecStart=/usr/bin/nmcli con mod "Wired connection 1" ipv4.method manual
            RemainAfterExit=yes
            [Install]
            WantedBy=multi-user.target

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant