-
Notifications
You must be signed in to change notification settings - Fork 301
Stuck on replacing controlplane nodes #921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It seems to be somthing with: kubernetes/cloud-provider-vsphere#326 |
@farodin91 - what is the output when you are trying to get the new node on the target cluster ? |
kubectl get node node-cp-v2-92xc5 -o yaml apiVersion: v1
kind: Node
metadata:
annotations:
csi.volume.kubernetes.io/nodeid: '{"csi.vsphere.vmware.com":"node-cp-v2-92xc5"}'
kubeadm.alpha.kubernetes.io/cri-socket: /run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: "0"
projectcalico.org/IPv4Address: 10.25.8.63/24
projectcalico.org/IPv4IPIPTunnelAddr: 10.15.247.128
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2020-06-02T12:18:01Z"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: vsphere-vm.cpu-8.mem-8gb.os-linux
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: node-cp-v2-92xc5
kubernetes.io/os: linux
node-role.kubernetes.io/master: ""
name: node-cp-v2-92xc5
resourceVersion: "17595"
selfLink: /api/v1/nodes/node-cp-v2-92xc5
uid: cccd5cc7-a932-405f-9f38-3d05d143ba33
spec:
podCIDR: 10.15.232.0/24
podCIDRs:
- 10.15.232.0/24
providerID: vsphere://421241af-45c7-fbbf-2b7d-73d156d3c120
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
status:
addresses:
- address: 10.25.8.63
type: InternalIP
- address: node-cp-v2-92xc5
type: Hostname
allocatable:
cpu: "8"
ephemeral-storage: "18901337672"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 8064980Ki
pods: "110"
capacity:
cpu: "8"
ephemeral-storage: 20509264Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 8167380Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2020-06-02T12:18:51Z"
lastTransitionTime: "2020-06-02T12:18:51Z"
message: Calico is running on this node
reason: CalicoIsUp
status: "False"
type: NetworkUnavailable
- lastHeartbeatTime: "2020-06-02T13:20:52Z"
lastTransitionTime: "2020-06-02T12:18:00Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2020-06-02T13:20:52Z"
lastTransitionTime: "2020-06-02T12:18:00Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2020-06-02T13:20:52Z"
lastTransitionTime: "2020-06-02T12:18:00Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2020-06-02T13:20:52Z"
lastTransitionTime: "2020-06-02T12:18:41Z"
message: kubelet is posting ready status. AppArmor enabled
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- k8s.gcr.io/etcd@sha256:4afb99b4690b418ffc2ceb67e1a17376457e441c1f09ab55447f0aaf992fa646
- k8s.gcr.io/etcd:3.4.3-0
sizeBytes: 100947667
- names:
- docker.io/calico/node@sha256:dbebe7e01ae85af68673a8e0ce51200ab8ae2a1c69d48dff5b95969b17eca7c2
- docker.io/calico/node:v3.14.1
sizeBytes: 90581056
- names:
- docker.io/calico/cni@sha256:84113c174b979e686de32094e552933e35d8fc7e2d532efcb9ace5310b65088c
- docker.io/calico/cni:v3.14.1
sizeBytes: 77638089
- names:
- gcr.io/cloud-provider-vsphere/csi/release/driver@sha256:149e87faaacda614ee95ec271b54c8bfdbd2bf5825abc12d45c654036b798229
- gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.2
sizeBytes: 75130938
- names:
- k8s.gcr.io/kube-apiserver@sha256:33400ea29255bd20714b6b8092b22ebb045ae134030d6bf476bddfed9d33e900
- k8s.gcr.io/kube-apiserver:v1.17.3
sizeBytes: 50633771
- names:
- k8s.gcr.io/kube-controller-manager@sha256:2f0bf4d08e72a1fd6327c8eca3a72ad21af3a608283423bb3c10c98e68759844
- k8s.gcr.io/kube-controller-manager:v1.17.3
sizeBytes: 48808424
- names:
- k8s.gcr.io/kube-proxy@sha256:3a70e2ab8d1d623680191a1a1f1dcb0bdbfd388784b1f153d5630a7397a63fd4
- k8s.gcr.io/kube-proxy:v1.17.3
sizeBytes: 48700427
- names:
- docker.io/calico/pod2daemon-flexvol@sha256:d125b9f3c24133bdaf90eaf2bee1d506240d39a77bda712eda3991b6b5d443f0
- docker.io/calico/pod2daemon-flexvol:v3.14.1
sizeBytes: 37526807
- names:
- k8s.gcr.io/kube-scheduler@sha256:b091f0db3bc61a3339fd3ba7ebb06c984c4ded32e1f2b1ef0fbdfab638e88462
- k8s.gcr.io/kube-scheduler:v1.17.3
sizeBytes: 33820167
- names:
- gcr.io/cloud-provider-vsphere/cpi/release/manager@sha256:64de5c7f10e55703142383fade40886091528ca505f00c98d57e27f10f04fc03
- gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.1.0
sizeBytes: 16201394
- names:
- k8s.gcr.io/coredns@sha256:7ec975f167d815311a7136c32e70735f0d00b73781365df1befd46ed35bd4fe7
- k8s.gcr.io/coredns:1.6.5
sizeBytes: 13239960
- names:
- quay.io/k8scsi/csi-node-driver-registrar:v1.1.0
sizeBytes: 6939423
- names:
- quay.io/k8scsi/livenessprobe:v1.1.0
sizeBytes: 6690548
- names:
- k8s.gcr.io/pause@sha256:f78411e19d84a252e53bff71a4407a5686c46983a2c2eeed83929b888179acea
- k8s.gcr.io/pause:3.1
sizeBytes: 317164
nodeInfo:
architecture: amd64
bootID: 4cbff854-25c5-4829-b758-5575f35b12fe
containerRuntimeVersion: containerd://1.3.3
kernelVersion: 4.15.0-88-generic
kubeProxyVersion: v1.17.3
kubeletVersion: v1.17.3
machineID: db2b18283b014781b8f967f4f8566437
operatingSystem: linux
osImage: Ubuntu 18.04.4 LTS
systemUUID: AF411242-C745-BFFB-2B7D-73D156D3C120 Tried to things one this node: first set providerID manually and second taint the node as unintialized. A different between a old node and a new node is that no externalIP is set. I could created the cluster without the manual changes. |
@farodin91 - can you also grab the logs from kcp and capi controllers ? Also are you able to reproduce it ? |
It is reproducible. I can only say it for steps in the issue and not the manual changes. |
KCP: kubeadm-control-plane controller logs |
capi-controller-manager.log I create a new management cluster using kind and than run all steps as described above. |
vsphere-cloud-controller-manager.log I just replaced some internals. |
The issue seems to be resolved with 0.6.6 and cluster-api 0.3.7-rc1. |
/kind bug
What steps did you take and what happened:
Waiting for control plane to pass control plane health check to continue reconciliation: control plane machine namespace/node-cp-v2-s8p8w has no status.nodeRef
What did you expect to happen:
Anything else you would like to add:
logs of capi-controller-manager
Environment:
kubectl version
): 1.17.3/etc/os-release
): ubuntu 18.04I would like to help.
The text was updated successfully, but these errors were encountered: