-
Notifications
You must be signed in to change notification settings - Fork 440
MachinePool ready state leading to not processing providerIDs in CAPI #4982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/priority backlog |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
FTR this is generating a lot of issues for us and I'll work on proposing a PR to fix this behavior. |
Thank you for working on this @mweibel! Let us know if you need help with reviews or anything else. |
/kind bug
What steps did you take and what happened:
The following code determines ready state for a AzureMachinePool:
cluster-api-provider-azure/azure/scope/machinepool.go
Lines 571 to 603 in 9079793
The following CAPI code is not run if AzureMachinePool is not ready:
https://github.com/kubernetes-sigs/cluster-api/blob/8d639f1fad564eecf5bda0a2ee03c8a38896a184/exp/internal/controllers/machinepool_controller_phases.go#L290-L319
If I'm right, this logic together has the following effect:
provisioningState: Failed
), the MachinePool does not get reconciled anymore until the ready status changes back again.This is a bug which can lead to issues with the known machines in a cluster. E.g. cluster-autoscaler with clusterapi provider doesn't know about certain machines.
I'm not sure whether the bug is in CAPZ or in CAPI:
What did you expect to happen:
Scaling up/down works without issues and also a single VM doesn't impact the functioning of the full VMSS.
Anything else you would like to add:
I guess this is initially more of a discussion point because there could be multiple facets of this issue.
Environment:
kubectl version
): 1.28.5/etc/os-release
): linux/windowsThe text was updated successfully, but these errors were encountered: