-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[KCP]: Error in machine selection logic during scale down #2760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/milestone v0.3.x |
/area control-plane |
@vincepri it should work with or without failure domains. It's been a while since I've reviewed, but the logic used to prioritize no longer existing failure domains and Machines with no failure domains for scale down. It also previously used to use the group of "selected" machines to choose a failure domain for scale down, since the selected machine would already belong to the failure domain that we selected as either no longer being present, not defined on the machine, or the one that was the most highly populated. I suspect one of the more recent changes may have made those computations diverge. |
The current logic works for scaling down if same set of machines are used for ownedMachines and selectedMachines as that is the case in regular scale down happening in reconcile(). I am changing the logic to pick a failureDomain from the selectedMachines. |
/assign |
What steps did you take and what happened:
failed to pick control plane Machine to mark for deletion
What did you expect to happen:
I would expect it to pick one of the machines that has the old version and remove it.
Anything else you would like to add:
The reason this is happening is when there are multiple failure domains:
KCP scaleDownControlPlane() finds a failure domain with most machines, then among the machines that has UpgradeReplacementCreatedAnnotation, it tries to find a machine that is in that failure domain. But this is not necessarily the case always.
Say we have machines with following failure domains: {Machine1: fd1, Machine2: fd2, Machine3: fd2}.
upgradeControlPlane() calls scaleDownControlPlane() with ownedMachines={Machine1: fd1, Machine2: fd2, Machine3: fd2} and selectedMachines={Machine1: fd1}
In scaleDownControlPlane(), fd will be calculated as fd2 and among selectedMachines, it will try to find a machine that has fd2.
Example unit test for this: sedefsavas#5
This is this reason #2758 failing.
/kind bug
The text was updated successfully, but these errors were encountered: