-
Notifications
You must be signed in to change notification settings - Fork 282
PodDiscruptionBudget #512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Should we set the value of |
In a 5 node cluster, a quorum queue might only have 3 replicas. If we allowed 2 of these 3 replicas to be unavailable, the quorum would be lost. On the one hand, setting cluster-operator/internal/resource/statefulset.go Lines 663 to 665 in 8f066d1
On the other hand, it's a better user experience if the eviction request that kubectl submits on behalf of the user is temporarily rejected rather than having the RabbitMQ pod being blocked in the preStop hook. Therefore, I vote to hardcode |
There are two different questions that regarding supporting PDB. One is whether to create a default PDB for every rabbitmqcluster. Second is whether we should have a top level property PDB in our CRD spec. I'm responding to the first question in this comment. I don't think we should create a default PDB for all rabbitmqclusters. First reason is that different from something like pod topology spread constraint, PDB has a blocking effect. When unsatisfied, it can block eviction of pods and therefore block regular k8s worker nodes activities like drain and upgrade. This is problematic because our operator has no control over how rabbitmqcluster pods are going to be placed since it's completely up to users to specific hard requirements on pod scheduling such as affinity/anti-affinity rules. For example, with maxUnavailable set to 1 as suggested in this issue, if two rabbitmq pods (from the same cluster), are located on the same k8s worker node, this worker node will be undrainable since according to the PDB, you cannot evict both pods at the same time. Another example is given a dev k8s cluster with just one worker node, this single node k8s cluster cannot be drained properly either if a rabbitmqcluster is deployed with a Second reason that I don't think it's a good idea to have default PDB is that we do not restrict users configurations on number of replicas that they can specify and we do not have strong recommendations/limits on how many rabbitmq clusters can be created given a single k8s cluster. A reasonable PDB for a three node rabbitmq cluster is not going to be the same to a five node rabbitmq cluster. With no limitation on number of replicas that users can have for a rabbitmq cluster, having maxUnavailable set to one is not going to make sense for any rabbitmq cluster that has more than 4 nodes. In summary, I think given we do not have hard requirement on how users can deploy rabbitmqcluster and we don't have control over k8s clusters that users use, it's extremely difficult to have a default PDB that has a guarantee to work for majority of users and a default PDB will likely limit use cases that the operator supports. I don't think we should do it by default. I think leaving it to the user is the right option until we actually decide to limit the type of rabbitmq deployment that the operator support. Thoughts? @ansd @mkuratczyk |
Looking at other operators, it seems like at least some of them set the default of or include the PDB in the chart: I feel like the value of Given all the arguments I've seen so far, if we create PDB at all then the logic could look like this:
Is this something people would be comfortable with? |
I agree with @ChunyiLyu here. We should not automate creation of PDBs. On top of what Chunyi shared, enabling the Operator to create PDBs will imply that we will allow the Operator to create/delete/update/watch PDB in all the cluster, given how we configure the Operator RBAC. I can foresee many human operators not being happy with this. We reached a similar conclussion with regards to Pod Security Policies. Allowing the Operator to perform CRUD operations on PDBs cluster-wide may also allow privilege escalation for the end-user. Alana (Alana who? 😄) might (likely) enable Cody (Cody who? 😄) to create We can instead document with an example how to leverage PDBs and set some recommendations. Perhaps even update the production ready example with one PDB with some sensible defaults. I definitely oppose to automate and own PDBs in the Operator. |
|
@ChunyiLyu regarding your 1st reason:
That's desired behaviour for this feature.
Our topology spread constraint implementation in cluster-operator/internal/resource/statefulset.go Lines 471 to 483 in 6a15d4b
maxUnavailable as suggested by @mkuratczyk. Even if the user didn't override maxUnavailable , the K8s worker node will not be undrainable since either the user can manually evict pods (as described here) or they can kubectl drain --disable-eviction=true which will bypass checking PodDisruptionBudgets.
Making the RabbitMQ cluster operator production safe removing the burden for users to manually create and delete PodDiscruptionBudgets feels more important to me than optimising for draining a dev cluster. Since it's only a dev cluster, a user can easily run Regarding your 2nd reason:
Why doesn't it make sense to set the default PodDisruptionBudget to @Zerpet I'm not sure whether I understand why it gives the user (Cody) privilege escalation if the operator creates a PodDisruptionBudget object for a particular RabbitMQ cluster? In summary, I think it's a good idea to have the operator create PodDisruptionBudgets by default allowing the user to override the values. The Kafka operator is doing the same. See here and here. |
The platform administrator (Alana) enables Cody to create The same reasoning above can be applied to other resources e.g. Service, which could be used to expose certain applications that Alana does not wish to expose, however the Service object is essential to run RabbitMQ (inaccessible RabbitMQ is almost the same as not running).
I disagree with this statement. We should not break any workflow, regardless of the tag ("dev" or "prod") of the cluster.
I would prefer to deliver this as part of the |
This issue has been marked as stale due to 60 days of inactivity. Stale issues will be closed after a further 30 days of inactivity; please remove the stale label in order to prevent this occurring. |
Closing stale issue due to further inactivity. |
In #510 we added a
PodDiscruptionBudget
to the production examples.Instead of having the user manually create and delete this object, let's have the cluster operator manage this object.
The
PodDiscruptionBudget
feature is in beta stage (since K8s v1.5), i.e. enabled by default. Are there users which have beta features disabled? If so, we could hide this feature behind a new top level property to the RabbitMQ spec.See also #401 (comment).
The text was updated successfully, but these errors were encountered: