Skip to content

Commit a56ecaa

Browse files
authored
Critical operation PDB (#2830)
Create the second PDB to cover Pods with a special "critical operation" label set. This label is going to be assigned to all pg cluster's Pods by the Operator during a PG major version upgrade, by Patroni during a cluster/replica bootstrap. It can also be set manually or by any other automation tool.
1 parent f49b4f1 commit a56ecaa

File tree

11 files changed

+456
-123
lines changed

11 files changed

+456
-123
lines changed

docs/administrator.md

+21-9
Original file line numberDiff line numberDiff line change
@@ -620,22 +620,34 @@ By default the topology key for the pod anti affinity is set to
620620
`kubernetes.io/hostname`, you can set another topology key e.g.
621621
`failure-domain.beta.kubernetes.io/zone`. See [built-in node labels](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#interlude-built-in-node-labels) for available topology keys.
622622

623-
## Pod Disruption Budget
623+
## Pod Disruption Budgets
624624

625-
By default the operator uses a PodDisruptionBudget (PDB) to protect the cluster
626-
from voluntarily disruptions and hence unwanted DB downtime. The `MinAvailable`
627-
parameter of the PDB is set to `1` which prevents killing masters in single-node
628-
clusters and/or the last remaining running instance in a multi-node cluster.
625+
By default the operator creates two PodDisruptionBudgets (PDB) to protect the cluster
626+
from voluntarily disruptions and hence unwanted DB downtime: so-called primary PDB and
627+
and PDB for critical operations.
628+
629+
### Primary PDB
630+
The `MinAvailable` parameter of this PDB is set to `1` and, if `pdb_master_label_selector`
631+
is enabled, label selector includes `spilo-role=master` condition, which prevents killing
632+
masters in single-node clusters and/or the last remaining running instance in a multi-node
633+
cluster.
634+
635+
## PDB for critical operations
636+
The `MinAvailable` parameter of this PDB is equal to the `numberOfInstances` set in the
637+
cluster manifest, while label selector includes `critical-operation=true` condition. This
638+
allows to protect all pods of a cluster, given they are labeled accordingly.
639+
For example, Operator labels all Spilo pods with `critical-operation=true` during the major
640+
version upgrade run. You may want to protect cluster pods during other critical operations
641+
by assigning the label to pods yourself or using other means of automation.
629642

630643
The PDB is only relaxed in two scenarios:
631644

632645
* If a cluster is scaled down to `0` instances (e.g. for draining nodes)
633646
* If the PDB is disabled in the configuration (`enable_pod_disruption_budget`)
634647

635-
The PDB is still in place having `MinAvailable` set to `0`. If enabled it will
636-
be automatically set to `1` on scale up. Disabling PDBs helps avoiding blocking
637-
Kubernetes upgrades in managed K8s environments at the cost of prolonged DB
638-
downtime. See PR [#384](https://github.com/zalando/postgres-operator/pull/384)
648+
The PDBs are still in place having `MinAvailable` set to `0`. Disabling PDBs
649+
helps avoiding blocking Kubernetes upgrades in managed K8s environments at the
650+
cost of prolonged DB downtime. See PR [#384](https://github.com/zalando/postgres-operator/pull/384)
639651
for the use case.
640652

641653
## Add cluster-specific labels

docs/quickstart.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ kubectl delete postgresql acid-minimal-cluster
230230
```
231231

232232
This should remove the associated StatefulSet, database Pods, Services and
233-
Endpoints. The PersistentVolumes are released and the PodDisruptionBudget is
233+
Endpoints. The PersistentVolumes are released and the PodDisruptionBudgets are
234234
deleted. Secrets however are not deleted and backups will remain in place.
235235

236236
When deleting a cluster while it is still starting up or got stuck during that

docs/reference/operator_parameters.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -334,13 +334,13 @@ configuration they are grouped under the `kubernetes` key.
334334
pod namespace).
335335

336336
* **pdb_name_format**
337-
defines the template for PDB (Pod Disruption Budget) names created by the
337+
defines the template for primary PDB (Pod Disruption Budget) name created by the
338338
operator. The default is `postgres-{cluster}-pdb`, where `{cluster}` is
339339
replaced by the cluster name. Only the `{cluster}` placeholders is allowed in
340340
the template.
341341

342342
* **pdb_master_label_selector**
343-
By default the PDB will match the master role hence preventing nodes to be
343+
By default the primary PDB will match the master role hence preventing nodes to be
344344
drained if the node_readiness_label is not used. If this option if set to
345345
`false` the `spilo-role=master` selector will not be added to the PDB.
346346

e2e/tests/test_e2e.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -2547,7 +2547,10 @@ def check_cluster_child_resources_owner_references(self, cluster_name, cluster_n
25472547
self.assertTrue(self.has_postgresql_owner_reference(config_ep.metadata.owner_references, inverse), "config endpoint owner reference check failed")
25482548

25492549
pdb = k8s.api.policy_v1.read_namespaced_pod_disruption_budget("postgres-{}-pdb".format(cluster_name), cluster_namespace)
2550-
self.assertTrue(self.has_postgresql_owner_reference(pdb.metadata.owner_references, inverse), "pod disruption owner reference check failed")
2550+
self.assertTrue(self.has_postgresql_owner_reference(pdb.metadata.owner_references, inverse), "primary pod disruption budget owner reference check failed")
2551+
2552+
pdb = k8s.api.policy_v1.read_namespaced_pod_disruption_budget("postgres-{}-critical-op-pdb".format(cluster_name), cluster_namespace)
2553+
self.assertTrue(self.has_postgresql_owner_reference(pdb.metadata.owner_references, inverse), "pod disruption budget for critical operations owner reference check failed")
25512554

25522555
pg_secret = k8s.api.core_v1.read_namespaced_secret("postgres.{}.credentials.postgresql.acid.zalan.do".format(cluster_name), cluster_namespace)
25532556
self.assertTrue(self.has_postgresql_owner_reference(pg_secret.metadata.owner_references, inverse), "postgres secret owner reference check failed")

pkg/cluster/cluster.go

+31-33
Original file line numberDiff line numberDiff line change
@@ -59,16 +59,17 @@ type Config struct {
5959
}
6060

6161
type kubeResources struct {
62-
Services map[PostgresRole]*v1.Service
63-
Endpoints map[PostgresRole]*v1.Endpoints
64-
PatroniEndpoints map[string]*v1.Endpoints
65-
PatroniConfigMaps map[string]*v1.ConfigMap
66-
Secrets map[types.UID]*v1.Secret
67-
Statefulset *appsv1.StatefulSet
68-
VolumeClaims map[types.UID]*v1.PersistentVolumeClaim
69-
PodDisruptionBudget *policyv1.PodDisruptionBudget
70-
LogicalBackupJob *batchv1.CronJob
71-
Streams map[string]*zalandov1.FabricEventStream
62+
Services map[PostgresRole]*v1.Service
63+
Endpoints map[PostgresRole]*v1.Endpoints
64+
PatroniEndpoints map[string]*v1.Endpoints
65+
PatroniConfigMaps map[string]*v1.ConfigMap
66+
Secrets map[types.UID]*v1.Secret
67+
Statefulset *appsv1.StatefulSet
68+
VolumeClaims map[types.UID]*v1.PersistentVolumeClaim
69+
PrimaryPodDisruptionBudget *policyv1.PodDisruptionBudget
70+
CriticalOpPodDisruptionBudget *policyv1.PodDisruptionBudget
71+
LogicalBackupJob *batchv1.CronJob
72+
Streams map[string]*zalandov1.FabricEventStream
7273
//Pods are treated separately
7374
}
7475

@@ -343,14 +344,10 @@ func (c *Cluster) Create() (err error) {
343344
c.logger.Infof("secrets have been successfully created")
344345
c.eventRecorder.Event(c.GetReference(), v1.EventTypeNormal, "Secrets", "The secrets have been successfully created")
345346

346-
if c.PodDisruptionBudget != nil {
347-
return fmt.Errorf("pod disruption budget already exists in the cluster")
347+
if err = c.createPodDisruptionBudgets(); err != nil {
348+
return fmt.Errorf("could not create pod disruption budgets: %v", err)
348349
}
349-
pdb, err := c.createPodDisruptionBudget()
350-
if err != nil {
351-
return fmt.Errorf("could not create pod disruption budget: %v", err)
352-
}
353-
c.logger.Infof("pod disruption budget %q has been successfully created", util.NameFromMeta(pdb.ObjectMeta))
350+
c.logger.Info("pod disruption budgets have been successfully created")
354351

355352
if c.Statefulset != nil {
356353
return fmt.Errorf("statefulset already exists in the cluster")
@@ -1081,9 +1078,9 @@ func (c *Cluster) Update(oldSpec, newSpec *acidv1.Postgresql) error {
10811078
}
10821079
}
10831080

1084-
// pod disruption budget
1085-
if err := c.syncPodDisruptionBudget(true); err != nil {
1086-
c.logger.Errorf("could not sync pod disruption budget: %v", err)
1081+
// pod disruption budgets
1082+
if err := c.syncPodDisruptionBudgets(true); err != nil {
1083+
c.logger.Errorf("could not sync pod disruption budgets: %v", err)
10871084
updateFailed = true
10881085
}
10891086

@@ -1228,10 +1225,10 @@ func (c *Cluster) Delete() error {
12281225
c.logger.Info("not deleting secrets because disabled in configuration")
12291226
}
12301227

1231-
if err := c.deletePodDisruptionBudget(); err != nil {
1228+
if err := c.deletePodDisruptionBudgets(); err != nil {
12321229
anyErrors = true
1233-
c.logger.Warningf("could not delete pod disruption budget: %v", err)
1234-
c.eventRecorder.Eventf(c.GetReference(), v1.EventTypeWarning, "Delete", "could not delete pod disruption budget: %v", err)
1230+
c.logger.Warningf("could not delete pod disruption budgets: %v", err)
1231+
c.eventRecorder.Eventf(c.GetReference(), v1.EventTypeWarning, "Delete", "could not delete pod disruption budgets: %v", err)
12351232
}
12361233

12371234
for _, role := range []PostgresRole{Master, Replica} {
@@ -1730,16 +1727,17 @@ func (c *Cluster) GetCurrentProcess() Process {
17301727
// GetStatus provides status of the cluster
17311728
func (c *Cluster) GetStatus() *ClusterStatus {
17321729
status := &ClusterStatus{
1733-
Cluster: c.Name,
1734-
Namespace: c.Namespace,
1735-
Team: c.Spec.TeamID,
1736-
Status: c.Status,
1737-
Spec: c.Spec,
1738-
MasterService: c.GetServiceMaster(),
1739-
ReplicaService: c.GetServiceReplica(),
1740-
StatefulSet: c.GetStatefulSet(),
1741-
PodDisruptionBudget: c.GetPodDisruptionBudget(),
1742-
CurrentProcess: c.GetCurrentProcess(),
1730+
Cluster: c.Name,
1731+
Namespace: c.Namespace,
1732+
Team: c.Spec.TeamID,
1733+
Status: c.Status,
1734+
Spec: c.Spec,
1735+
MasterService: c.GetServiceMaster(),
1736+
ReplicaService: c.GetServiceReplica(),
1737+
StatefulSet: c.GetStatefulSet(),
1738+
PrimaryPodDisruptionBudget: c.GetPrimaryPodDisruptionBudget(),
1739+
CriticalOpPodDisruptionBudget: c.GetCriticalOpPodDisruptionBudget(),
1740+
CurrentProcess: c.GetCurrentProcess(),
17431741

17441742
Error: fmt.Errorf("error: %s", c.Error),
17451743
}

pkg/cluster/k8sres.go

+37-3
Original file line numberDiff line numberDiff line change
@@ -109,10 +109,15 @@ func (c *Cluster) servicePort(role PostgresRole) int32 {
109109
return pgPort
110110
}
111111

112-
func (c *Cluster) podDisruptionBudgetName() string {
112+
func (c *Cluster) PrimaryPodDisruptionBudgetName() string {
113113
return c.OpConfig.PDBNameFormat.Format("cluster", c.Name)
114114
}
115115

116+
func (c *Cluster) criticalOpPodDisruptionBudgetName() string {
117+
pdbTemplate := config.StringTemplate("postgres-{cluster}-critical-op-pdb")
118+
return pdbTemplate.Format("cluster", c.Name)
119+
}
120+
116121
func makeDefaultResources(config *config.Config) acidv1.Resources {
117122

118123
defaultRequests := acidv1.ResourceDescription{
@@ -2207,7 +2212,7 @@ func (c *Cluster) generateStandbyEnvironment(description *acidv1.StandbyDescript
22072212
return result
22082213
}
22092214

2210-
func (c *Cluster) generatePodDisruptionBudget() *policyv1.PodDisruptionBudget {
2215+
func (c *Cluster) generatePrimaryPodDisruptionBudget() *policyv1.PodDisruptionBudget {
22112216
minAvailable := intstr.FromInt(1)
22122217
pdbEnabled := c.OpConfig.EnablePodDisruptionBudget
22132218
pdbMasterLabelSelector := c.OpConfig.PDBMasterLabelSelector
@@ -2225,7 +2230,36 @@ func (c *Cluster) generatePodDisruptionBudget() *policyv1.PodDisruptionBudget {
22252230

22262231
return &policyv1.PodDisruptionBudget{
22272232
ObjectMeta: metav1.ObjectMeta{
2228-
Name: c.podDisruptionBudgetName(),
2233+
Name: c.PrimaryPodDisruptionBudgetName(),
2234+
Namespace: c.Namespace,
2235+
Labels: c.labelsSet(true),
2236+
Annotations: c.annotationsSet(nil),
2237+
OwnerReferences: c.ownerReferences(),
2238+
},
2239+
Spec: policyv1.PodDisruptionBudgetSpec{
2240+
MinAvailable: &minAvailable,
2241+
Selector: &metav1.LabelSelector{
2242+
MatchLabels: labels,
2243+
},
2244+
},
2245+
}
2246+
}
2247+
2248+
func (c *Cluster) generateCriticalOpPodDisruptionBudget() *policyv1.PodDisruptionBudget {
2249+
minAvailable := intstr.FromInt32(c.Spec.NumberOfInstances)
2250+
pdbEnabled := c.OpConfig.EnablePodDisruptionBudget
2251+
2252+
// if PodDisruptionBudget is disabled or if there are no DB pods, set the budget to 0.
2253+
if (pdbEnabled != nil && !(*pdbEnabled)) || c.Spec.NumberOfInstances <= 0 {
2254+
minAvailable = intstr.FromInt(0)
2255+
}
2256+
2257+
labels := c.labelsSet(false)
2258+
labels["critical-operation"] = "true"
2259+
2260+
return &policyv1.PodDisruptionBudget{
2261+
ObjectMeta: metav1.ObjectMeta{
2262+
Name: c.criticalOpPodDisruptionBudgetName(),
22292263
Namespace: c.Namespace,
22302264
Labels: c.labelsSet(true),
22312265
Annotations: c.annotationsSet(nil),

0 commit comments

Comments
 (0)