You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Cluster Health Checking](#cluster-health-checking)
36
37
37
38
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
38
39
@@ -227,3 +228,22 @@ Multitenancy Management
227
228
228
229
### Provider Implementors
229
230
- As a provider, I want the machine controller to reconcile a Machine in response to an event from some other resource in the cluster. This is the sort of thing that other controllers do on a regular basis, so that's nothing particularly interesting. But having made a machine actuator, there's not an easy way to get access to the machine controller object in order to call its Watch method.
231
+
232
+
## Cluster Health Checking
233
+
234
+
Cluster Health Checking is a service to provide the health status of Kubernetes cluster and its components.
235
+
236
+
- As an operator, given I have created a Kubernetes-conformant cluster with ClusterAPI, I want to check the Kubernetes cluster node status.
237
+
- Describe nodes and provide details if they are ready/healthy or not ready/healthy.
238
+
- List conditions for any nodes which are `NotReady`, list information about allocated resources.
239
+
240
+
- As an operator, given I have created a Kubernetes-conformant cluster with ClusterAPI, I want to check the kube-apiserver status.
241
+
242
+
- As an operator, given I have created a Kubernetes-conformant cluster with ClusterAPI, I want to check the etcd status.
243
+
244
+
- 🔭 As an operator, given I have created a Kubernetes-conformant cluster with ClusterAPI, I want to check the Kubernetes components status, like ingress controller, other add-on components etc.
245
+
246
+
- 🔭 As an operator, given I have created a Kubernetes-conformant cluster with ClusterAPI, I want to check unhealthy Pods statuses in configured namespace.
247
+
- Provide the details on any pods which are unhealthy in `kube-system` namespace. Filter the unhealthy pods for their status(`kubectl get pods --show-labels -n kube-system | grep -vE "Running|Completed"`)
248
+
- Describe any Pods which are not `Completed|Running`, list the Events to provide hints on the failure.
249
+
- Look for Pods which don't have all of their containers running.
0 commit comments