-
Notifications
You must be signed in to change notification settings - Fork 130
manifests/0000_90_kube-controller-manager-operator_05_alerts: Template console links in alert descriptions #837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,7 +25,8 @@ spec: | |
- alert: PodDisruptionBudgetAtLimit | ||
annotations: | ||
summary: The pod disruption budget is preventing further disruption to pods. | ||
description: The pod disruption budget is at the minimum disruptions allowed level. The number of current healthy pods is equal to the desired healthy pods. | ||
description: |- | ||
The {{ $labels.poddisruptionbudget }} pod disruption budget in the {{ $labels.namespace }} namespace is at the maximum allowed disruption. The number of current healthy pods is equal to the desired healthy pods.{{ with $console_url := "console_url" | query }}{{ if ne (len (label "url" (first $console_url))) 0}} For more information refer to {{ label "url" (first $console_url) }}/k8s/ns/{{ $labels.namespace }}/poddisruptionbudgets/{{ $labels.poddisruptionbudget }}{{ end }}{{ end }} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then we would not need the console url and let the console handle all the link rendering for the |
||
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-kube-controller-manager-operator/PodDisruptionBudgetAtLimit.md | ||
expr: | | ||
max by(namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_current_healthy == kube_poddisruptionbudget_status_desired_healthy and on (namespace, poddisruptionbudget) kube_poddisruptionbudget_status_expected_pods > 0) | ||
|
@@ -35,17 +36,19 @@ spec: | |
- alert: PodDisruptionBudgetLimit | ||
annotations: | ||
summary: The pod disruption budget registers insufficient amount of pods. | ||
description: The pod disruption budget is below the minimum disruptions allowed level and is not satisfied. The number of current healthy pods is less than the desired healthy pods. | ||
description: |- | ||
The {{ $labels.poddisruptionbudget }} pod disruption budget in the {{ $labels.namespace }} namespace exceeds the maximum allowed disruption and is not satisfied. The number of current healthy pods is {{ $value }} less than the desired healthy pods.{{ with $console_url := "console_url" | query }}{{ if ne (len (label "url" (first $console_url))) 0}} For more information refer to {{ label "url" (first $console_url) }}/k8s/ns/{{ $labels.namespace }}/poddisruptionbudgets/{{ $labels.poddisruptionbudget }}{{ end }}{{ end }} | ||
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-kube-controller-manager-operator/PodDisruptionBudgetLimit.md | ||
expr: | | ||
max by (namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_current_healthy < kube_poddisruptionbudget_status_desired_healthy) | ||
max by (namespace, poddisruptionbudget) (kube_poddisruptionbudget_status_desired_healthy - kube_poddisruptionbudget_status_current_healthy) > 0 | ||
for: 15m | ||
labels: | ||
severity: critical | ||
- alert: GarbageCollectorSyncFailed | ||
annotations: | ||
summary: There was a problem with syncing the resources for garbage collection. | ||
description: Garbage Collector had a problem with syncing and monitoring the available resources. Please see KubeControllerManager logs for more details. | ||
description: |- | ||
Garbage Collector had a problem with syncing and monitoring the available resources. Please see KubeControllerManager logs for more details: 'oc -n {{ $labels.namespace }} logs -c {{ $labels.container }} {{ $labels.pod }}'{{ with $console_url := "console_url" | query }}{{ if ne (len (label "url" (first $console_url))) 0}} For more information refer to {{ label "url" (first $console_url) }}/k8s/ns/{{ $labels.namespace }}/pods/{{ $labels.pod }}/logs?container={{ $labels.container }} {{ end }}{{ end }}. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems too verbose to me. How to invoke the logs should be a responsibility of the runbook IMO. But similar to the PDB case, we could link directly to the pod in the console: https://github.com/openshift/monitoring-plugin/blob/6f948e4323bdf7c68e6b625ce3020116b5b4571a/web/src/components/alerting/AlertsDetailPage.tsx#L450 without too much extra markup. |
||
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-kube-controller-manager-operator/GarbageCollectorSyncFailed.md | ||
expr: | | ||
rate(garbagecollector_controller_resources_sync_error_total{}[5m]) > 0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New thread for Cluster Bot testing. As of daae216, with a
launch 4.19,openshift/cluster-kube-controller-manager-operator#837 aws
cluster, make a PDB mad:I didn't wait for the alert to kick over into
firing
, but checking onpending
, this looks... almost good to me:the issue is the
<span class="co-resource-item monitoring__resource-item--monitoring-alert co-resource-item--inline">
bit for theNS
injected into my attempt at constructing a console link.To trip
PodDisruptionBudgetLimit
I'll look to a different workload, since I don't want to completely break Prometheus (it would make it hard to test alert behavior):In that case, the rendering looks great, although I'm not clear on why it's not seeing the
NS
rendering issue:I'm also not clear on how to trigger
GarbageCollectorSyncFailed
to test its rendering.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
daae216 -> 9331433 added some whitespace before a
}}
to try to get closer to what the workingPodDisruptionBudgetLimit
description
is doing:But sadly the
NS
markup injected into the middle of the console PDB link is still there:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is not really optimal. As far as I can see, it should be pretty simple by adding poddisruptionbudget resource to here: https://github.com/openshift/monitoring-plugin/blob/6f948e4323bdf7c68e6b625ce3020116b5b4571a/web/src/components/alerting/AlertsDetailPage.tsx#L450