Skip to content

Commit abe25af

Browse files
Adds telemeter alert TelemeterClientFailures
Issue: https://issues.redhat.com/browse/MON-2727 Problem: in-cluster admins and folks monitoring submitted Insights should have a way to figure out that the cluster is trying and failing to submit Telemetry. Solution: alert that will trigger when the rate of failed requests reaches a total of 20% of the total rate of requests in a 15 min window
1 parent d6a64bd commit abe25af

File tree

2 files changed

+20
-2
lines changed

2 files changed

+20
-2
lines changed

assets/telemeter-client/prometheus-rule.yaml

+18
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,21 @@ spec:
99
rules:
1010
- expr: max(federate_samples - federate_filtered_samples)
1111
record: cluster:telemetry_selected_series:count
12+
- alert: TelemeterClientFailures
13+
annotations:
14+
description: |-
15+
The telemeter client in namespace {{ $labels.namespace }} fails {{ $value | humanize }} of the requests to the telemeter service.
16+
Check the logs of the telemeter-client pod with the following command:
17+
oc logs -n openshift-monitoring deployment.apps/telemeter-client -c telemeter-client
18+
If the telemeter client fails to authenticate with the telemeter service, make sure that the global pull secret is up to date, see https://docs.openshift.com/container-platform/latest/openshift_images/managing_images/using-image-pull-secrets.html#images-update-global-pull-secret_using-image-pull-secrets for more details.
19+
summary: Telemeter client fails to send metrics
20+
expr: |
21+
sum by (namespace) (
22+
rate(federate_requests_failed_total{job="telemeter-client"}[15m])
23+
) /
24+
sum by (namespace) (
25+
rate(federate_requests_total{job="telemeter-client"}[15m])
26+
) > 0.2
27+
for: 1h
28+
labels:
29+
severity: warning

jsonnet/jsonnetfile.lock.json

+2-2
Original file line numberDiff line numberDiff line change
@@ -120,8 +120,8 @@
120120
"subdir": "jsonnet/telemeter"
121121
}
122122
},
123-
"version": "320b9a967574c0a57690dea1987e1f294dbc22e5",
124-
"sum": "jPX3JQZndZSVPDmkW2HZEib7/oeuVpxGOB/rXSgyOcI=",
123+
"version": "4d304019274307c21afefa108493c8af89a2429d",
124+
"sum": "079UoqPnQJWKoVi2qMsVUANGD0cBkx25D+S7guvrcGc=",
125125
"name": "telemeter-client"
126126
},
127127
{

0 commit comments

Comments
 (0)