Skip to content

Commit 2d557d4

Browse files
Merge pull request #20602 from bparees/telemetry
add recording rule for build+registry sucess rate
2 parents b60f9b7 + a27d56b commit 2d557d4

File tree

1 file changed

+10
-2
lines changed

1 file changed

+10
-2
lines changed

examples/prometheus/README.md

+10-2
Original file line numberDiff line numberDiff line change
@@ -165,11 +165,19 @@ builds where the fact they have not started could be cited as resulting from use
165165

166166
NOTE: OpenShift Online monitors builds in a fashion similar to this today.
167167

168-
> sum(rate(openshift_build_total{phase="Error"}[10m])) / sum((rate(openshift_build_total{phase="Complete"}[10m]) + rate(openshift_build_total{phase="Error"}[10m]))) * 100
168+
> sum(openshift_build_total{job="kubernetes-apiservers",phase="Error"})/(sum(openshift_build_total{job="kubernetes-apiservers",phase=~"Complete|Error"})) * 100
169169
170-
Calculates the error rate for builds over the last 10 minutes, where the error might indicate issues with the cluster or namespace. Note, it ignores build in the "Failed" and "Cancelled" phases, as builds typically end up in
170+
Calculates the error rate for builds, where the error might indicate issues with the cluster or namespace. Note, it ignores builds in the "Failed" and "Cancelled" phases, as builds typically end up in
171171
one of those phases as the result of a user choice or error. Administrators after some experience with their cluster could decide what is an acceptable error rate and monitor when it is exceeded.
172172

173+
> ((sum(openshift_build_total{job="kubernetes-apiservers",phase="Complete"})-
174+
> sum(openshift_build_total{job="kubernetes-apiservers",phase="Complete"} offset 1h)) /
175+
> (sum(openshift_build_total{job="kubernetes-apiservers",phase=\~"Failed|Complete|Error"}) -
176+
> (sum(openshift_build_total{job="kubernetes-apiservers",phase=\~"Failed|Complete|Error"} offset 1h)))) * 100
177+
178+
Calculates the percentage of builds that were successful in the last hour. Note that this value is only accurate if no pruning of builds
179+
is performed, otherwise it is impossible to determine how many builds ran (successfully or otherwise) in the last hour.
180+
173181
> predict_linear(openshift_build_total{phase="Error"}[1h],3600)
174182
175183
Predicts what the error count will be in 1 hour, using last hours data.

0 commit comments

Comments
 (0)