Skip to content

[enterprise-4.15] OBSDOCS-1324: Improve 'troubleshooting monitoring issues: Investigati… #91394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,42 @@ endif::openshift-rosa,openshift-dedicated[]
ifdef::openshift-rosa,openshift-dedicated[]
* You have access to the cluster as a user with the `dedicated-admin` role.
endif::openshift-rosa,openshift-dedicated[]
* You have installed the OpenShift CLI (`oc`).
* You have installed the {oc-first}.
* You have enabled and configured monitoring for user-defined projects.
* You have created a `ServiceMonitor` resource.

.Procedure

. *Check that the corresponding labels match* in the service and `ServiceMonitor` resource configurations.
.. Obtain the label defined in the service. The following example queries the `prometheus-example-app` service in the `ns1` project:
. Ensure that your project is not excluded from user workload monitoring. The following examples use the `ns1` project.

.. Verify that the project _does not_ have the `openshift.io/user-monitoring=false` label attached:
+
[source,terminal]
----
$ oc get namespace ns1 --show-labels | grep 'openshift.io/user-monitoring=false'
----
+
[NOTE]
====
The default label set for user workload projects is `openshift.io/user-monitoring=true`. However, the label is not visible unless you manually apply it.
====

.. If the label is attached, remove the label:
+
.Example of removing the label from the project
[source,terminal]
----
$ oc label namespace ns1 'openshift.io/user-monitoring-'
----
+
.Example output
[source,terminal]
----
namespace/ns1 unlabeled
----

. Check that the corresponding labels match in the service and `ServiceMonitor` resource configurations. The following examples use the `prometheus-example-app` service, the `prometheus-example-monitor` service monitor, and the `ns1` project.
.. Obtain the label defined in the service.
+
[source,terminal]
----
Expand All @@ -38,7 +66,7 @@ $ oc -n ns1 get service prometheus-example-app -o yaml
app: prometheus-example-app
----
+
.. Check that the `matchLabels` definition in the `ServiceMonitor` resource configuration matches the label output in the preceding step. The following example queries the `prometheus-example-monitor` service monitor in the `ns1` project:
.. Check that the `matchLabels` definition in the `ServiceMonitor` resource configuration matches the label output in the preceding step.
+
[source,terminal]
----
Expand Down Expand Up @@ -68,7 +96,7 @@ spec:
You can check service and `ServiceMonitor` resource labels as a developer with view permissions for the project.
====

. *Inspect the logs for the Prometheus Operator* in the `openshift-user-workload-monitoring` project.
. Inspect the logs for the Prometheus Operator in the `openshift-user-workload-monitoring` project.
.. List the pods in the `openshift-user-workload-monitoring` project:
+
[source,terminal]
Expand Down Expand Up @@ -101,14 +129,14 @@ If there is a issue with the service monitor, the logs might include an error si
level=warn ts=2020-08-10T11:48:20.906739623Z caller=operator.go:1829 component=prometheusoperator msg="skipping servicemonitor" error="it accesses file system via bearer token file which Prometheus specification prohibits" servicemonitor=eagle/eagle namespace=openshift-user-workload-monitoring prometheus=user-workload
----

. *Review the target status for your endpoint* on the *Metrics targets* page in the {product-title} web console UI.
. Review the target status for your endpoint on the *Metrics targets* page in the {product-title} web console UI.
.. Log in to the {product-title} web console and navigate to *Observe* → *Targets* in the *Administrator* perspective.

.. Locate the metrics endpoint in the list, and review the status of the target in the *Status* column.

.. If the *Status* is *Down*, click the URL for the endpoint to view more information on the *Target Details* page for that metrics target.

. *Configure debug level logging for the Prometheus Operator* in the `openshift-user-workload-monitoring` project.
. Configure debug level logging for the Prometheus Operator in the `openshift-user-workload-monitoring` project.
.. Edit the `user-workload-monitoring-config` `ConfigMap` object in the `openshift-user-workload-monitoring` project:
+
[source,terminal]
Expand Down