-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable hwmon for sensor collection for bare metal clusters #971
enable hwmon for sensor collection for bare metal clusters #971
Conversation
25034f4
to
4eb7cd5
Compare
@@ -23,7 +23,6 @@ spec: | |||
- --path.sysfs=/host/sys | |||
- --path.rootfs=/host/root | |||
- --no-collector.wifi | |||
- --no-collector.hwmon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This modifies the generate assets directly, you need to modify the jsonnet file to remove this, see this guide how to then generate https://github.com/openshift/cluster-monitoring-operator/blob/master/CONTRIBUTING.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this was the only file in this repository that had that string in it. Where is this file coming from upstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR that removed hwmon
from enabled collectors in upstream repository: prometheus-operator/kube-prometheus#381
@@ -7,6 +7,7 @@ | |||
- Adjusted NodeClockNotSynchronising, NodeNetworkReceiveErrs, and NodeNetworkTransmitErrs alerts. | |||
- [#962](https://github.com/openshift/cluster-monitoring-operator/pull/962) Enable namespace by pod and pod total networking Grafana dashboards. | |||
- [#959](https://github.com/openshift/cluster-monitoring-operator/pull/959) Remove memory limits from prometheus-config-reloader in user workload monitoring | |||
- [#971](https://github.com/openshift/cluster-monitoring-operator/pull/971) Enable `hwmon` in node-exporter for hardware sensor data collection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember us disabling not urgent collectors due to high cardinality, I am wondering how many new series this brings in on each cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of series depends on the number of sensors visible on the node, so it's hard to pin that down. On my dev system, I only get some temperature sensors. On other hosts, we should see more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhellmann I am wondering is this enablement part of some concrete initiative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's some example output from one host: https://paste.centos.org/view/e2d24856
That includes temperature data, but that host is apparently not exposing fan speed or other sensors.
4eb7cd5
to
e1c6c72
Compare
/retest |
I've updated the PR with a jsonnet expression to remove the flag from the upstream version of the deployment settings. Regenerating the assets modified manifests/0000_50_cluster-monitoring-operator_02-role.yaml but it looks like that has to do with something not sorting output consistently. I can manually remove that change, but thought I should include all of the output of |
e1c6c72
to
f83ed78
Compare
/retest |
2 similar comments
/retest |
/retest |
2a3a858
to
d745180
Compare
/test generate |
@dhellmann this lgtm, but could you please rebase? |
Enable the hwmon data collection so that hardware telemetry like CPU temperature and fan speeds are available for bare metal clusters. Signed-off-by: Doug Hellmann <[email protected]>
d745180
to
00db9df
Compare
Rebased to resolve the changelog merge conflict. |
/lgtm |
1 similar comment
/lgtm |
/hold |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dhellmann, s-urbaniak The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
just holding to verify that @lilic's comments have been addressed. |
/retest |
1 similar comment
/retest |
/hold cancel lgtm 👍 |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
Enable the hwmon data collection so that hardware telemetry like CPU
temperature and fan speeds are available for bare metal clusters.