Skip to content

Commit cf9d753

Browse files
author
Traci Morrison
authored
Merge pull request #7768 from openshift-cherrypick-robot/cherry-pick-7579-to-enterprise-3.9-stage
[enterprise-3.9-stage] Trello card: Added Kubernetes Storage Metrics via Prometheus section
2 parents 8eacc30 + 4f2d6e2 commit cf9d753

File tree

1 file changed

+143
-28
lines changed

1 file changed

+143
-28
lines changed

install_config/cluster_metrics.adoc

Lines changed: 143 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,10 @@ each node individually through the `/stats` endpoint. From there, Heapster
3737
scrapes the metrics for CPU, memory and network usage, then exports them into
3838
Hawkular Metrics.
3939

40+
The storage volume metrics available on the kubelet are not available through
41+
the `/stats` endpoint, but are available through the `/metrics` endpoint. See
42+
{product-title} via Prometheus for detailed information.
43+
4044
Browsing individual pods in the web console displays separate sparkline charts
4145
for memory and CPU. The time range displayed is selectable, and these charts
4246
automatically update every 30 seconds. If there are multiple containers on the
@@ -63,7 +67,7 @@ previous to v1.0.8, even if it has since been updated to a newer version, follow
6367
the instructions for node certificates outlined in
6468
xref:../install_config/upgrading/manual_upgrades.adoc#install-config-upgrading-manual-upgrades[Updating
6569
Master and Node Certificates]. If the node certificate does not contain the IP
66-
address of the node, then Heapster will fail to retrieve any metrics.
70+
address of the node, then Heapster fails to retrieve any metrics.
6771
====
6872
endif::[]
6973

@@ -102,9 +106,9 @@ volume].
102106
=== Persistent Storage
103107

104108
Running {product-title} cluster metrics with persistent storage means that your
105-
metrics will be stored to a
109+
metrics are stored to a
106110
xref:../architecture/additional_concepts/storage.adoc#persistent-volumes[persistent
107-
volume] and be able to survive a pod being restarted or recreated. This is ideal
111+
volume] and are able to survive a pod being restarted or recreated. This is ideal
108112
if you require your metrics data to be guarded from data loss. For production
109113
environments it is highly recommended to configure persistent storage for your
110114
metrics pods.
@@ -205,7 +209,7 @@ storage space as a buffer for unexpected monitored pod usage.
205209
[WARNING]
206210
====
207211
If the Cassandra persisted volume runs out of sufficient space, then data loss
208-
will occur.
212+
occurs.
209213
====
210214

211215
For cluster metrics to work with persistent storage, ensure that the persistent
@@ -301,7 +305,7 @@ metrics-gathering solutions.
301305
=== Non-Persistent Storage
302306

303307
Running {product-title} cluster metrics with non-persistent storage means that
304-
any stored metrics will be deleted when the pod is deleted. While it is much
308+
any stored metrics are deleted when the pod is deleted. While it is much
305309
easier to run cluster metrics with non-persistent data, running with
306310
non-persistent data does come with the risk of permanent data loss. However,
307311
metrics can still survive a container being restarted.
@@ -313,16 +317,16 @@ to `emptyDir` in the inventory file.
313317

314318
[NOTE]
315319
====
316-
When using non-persistent storage, metrics data will be written to
320+
When using non-persistent storage, metrics data is written to
317321
*_/var/lib/origin/openshift.local.volumes/pods_* on the node where the Cassandra
318-
pod is running. Ensure *_/var_* has enough free space to accommodate metrics
322+
pod runs Ensure *_/var_* has enough free space to accommodate metrics
319323
storage.
320324
====
321325

322326
[[metrics-ansible-role]]
323327
== Metrics Ansible Role
324328

325-
The OpenShift Ansible `openshift_metrics` role configures and deploys all of the
329+
The {product-title} Ansible `openshift_metrics` role configures and deploys all of the
326330
metrics components using the variables from the
327331
xref:../install_config/install/advanced_install.adoc#configuring-ansible[Configuring
328332
Ansible] inventory file.
@@ -501,7 +505,7 @@ Technology Preview and is not installed by default.
501505

502506
[NOTE]
503507
====
504-
The Hawkular OpenShift Agent on {product-title} is a Technology Preview feature
508+
The Hawkular {product-title} Agent on {product-title} is a Technology Preview feature
505509
only.
506510
ifdef::openshift-enterprise[]
507511
Technology Preview features are not
@@ -535,7 +539,7 @@ that it does not become full.
535539

536540
[WARNING]
537541
====
538-
Data loss will result if the Cassandra persisted volume runs out of sufficient space.
542+
Data loss results if the Cassandra persisted volume runs out of sufficient space.
539543
====
540544

541545
All of the other variables are optional and allow for greater customization.
@@ -556,8 +560,8 @@ running.
556560
[[metrics-using-secrets]]
557561
=== Using Secrets
558562

559-
The OpenShift Ansible `openshift_metrics` role will auto-generate self-signed certificates for use between its
560-
components and will generate a
563+
The {product-title} Ansible `openshift_metrics` role auto-generates self-signed certificates for use between its
564+
components and generates a
561565
xref:../architecture/networking/routes.adoc#secured-routes[re-encrypting route] to expose
562566
the Hawkular Metrics service. This route is what allows the web console to access the Hawkular Metrics
563567
service.
@@ -566,14 +570,14 @@ In order for the browser running the web console to trust the connection through
566570
this route, it must trust the route's certificate. This can be accomplished by
567571
xref:metrics-using-secrets-byo-certs[providing your own certificates] signed by
568572
a trusted Certificate Authority. The `openshift_metrics` role allows you to
569-
specify your own certificates which it will then use when creating the route.
573+
specify your own certificates, which it then uses when creating the route.
570574

571575
The router's default certificate are used if you do not provide your own.
572576

573577
[[metrics-using-secrets-byo-certs]]
574578
==== Providing Your Own Certificates
575579

576-
To provide your own certificate which will be used by the
580+
To provide your own certificate, which is used by the
577581
xref:../architecture/networking/routes.adoc#secured-routes[re-encrypting
578582
route], you can set the `openshift_metrics_hawkular_cert`,
579583
`openshift_metrics_hawkular_key`, and `openshift_metrics_hawkular_ca`
@@ -592,7 +596,7 @@ route documentation].
592596
== Deploying the Metric Components
593597

594598
Because deploying and configuring all the metric components is handled with
595-
OpenShift Ansible, you can deploy everything in one step.
599+
{product-title} Ansible, you can deploy everything in one step.
596600

597601
The following examples show you how to deploy metrics with and without
598602
persistent storage using the default parameters.
@@ -675,8 +679,7 @@ For example, if your `openshift_metrics_hawkular_hostname` corresponds to
675679
Once you have updated and saved the *_master-config.yaml_* file, you must
676680
restart your {product-title} instance.
677681

678-
When your {product-title} server is back up and running, metrics will be
679-
displayed on the pod overview pages.
682+
When your {product-title} server is back up and running, metrics are displayed on the pod overview pages.
680683

681684
[CAUTION]
682685
====
@@ -698,16 +701,16 @@ Metrics API].
698701

699702
[NOTE]
700703
====
701-
When accessing Hawkular Metrics from the API, you will only be able to perform
702-
reads. Writing metrics has been disabled by default. If you want for individual
704+
When accessing Hawkular Metrics from the API, you are only able to perform
705+
reads. Writing metrics is disabled by default. If you want individual
703706
users to also be able to write metrics, you must set the
704707
`openshift_metrics_hawkular_user_write_access`
705708
xref:../install_config/cluster_metrics.adoc#metrics-ansible-variables[variable]
706709
to *true*.
707710
708711
However, it is recommended to use the default configuration and only have
709712
metrics enter the system via Heapster. If write access is enabled, any user
710-
will be able to write metrics to the system, which can affect performance and
713+
can write metrics to the system, which can affect performance and
711714
cause Cassandra disk usage to unpredictably increase.
712715
====
713716

@@ -732,7 +735,7 @@ privileges to access.
732735
[[cluster-metrics-authorization]]
733736
=== Authorization
734737

735-
The Hawkular Metrics service will authenticate the user against {product-title}
738+
The Hawkular Metrics service authenticates the user against {product-title}
736739
to determine if the user has access to the project it is trying to access.
737740

738741
Hawkular Metrics accepts a bearer token from the client and verifies that token
@@ -748,8 +751,8 @@ ifdef::openshift-origin[]
748751
[[cluster-metrics-accessing-heapster-directly]]
749752
== Accessing Heapster Directly
750753

751-
Heapster has been configured to be only accessible via the API proxy.
752-
Accessing it will required either a cluster-reader or cluster-admin privileges.
754+
Heapster is configured to only be accessible via the API proxy. Accessing
755+
Heapster requires either a cluster-reader or cluster-admin privileges.
753756

754757
For example, to access the Heapster *validate* page, you need to access it
755758
using something similar to:
@@ -774,8 +777,8 @@ Performance Guide].
774777
== Integration with Aggregated Logging
775778

776779
Hawkular Alerts must be connected to the Aggregated Logging's Elasticsearch to
777-
react on log events. By default, Hawkular will try to find Elasticsearch on its
778-
default place (namespace `logging`, pod `logging-es`) at every boot. If the
780+
react on log events. By default, Hawkular tries to find Elasticsearch on its
781+
default place (namespace `logging`, pod `logging-es`) at every boot. If
779782
Aggregated Logging is installed after Hawkular, the Hawkular Metrics pod might
780783
need to be restarted in order to recognize the new Elasticsearch server. The
781784
Hawkular boot log provides a clear indication if the integration could not be
@@ -810,7 +813,7 @@ available.
810813
[[metrics-cleanup]]
811814
== Cleanup
812815

813-
You can remove everything deployed by the OpenShift Ansible `openshift_metrics` role
816+
You can remove everything deployed by the {product-title} Ansible `openshift_metrics` role
814817
by performing the following steps:
815818

816819
----
@@ -827,7 +830,7 @@ system resources.
827830

828831
[IMPORTANT]
829832
====
830-
Prometheus on OpenShift is a Technology Preview feature only.
833+
Prometheus on {product-title} is a Technology Preview feature only.
831834
ifdef::openshift-enterprise[]
832835
Technology Preview features are not supported with Red Hat production service
833836
level agreements (SLAs), might not be functionally complete, and Red Hat does
@@ -968,7 +971,7 @@ The Prometheus server automatically exposes a Web UI at `localhost:9090`. You
968971
can access the Prometheus Web UI with the `view` role.
969972

970973
[[openshift-prometheus-config]]
971-
==== Configuring Prometheus for OpenShift
974+
==== Configuring Prometheus for {product-title}
972975
//
973976
// Example Prometheus rules file:
974977
// ----
@@ -1087,6 +1090,118 @@ Once `openshift_metrics_project: openshift-infra` is installed, metrics can be
10871090
gathered from the `http://${POD_IP}:7575/metrics` endpoint.
10881091
====
10891092

1093+
[[openshift-prometheus-kubernetes-metrics]]
1094+
=== {product-title} Metrics via Prometheus
1095+
1096+
The state of a system can be gauged by the metrics that it emits. This section
1097+
describes current and proposed metrics that identify the health of the storage subsystem and
1098+
cluster.
1099+
1100+
[[k8s-current-metrics]]
1101+
==== Current Metrics
1102+
1103+
This section describes the metrics currently emitted from Kubernetes’s storage subsystem.
1104+
1105+
*Cloud Provider API Call Metrics*
1106+
1107+
This metric reports the time and count of success and failures of all
1108+
cloudprovider API calls. These metrics include `aws_attach_time` and
1109+
`aws_detach_time`. The type of emitted metrics is a histogram, and hence,
1110+
Prometheus also generates sum, count, and bucket metrics for these metrics.
1111+
1112+
.Example summary of cloudprovider metrics from GCE:
1113+
----
1114+
cloudprovider_gce_api_request_duration_seconds { request = "instance_list"}
1115+
cloudprovider_gce_api_request_duration_seconds { request = "disk_insert"}
1116+
cloudprovider_gce_api_request_duration_seconds { request = "disk_delete"}
1117+
cloudprovider_gce_api_request_duration_seconds { request = "attach_disk"}
1118+
cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"}
1119+
cloudprovider_gce_api_request_duration_seconds { request = "list_disk"}
1120+
----
1121+
1122+
.Example summary of cloudprovider metrics from AWS:
1123+
----
1124+
cloudprovider_aws_api_request_duration_seconds { request = "attach_volume"}
1125+
cloudprovider_aws_api_request_duration_seconds { request = "detach_volume"}
1126+
cloudprovider_aws_api_request_duration_seconds { request = "create_tags"}
1127+
cloudprovider_aws_api_request_duration_seconds { request = "create_volume"}
1128+
cloudprovider_aws_api_request_duration_seconds { request = "delete_volume"}
1129+
cloudprovider_aws_api_request_duration_seconds { request = "describe_instance"}
1130+
cloudprovider_aws_api_request_duration_seconds { request = "describe_volume"}
1131+
----
1132+
1133+
See
1134+
link:https://github.com/kubernetes/community/blob/master/contributors/design-proposals/cloud-provider/cloudprovider-storage-metrics.md[Cloud
1135+
Provider (specifically GCE and AWS) metrics for Storage API calls] for more
1136+
information.
1137+
1138+
*Volume Operation Metrics*
1139+
1140+
These metrics report time taken by a storage operation once started. These
1141+
metrics keep track of operation time at the plug-in level, but do not include
1142+
time taken by `goroutine` to run or operation to be picked up from the internal
1143+
queue. These metrics are a type of histogram.
1144+
1145+
.Example summary of available volume operation metrics
1146+
----
1147+
storage_operation_duration_seconds { volume_plugin = "aws-ebs", operation_name = "volume_attach" }
1148+
storage_operation_duration_seconds { volume_plugin = "aws-ebs", operation_name = "volume_detach" }
1149+
storage_operation_duration_seconds { volume_plugin = "glusterfs", operation_name = "volume_provision" }
1150+
storage_operation_duration_seconds { volume_plugin = "gce-pd", operation_name = "volume_delete" }
1151+
storage_operation_duration_seconds { volume_plugin = "vsphere", operation_name = "volume_mount" }
1152+
storage_operation_duration_seconds { volume_plugin = "iscsi" , operation_name = "volume_unmount" }
1153+
storage_operation_duration_seconds { volume_plugin = "aws-ebs", operation_name = "unmount_device" }
1154+
storage_operation_duration_seconds { volume_plugin = "cinder" , operation_name = "verify_volumes_are_attached" }
1155+
storage_operation_duration_seconds { volume_plugin = "<n/a>" , operation_name = "verify_volumes_are_attached_per_node" }
1156+
----
1157+
1158+
See
1159+
link:https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/volume-metrics.md[Volume
1160+
operation metrics] for more information.
1161+
1162+
*Volume Stats Metrics*
1163+
1164+
These metrics typically report usage stats of PVC (such as used space vs available space). The type of metrics emitted is gauge.
1165+
1166+
.Volume Stats Metrics
1167+
|===
1168+
|Metric|Type|Labels/tags
1169+
1170+
|volume_stats_capacityBytes
1171+
|Gauge
1172+
|namespace,persistentvolumeclaim,persistentvolume=
1173+
1174+
|volume_stats_usedBytes
1175+
|Gauge
1176+
|namespace=<persistentvolumeclaim-namespace>
1177+
persistentvolumeclaim=<persistentvolumeclaim-name>
1178+
persistentvolume=<persistentvolume-name>
1179+
1180+
|volume_stats_availableBytes
1181+
|Gauge
1182+
|namespace=<persistentvolumeclaim-namespace>
1183+
persistentvolumeclaim=<persistentvolumeclaim-name>
1184+
persistentvolume=
1185+
1186+
|volume_stats_InodesFree
1187+
|Gauge
1188+
|namespace=<persistentvolumeclaim-namespace>
1189+
persistentvolumeclaim=<persistentvolumeclaim-name>
1190+
persistentvolume=<persistentvolume-name>
1191+
1192+
|volume_stats_Inodes
1193+
|Gauge
1194+
|namespace=<persistentvolumeclaim-namespace>
1195+
persistentvolumeclaim=<persistentvolumeclaim-name>
1196+
persistentvolume=<persistentvolume-name>
1197+
1198+
|volume_stats_InodesUsed
1199+
|Gauge
1200+
|namespace=<persistentvolumeclaim-namespace>
1201+
persistentvolumeclaim=<persistentvolumeclaim-name>
1202+
persistentvolume=<persistentvolume-name>
1203+
|===
1204+
10901205
[[openshift-prometheus-undeploy]]
10911206
=== Undeploying Prometheus
10921207

0 commit comments

Comments
 (0)