@@ -37,6 +37,10 @@ each node individually through the `/stats` endpoint. From there, Heapster
37
37
scrapes the metrics for CPU, memory and network usage, then exports them into
38
38
Hawkular Metrics.
39
39
40
+ The storage volume metrics available on the kubelet are not available through
41
+ the `/stats` endpoint, but are available through the `/metrics` endpoint. See
42
+ {product-title} via Prometheus for detailed information.
43
+
40
44
Browsing individual pods in the web console displays separate sparkline charts
41
45
for memory and CPU. The time range displayed is selectable, and these charts
42
46
automatically update every 30 seconds. If there are multiple containers on the
@@ -63,7 +67,7 @@ previous to v1.0.8, even if it has since been updated to a newer version, follow
63
67
the instructions for node certificates outlined in
64
68
xref:../install_config/upgrading/manual_upgrades.adoc#install-config-upgrading-manual-upgrades[Updating
65
69
Master and Node Certificates]. If the node certificate does not contain the IP
66
- address of the node, then Heapster will fail to retrieve any metrics.
70
+ address of the node, then Heapster fails to retrieve any metrics.
67
71
====
68
72
endif::[]
69
73
@@ -102,9 +106,9 @@ volume].
102
106
=== Persistent Storage
103
107
104
108
Running {product-title} cluster metrics with persistent storage means that your
105
- metrics will be stored to a
109
+ metrics are stored to a
106
110
xref:../architecture/additional_concepts/storage.adoc#persistent-volumes[persistent
107
- volume] and be able to survive a pod being restarted or recreated. This is ideal
111
+ volume] and are able to survive a pod being restarted or recreated. This is ideal
108
112
if you require your metrics data to be guarded from data loss. For production
109
113
environments it is highly recommended to configure persistent storage for your
110
114
metrics pods.
@@ -205,7 +209,7 @@ storage space as a buffer for unexpected monitored pod usage.
205
209
[WARNING]
206
210
====
207
211
If the Cassandra persisted volume runs out of sufficient space, then data loss
208
- will occur .
212
+ occurs .
209
213
====
210
214
211
215
For cluster metrics to work with persistent storage, ensure that the persistent
@@ -301,7 +305,7 @@ metrics-gathering solutions.
301
305
=== Non-Persistent Storage
302
306
303
307
Running {product-title} cluster metrics with non-persistent storage means that
304
- any stored metrics will be deleted when the pod is deleted. While it is much
308
+ any stored metrics are deleted when the pod is deleted. While it is much
305
309
easier to run cluster metrics with non-persistent data, running with
306
310
non-persistent data does come with the risk of permanent data loss. However,
307
311
metrics can still survive a container being restarted.
@@ -313,16 +317,16 @@ to `emptyDir` in the inventory file.
313
317
314
318
[NOTE]
315
319
====
316
- When using non-persistent storage, metrics data will be written to
320
+ When using non-persistent storage, metrics data is written to
317
321
*_/var/lib/origin/openshift.local.volumes/pods_* on the node where the Cassandra
318
- pod is running. Ensure *_/var_* has enough free space to accommodate metrics
322
+ pod runs Ensure *_/var_* has enough free space to accommodate metrics
319
323
storage.
320
324
====
321
325
322
326
[[metrics-ansible-role]]
323
327
== Metrics Ansible Role
324
328
325
- The OpenShift Ansible `openshift_metrics` role configures and deploys all of the
329
+ The {product-title} Ansible `openshift_metrics` role configures and deploys all of the
326
330
metrics components using the variables from the
327
331
xref:../install_config/install/advanced_install.adoc#configuring-ansible[Configuring
328
332
Ansible] inventory file.
@@ -501,7 +505,7 @@ Technology Preview and is not installed by default.
501
505
502
506
[NOTE]
503
507
====
504
- The Hawkular OpenShift Agent on {product-title} is a Technology Preview feature
508
+ The Hawkular {product-title} Agent on {product-title} is a Technology Preview feature
505
509
only.
506
510
ifdef::openshift-enterprise[]
507
511
Technology Preview features are not
@@ -535,7 +539,7 @@ that it does not become full.
535
539
536
540
[WARNING]
537
541
====
538
- Data loss will result if the Cassandra persisted volume runs out of sufficient space.
542
+ Data loss results if the Cassandra persisted volume runs out of sufficient space.
539
543
====
540
544
541
545
All of the other variables are optional and allow for greater customization.
@@ -556,8 +560,8 @@ running.
556
560
[[metrics-using-secrets]]
557
561
=== Using Secrets
558
562
559
- The OpenShift Ansible `openshift_metrics` role will auto-generate self-signed certificates for use between its
560
- components and will generate a
563
+ The {product-title} Ansible `openshift_metrics` role auto-generates self-signed certificates for use between its
564
+ components and generates a
561
565
xref:../architecture/networking/routes.adoc#secured-routes[re-encrypting route] to expose
562
566
the Hawkular Metrics service. This route is what allows the web console to access the Hawkular Metrics
563
567
service.
@@ -566,14 +570,14 @@ In order for the browser running the web console to trust the connection through
566
570
this route, it must trust the route's certificate. This can be accomplished by
567
571
xref:metrics-using-secrets-byo-certs[providing your own certificates] signed by
568
572
a trusted Certificate Authority. The `openshift_metrics` role allows you to
569
- specify your own certificates which it will then use when creating the route.
573
+ specify your own certificates, which it then uses when creating the route.
570
574
571
575
The router's default certificate are used if you do not provide your own.
572
576
573
577
[[metrics-using-secrets-byo-certs]]
574
578
==== Providing Your Own Certificates
575
579
576
- To provide your own certificate which will be used by the
580
+ To provide your own certificate, which is used by the
577
581
xref:../architecture/networking/routes.adoc#secured-routes[re-encrypting
578
582
route], you can set the `openshift_metrics_hawkular_cert`,
579
583
`openshift_metrics_hawkular_key`, and `openshift_metrics_hawkular_ca`
@@ -592,7 +596,7 @@ route documentation].
592
596
== Deploying the Metric Components
593
597
594
598
Because deploying and configuring all the metric components is handled with
595
- OpenShift Ansible, you can deploy everything in one step.
599
+ {product-title} Ansible, you can deploy everything in one step.
596
600
597
601
The following examples show you how to deploy metrics with and without
598
602
persistent storage using the default parameters.
@@ -675,8 +679,7 @@ For example, if your `openshift_metrics_hawkular_hostname` corresponds to
675
679
Once you have updated and saved the *_master-config.yaml_* file, you must
676
680
restart your {product-title} instance.
677
681
678
- When your {product-title} server is back up and running, metrics will be
679
- displayed on the pod overview pages.
682
+ When your {product-title} server is back up and running, metrics are displayed on the pod overview pages.
680
683
681
684
[CAUTION]
682
685
====
@@ -698,16 +701,16 @@ Metrics API].
698
701
699
702
[NOTE]
700
703
====
701
- When accessing Hawkular Metrics from the API, you will only be able to perform
702
- reads. Writing metrics has been disabled by default. If you want for individual
704
+ When accessing Hawkular Metrics from the API, you are only able to perform
705
+ reads. Writing metrics is disabled by default. If you want individual
703
706
users to also be able to write metrics, you must set the
704
707
`openshift_metrics_hawkular_user_write_access`
705
708
xref:../install_config/cluster_metrics.adoc#metrics-ansible-variables[variable]
706
709
to *true*.
707
710
708
711
However, it is recommended to use the default configuration and only have
709
712
metrics enter the system via Heapster. If write access is enabled, any user
710
- will be able to write metrics to the system, which can affect performance and
713
+ can write metrics to the system, which can affect performance and
711
714
cause Cassandra disk usage to unpredictably increase.
712
715
====
713
716
@@ -732,7 +735,7 @@ privileges to access.
732
735
[[cluster-metrics-authorization]]
733
736
=== Authorization
734
737
735
- The Hawkular Metrics service will authenticate the user against {product-title}
738
+ The Hawkular Metrics service authenticates the user against {product-title}
736
739
to determine if the user has access to the project it is trying to access.
737
740
738
741
Hawkular Metrics accepts a bearer token from the client and verifies that token
@@ -748,8 +751,8 @@ ifdef::openshift-origin[]
748
751
[[cluster-metrics-accessing-heapster-directly]]
749
752
== Accessing Heapster Directly
750
753
751
- Heapster has been configured to be only accessible via the API proxy.
752
- Accessing it will required either a cluster-reader or cluster-admin privileges.
754
+ Heapster is configured to only be accessible via the API proxy. Accessing
755
+ Heapster requires either a cluster-reader or cluster-admin privileges.
753
756
754
757
For example, to access the Heapster *validate* page, you need to access it
755
758
using something similar to:
@@ -774,8 +777,8 @@ Performance Guide].
774
777
== Integration with Aggregated Logging
775
778
776
779
Hawkular Alerts must be connected to the Aggregated Logging's Elasticsearch to
777
- react on log events. By default, Hawkular will try to find Elasticsearch on its
778
- default place (namespace `logging`, pod `logging-es`) at every boot. If the
780
+ react on log events. By default, Hawkular tries to find Elasticsearch on its
781
+ default place (namespace `logging`, pod `logging-es`) at every boot. If
779
782
Aggregated Logging is installed after Hawkular, the Hawkular Metrics pod might
780
783
need to be restarted in order to recognize the new Elasticsearch server. The
781
784
Hawkular boot log provides a clear indication if the integration could not be
@@ -810,7 +813,7 @@ available.
810
813
[[metrics-cleanup]]
811
814
== Cleanup
812
815
813
- You can remove everything deployed by the OpenShift Ansible `openshift_metrics` role
816
+ You can remove everything deployed by the {product-title} Ansible `openshift_metrics` role
814
817
by performing the following steps:
815
818
816
819
----
@@ -827,7 +830,7 @@ system resources.
827
830
828
831
[IMPORTANT]
829
832
====
830
- Prometheus on OpenShift is a Technology Preview feature only.
833
+ Prometheus on {product-title} is a Technology Preview feature only.
831
834
ifdef::openshift-enterprise[]
832
835
Technology Preview features are not supported with Red Hat production service
833
836
level agreements (SLAs), might not be functionally complete, and Red Hat does
@@ -968,7 +971,7 @@ The Prometheus server automatically exposes a Web UI at `localhost:9090`. You
968
971
can access the Prometheus Web UI with the `view` role.
969
972
970
973
[[openshift-prometheus-config]]
971
- ==== Configuring Prometheus for OpenShift
974
+ ==== Configuring Prometheus for {product-title}
972
975
//
973
976
// Example Prometheus rules file:
974
977
// ----
@@ -1087,6 +1090,118 @@ Once `openshift_metrics_project: openshift-infra` is installed, metrics can be
1087
1090
gathered from the `http://${POD_IP}:7575/metrics` endpoint.
1088
1091
====
1089
1092
1093
+ [[openshift-prometheus-kubernetes-metrics]]
1094
+ === {product-title} Metrics via Prometheus
1095
+
1096
+ The state of a system can be gauged by the metrics that it emits. This section
1097
+ describes current and proposed metrics that identify the health of the storage subsystem and
1098
+ cluster.
1099
+
1100
+ [[k8s-current-metrics]]
1101
+ ==== Current Metrics
1102
+
1103
+ This section describes the metrics currently emitted from Kubernetes’s storage subsystem.
1104
+
1105
+ *Cloud Provider API Call Metrics*
1106
+
1107
+ This metric reports the time and count of success and failures of all
1108
+ cloudprovider API calls. These metrics include `aws_attach_time` and
1109
+ `aws_detach_time`. The type of emitted metrics is a histogram, and hence,
1110
+ Prometheus also generates sum, count, and bucket metrics for these metrics.
1111
+
1112
+ .Example summary of cloudprovider metrics from GCE:
1113
+ ----
1114
+ cloudprovider_gce_api_request_duration_seconds { request = "instance_list"}
1115
+ cloudprovider_gce_api_request_duration_seconds { request = "disk_insert"}
1116
+ cloudprovider_gce_api_request_duration_seconds { request = "disk_delete"}
1117
+ cloudprovider_gce_api_request_duration_seconds { request = "attach_disk"}
1118
+ cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"}
1119
+ cloudprovider_gce_api_request_duration_seconds { request = "list_disk"}
1120
+ ----
1121
+
1122
+ .Example summary of cloudprovider metrics from AWS:
1123
+ ----
1124
+ cloudprovider_aws_api_request_duration_seconds { request = "attach_volume"}
1125
+ cloudprovider_aws_api_request_duration_seconds { request = "detach_volume"}
1126
+ cloudprovider_aws_api_request_duration_seconds { request = "create_tags"}
1127
+ cloudprovider_aws_api_request_duration_seconds { request = "create_volume"}
1128
+ cloudprovider_aws_api_request_duration_seconds { request = "delete_volume"}
1129
+ cloudprovider_aws_api_request_duration_seconds { request = "describe_instance"}
1130
+ cloudprovider_aws_api_request_duration_seconds { request = "describe_volume"}
1131
+ ----
1132
+
1133
+ See
1134
+ link:https://github.com/kubernetes/community/blob/master/contributors/design-proposals/cloud-provider/cloudprovider-storage-metrics.md[Cloud
1135
+ Provider (specifically GCE and AWS) metrics for Storage API calls] for more
1136
+ information.
1137
+
1138
+ *Volume Operation Metrics*
1139
+
1140
+ These metrics report time taken by a storage operation once started. These
1141
+ metrics keep track of operation time at the plug-in level, but do not include
1142
+ time taken by `goroutine` to run or operation to be picked up from the internal
1143
+ queue. These metrics are a type of histogram.
1144
+
1145
+ .Example summary of available volume operation metrics
1146
+ ----
1147
+ storage_operation_duration_seconds { volume_plugin = "aws-ebs", operation_name = "volume_attach" }
1148
+ storage_operation_duration_seconds { volume_plugin = "aws-ebs", operation_name = "volume_detach" }
1149
+ storage_operation_duration_seconds { volume_plugin = "glusterfs", operation_name = "volume_provision" }
1150
+ storage_operation_duration_seconds { volume_plugin = "gce-pd", operation_name = "volume_delete" }
1151
+ storage_operation_duration_seconds { volume_plugin = "vsphere", operation_name = "volume_mount" }
1152
+ storage_operation_duration_seconds { volume_plugin = "iscsi" , operation_name = "volume_unmount" }
1153
+ storage_operation_duration_seconds { volume_plugin = "aws-ebs", operation_name = "unmount_device" }
1154
+ storage_operation_duration_seconds { volume_plugin = "cinder" , operation_name = "verify_volumes_are_attached" }
1155
+ storage_operation_duration_seconds { volume_plugin = "<n/a>" , operation_name = "verify_volumes_are_attached_per_node" }
1156
+ ----
1157
+
1158
+ See
1159
+ link:https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/volume-metrics.md[Volume
1160
+ operation metrics] for more information.
1161
+
1162
+ *Volume Stats Metrics*
1163
+
1164
+ These metrics typically report usage stats of PVC (such as used space vs available space). The type of metrics emitted is gauge.
1165
+
1166
+ .Volume Stats Metrics
1167
+ |===
1168
+ |Metric|Type|Labels/tags
1169
+
1170
+ |volume_stats_capacityBytes
1171
+ |Gauge
1172
+ |namespace,persistentvolumeclaim,persistentvolume=
1173
+
1174
+ |volume_stats_usedBytes
1175
+ |Gauge
1176
+ |namespace=<persistentvolumeclaim-namespace>
1177
+ persistentvolumeclaim=<persistentvolumeclaim-name>
1178
+ persistentvolume=<persistentvolume-name>
1179
+
1180
+ |volume_stats_availableBytes
1181
+ |Gauge
1182
+ |namespace=<persistentvolumeclaim-namespace>
1183
+ persistentvolumeclaim=<persistentvolumeclaim-name>
1184
+ persistentvolume=
1185
+
1186
+ |volume_stats_InodesFree
1187
+ |Gauge
1188
+ |namespace=<persistentvolumeclaim-namespace>
1189
+ persistentvolumeclaim=<persistentvolumeclaim-name>
1190
+ persistentvolume=<persistentvolume-name>
1191
+
1192
+ |volume_stats_Inodes
1193
+ |Gauge
1194
+ |namespace=<persistentvolumeclaim-namespace>
1195
+ persistentvolumeclaim=<persistentvolumeclaim-name>
1196
+ persistentvolume=<persistentvolume-name>
1197
+
1198
+ |volume_stats_InodesUsed
1199
+ |Gauge
1200
+ |namespace=<persistentvolumeclaim-namespace>
1201
+ persistentvolumeclaim=<persistentvolumeclaim-name>
1202
+ persistentvolume=<persistentvolume-name>
1203
+ |===
1204
+
1090
1205
[[openshift-prometheus-undeploy]]
1091
1206
=== Undeploying Prometheus
1092
1207
0 commit comments