@@ -37,6 +37,10 @@ each node individually through the `/stats` endpoint. From there, Heapster
37
37
scrapes the metrics for CPU, memory and network usage, then exports them into
38
38
Hawkular Metrics.
39
39
40
+ The storage volume metrics available on the kubelet are not available through
41
+ the `/stats` endpoint, but are available through the `/metrics` endpoint. See
42
+ {product-title} via Prometheus for detailed information.
43
+
40
44
Browsing individual pods in the web console displays separate sparkline charts
41
45
for memory and CPU. The time range displayed is selectable, and these charts
42
46
automatically update every 30 seconds. If there are multiple containers on the
@@ -63,7 +67,7 @@ previous to v1.0.8, even if it has since been updated to a newer version, follow
63
67
the instructions for node certificates outlined in
64
68
xref:../install_config/upgrading/manual_upgrades.adoc#manual-updating-master-and-node-certificates[Updating
65
69
Master and Node Certificates]. If the node certificate does not contain the IP
66
- address of the node, then Heapster will fail to retrieve any metrics.
70
+ address of the node, then Heapster fails to retrieve any metrics.
67
71
====
68
72
endif::[]
69
73
@@ -102,9 +106,9 @@ volume].
102
106
=== Persistent Storage
103
107
104
108
Running {product-title} cluster metrics with persistent storage means that your
105
- metrics will be stored to a
109
+ metrics are stored to a
106
110
xref:../architecture/additional_concepts/storage.adoc#persistent-volumes[persistent
107
- volume] and be able to survive a pod being restarted or recreated. This is ideal
111
+ volume] and are able to survive a pod being restarted or recreated. This is ideal
108
112
if you require your metrics data to be guarded from data loss. For production
109
113
environments it is highly recommended to configure persistent storage for your
110
114
metrics pods.
@@ -205,7 +209,7 @@ storage space as a buffer for unexpected monitored pod usage.
205
209
[WARNING]
206
210
====
207
211
If the Cassandra persisted volume runs out of sufficient space, then data loss
208
- will occur .
212
+ occurs .
209
213
====
210
214
211
215
For cluster metrics to work with persistent storage, ensure that the persistent
@@ -245,7 +249,7 @@ metrics-gathering solutions.
245
249
=== Non-Persistent Storage
246
250
247
251
Running {product-title} cluster metrics with non-persistent storage means that
248
- any stored metrics will be deleted when the pod is deleted. While it is much
252
+ any stored metrics are deleted when the pod is deleted. While it is much
249
253
easier to run cluster metrics with non-persistent data, running with
250
254
non-persistent data does come with the risk of permanent data loss. However,
251
255
metrics can still survive a container being restarted.
@@ -257,16 +261,16 @@ to `emptyDir` in the inventory file.
257
261
258
262
[NOTE]
259
263
====
260
- When using non-persistent storage, metrics data will be written to
264
+ When using non-persistent storage, metrics data is written to
261
265
*_/var/lib/origin/openshift.local.volumes/pods_* on the node where the Cassandra
262
- pod is running. Ensure *_/var_* has enough free space to accommodate metrics
266
+ pod runs Ensure *_/var_* has enough free space to accommodate metrics
263
267
storage.
264
268
====
265
269
266
270
[[metrics-ansible-role]]
267
271
== Metrics Ansible Role
268
272
269
- The OpenShift Ansible `openshift_metrics` role configures and deploys all of the
273
+ The {product-title} Ansible `openshift_metrics` role configures and deploys all of the
270
274
metrics components using the variables from the
271
275
xref:../install_config/install/advanced_install.adoc#configuring-ansible[Configuring
272
276
Ansible] inventory file.
@@ -445,7 +449,7 @@ Technology Preview and is not installed by default.
445
449
446
450
[NOTE]
447
451
====
448
- The Hawkular OpenShift Agent on {product-title} is a Technology Preview feature
452
+ The Hawkular {product-title} Agent on {product-title} is a Technology Preview feature
449
453
only.
450
454
ifdef::openshift-enterprise[]
451
455
Technology Preview features are not
@@ -479,7 +483,7 @@ that it does not become full.
479
483
480
484
[WARNING]
481
485
====
482
- Data loss will result if the Cassandra persisted volume runs out of sufficient space.
486
+ Data loss results if the Cassandra persisted volume runs out of sufficient space.
483
487
====
484
488
485
489
All of the other variables are optional and allow for greater customization.
@@ -500,8 +504,8 @@ running.
500
504
[[metrics-using-secrets]]
501
505
=== Using Secrets
502
506
503
- The OpenShift Ansible `openshift_metrics` role will auto-generate self-signed certificates for use between its
504
- components and will generate a
507
+ The {product-title} Ansible `openshift_metrics` role auto-generates self-signed certificates for use between its
508
+ components and generates a
505
509
xref:../architecture/networking/routes.adoc#secured-routes[re-encrypting route] to expose
506
510
the Hawkular Metrics service. This route is what allows the web console to access the Hawkular Metrics
507
511
service.
@@ -510,14 +514,14 @@ In order for the browser running the web console to trust the connection through
510
514
this route, it must trust the route's certificate. This can be accomplished by
511
515
xref:metrics-using-secrets-byo-certs[providing your own certificates] signed by
512
516
a trusted Certificate Authority. The `openshift_metrics` role allows you to
513
- specify your own certificates which it will then use when creating the route.
517
+ specify your own certificates, which it then uses when creating the route.
514
518
515
519
The router's default certificate are used if you do not provide your own.
516
520
517
521
[[metrics-using-secrets-byo-certs]]
518
522
==== Providing Your Own Certificates
519
523
520
- To provide your own certificate which will be used by the
524
+ To provide your own certificate, which is used by the
521
525
xref:../architecture/networking/routes.adoc#secured-routes[re-encrypting
522
526
route], you can set the `openshift_metrics_hawkular_cert`,
523
527
`openshift_metrics_hawkular_key`, and `openshift_metrics_hawkular_ca`
@@ -536,7 +540,7 @@ route documentation].
536
540
== Deploying the Metric Components
537
541
538
542
Because deploying and configuring all the metric components is handled with
539
- OpenShift Ansible, you can deploy everything in one step.
543
+ {product-title} Ansible, you can deploy everything in one step.
540
544
541
545
The following examples show you how to deploy metrics with and without
542
546
persistent storage using the default parameters.
@@ -619,8 +623,7 @@ For example, if your `openshift_metrics_hawkular_hostname` corresponds to
619
623
Once you have updated and saved the *_master-config.yaml_* file, you must
620
624
restart your {product-title} instance.
621
625
622
- When your {product-title} server is back up and running, metrics will be
623
- displayed on the pod overview pages.
626
+ When your {product-title} server is back up and running, metrics are displayed on the pod overview pages.
624
627
625
628
[CAUTION]
626
629
====
@@ -642,16 +645,16 @@ Metrics API].
642
645
643
646
[NOTE]
644
647
====
645
- When accessing Hawkular Metrics from the API, you will only be able to perform
646
- reads. Writing metrics has been disabled by default. If you want for individual
648
+ When accessing Hawkular Metrics from the API, you are only able to perform
649
+ reads. Writing metrics is disabled by default. If you want individual
647
650
users to also be able to write metrics, you must set the
648
651
`openshift_metrics_hawkular_user_write_access`
649
652
xref:../install_config/cluster_metrics.adoc#metrics-ansible-variables[variable]
650
653
to *true*.
651
654
652
655
However, it is recommended to use the default configuration and only have
653
656
metrics enter the system via Heapster. If write access is enabled, any user
654
- will be able to write metrics to the system, which can affect performance and
657
+ can write metrics to the system, which can affect performance and
655
658
cause Cassandra disk usage to unpredictably increase.
656
659
====
657
660
@@ -676,7 +679,7 @@ privileges to access.
676
679
[[cluster-metrics-authorization]]
677
680
=== Authorization
678
681
679
- The Hawkular Metrics service will authenticate the user against {product-title}
682
+ The Hawkular Metrics service authenticates the user against {product-title}
680
683
to determine if the user has access to the project it is trying to access.
681
684
682
685
Hawkular Metrics accepts a bearer token from the client and verifies that token
@@ -692,8 +695,8 @@ ifdef::openshift-origin[]
692
695
[[cluster-metrics-accessing-heapster-directly]]
693
696
== Accessing Heapster Directly
694
697
695
- Heapster has been configured to be only accessible via the API proxy.
696
- Accessing it will required either a cluster-reader or cluster-admin privileges.
698
+ Heapster is configured to only be accessible via the API proxy. Accessing
699
+ Heapster requires either a cluster-reader or cluster-admin privileges.
697
700
698
701
For example, to access the Heapster *validate* page, you need to access it
699
702
using something similar to:
@@ -718,8 +721,8 @@ Performance Guide].
718
721
== Integration with Aggregated Logging
719
722
720
723
Hawkular Alerts must be connected to the Aggregated Logging's Elasticsearch to
721
- react on log events. By default, Hawkular will try to find Elasticsearch on its
722
- default place (namespace `logging`, pod `logging-es`) at every boot. If the
724
+ react on log events. By default, Hawkular tries to find Elasticsearch on its
725
+ default place (namespace `logging`, pod `logging-es`) at every boot. If
723
726
Aggregated Logging is installed after Hawkular, the Hawkular Metrics pod might
724
727
need to be restarted in order to recognize the new Elasticsearch server. The
725
728
Hawkular boot log provides a clear indication if the integration could not be
@@ -754,7 +757,7 @@ available.
754
757
[[metrics-cleanup]]
755
758
== Cleanup
756
759
757
- You can remove everything deployed by the OpenShift Ansible `openshift_metrics` role
760
+ You can remove everything deployed by the {product-title} Ansible `openshift_metrics` role
758
761
by performing the following steps:
759
762
760
763
----
@@ -771,7 +774,7 @@ system resources.
771
774
772
775
[IMPORTANT]
773
776
====
774
- Prometheus on OpenShift is a Technology Preview feature only.
777
+ Prometheus on {product-title} is a Technology Preview feature only.
775
778
ifdef::openshift-enterprise[]
776
779
Technology Preview features are not supported with Red Hat production service
777
780
level agreements (SLAs), might not be functionally complete, and Red Hat does
@@ -912,7 +915,7 @@ The Prometheus server automatically exposes a Web UI at `localhost:9090`. You
912
915
can access the Prometheus Web UI with the `view` role.
913
916
914
917
[[openshift-prometheus-config]]
915
- ==== Configuring Prometheus for OpenShift
918
+ ==== Configuring Prometheus for {product-title}
916
919
//
917
920
// Example Prometheus rules file:
918
921
// ----
@@ -1031,6 +1034,118 @@ Once `openshift_metrics_project: openshift-infra` is installed, metrics can be
1031
1034
gathered from the `http://${POD_IP}:7575/metrics` endpoint.
1032
1035
====
1033
1036
1037
+ [[openshift-prometheus-kubernetes-metrics]]
1038
+ === {product-title} Metrics via Prometheus
1039
+
1040
+ The state of a system can be gauged by the metrics that it emits. This section
1041
+ describes current and proposed metrics that identify the health of the storage subsystem and
1042
+ cluster.
1043
+
1044
+ [[k8s-current-metrics]]
1045
+ ==== Current Metrics
1046
+
1047
+ This section describes the metrics currently emitted from Kubernetes’s storage subsystem.
1048
+
1049
+ *Cloud Provider API Call Metrics*
1050
+
1051
+ This metric reports the time and count of success and failures of all
1052
+ cloudprovider API calls. These metrics include `aws_attach_time` and
1053
+ `aws_detach_time`. The type of emitted metrics is a histogram, and hence,
1054
+ Prometheus also generates sum, count, and bucket metrics for these metrics.
1055
+
1056
+ .Example summary of cloudprovider metrics from GCE:
1057
+ ----
1058
+ cloudprovider_gce_api_request_duration_seconds { request = "instance_list"}
1059
+ cloudprovider_gce_api_request_duration_seconds { request = "disk_insert"}
1060
+ cloudprovider_gce_api_request_duration_seconds { request = "disk_delete"}
1061
+ cloudprovider_gce_api_request_duration_seconds { request = "attach_disk"}
1062
+ cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"}
1063
+ cloudprovider_gce_api_request_duration_seconds { request = "list_disk"}
1064
+ ----
1065
+
1066
+ .Example summary of cloudprovider metrics from AWS:
1067
+ ----
1068
+ cloudprovider_aws_api_request_duration_seconds { request = "attach_volume"}
1069
+ cloudprovider_aws_api_request_duration_seconds { request = "detach_volume"}
1070
+ cloudprovider_aws_api_request_duration_seconds { request = "create_tags"}
1071
+ cloudprovider_aws_api_request_duration_seconds { request = "create_volume"}
1072
+ cloudprovider_aws_api_request_duration_seconds { request = "delete_volume"}
1073
+ cloudprovider_aws_api_request_duration_seconds { request = "describe_instance"}
1074
+ cloudprovider_aws_api_request_duration_seconds { request = "describe_volume"}
1075
+ ----
1076
+
1077
+ See
1078
+ link:https://github.com/kubernetes/community/blob/master/contributors/design-proposals/cloud-provider/cloudprovider-storage-metrics.md[Cloud
1079
+ Provider (specifically GCE and AWS) metrics for Storage API calls] for more
1080
+ information.
1081
+
1082
+ *Volume Operation Metrics*
1083
+
1084
+ These metrics report time taken by a storage operation once started. These
1085
+ metrics keep track of operation time at the plug-in level, but do not include
1086
+ time taken by `goroutine` to run or operation to be picked up from the internal
1087
+ queue. These metrics are a type of histogram.
1088
+
1089
+ .Example summary of available volume operation metrics
1090
+ ----
1091
+ storage_operation_duration_seconds { volume_plugin = "aws-ebs", operation_name = "volume_attach" }
1092
+ storage_operation_duration_seconds { volume_plugin = "aws-ebs", operation_name = "volume_detach" }
1093
+ storage_operation_duration_seconds { volume_plugin = "glusterfs", operation_name = "volume_provision" }
1094
+ storage_operation_duration_seconds { volume_plugin = "gce-pd", operation_name = "volume_delete" }
1095
+ storage_operation_duration_seconds { volume_plugin = "vsphere", operation_name = "volume_mount" }
1096
+ storage_operation_duration_seconds { volume_plugin = "iscsi" , operation_name = "volume_unmount" }
1097
+ storage_operation_duration_seconds { volume_plugin = "aws-ebs", operation_name = "unmount_device" }
1098
+ storage_operation_duration_seconds { volume_plugin = "cinder" , operation_name = "verify_volumes_are_attached" }
1099
+ storage_operation_duration_seconds { volume_plugin = "<n/a>" , operation_name = "verify_volumes_are_attached_per_node" }
1100
+ ----
1101
+
1102
+ See
1103
+ link:https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/volume-metrics.md[Volume
1104
+ operation metrics] for more information.
1105
+
1106
+ *Volume Stats Metrics*
1107
+
1108
+ These metrics typically report usage stats of PVC (such as used space vs available space). The type of metrics emitted is gauge.
1109
+
1110
+ .Volume Stats Metrics
1111
+ |===
1112
+ |Metric|Type|Labels/tags
1113
+
1114
+ |volume_stats_capacityBytes
1115
+ |Gauge
1116
+ |namespace,persistentvolumeclaim,persistentvolume=
1117
+
1118
+ |volume_stats_usedBytes
1119
+ |Gauge
1120
+ |namespace=<persistentvolumeclaim-namespace>
1121
+ persistentvolumeclaim=<persistentvolumeclaim-name>
1122
+ persistentvolume=<persistentvolume-name>
1123
+
1124
+ |volume_stats_availableBytes
1125
+ |Gauge
1126
+ |namespace=<persistentvolumeclaim-namespace>
1127
+ persistentvolumeclaim=<persistentvolumeclaim-name>
1128
+ persistentvolume=
1129
+
1130
+ |volume_stats_InodesFree
1131
+ |Gauge
1132
+ |namespace=<persistentvolumeclaim-namespace>
1133
+ persistentvolumeclaim=<persistentvolumeclaim-name>
1134
+ persistentvolume=<persistentvolume-name>
1135
+
1136
+ |volume_stats_Inodes
1137
+ |Gauge
1138
+ |namespace=<persistentvolumeclaim-namespace>
1139
+ persistentvolumeclaim=<persistentvolumeclaim-name>
1140
+ persistentvolume=<persistentvolume-name>
1141
+
1142
+ |volume_stats_InodesUsed
1143
+ |Gauge
1144
+ |namespace=<persistentvolumeclaim-namespace>
1145
+ persistentvolumeclaim=<persistentvolumeclaim-name>
1146
+ persistentvolume=<persistentvolume-name>
1147
+ |===
1148
+
1034
1149
[[openshift-prometheus-undeploy]]
1035
1150
=== Undeploying Prometheus
1036
1151
0 commit comments