Merge pull request #45700 from openshift-cherrypick-robot/cherry-pick-45007-to-enterprise-4.11

JStickler · web-flow · commit 88edcdee7bd0 · 2022-05-13T15:54:28.000-04:00
[enterprise-4.11] RHDEVDOCS-3919 - document size-based configuration for Prometheus metrics retention
diff --git a/modules/monitoring-modifying-retention-time-and-size-for-prometheus-metrics-data.adoc b/modules/monitoring-modifying-retention-time-and-size-for-prometheus-metrics-data.adoc
@@ -0,0 +1,133 @@
+// Module included in the following assemblies:
+//
+// * monitoring/configuring-the-monitoring-stack.adoc
+
+:_content-type: PROCEDURE
+[id="modifying-retention-time-and-size-for-prometheus-metrics-data_{context}"]
+= Modifying the retention time and size for Prometheus metrics data
+
+By default, Prometheus automatically retains metrics data for 15 days.
+You can modify the retention time to change how soon data is deleted by specifying a time value in the `retention` field.
+You can also configure the maximum amount of disk space the retained metrics data uses by specifying a size value in the `retentionSize` field.
+If the data reaches this size limit, Prometheus deletes the oldest data first until the disk space used is again below the limit.
+
+Note the following behaviors of these data retention settings:
+
+* The size-based retention policy applies to all data block directories in the `/prometheus` directory, including persistent blocks, write-ahead log (WAL) data, and m-mapped chunks. 
+* Data in the `/wal` and `/head_chunks` directories counts toward the retention size limit, but Prometheus never purges data from these directories based on size- or time-based retention policies. 
+Thus, if you set a retention size limit lower than the maximum size set for the `/wal` and `/head_chunks` directories, you have configured the system not to retain any data blocks in the `/prometheus` data directories.
+* The size-based retention policy is applied only when Prometheus cuts a new data block, which occurs every two hours after the WAL contains at least three hours of data.
+* If you do not explicitly define values for either `retention` or `retentionSize`, retention time defaults to 15 days, and retention size is not set.
+* If you define values for both `retention` and `retentionSize`, both values apply.
+If any data blocks exceed the defined retention time or the defined size limit, Prometheus purges these data blocks.
+* If you define a value for `retentionSize` and do not define `retention`, only the `retentionSize` value applies.
+* If you do not define a value for `retentionSize` and only define a value for `retention`, only the `retention` value applies.
+
+.Prerequisites
+
+* *If you are configuring core {product-title} monitoring components*:
+** You have access to the cluster as a user with the `cluster-admin` role.
+** You have created the `cluster-monitoring-config` `ConfigMap` object.
+* *If you are configuring components that monitor user-defined projects*:
+** A cluster administrator has enabled monitoring for user-defined projects.
+** You have access to the cluster as a user with the `cluster-admin` role, or as a user with the `user-workload-monitoring-config-edit` role in the `openshift-user-workload-monitoring` project.
+** You have created the `user-workload-monitoring-config` `ConfigMap` object.
+* You have installed the OpenShift CLI (`oc`).
+
+[WARNING]
+====
+Saving changes to a monitoring config map might restart monitoring processes and redeploy the pods and other resources in the related project.
+The running monitoring processes in that project might also restart.
+====
+
+.Procedure
+
+. Edit the `ConfigMap` object:
+** *To modify the retention time and size for the Prometheus instance that monitors core {product-title} projects*:
+.. Edit the `cluster-monitoring-config` `ConfigMap` object in the `openshift-monitoring` project:
++
+[source,terminal]
+----
+$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
+----
+
+.. Add the retention time and size configuration under `data/config.yaml`:
++
+[source,yaml]
+----
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: cluster-monitoring-config
+  namespace: openshift-monitoring
+data:
+  config.yaml: |
+    prometheusK8s:
+      retention: <time_specification> <1>
+      retentionSize: <size_specification> <2>
+----
++
+<1> The retention time: a number directly followed by `ms` (milliseconds), `s` (seconds), `m` (minutes), `h` (hours), `d` (days), `w` (weeks), or `y` (years). You can also combine time values for specific times, such as `1h30m15s`.
+<2> The retention size: a number directly followed by `B` (bytes), `KB` (kilobytes), `MB` (megabytes), `GB` (gigabytes), `TB` (terabytes), `PB` (petabytes), and `EB` (exabytes).
++
+The following example sets the retention time to 24 hours and the retention size to 10 gigabytes for the Prometheus instance that monitors core {product-title} components:
++
+[source,yaml,subs=quotes]
+----
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: cluster-monitoring-config
+  namespace: openshift-monitoring
+data:
+  config.yaml: |
+    prometheusK8s:
+      retention: *24h*
+      retentionSize: *10GB*
+----
+
+** *To modify the retention time and size for the Prometheus instance that monitors user-defined projects*:
+.. Edit the `user-workload-monitoring-config` `ConfigMap` object in the `openshift-user-workload-monitoring` project:
++
+[source,terminal]
+----
+$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-config
+----
+
+.. Add the retention time and size configuration under `data/config.yaml`:
++
+[source,yaml]
+----
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: user-workload-monitoring-config
+  namespace: openshift-user-workload-monitoring
+data:
+  config.yaml: |
+    prometheus:
+      retention: <time_specification> <1>
+      retentionSize: <size_specification> <2>
+----
++
+<1> The retention time: a number directly followed by `ms` (milliseconds), `s` (seconds), `m` (minutes), `h` (hours), `d` (days), `w` (weeks), or `y` (years).
+You can also combine time values for specific times, such as `1h30m15s`.
+<2> The retention size: a number directly followed by `B` (bytes), `KB` (kilobytes), `MB` (megabytes), `GB` (gigabytes), `TB` (terabytes), `PB` (petabytes), or `EB` (exabytes).
++
+The following example sets the retention time to 24 hours and the retention size to 10 gigabytes for the Prometheus instance that monitors user-defined projects:
++
+[source,yaml,subs=quotes]
+----
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: user-workload-monitoring-config
+  namespace: openshift-user-workload-monitoring
+data:
+  config.yaml: |
+    prometheus:
+      retention: *24h*
+      retentionSize: *10GB*
+----
+
+. Save the file to apply the changes. The pods affected by the new configuration restart automatically.
diff --git a/modules/monitoring-modifying-retention-time-for-prometheus-metrics-data.adoc b/modules/monitoring-modifying-retention-time-for-prometheus-metrics-data.adoc
diff --git a/monitoring/configuring-the-monitoring-stack.adoc b/monitoring/configuring-the-monitoring-stack.adoc
@@ -96,7 +96,7 @@ If you use a local volume for persistent storage, do not use a raw block volume,
 ====
 
 include::modules/monitoring-configuring-a-local-persistent-volume-claim.adoc[leveloffset=+2]
-include::modules/monitoring-modifying-retention-time-for-prometheus-metrics-data.adoc[leveloffset=+2]
+include::modules/monitoring-modifying-retention-time-and-size-for-prometheus-metrics-data.adoc[leveloffset=+2]
 
 [role="_additional-resources"]
 .Additional resources