The {product-title} 4 installation program provides only a low number of configuration options before installation. Configuring most {product-title} framework components, including the cluster monitoring stack, happens post-installation.
This section explains what configuration is supported, shows how to configure the monitoring stack, and demonstrates several common configuration scenarios.
-
The monitoring stack imposes additional resource requirements. Consult the computing resources recommendations in Scaling the Cluster Monitoring Operator and verify that you have sufficient resources.
You can configure the monitoring stack by creating and updating monitoring config maps.
modules/monitoring-creating-cluster-monitoring-configmap.adoc modules/monitoring-creating-user-defined-workload-monitoring-configmap.adoc
-
See Preparing to configure the monitoring stack for steps to create monitoring config maps
-
See Preparing to configure the monitoring stack for steps to create monitoring config maps
-
See the Kubernetes documentation for details on the
nodeSelector
constraint
-
See Preparing to configure the monitoring stack for steps to create monitoring config maps
-
See the {product-title} documentation on taints and tolerations
-
See the Kubernetes documentation on taints and tolerations
Running cluster monitoring with persistent storage means that your metrics are stored to a persistent volume (PV) and can survive a pod being restarted or recreated. This is ideal if you require your metrics or alerting data to be guarded from data loss. For production environments, it is highly recommended to configure persistent storage. Because of the high IO demands, it is advantageous to use local storage.
Important
|
If you are running cluster monitoring with an attached PVC for Prometheus, you might experience OOM kills during cluster upgrade. When persistent storage is in use for Prometheus, Prometheus memory usage doubles during cluster upgrade and for several hours after upgrade is complete. To avoid the OOM kill issue, allow worker nodes with double the size of memory that was available prior to the upgrade. For example, if you are running monitoring on the minimum recommended nodes, which is 2 cores with 8 GB of RAM, increase memory to 16 GB. For more information, see BZ#1925061. |
-
Dedicate sufficient local persistent storage to ensure that the disk does not become full. How much storage you need depends on the number of pods. For information on system requirements for persistent storage, see Prometheus database storage requirements.
-
Make sure you have a persistent volume (PV) ready to be claimed by the persistent volume claim (PVC), one PV for each replica. Because Prometheus has two replicas and Alertmanager has three replicas, you need five PVs to support the entire monitoring stack. The PVs should be available from the Local Storage Operator. This does not apply if you enable dynamically provisioned storage.
-
Use the block type of storage.
-
Configure local persistent storage.
NoteIf you use a local volume for persistent storage, do not use a raw block volume, which is described with
volumeMode: block
in theLocalVolume
object. Prometheus cannot use raw block volumes.
modules/monitoring-configuring-a-local-persistent-volume-claim.adoc modules/monitoring-modifying-retention-time-for-prometheus-metrics-data.adoc
-
See Preparing to configure the monitoring stack for steps to create monitoring config maps
-
See Setting up remote write compatible endpoints for steps to create a remote write compatible endpoint (such as Thanos).
-
See Tuning remote write settings for information about how to optimize remote write settings for different use cases.
-
For information about additional optional fields, please refer to the API documentation.
modules/monitoring-limiting-scrape-samples-in-user-defined-projects.adoc modules/monitoring-setting-a-scrape-sample-limit-for-user-defined-projects.adoc modules/monitoring-creating-scrape-sample-alerts.adoc
-
See Determining why Prometheus is consuming a lot of disk space for steps to query which metrics have the highest number of scrape samples
-
See Preparing to configure the monitoring stack for steps to create monitoring config maps
-
See Preparing to configure the monitoring stack for steps to create monitoring config maps
-
See Preparing to configure the monitoring stack for steps to create monitoring config maps
-
See Preparing to configure the monitoring stack for steps to create monitoring config maps
-
Learn about remote health reporting and, if necessary, opt out of it