Scaling Cluster Metrics

Overview
Recommendations for {product-title} Version 3.7
Capacity Planning for Cluster Metrics
Scaling {product-title} Metrics Pods
- Prerequisites
- Scaling the Cassandra Components

Overview

{product-title} exposes metrics that can be collected and stored in back-ends by Heapster. As an {product-title} administrator, you can view containers and components metrics in one user interface. These metrics are also used by horizontal pod autoscalers in order to determine when and how to scale.

This topic provides information for scaling the metrics components.

Recommendations for {product-title} Version 3.7

Run metrics pods on dedicated {product-title} infrastructure nodes.
Use persistent storage when configuring metrics. Set USE_PERSISTENT_STORAGE=true.
Keep the METRICS_RESOLUTION=30 parameter in {product-title} metrics deployments. Using a value lower than the default value of 30 for METRICS_RESOLUTION is not recommended. When using the Ansible metrics installation procedure, this is the openshift_metrics_resolution parameter.
Closely monitor {product-title} nodes with host metrics pods to detect early capacity shortages (CPU and memory) on the host system. These capacity shortages can cause problems for metrics pods.
In {product-title} version 3.7 testing, test cases up to 25,000 pods were monitored in a {product-title} cluster.

Capacity Planning for Cluster Metrics

In tests performed with with 210 and 990 {product-title} nodes, where 10500 pods and 11000 pods were monitored respectively, the Cassandra database grew at the speed shown in the table below:

Table 1. Cassandra Database storage requirements based on number of nodes/pods in the cluster

Number of Nodes	Number of Pods	Cassandra Storage growth speed	Cassandra storage growth per day	Cassandra storage growth per week
210	10500	500 MB per hour	15 GB	75 GB
990	11000	1 GB per hour	30 GB	210 GB

In the above calculation, approximately 20 percent of the expected size was added as overhead to ensure that the storage requirements do not exceed calculated value.

If the METRICS_DURATION and METRICS_RESOLUTION values are kept at the default (7 days and 15 seconds respectively), it is safe to plan Cassandra storage size requirements for week, as in the values above.

Warning

Because {product-title} metrics uses the Cassandra database as a datastore for metrics data, if USE_PERSISTENT_STORAGE=true is set during the metrics set up process, PV will be on top in the network storage, with NFS as the default. However, using network storage in combination with Cassandra is not recommended, as per the Cassandra documentation.

Scaling {product-title} Metrics Pods

One set of metrics pods (Cassandra/Hawkular/Heapster) is able to monitor at least 25,000 pods.

Caution

Pay attention to system load on nodes where {product-title} metrics pods run. Use that information to determine if it is necessary to scale out a number of {product-title} metrics pods and spread the load across multiple {product-title} nodes. Scaling {product-title} metrics heapster pods is not recommended.

Prerequisites

If persistent storage was used to deploy {product-title} metrics, then you must create a persistent volume (PV) for the new Cassandra pod to use before you can scale out the number of {product-title} metrics Cassandra pods. However, if Cassandra was deployed with dynamically provisioned PVs, then this step is not necessary.

Scaling the Cassandra Components

Cassandra nodes use persistent storage. Therefore, scaling up or down is not possible with replication controllers.

Scaling a Cassandra cluster requires modifying the openshift_metrics_cassandra_replicas variable and re-running the deployment. By default, the Cassandra cluster is a single-node cluster.

To scale up the number of {product-title} metrics hawkular pods to two replicas, run:

# oc scale -n openshift-infra --replicas=2 rc hawkular-metrics

Alternatively, update your inventory file and re-run the deployment.

Note	If you add a new node to or remove an existing node from a Cassandra cluster, the data stored in the cluster rebalances across the cluster.

To scale down:

If remotely accessing the container, run the following for the Cassandra node you want to remove:
```
$ oc exec -it <hawkular-cassandra-pod> nodetool decommission
```
If locally accessing the container, run the following instead:
```
$ oc rsh <hawkular-cassandra-pod> nodetool decommission
```
This command can take a while to run since it copies data across the cluster. You can monitor the decommission progress with nodetool netstats -H.
Once the previous command succeeds, scale down the rc for the Cassandra instance to 0.
```
# oc scale -n openshift-infra --replicas=0 rc <hawkular-cassandra-rc>
```
This will remove the Cassandra pod.

Important

If the scale down process completed and the existing Cassandra nodes are functioning as expected, you can also delete the rc for this Cassandra instance and its corresponding persistent volume claim (PVC). Deleting the PVC can permanently delete any data associated with this Cassandra instance, so if the scale down did not fully and successfully complete, you will not be able to recover the lost data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scaling_cluster_metrics.adoc

scaling_cluster_metrics.adoc

Scaling Cluster Metrics

Overview

Recommendations for {product-title} Version 3.7

Capacity Planning for Cluster Metrics

Scaling {product-title} Metrics Pods

Prerequisites

Scaling the Cassandra Components

Files

scaling_cluster_metrics.adoc

Latest commit

History

scaling_cluster_metrics.adoc

File metadata and controls

Scaling Cluster Metrics

Overview

Recommendations for {product-title} Version 3.7

Capacity Planning for Cluster Metrics

Scaling {product-title} Metrics Pods

Prerequisites

Scaling the Cassandra Components