Add bodysize limit for metric scraping. #1467

raptorsun · 2021-11-05T09:23:44Z

I added CHANGELOG entry for this change.
No user-facing changes, so no entry in CHANGELOG was needed.

This PR adds an option to CMO config map to activate bodysize limit on metrics scraping, which can prevent potential OOM problems when scraping metric endpoints responding with an oversized HTTP body.

The dependency upgrade PR is PR #1468 . This functionality requires Prometheus-Operator 0.51+. So I upgrade it to 0.52.

Here is the JIRA ticket.

FIeld prometheusK8s.enforcedBodySizeLimit to CMO ConfigMap, accepting the size format from Prometheus:

Empty value and 0 mean no limit
string "automatic" for automatically set the limit according the cluster pod capacity.
[0-9]+[A-Z][a-z]*B for a customized size.

raptorsun · 2021-11-06T19:00:03Z

/retest-required

raptorsun · 2021-11-08T17:47:27Z

/test e2e-agnostic-upgrade

raptorsun · 2021-11-23T16:45:53Z

Tested with clusterbot it is well set to 2MB body size limit when adding the option limitScrapeBodySize: true in CMO config map.

$k exec -it prometheus-k8s-0 -n openshift-monitoring -- cat /etc/prometheus/config_out/prometheus.env.yaml  | grep body_size
  body_size_limit: 2MB
  body_size_limit: 2MB
  body_size_limit: 2MB

With the default value 2MB limit all targets are correctly scraped.

When setting the body size limit to 500KB part of the targets shows an error of exceeding the limit as shown in the screenshot below. (CMO running locally)

jan--f · 2021-11-24T10:30:27Z

Nice, this looks good to me. Is the scenario of hitting a scrape limit already covered by an existing alert?

I think it should definitely be alerted upon, as otherwise user might not get relevant metrics without being aware.

raptorsun · 2021-11-24T22:34:34Z

Nice, this looks good to me. Is the scenario of hitting a scrape limit already covered by an existing alert?

I think it should definitely be alerted upon, as otherwise user might not get relevant metrics without being aware.

Thanks for pointing out the alert. I almost forget it 😀
There is a metric prometheus_target_scrapes_exceeded_body_size_limit_total in prometheus to trigger alert when scrape body exceeds the size limit. I complete the PR with a commit adding this alert.

pkg/manifests/config.go

arajkumar · 2021-11-25T07:11:34Z

jsonnet/utils/sanitize-rules.libsonnet

@@ -326,6 +326,26 @@ local patchedRules = [
  },
 ];

+local addedRules = [


Can this be defined in upstream prometheus mixin?

Yes, I will submit a change to upstream prometheus mixins later. Once it is fit into upstream, we can clean it from CMO.

PR created in Prometheus to add this alert to mixins: prometheus/prometheus#9873
Hope it can be merged someday :)

Can you add a TODO so we don't forget?

of course, when the PR is merged, we can safely removed this alert patch.

let remove the patched alert and rely on upstream.

pkg/manifests/config.go

raptorsun · 2021-11-25T16:28:22Z

A new version is pushed :)
limitScrapeBodySize is a string now accepting the following 3 values:

a size in Prometheus config format such as "10MB"
a string "default" to use the default value we provided "2MB"
empty. no limit will be set.

simonpasquier · 2022-04-20T14:43:17Z

pkg/manifests/config.go

@@ -315,19 +325,18 @@ func (cfg *TelemeterClientConfig) IsEnabled() bool {
 	return true
 }

-func NewConfig(content io.Reader) (*Config, error) {
+func NewConfig(content io.Reader) (res *Config, err error) {


I'm not a big fan of named return values as I find that they make the code less readable. Maybe this is only me but I don't feel either that this change is required here.

simonpasquier · 2022-04-20T14:43:59Z

pkg/manifests/config.go

+}
+
+func (c *Config) LoadEnforcedBodySizeLimit(pcr PodCapacityReader, ctx context.Context) error {
+	if c.ClusterMonitoringConfiguration.PrometheusK8sConfig.EnforcedBodySizeLimit == "automatic" {


Maybe define a const for "automatic".

simonpasquier · 2022-04-20T14:45:21Z

pkg/manifests/config.go

+	TelemetryMatches      []string                             `json:"-"`
+	AlertmanagerConfigs   []AdditionalAlertmanagerConfig       `json:"additionalAlertmanagerConfigs"`
+	QueryLogFile          string                               `json:"queryLogFile"`
+	EnforcedBodySizeLimit string                               `json:"enforcedBodySizeLimit,omitempty"`


Can you add a comment about how using the parameter?

an empty value means no enforcement

"automatic" means that CMO picks up a value based on the cluster capacity

A fixed size can be defined too.

simonpasquier · 2022-04-20T14:46:57Z

pkg/manifests/config.go

+}
+
+func (c *Config) LoadEnforcedBodySizeLimit(pcr PodCapacityReader, ctx context.Context) error {
+	if c.ClusterMonitoringConfiguration.PrometheusK8sConfig.EnforcedBodySizeLimit == "automatic" {


you can handle the "return-early" case first.

Suggested change

if c.ClusterMonitoringConfiguration.PrometheusK8sConfig.EnforcedBodySizeLimit == "automatic" {

if c.ClusterMonitoringConfiguration.PrometheusK8sConfig.EnforcedBodySizeLimit == "" {

return nil

}

if c.ClusterMonitoringConfiguration.PrometheusK8sConfig.EnforcedBodySizeLimit == "automatic" {

simonpasquier · 2022-04-20T14:47:48Z

pkg/manifests/config.go

+	return nil
+}
+
+func (c *Config) UseMinimalEnforcedBodySizeLimit() {


this doesn't seem to be used but should be incorporated in calculateBodySizeLimit() I believe.

simonpasquier · 2022-04-20T14:48:22Z

pkg/manifests/config.go

+func calculateBodySizeLimit(podCapacity int) string {
+	const samplesPerPod = 400       // 400 samples per pod
+	const sizePerSample = 200       // 200 Bytes
+	const loadFactorPercentage = 60 // assume 80% of the maximum pods capacity per node is used


I would assume that the full capacity can be used.

simonpasquier · 2022-04-20T14:49:51Z

pkg/manifests/config.go

+	bodySize := loadFactorPercentage * podCapacity / 100 * samplesPerPod * sizePerSample
+	if bodySize < minimalSizeLimit {
+		bodySize = minimalSizeLimit
+		klog.Infof("Calculated scrape body size limit is too small, using default value %v instead", minimalSizeLimit)


we could log both values (e.g. "calculated body size limit = ... is too small, using ... instead")

simonpasquier · 2022-04-20T14:51:59Z

pkg/operator/operator.go

+	err = c.LoadEnforcedBodySizeLimit(o.client, ctx)
+	if err != nil {
+		c.ClusterMonitoringConfiguration.PrometheusK8sConfig.EnforcedBodySizeLimit = ""
+		klog.Warningf("Error loading enforced body size limit, no body size limit will be enforced. Error: %v", err)


Suggested change

klog.Warningf("Error loading enforced body size limit, no body size limit will be enforced. Error: %v", err)

klog.Warningf("Error loading enforced body size limit, no body size limit will be enforced: %v", err)

simonpasquier · 2022-04-20T14:53:45Z

test/e2e/prometheus_test.go

+	f.MustCreateOrUpdateConfigMap(t, configMapWithData(t, data))
+
+	f.PrometheusK8sClient.WaitForQueryReturn(
+		t, 5*time.Minute, `ceil(sum(increase(prometheus_target_scrapes_exceeded_body_size_limit_total{job="prometheus-k8s"}[5m])))`,


Suggested change

t, 5*time.Minute, `ceil(sum(increase(prometheus_target_scrapes_exceeded_body_size_limit_total{job="prometheus-k8s"}[5m])))`,

t, 5*time.Minute, `sum(increase(prometheus_target_scrapes_exceeded_body_size_limit_total{job="prometheus-k8s"}[5m]))`,

need to keep the ceil function to round the result to an integer as WaitForQueryReturn requires.

raptorsun · 2022-04-25T13:44:29Z

/test e2e-agnostic-upgrade

simonpasquier · 2022-05-05T13:00:29Z

pkg/manifests/config.go

+	const loadFactorPercentage = 100 // assume 100% of the maximum pods capacity per node is used
+
+	bodySize := loadFactorPercentage * podCapacity / 100 * samplesPerPod * sizePerSample


I think we can get rid of loadFactorPercentage.

Suggested change

const loadFactorPercentage = 100 // assume 100% of the maximum pods capacity per node is used

bodySize := loadFactorPercentage * podCapacity / 100 * samplesPerPod * sizePerSample

bodySize := podCapacity * samplesPerPod * sizePerSample

simonpasquier · 2022-05-05T13:04:49Z

pkg/manifests/config.go

+	// Limit the body size from scrape queries
+	// Assumptions: one node has in average 110 pods, each pod exposes 400 metrics, each metric is expressed by on average 250 bytes.
+	// 1.5x the size for a safe margin, it rounds to 16MB (16,500,000 Bytes).
+	minimalSizeLimit = 1.5 * 110 * 400 * 250


We probably need a "safe" lower-bound value for the "automatically computed" value but I'm not sure that this value is accurate. I would take a typical CI cluster, load it with as many pods/secret/configmaps as possible and measure the body size returned by kube-state-metrics /metrics.

Having tested with 100k secrets and 100k config maps with kube-burner, KSM scrape target gives us ~32MB body size.
So this lower bound is tripled to 48MB in new commit. It should be large enough I suppose :)

bodysize when scraping metric. Empty value or 0 means bodysize limit. "automatic" for automatically deduced bodysize limit.

raptorsun · 2022-05-12T09:02:00Z

/test e2e-agnostic-operator

raptorsun · 2022-05-16T09:09:59Z

/test e2e-agnostic-operator

jan--f

This lgtm

simonpasquier

/lgtm

simonpasquier · 2022-05-18T15:23:21Z

/hold cancel

openshift-ci · 2022-05-18T15:24:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jan--f, raptorsun, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jan--f,raptorsun,simonpasquier]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2022-05-18T16:09:30Z

/retest-required