Skip to content

Commit e1402c7

Browse files
Automatically increase scrape interval for high_cardinality ...
... metrics if there are too many of them reported. The goal is to maintain sane size of stats part of cbcollect dump. The prometheus_cfg process wakes up every 10 min and performs the following steps: 1) Firstly, it gets the latest scrape information for each target from prometheus. Right now we need to know only how many samples are reported in each scrape by each service. Prometheus keeps this information in the scrape_samples_scraped metric. 2) All samples are divided into two parts: those for which the scrape interval is static, and those for which the scrape interval can be changed. First group is all the low cardinality metrics and the high cardinality metrics for which the scrape interval is set explicitly. All other samples fall to the second group (all high cardinality metrics where the scrape interval is not explicitly set). 3) Then it calculates how many samples can be written per second to satisfy cbcollect dump size requirement and subtracts the rate of "static" samples from it (first group from #2). The resulting number is the maximum samples rate for metrics from second group. 4) Now when it knows the max samples rate and the number of samples per scrape, it is easy to calculate scrape intervals for each service. For example: the max sample rate is 100 samples per second and the number of samples per scrape is 2000. We can calculate scrape interval: 2000 s / 100 sps = 20 s. Change-Id: I383dacfaf88a0ba392c97a72bd809f9428469535 Reviewed-on: http://review.couchbase.org/c/ns_server/+/136831 Tested-by: Timofey Barmin <[email protected]> Reviewed-by: Steve Watanabe <[email protected]> Reviewed-by: Sam Cramer <[email protected]>
1 parent e5f4849 commit e1402c7

File tree

1 file changed

+453
-17
lines changed

1 file changed

+453
-17
lines changed

0 commit comments

Comments
 (0)