Skip to content

WIP: OCPBUGS-18282: Reject reserved labels used as external labels #2579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

slashpai
Copy link
Member

API documentation for externalLabels is like this

// The labels to add to any time series or alerts when communicating with
// external systems (federation, remote storage, Alertmanager).
// Labels defined by `spec.replicaExternalLabelName` and
// `spec.prometheusExternalLabelName` take precedence over this list.

We also advise not to use reserved labels prometheus or prometheus_replica as externalLabels, because they are reserved and will be overwritten. It was not overwritten due to bug in PO which is fixed in upstream.

This change is for CMO to reject it beforehand and report to user if they use reserved labels as externalLabels

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

prometheus and prometheus_replica are reserved labels,
if a user configures this as a label in externalLabels
in cluster-monitoring-configmap warn user about this
and make operator status Degraded

Signed-off-by: Jayapriya Pai <[email protected]>
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 13, 2025
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Mar 13, 2025
@openshift-ci-robot
Copy link
Contributor

@slashpai: This pull request references Jira Issue OCPBUGS-18282, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @juzhao

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

API documentation for externalLabels is like this

// The labels to add to any time series or alerts when communicating with
// external systems (federation, remote storage, Alertmanager).
// Labels defined by `spec.replicaExternalLabelName` and
// `spec.prometheusExternalLabelName` take precedence over this list.

We also advise not to use reserved labels prometheus or prometheus_replica as externalLabels, because they are reserved and will be overwritten. It was not overwritten due to bug in PO which is fixed in upstream.

This change is for CMO to reject it beforehand and report to user if they use reserved labels as externalLabels

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from juzhao, jan--f and marioferh March 13, 2025 05:36
Copy link
Contributor

openshift-ci bot commented Mar 13, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: slashpai

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 13, 2025
Copy link
Contributor

openshift-ci bot commented Mar 13, 2025

@slashpai: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/versions cf4366d link false /test versions
ci/prow/e2e-aws-ovn-single-node cf4366d link false /test e2e-aws-ovn-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@@ -34,6 +34,8 @@ const (
StorageNotConfiguredReason = "PrometheusDataPersistenceNotConfigured"
UserAlermanagerConfigMisconfiguredMessage = "Misconfigured Alertmanager: Alertmanager for user-defined alerting is enabled in the openshift-monitoring/cluster-monitoring-config configmap by setting 'enableUserAlertmanagerConfig: true' field. This conflicts with a dedicated Alertmanager instance enabled in openshift-user-workload-monitoring/user-workload-monitoring-config. Alertmanager enabled in openshift-user-workload-monitoring takes precedence over the one in openshift-monitoring, so please remove the 'enableUserAlertmanagerConfig' field in openshift-monitoring/cluster-monitoring-config."
UserAlermanagerConfigMisconfiguredReason = "UserAlertmanagerMisconfigured"
CannotUseReservedExternalLabelsMessage = "Reserved Labels cannot be as External Labels. `externalLabels` specified under `prometheusK8s` field in the `openshift-monitoring/cluster-monitoring-config` uses reserved labels `prometheus` or `prometheus_replica` which is not allowed, please remove these from externalLabels"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juzhao
Copy link
Contributor

juzhao commented Mar 26, 2025

from warning message in https://docs.redhat.com/en/documentation/openshift_container_platform/4.17/html/monitoring/configuring-core-platform-monitoring#attaching-additional-labels-to-your-time-series-and-alerts_configuring-alerts-and-notifications
Do not use cluster or managed_cluster as key names. Using them can cause issues where you are unable to see data in the developer dashboards.
do we need to consider it?

@jan--f
Copy link
Contributor

jan--f commented Mar 26, 2025

Yeah good catch, I think cluster and managed_cluster should be added here.

@machine424
Copy link
Contributor

Thanks for reviving this.
Given the latest enhancements in config validation, how about sth like

diff --git a/pkg/manifests/config.go b/pkg/manifests/config.go
index be013ce12..247cd2340 100644
--- a/pkg/manifests/config.go
+++ b/pkg/manifests/config.go
@@ -52,6 +52,7 @@ const (
 )
 
 var errAlertmanagerV1NotSupported = errors.New("alertmanager's apiVersion=v1 is no longer supported, v2 has been available since Alertmanager 0.16.0")
+var reservedPrometheusExternalLabels = []string{"prometheus", "promtheus_replica"}
 
 type Config struct {
        Images                               *Images `json:"-"`
@@ -713,3 +714,17 @@ func NewDefaultUserWorkloadMonitoringConfig() *UserWorkloadConfiguration {
        u.applyDefaults()
        return u
 }
+
+func (lb *ExternalLabels) UnmarshalJSON(data []byte) error {
+       var v map[string]string
+       if err := json.Unmarshal(data, &v); err != nil {
+               return err
+       }
+       for _, r := range reservedPrometheusExternalLabels {
+               if _, ok := v[r]; ok {
+                       return fmt.Errorf("XXX")
+               }
+       }
+       *lb = v
+       return nil
+}
diff --git a/pkg/manifests/types.go b/pkg/manifests/types.go
index ad06174ba..c60162b40 100644
--- a/pkg/manifests/types.go
+++ b/pkg/manifests/types.go
@@ -22,6 +22,8 @@ import (
 type CollectionProfile string
 type CollectionProfiles []CollectionProfile
 
+type ExternalLabels map[string]string
+
 const (
        FullCollectionProfile    = "full"
        MinimalCollectionProfile = "minimal"
@@ -191,7 +193,7 @@ type PrometheusK8sConfig struct {
        // Defines labels to be added to any time series or alerts when
        // communicating with external systems such as federation, remote storage,
        // and Alertmanager. By default, no labels are added.
-       ExternalLabels map[string]string `json:"externalLabels,omitempty"`
+       ExternalLabels ExternalLabels `json:"externalLabels,omitempty"`
        // Defines the log level setting for Prometheus.
        // The possible values are: `error`, `warn`, `info`, and `debug`.
        // The default value is `info`.
@@ -655,7 +657,7 @@ type PrometheusRestrictedConfig struct {
        // communicating with external systems such as federation, remote storage,
        // and Alertmanager.
        // By default, no labels are added.
-       ExternalLabels map[string]string `json:"externalLabels,omitempty"`
+       ExternalLabels ExternalLabels `json:"externalLabels,omitempty"`
        // Defines the log level setting for Prometheus.
        // The possible values are `error`, `warn`, `info`, and `debug`.
        // The default setting is `info`.

(implementing UnmarshalJSON for that field), in >=4.18 we'll have that caught in early validation (via the webhook)
also it doesn't require any extra/specific errors/changes.

machine424 added a commit to machine424/cluster-monitoring-operator that referenced this pull request May 21, 2025
…rnalLabels

supersedes openshift#2579

As this is done at the parsing level, the monitoringconfigmaps.openshift.io
webhook would catch this early.
@machine424
Copy link
Contributor

we agreed on implementing #2579 (comment) here #2604

/close

@openshift-ci openshift-ci bot closed this May 21, 2025
@openshift-ci-robot
Copy link
Contributor

@slashpai: This pull request references Jira Issue OCPBUGS-18282. The bug has been updated to no longer refer to the pull request using the external bug tracker.

In response to this:

API documentation for externalLabels is like this

// The labels to add to any time series or alerts when communicating with
// external systems (federation, remote storage, Alertmanager).
// Labels defined by `spec.replicaExternalLabelName` and
// `spec.prometheusExternalLabelName` take precedence over this list.

We also advise not to use reserved labels prometheus or prometheus_replica as externalLabels, because they are reserved and will be overwritten. It was not overwritten due to bug in PO which is fixed in upstream.

This change is for CMO to reject it beforehand and report to user if they use reserved labels as externalLabels

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented May 21, 2025

@machine424: Closed this PR.

In response to this:

we agreed on implementing #2579 (comment) here #2604

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

machine424 added a commit to machine424/cluster-monitoring-operator that referenced this pull request May 21, 2025
…rnalLabels

supersedes openshift#2579

As this is done at the parsing level, the monitoringconfigmaps.openshift.io
webhook would catch this early.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants