Allow user workload monitoring configuration ConfigMap to be created in UWM ns #804

lilic · 2020-06-09T16:15:12Z

I added CHANGELOG entry for this change.

lilic · 2020-06-10T15:04:07Z

/retest

--- FAIL: TestAlertmanagerOAuthProxy (5.13s)
alertmanager_test.go:383: unexpected status code response, want 200, got 403

strange failure...

lilic · 2020-06-15T08:08:10Z

/retest

lilic

cc @openshift/openshift-team-monitoring this is ready for a first pass, please take a look, I still need to update the CHANGELOG and add an example after it looks good from you.

s-urbaniak · 2020-06-15T13:15:57Z

cmd/operator/main.go

@@ -89,6 +89,7 @@ func Main() int {
 	namespaceUserWorkload := flagset.String("namespace-user-workload", "openshift-user-workload-monitoring", "Namespace to deploy and manage user workload monitoring stack in.")
 	namespaceSelector := flagset.String("namespace-selector", "openshift.io/cluster-monitoring=true", "Selector for namespaces to monitor.")
 	configMapName := flagset.String("configmap", "cluster-monitoring-config", "ConfigMap name to configure the cluster monitoring stack.")
+	userWorkloadConfigMapName := flagset.String("userWorkloadConfigmap", "user-workload-monitoring-config", "ConfigMap name to configure the user workload monitoring stack.")


why does this need to be configurable?

Just followed pattern we had, see line above here, honestly happy to leave it out as well!

my intuition is we should limit exposure here as we want to move to a CRD based setup 🤔

This cli flag increases the public API surfaces.

good point, removed it!

pkg/manifests/config.go

pkg/operator/operator.go

s-urbaniak · 2020-06-15T13:22:19Z

looking great so far! 🎉

s-urbaniak · 2020-06-16T08:56:34Z

/lgtm

lilic · 2020-06-16T11:56:37Z

@s-urbaniak PTAL changes since last time:

I removed the flag, good call!
added example configmap and used it in one of the config unit tests
added CHANGELOG entry.

Thanks!

lilic · 2020-06-16T11:56:48Z

/hold cancel

lilic · 2020-06-16T14:12:40Z

/retest

examples/user-workload/1q

simonpasquier · 2020-06-18T10:54:13Z

pkg/operator/operator.go

+	userCM, err := o.client.GetConfigmap(o.namespaceUserWorkload, o.userWorkloadConfigMapName)
+	if err != nil {
+		if apierrors.IsNotFound(err) {
+			klog.Warning("No User Workload Monitoring ConfigMap was found. Using defaults.")


Info rather than Warning?

We use warning level for the cluster monitoring configmap, I just followed that pattern. It makes sense to me in a way, as you are warning user they did not configure their stack at all. But don't have too strong opinion so can change, but then we should do both?

I'd rather have the message at the info level because for me a warning is something I need at least to investigate and not customizing the monitoring isn't something suspicious. At info, the message would still logged so it would be available for support requests.
And +1000 that if we change it here, it should be consistent across the board.

simonpasquier · 2020-06-18T10:57:07Z

pkg/operator/operator.go

+		return uwc, nil
+	}
+
+	klog.Warning("No User Workload Monitoring ConfigMap was found. Using defaults.")


pkg/operator/operator.go

simonpasquier · 2020-06-18T11:27:10Z

pkg/manifests/config.go

-	PrometheusK8sConfig          *PrometheusK8sConfig `json:"prometheusK8s"`
-	PrometheusUserWorkloadConfig *PrometheusK8sConfig `json:"prometheusUserWorkload"`
+	ClusterMonitoringConfiguration *ClusterMonitoringConfiguration `json:"-"`
+	UserWorkloadConfiguration      *UserWorkloadConfiguration      `json:"-"`


Having both UserWorkloadConfiguration and UserWorkloadConfig is confusing.

UserWorkloadConfig?

We have a UserWorkloadConfig struct:

cluster-monitoring-operator/pkg/manifests/config.go

Lines 157 to 159 in 882dc1b

type UserWorkloadConfig struct {

Enabled *bool `json:"enabled"`

}

And another UserWorkloadConfiguration struct:

cluster-monitoring-operator/pkg/manifests/config.go

Lines 381 to 385 in 882dc1b

type UserWorkloadConfiguration struct {

PrometheusOperator *PrometheusOperatorConfig `json:"prometheusOperator"`

Prometheus *PrometheusK8sConfig `json:"prometheus"`

ThanosRuler *ThanosRulerConfig `json:"thanosRuler"`

}

The names are close which makes it confusing for me when reviewing the code.

I see, makes sense! Any suggestions for new or old name?

simonpasquier · 2020-06-18T11:35:09Z

pkg/manifests/config.go

+	ThanosQuerierConfig      *ThanosQuerierConfig         `json:"thanosQuerier"`
+	UserWorkloadEnabled      *bool                        `json:"enableUserWorkload"`
+	// TODO: Remove in 4.7 release.
+	PrometheusUserWorkloadConfig         *PrometheusK8sConfig      `json:"prometheusUserWorkload"`


Instead of duplicating PrometheusUserWorkloadConfig, PrometheusOperatorUserWorkloadConfig and ThanosRulerConfig here and in UserWorkloadConfiguration maybe it would be simpler to use only UserWorkloadConfiguration? IOW handle the "legacy" fields only when loading the config map but don't expose them here.

The point was to be able to easily remove the current logic (we will just remove it after 4.7), hence the TODO comments, as we will stop supporting this in 4.7. And to separate out the two clear tenants we have right now cluster and user workload.
If you have a better suggestion, can yon clarify, thanks! :)

IOW? :)

An alternative approach would be to remove the user workload monitoring fields from the ClusterMonitoringConfiguration struct and only use Config.UserWorkloadConfiguration. NewConfig() would try to load the uwm fields until we decide that it shouldn't.

Something like this: simonpasquier@44bd3dd

The side effect is that uwm configuration will be taken either from the "legacy" fields or from the new configmap in openshift-user-workload-monitoring but it won't be a merge of both. In the current state, PrometheusOperatorConfig can be defined in the openshift-montoring configmap and PrometheusConfig in the openshift-user-workload-monitoring configmap, it can be confusing or desired, depending on how you see things :)

(IOW: in other words)

hmm .. especially with configuration I much prefer duplication structs. This is a lesson learned the hard way in Kubernetes. Only because things happen to be identical at this point in time, doesn't mean they may not very well develop in different direction in the future, and they almost always do unless the concept is literally the same (eg. TLS configuration in Prometheus). In this case we even know it will be deleted in the future, so I actually prefer @lilic 's currently proposed way.

simonpasquier · 2020-06-18T11:37:22Z

pkg/manifests/manifests.go

+
+	// TODO: remove after 4.7
+
+	if f.config.ClusterMonitoringConfiguration.PrometheusUserWorkloadConfig.LogLevel != "" {


When you have config fields defined both in f.config.ClusterMonitoringConfiguration and f.config.UserWorkloadConfiguration then the former will take precedence. Couldn't this cause confusion?

techPreview is what we have to support right now primarily hence why it takes precedence here, but that is a good point to document.

openshift-user-workload-monitoring namespace

lilic · 2020-06-18T13:33:21Z

/retest
🤔

brancz

/lgtm

openshift-ci-robot · 2020-06-23T12:47:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brancz, lilic, s-urbaniak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [brancz,lilic,s-urbaniak]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2020-06-23T12:55:00Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-06-23T13:08:09Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-06-23T14:39:12Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 9, 2020

openshift-ci-robot requested review from paulfantom and squat June 9, 2020 16:15

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 9, 2020

lilic force-pushed the user-workload-config branch 7 times, most recently from dbef134 to f8f027e Compare June 10, 2020 13:18

lilic force-pushed the user-workload-config branch from f8f027e to 4e233a2 Compare June 15, 2020 11:59

lilic changed the title ~~WIP: Allow user workload monitoring configuration configmap to be cre…~~ Allow user workload monitoring configuration ConfigMap to be created in UWM ns Jun 15, 2020

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 15, 2020

lilic commented Jun 15, 2020

View reviewed changes

s-urbaniak reviewed Jun 15, 2020

View reviewed changes

pkg/manifests/config.go Outdated Show resolved Hide resolved

s-urbaniak reviewed Jun 15, 2020

View reviewed changes

pkg/operator/operator.go Outdated Show resolved Hide resolved

s-urbaniak reviewed Jun 15, 2020

View reviewed changes

pkg/operator/operator.go Outdated Show resolved Hide resolved

s-urbaniak reviewed Jun 15, 2020

View reviewed changes

pkg/operator/operator.go Outdated Show resolved Hide resolved

lilic force-pushed the user-workload-config branch from 08b56d1 to cbd1e28 Compare June 15, 2020 13:26

openshift-ci-robot assigned s-urbaniak Jun 16, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 16, 2020

lilic force-pushed the user-workload-config branch from cbd1e28 to b1297c0 Compare June 16, 2020 11:47

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jun 16, 2020

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2020

lilic force-pushed the user-workload-config branch from b1297c0 to 7b4146d Compare June 16, 2020 11:53

openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2020

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 16, 2020

simonpasquier reviewed Jun 18, 2020

View reviewed changes

examples/user-workload/1q Outdated Show resolved Hide resolved

simonpasquier reviewed Jun 18, 2020

View reviewed changes

Allow user workload monitoring configuration configmap to be created in

882dc1b

openshift-user-workload-monitoring namespace

lilic force-pushed the user-workload-config branch from 7b4146d to 882dc1b Compare June 18, 2020 11:41

brancz approved these changes Jun 23, 2020

View reviewed changes

openshift-ci-robot assigned brancz Jun 23, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 23, 2020

openshift-merge-robot merged commit 31f0ca4 into openshift:master Jun 23, 2020

lilic deleted the user-workload-config branch June 26, 2020 06:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow user workload monitoring configuration ConfigMap to be created in UWM ns #804

Allow user workload monitoring configuration ConfigMap to be created in UWM ns #804

lilic commented Jun 9, 2020 •

edited

Loading

lilic commented Jun 10, 2020

lilic commented Jun 15, 2020

lilic left a comment

s-urbaniak Jun 15, 2020

lilic Jun 15, 2020

s-urbaniak Jun 16, 2020

lilic Jun 16, 2020

s-urbaniak commented Jun 15, 2020

s-urbaniak commented Jun 16, 2020

lilic commented Jun 16, 2020

lilic commented Jun 16, 2020

lilic commented Jun 16, 2020

simonpasquier Jun 18, 2020

lilic Jun 24, 2020

simonpasquier Jul 6, 2020

simonpasquier Jun 18, 2020

simonpasquier Jun 18, 2020

lilic Jun 18, 2020

simonpasquier Jun 18, 2020

lilic Jun 18, 2020

simonpasquier Jun 18, 2020

lilic Jun 18, 2020 •

edited

Loading

simonpasquier Jun 18, 2020

brancz Jun 23, 2020

simonpasquier Jun 18, 2020

lilic Jun 18, 2020

lilic commented Jun 18, 2020

brancz left a comment

openshift-ci-robot commented Jun 23, 2020

openshift-bot commented Jun 23, 2020

openshift-bot commented Jun 23, 2020

openshift-bot commented Jun 23, 2020

	type UserWorkloadConfig struct {
	Enabled *bool `json:"enabled"`
	}

	type UserWorkloadConfiguration struct {
	PrometheusOperator *PrometheusOperatorConfig `json:"prometheusOperator"`
	Prometheus *PrometheusK8sConfig `json:"prometheus"`
	ThanosRuler *ThanosRulerConfig `json:"thanosRuler"`
	}


		// TODO: remove after 4.7

		if f.config.ClusterMonitoringConfiguration.PrometheusUserWorkloadConfig.LogLevel != "" {

Allow user workload monitoring configuration ConfigMap to be created in UWM ns #804

Allow user workload monitoring configuration ConfigMap to be created in UWM ns #804

Conversation

lilic commented Jun 9, 2020 • edited Loading

lilic commented Jun 10, 2020

lilic commented Jun 15, 2020

lilic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s-urbaniak commented Jun 15, 2020

s-urbaniak commented Jun 16, 2020

lilic commented Jun 16, 2020

lilic commented Jun 16, 2020

lilic commented Jun 16, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lilic Jun 18, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lilic commented Jun 18, 2020

brancz left a comment

Choose a reason for hiding this comment

openshift-ci-robot commented Jun 23, 2020

openshift-bot commented Jun 23, 2020

openshift-bot commented Jun 23, 2020

openshift-bot commented Jun 23, 2020

lilic commented Jun 9, 2020 •

edited

Loading

lilic Jun 18, 2020 •

edited

Loading