Skip to content

Add setting to decommission legacy monitoring cluster alerts #62668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 29, 2020

Conversation

jbaiera
Copy link
Member

@jbaiera jbaiera commented Sep 18, 2020

Adds a setting that, when enabled, directs any currently running exporters in monitoring will treat any cluster alert definition as excluded from the list of allowed cluster alert watches. This is the first step to adding a migration path away from using cluster alerts configured by the monitoring plugin and toward those managed by the stack monitoring solutions on the new alerting feature.

This will allow us to test if the exporters actually pull the watches back down when they aren't in use.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Monitoring)

@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Sep 18, 2020
@jbaiera jbaiera changed the title Monitoring decommission watch setting Add setting to decommission legacy monitoring cluster alerts Sep 18, 2020
@jbaiera
Copy link
Member Author

jbaiera commented Sep 22, 2020

@elasticmachine update branch

Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments

@@ -62,7 +63,9 @@ public Exporters(Settings settings, Map<String, Exporter.Factory> factories,

final List<Setting.AffixSetting<?>> dynamicSettings =
getSettings().stream().filter(Setting::isDynamic).collect(Collectors.toList());
clusterService.getClusterSettings().addSettingsUpdateConsumer(this::setExportersSetting, dynamicSettings);
final List<Setting<?>> updateSettings = new ArrayList<Setting<?>>(dynamicSettings);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should #getSettings() also return this new setting ? (I think it is only really used for testing, but seems more correct to return all the settings up through the plugin).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I missed that step, good catch!

@@ -77,6 +77,9 @@
public static final Setting<Boolean> CLEAN_WATCHER_HISTORY = boolSetting("xpack.watcher.history.cleaner_service.enabled",
true, Setting.Property.Dynamic, Setting.Property.NodeScope, Setting.Property.Deprecated);

public static final Setting<Boolean> MIGRATION_DECOMMISSION_ALERTS = boolSetting("xpack.monitoring.migration.decommission_alerts",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels kinda weird to have this AND xpack.monitoring.exporters.*.cluster_alerts.management.enabled setting.

I wonder if makes sense to

  1. expose a POST _monitoring/cluster_alert/disable endpoint (or whatever the name is)
  2. when that is called iterate through all of the current exporters and set xpack.monitoring.exporters.<i>.cluster_alerts.management.enabled=false and set some cluster state to no longer try to install cluster alerts (and reach out to delete the cluster alerts for those exporters)
  3. If a new exporter is created, set xpack.monitoring.exporters.<new_exporter>.cluster_alerts.management.enabled=false based on the cluster state.
  4. expose a POST _monitoring/cluster_alert/enable

thoughts ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favour of less settings but OTOH current flow with this setting seems simpler and less error-prone. And currently you can't enable single exporter back (with xpack.monitoring.exporters.<i>.cluster_alerts.management.enabled=true) which I think is good thing (easier to reason about cluster state).
We can still expose _monitoring/cluster_alert/disable endpoint if needed though if we want, it may be easier for the user to just call API than looking up setting to use

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider this more of a first step with a new action that performs the migration to follow. If the exporters are disabled, this setting only ensures that the alerts are not created if they are turned back on (since alerts are created lazily much to my chagrin).

The main concern that was raised in the migration issue was that if a new exporter is configured, either on purpose or by accident, the cluster alerts for that exporter will not be installed. A global setting to control the existence of the alerts is the easiest way to fix that, even if it overlaps with other settings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense, thanks for explaining.

@@ -484,6 +488,56 @@ public void testWatchPublishBlocksAfterSuccessfulWatcherCheck() {
verifyNoMoreInteractions(client);
}

public void testWatchRemovalOnDecomissionAlerts() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this name is bit misleading ? (maybe testDeployClusterAlerts) ?

@@ -77,6 +77,9 @@
public static final Setting<Boolean> CLEAN_WATCHER_HISTORY = boolSetting("xpack.watcher.history.cleaner_service.enabled",
true, Setting.Property.Dynamic, Setting.Property.NodeScope, Setting.Property.Deprecated);

public static final Setting<Boolean> MIGRATION_DECOMMISSION_ALERTS = boolSetting("xpack.monitoring.migration.decommission_alerts",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favour of less settings but OTOH current flow with this setting seems simpler and less error-prone. And currently you can't enable single exporter back (with xpack.monitoring.exporters.<i>.cluster_alerts.management.enabled=true) which I think is good thing (easier to reason about cluster state).
We can still expose _monitoring/cluster_alert/disable endpoint if needed though if we want, it may be easier for the user to just call API than looking up setting to use

…/monitoring/exporter/local/LocalExporter.java

Co-authored-by: Przemko Robakowski <[email protected]>
Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM... a couple minor points that could be addressed (but no need for additional reviews)

@jbaiera
Copy link
Member Author

jbaiera commented Oct 28, 2020

@elasticmachine run elasticsearch-ci/bwc

@jbaiera
Copy link
Member Author

jbaiera commented Oct 28, 2020

@elasticmachine run elasticsearch-ci/default-distro

@jbaiera
Copy link
Member Author

jbaiera commented Oct 29, 2020

@elasticmachine update branch

@jbaiera jbaiera merged commit 1695751 into elastic:master Oct 29, 2020
jbaiera added a commit that referenced this pull request Dec 14, 2020
…62668) (#64443)

Adds a setting that, when enabled, directs any currently running exporters in monitoring 
will treat any cluster alert definition as excluded from the list of allowed cluster alert 
watches. This is the first step to adding a migration path away from using cluster alerts 
configured by the monitoring plugin and toward those managed by the stack monitoring 
solutions on the new alerting feature.

Co-authored-by: Przemko Robakowski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants