Skip to content

Add setting to decommission legacy monitoring cluster alerts #62668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 29, 2020
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,9 @@ public class Monitoring extends Plugin implements ActionPlugin, ReloadablePlugin
public static final Setting<Boolean> CLEAN_WATCHER_HISTORY = boolSetting("xpack.watcher.history.cleaner_service.enabled",
true, Setting.Property.Dynamic, Setting.Property.NodeScope, Setting.Property.Deprecated);

public static final Setting<Boolean> MIGRATION_DECOMMISSION_ALERTS = boolSetting("xpack.monitoring.migration.decommission_alerts",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels kinda weird to have this AND xpack.monitoring.exporters.*.cluster_alerts.management.enabled setting.

I wonder if makes sense to

  1. expose a POST _monitoring/cluster_alert/disable endpoint (or whatever the name is)
  2. when that is called iterate through all of the current exporters and set xpack.monitoring.exporters.<i>.cluster_alerts.management.enabled=false and set some cluster state to no longer try to install cluster alerts (and reach out to delete the cluster alerts for those exporters)
  3. If a new exporter is created, set xpack.monitoring.exporters.<new_exporter>.cluster_alerts.management.enabled=false based on the cluster state.
  4. expose a POST _monitoring/cluster_alert/enable

thoughts ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favour of less settings but OTOH current flow with this setting seems simpler and less error-prone. And currently you can't enable single exporter back (with xpack.monitoring.exporters.<i>.cluster_alerts.management.enabled=true) which I think is good thing (easier to reason about cluster state).
We can still expose _monitoring/cluster_alert/disable endpoint if needed though if we want, it may be easier for the user to just call API than looking up setting to use

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider this more of a first step with a new action that performs the migration to follow. If the exporters are disabled, this setting only ensures that the alerts are not created if they are turned back on (since alerts are created lazily much to my chagrin).

The main concern that was raised in the migration issue was that if a new exporter is configured, either on purpose or by accident, the cluster alerts for that exporter will not be installed. A global setting to control the existence of the alerts is the easiest way to fix that, even if it overlaps with other settings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense, thanks for explaining.

false, Setting.Property.Dynamic, Setting.Property.NodeScope);

protected final Settings settings;

private Exporters exporters;
Expand Down Expand Up @@ -146,6 +149,7 @@ public List<Setting<?>> getSettings() {
List<Setting<?>> settings = new ArrayList<>();
settings.add(MonitoringField.HISTORY_DURATION);
settings.add(CLEAN_WATCHER_HISTORY);
settings.add(MIGRATION_DECOMMISSION_ALERTS);
settings.add(MonitoringService.ENABLED);
settings.add(MonitoringService.ELASTICSEARCH_COLLECTION_ENABLED);
settings.add(MonitoringService.INTERVAL);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import org.elasticsearch.gateway.GatewayService;
import org.elasticsearch.license.XPackLicenseState;
import org.elasticsearch.xpack.core.ssl.SSLService;
import org.elasticsearch.xpack.monitoring.Monitoring;
import org.elasticsearch.xpack.monitoring.exporter.http.HttpExporter;
import org.elasticsearch.xpack.core.monitoring.exporter.MonitoringDoc;
import org.elasticsearch.xpack.monitoring.exporter.local.LocalExporter;
Expand Down Expand Up @@ -62,7 +63,9 @@ public Exporters(Settings settings, Map<String, Exporter.Factory> factories,

final List<Setting.AffixSetting<?>> dynamicSettings =
getSettings().stream().filter(Setting::isDynamic).collect(Collectors.toList());
clusterService.getClusterSettings().addSettingsUpdateConsumer(this::setExportersSetting, dynamicSettings);
final List<Setting<?>> updateSettings = new ArrayList<Setting<?>>(dynamicSettings);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should #getSettings() also return this new setting ? (I think it is only really used for testing, but seems more correct to return all the settings up through the plugin).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I missed that step, good catch!

updateSettings.add(Monitoring.MIGRATION_DECOMMISSION_ALERTS);
clusterService.getClusterSettings().addSettingsUpdateConsumer(this::setExportersSetting, updateSettings);
HttpExporter.registerSettingValidators(clusterService, sslService);
// this ensures that logging is happening by adding an empty consumer per affix setting
for (Setting.AffixSetting<?> affixSetting : dynamicSettings) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
import org.elasticsearch.xpack.core.ssl.SSLConfiguration;
import org.elasticsearch.xpack.core.ssl.SSLConfigurationSettings;
import org.elasticsearch.xpack.core.ssl.SSLService;
import org.elasticsearch.xpack.monitoring.Monitoring;
import org.elasticsearch.xpack.monitoring.exporter.ClusterAlertsUtil;
import org.elasticsearch.xpack.monitoring.exporter.ExportBulk;
import org.elasticsearch.xpack.monitoring.exporter.Exporter;
Expand Down Expand Up @@ -837,7 +838,7 @@ private static void configureClusterAlertsResources(final Config config, final S

// add a resource per watch
for (final String watchId : ClusterAlertsUtil.WATCH_IDS) {
final boolean blacklisted = blacklist.contains(watchId);
final boolean blacklisted = blacklist.contains(watchId) || Monitoring.MIGRATION_DECOMMISSION_ALERTS.get(config.settings());
// lazily load the cluster state to fetch the cluster UUID once it's loaded
final Supplier<String> uniqueWatchId = () -> ClusterAlertsUtil.createUniqueWatchId(clusterService, watchId);
final Supplier<String> watch = blacklisted ? null : () -> ClusterAlertsUtil.loadWatch(clusterService, watchId);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
import org.elasticsearch.xpack.core.watcher.transport.actions.get.GetWatchResponse;
import org.elasticsearch.xpack.core.watcher.transport.actions.put.PutWatchAction;
import org.elasticsearch.xpack.core.watcher.watch.Watch;
import org.elasticsearch.xpack.monitoring.Monitoring;
import org.elasticsearch.xpack.monitoring.cleaner.CleanerService;
import org.elasticsearch.xpack.monitoring.exporter.ClusterAlertsUtil;
import org.elasticsearch.xpack.monitoring.exporter.ExportBulk;
Expand Down Expand Up @@ -105,6 +106,7 @@ public class LocalExporter extends Exporter implements ClusterStateListener, Cle
private final boolean useIngest;
private final DateFormatter dateTimeFormatter;
private final List<String> clusterAlertBlacklist;
private final boolean decommissionClusterAlerts;

private final AtomicReference<State> state = new AtomicReference<>(State.INITIALIZED);
private final AtomicBoolean installingSomething = new AtomicBoolean(false);
Expand All @@ -120,6 +122,7 @@ public LocalExporter(Exporter.Config config, Client client, CleanerService clean
this.licenseState = config.licenseState();
this.useIngest = USE_INGEST_PIPELINE_SETTING.getConcreteSettingForNamespace(config.name()).get(config.settings());
this.clusterAlertBlacklist = ClusterAlertsUtil.getClusterAlertsBlacklist(config);
this.decommissionClusterAlerts = Monitoring.MIGRATION_DECOMMISSION_ALERTS.get(config.settings());
this.cleanerService = cleanerService;
this.dateTimeFormatter = dateTimeFormatter(config);
// if additional listeners are added here, adjust LocalExporterTests#testLocalExporterRemovesListenersOnClose accordingly
Expand Down Expand Up @@ -158,8 +161,10 @@ public void licenseStateChanged() {
boolean isExporterReady() {
// forces the setup to occur if it hasn't already
final boolean running = resolveBulk(clusterService.state(), false) != null;
// Report on watcher readiness
boolean alertsProcessed = canUseWatcher() == false || watcherSetup.get();

return running && installingSomething.get() == false;
return running && installingSomething.get() == false && alertsProcessed;
}

@Override
Expand Down Expand Up @@ -453,7 +458,8 @@ private void getClusterAlertsInstallationAsyncActions(final boolean indexExists,

for (final String watchId : ClusterAlertsUtil.WATCH_IDS) {
final String uniqueWatchId = ClusterAlertsUtil.createUniqueWatchId(clusterService, watchId);
final boolean addWatch = canAddWatches && clusterAlertBlacklist.contains(watchId) == false;
final boolean addWatch = canAddWatches && clusterAlertBlacklist.contains(watchId) == false &&
decommissionClusterAlerts == false;

// we aren't sure if no watches exist yet, so add them
if (indexExists) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,11 @@ public void setupResources() {
}

public void awaitCheckAndPublish(final Boolean expected) {
resources.checkAndPublish(client, listener);
awaitCheckAndPublish(resources, expected);
}

public void awaitCheckAndPublish(HttpResource resource, final Boolean expected) {
resource.checkAndPublish(client, listener);

verifyListener(expected);
}
Expand Down Expand Up @@ -484,6 +488,56 @@ public void testWatchPublishBlocksAfterSuccessfulWatcherCheck() {
verifyNoMoreInteractions(client);
}

public void testDeployClusterAlerts() {
final int successfulGetTemplates = randomIntBetween(0, EXPECTED_TEMPLATES);
final int unsuccessfulGetTemplates = EXPECTED_TEMPLATES - successfulGetTemplates;
final int successfulGetPipelines = randomIntBetween(0, EXPECTED_PIPELINES);
final int unsuccessfulGetPipelines = EXPECTED_PIPELINES - successfulGetPipelines;
final Exception exception = failurePutException();

whenValidVersionResponse();
whenGetTemplates(successfulGetTemplates, unsuccessfulGetTemplates);
whenSuccessfulPutTemplates(unsuccessfulGetTemplates);
whenGetPipelines(successfulGetPipelines, unsuccessfulGetPipelines);
whenSuccessfulPutPipelines(unsuccessfulGetPipelines);
// license needs to be valid, otherwise we'll do DELETEs, which are tested earlier
whenWatcherCanBeUsed(true);

// a number of watches are mocked as present
final int existingWatches = randomIntBetween(0, EXPECTED_WATCHES);

// For completeness's sake. GET/PUT watches wont be called by the resources.
// Instead it tries to DELETE the watches ignoring them not existing.
whenGetWatches(existingWatches, EXPECTED_WATCHES - existingWatches);
whenPerformRequestAsyncWith(client, new RequestMatcher(is("PUT"), startsWith("/_watcher/watch/")), exception);
whenPerformRequestAsyncWith(client, new RequestMatcher(is("DELETE"), startsWith("/_watcher/watch/")),
successfulDeleteResponses(EXPECTED_WATCHES));

// Create resources that are configured to remove all watches
Settings removalExporterSettings = Settings.builder()
.put(exporterSettings)
.put("xpack.monitoring.migration.decommission_alerts", true)
.build();
MultiHttpResource overrideResource = HttpExporter.createResources(
new Exporter.Config("_http", "http", removalExporterSettings, clusterService, licenseState));

assertTrue(overrideResource.isDirty());
awaitCheckAndPublish(overrideResource, true);
// Should proceed
assertFalse(overrideResource.isDirty());

verifyVersionCheck();
verifyGetTemplates(EXPECTED_TEMPLATES);
verifyPutTemplates(unsuccessfulGetTemplates);
verifyGetPipelines(EXPECTED_PIPELINES);
verifyPutPipelines(unsuccessfulGetPipelines);
verifyWatcherCheck();
verifyGetWatches(0);
verifyPutWatches(0);
verifyDeleteWatches(EXPECTED_WATCHES);
verifyNoMoreInteractions(client);
}

public void testSuccessfulChecksOnElectedMasterNode() {
final int successfulGetTemplates = randomIntBetween(0, EXPECTED_TEMPLATES);
final int unsuccessfulGetTemplates = EXPECTED_TEMPLATES - successfulGetTemplates;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ public static void setupThreadPool() {
}

@AfterClass
public static void cleanUpStatic() throws Exception {
public static void cleanUpStatic() {
if (THREADPOOL != null) {
terminate(THREADPOOL);
}
Expand All @@ -57,6 +57,15 @@ protected Settings nodeSettings(int nodeOrdinal) {
.build();
}

/**
* Create a new {@link LocalExporter} with the default exporter settings and name.
*
* @return Never {@code null}.
*/
protected LocalExporter createLocalExporter() {
return createLocalExporter(exporterName, null);
}

/**
* Create a new {@link LocalExporter}. Expected usage:
* <pre><code>
Expand All @@ -68,12 +77,11 @@ protected Settings nodeSettings(int nodeOrdinal) {
*
* @return Never {@code null}.
*/
protected LocalExporter createLocalExporter() {
final Settings settings = localExporterSettings();
protected LocalExporter createLocalExporter(String exporterName, Settings exporterSettings) {
final XPackLicenseState licenseState = TestUtils.newTestLicenseState();
final Exporter.Config config = new Exporter.Config(exporterName, "local", settings, clusterService(), licenseState);
final Exporter.Config config = new Exporter.Config(exporterName, "local", exporterSettings, clusterService(), licenseState);
final CleanerService cleanerService =
new CleanerService(settings, clusterService().getClusterSettings(), THREADPOOL, licenseState);
new CleanerService(exporterSettings, clusterService().getClusterSettings(), THREADPOOL, licenseState);

return new LocalExporter(config, client(), cleanerService);
}
Expand Down
Loading