Skip to content

Commit e925346

Browse files
committed
Refresh cached phase policy definition if possible on new poli… (elastic#50820)
* Refresh cached phase policy definition if possible on new policy There are some cases when updating a policy does not change the structure in a significant way. In these cases, we can reread the policy definition for any indices using the updated policy. This commit adds this refreshing to the `TransportPutLifecycleAction` to allow this. It allows us to do things like change the configuration values for a particular step, even when on that step (for example, changing the rollover criteria while on the `check-rollover-ready` step). There are more cases where the phase definition can be reread that just the ones checked here (for example, removing an action that has already been passed), and those will be added in subsequent work. Relates to elastic#48431
1 parent 2dc23bd commit e925346

File tree

5 files changed

+756
-11
lines changed

5 files changed

+756
-11
lines changed

docs/reference/ilm/policy-definitions.asciidoc

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ PUT _ilm/policy/my_policy
5555
}
5656
--------------------------------------------------
5757

58-
The Above example configures a policy that moves the index into the warm
58+
The above example configures a policy that moves the index into the warm
5959
phase after one day. Until then, the index is in a waiting state. After
6060
moving into the warm phase, it will wait until 30 days have elapsed before
6161
moving to the delete phase and deleting the index.
@@ -76,10 +76,14 @@ check occurs.
7676
=== Phase Execution
7777

7878
The current phase definition, of an index's policy being executed, is stored
79-
in the index's metadata. The phase and its actions are compiled into a series
80-
of discrete steps that are executed sequentially. Since some {ilm-init} actions
81-
are more complex and involve multiple operations against an index, each of these
82-
operations are done in isolation in a unit called a "step". The
79+
in the index's metadata. This phase definition is cached to prevent changes to
80+
the policy from putting the index in a state where it cannot proceed from its
81+
current step. When the policy is updated we check to see if this phase
82+
definition can be safely updated, and if so, update the cached definition in
83+
indices using the updated policy. The phase and its actions are compiled into a
84+
series of discrete steps that are executed sequentially. Since some {ilm-init}
85+
actions are more complex and involve multiple operations against an index, each
86+
of these operations are done in isolation in a unit called a "step". The
8387
<<ilm-explain-lifecycle,Explain Lifecycle API>> exposes this information to us
8488
to see which step our index is either to execute next, or is currently
8589
executing.

x-pack/plugin/ilm/qa/multi-node/src/test/java/org/elasticsearch/xpack/ilm/TimeSeriesLifecycleActionsIT.java

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1304,6 +1304,41 @@ public void testRetryableInitializationStep() throws Exception {
13041304
});
13051305
}
13061306

1307+
public void testRefreshablePhaseJson() throws Exception {
1308+
String index = "refresh-index";
1309+
1310+
createNewSingletonPolicy("hot", new RolloverAction(null, null, 100L));
1311+
Request createIndexTemplate = new Request("PUT", "_template/rolling_indexes");
1312+
createIndexTemplate.setJsonEntity("{" +
1313+
"\"index_patterns\": [\""+ index + "-*\"], \n" +
1314+
" \"settings\": {\n" +
1315+
" \"number_of_shards\": 1,\n" +
1316+
" \"number_of_replicas\": 0,\n" +
1317+
" \"index.lifecycle.name\": \"" + policy+ "\",\n" +
1318+
" \"index.lifecycle.rollover_alias\": \"alias\"\n" +
1319+
" }\n" +
1320+
"}");
1321+
client().performRequest(createIndexTemplate);
1322+
1323+
createIndexWithSettings(index + "-1",
1324+
Settings.builder().put(IndexMetaData.SETTING_NUMBER_OF_SHARDS, 1)
1325+
.put(IndexMetaData.SETTING_NUMBER_OF_REPLICAS, 0),
1326+
true);
1327+
1328+
// Index a document
1329+
index(client(), index + "-1", "1", "foo", "bar");
1330+
1331+
// Wait for the index to enter the check-rollover-ready step
1332+
assertBusy(() -> assertThat(getStepKeyForIndex(index + "-1").getName(), equalTo(WaitForRolloverReadyStep.NAME)));
1333+
1334+
// Update the policy to allow rollover at 1 document instead of 100
1335+
createNewSingletonPolicy("hot", new RolloverAction(null, null, 1L));
1336+
1337+
// Index should now have been able to roll over, creating the new index and proceeding to the "complete" step
1338+
assertBusy(() -> assertThat(indexExists(index + "-000002"), is(true)));
1339+
assertBusy(() -> assertThat(getStepKeyForIndex(index + "-1").getName(), equalTo(TerminalPolicyStep.KEY.getName())));
1340+
}
1341+
13071342
// This method should be called inside an assertBusy, it has no retry logic of its own
13081343
private void assertHistoryIsPresent(String policyName, String indexName, boolean success, String stepName) throws IOException {
13091344
assertHistoryIsPresent(policyName, indexName, success, null, null, stepName);

x-pack/plugin/ilm/src/main/java/org/elasticsearch/xpack/ilm/IndexLifecycleTransition.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -251,8 +251,8 @@ private static LifecycleExecutionState updateExecutionStateToStep(LifecyclePolic
251251
/**
252252
* Given a cluster state and lifecycle state, return a new state using the new lifecycle state for the given index.
253253
*/
254-
private static ClusterState.Builder newClusterStateWithLifecycleState(Index index, ClusterState clusterState,
255-
LifecycleExecutionState lifecycleState) {
254+
public static ClusterState.Builder newClusterStateWithLifecycleState(Index index, ClusterState clusterState,
255+
LifecycleExecutionState lifecycleState) {
256256
ClusterState.Builder newClusterStateBuilder = ClusterState.builder(clusterState);
257257
newClusterStateBuilder.metaData(MetaData.builder(clusterState.getMetaData())
258258
.put(IndexMetaData.builder(clusterState.getMetaData().index(index))

x-pack/plugin/ilm/src/main/java/org/elasticsearch/xpack/ilm/action/TransportPutLifecycleAction.java

Lines changed: 211 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,34 +8,54 @@
88

99
import org.apache.logging.log4j.LogManager;
1010
import org.apache.logging.log4j.Logger;
11+
import org.apache.logging.log4j.message.ParameterizedMessage;
1112
import org.elasticsearch.action.ActionListener;
1213
import org.elasticsearch.action.support.ActionFilters;
1314
import org.elasticsearch.action.support.master.TransportMasterNodeAction;
15+
import org.elasticsearch.client.Client;
1416
import org.elasticsearch.cluster.AckedClusterStateUpdateTask;
1517
import org.elasticsearch.cluster.ClusterState;
1618
import org.elasticsearch.cluster.block.ClusterBlockException;
1719
import org.elasticsearch.cluster.block.ClusterBlockLevel;
20+
import org.elasticsearch.cluster.metadata.IndexMetaData;
1821
import org.elasticsearch.cluster.metadata.IndexNameExpressionResolver;
1922
import org.elasticsearch.cluster.metadata.MetaData;
2023
import org.elasticsearch.cluster.service.ClusterService;
24+
import org.elasticsearch.common.Nullable;
25+
import org.elasticsearch.common.Strings;
2126
import org.elasticsearch.common.inject.Inject;
2227
import org.elasticsearch.common.io.stream.StreamInput;
28+
import org.elasticsearch.common.xcontent.DeprecationHandler;
29+
import org.elasticsearch.common.xcontent.NamedXContentRegistry;
30+
import org.elasticsearch.common.xcontent.XContentParser;
31+
import org.elasticsearch.common.xcontent.json.JsonXContent;
2332
import org.elasticsearch.threadpool.ThreadPool;
2433
import org.elasticsearch.transport.TransportService;
2534
import org.elasticsearch.xpack.core.ClientHelper;
35+
import org.elasticsearch.xpack.core.ilm.ErrorStep;
2636
import org.elasticsearch.xpack.core.ilm.IndexLifecycleMetadata;
37+
import org.elasticsearch.xpack.core.ilm.LifecycleExecutionState;
2738
import org.elasticsearch.xpack.core.ilm.LifecyclePolicy;
2839
import org.elasticsearch.xpack.core.ilm.LifecyclePolicyMetadata;
40+
import org.elasticsearch.xpack.core.ilm.LifecycleSettings;
41+
import org.elasticsearch.xpack.core.ilm.PhaseExecutionInfo;
42+
import org.elasticsearch.xpack.core.ilm.Step;
2943
import org.elasticsearch.xpack.core.ilm.action.PutLifecycleAction;
3044
import org.elasticsearch.xpack.core.ilm.action.PutLifecycleAction.Request;
3145
import org.elasticsearch.xpack.core.ilm.action.PutLifecycleAction.Response;
46+
import org.elasticsearch.xpack.ilm.IndexLifecycleTransition;
3247

3348
import java.io.IOException;
3449
import java.time.Instant;
50+
import java.util.LinkedHashSet;
51+
import java.util.List;
3552
import java.util.Map;
53+
import java.util.Set;
3654
import java.util.SortedMap;
55+
import java.util.Spliterators;
3756
import java.util.TreeMap;
3857
import java.util.stream.Collectors;
58+
import java.util.stream.StreamSupport;
3959

4060
/**
4161
* This class is responsible for bootstrapping {@link IndexLifecycleMetadata} into the cluster-state, as well
@@ -44,12 +64,17 @@
4464
public class TransportPutLifecycleAction extends TransportMasterNodeAction<Request, Response> {
4565

4666
private static final Logger logger = LogManager.getLogger(TransportPutLifecycleAction.class);
67+
private final NamedXContentRegistry xContentRegistry;
68+
private final Client client;
4769

4870
@Inject
4971
public TransportPutLifecycleAction(TransportService transportService, ClusterService clusterService, ThreadPool threadPool,
50-
ActionFilters actionFilters, IndexNameExpressionResolver indexNameExpressionResolver) {
72+
ActionFilters actionFilters, IndexNameExpressionResolver indexNameExpressionResolver,
73+
NamedXContentRegistry namedXContentRegistry, Client client) {
5174
super(PutLifecycleAction.NAME, transportService, clusterService, threadPool, actionFilters, Request::new,
5275
indexNameExpressionResolver);
76+
this.xContentRegistry = namedXContentRegistry;
77+
this.client = client;
5378
}
5479

5580
@Override
@@ -81,7 +106,7 @@ protected Response newResponse(boolean acknowledged) {
81106

82107
@Override
83108
public ClusterState execute(ClusterState currentState) throws Exception {
84-
ClusterState.Builder newState = ClusterState.builder(currentState);
109+
ClusterState.Builder stateBuilder = ClusterState.builder(currentState);
85110
IndexLifecycleMetadata currentMetadata = currentState.metaData().custom(IndexLifecycleMetadata.TYPE);
86111
if (currentMetadata == null) { // first time using index-lifecycle feature, bootstrap metadata
87112
currentMetadata = IndexLifecycleMetadata.EMPTY;
@@ -99,13 +124,195 @@ public ClusterState execute(ClusterState currentState) throws Exception {
99124
logger.info("updating index lifecycle policy [{}]", request.getPolicy().getName());
100125
}
101126
IndexLifecycleMetadata newMetadata = new IndexLifecycleMetadata(newPolicies, currentMetadata.getOperationMode());
102-
newState.metaData(MetaData.builder(currentState.getMetaData())
127+
stateBuilder.metaData(MetaData.builder(currentState.getMetaData())
103128
.putCustom(IndexLifecycleMetadata.TYPE, newMetadata).build());
104-
return newState.build();
129+
ClusterState nonRefreshedState = stateBuilder.build();
130+
if (oldPolicy == null) {
131+
return nonRefreshedState;
132+
} else {
133+
try {
134+
return updateIndicesForPolicy(nonRefreshedState, xContentRegistry, client,
135+
oldPolicy.getPolicy(), lifecyclePolicyMetadata);
136+
} catch (Exception e) {
137+
logger.warn(new ParameterizedMessage("unable to refresh indices phase JSON for updated policy [{}]",
138+
oldPolicy.getName()), e);
139+
// Revert to the non-refreshed state
140+
return nonRefreshedState;
141+
}
142+
}
105143
}
106144
});
107145
}
108146

147+
/**
148+
* Ensure that we have the minimum amount of metadata necessary to check for cache phase
149+
* refresh. This includes:
150+
* - An execution state
151+
* - Existing phase definition JSON
152+
* - A current step key
153+
* - A current phase in the step key
154+
* - Not currently in the ERROR step
155+
*/
156+
static boolean eligibleToCheckForRefresh(final IndexMetaData metaData) {
157+
LifecycleExecutionState executionState = LifecycleExecutionState.fromIndexMetadata(metaData);
158+
if (executionState == null || executionState.getPhaseDefinition() == null) {
159+
return false;
160+
}
161+
162+
Step.StepKey currentStepKey = LifecycleExecutionState.getCurrentStepKey(executionState);
163+
if (currentStepKey == null || currentStepKey.getPhase() == null) {
164+
return false;
165+
}
166+
167+
return ErrorStep.NAME.equals(currentStepKey.getName()) == false;
168+
}
169+
170+
/**
171+
* Parse the {@code phaseDef} phase definition to get the stepkeys for the given phase.
172+
* If there is an error parsing or if the phase definition is missing the required
173+
* information, returns null.
174+
*/
175+
@Nullable
176+
static Set<Step.StepKey> readStepKeys(final NamedXContentRegistry xContentRegistry, final Client client,
177+
final String phaseDef, final String currentPhase) {
178+
final PhaseExecutionInfo phaseExecutionInfo;
179+
try (XContentParser parser = JsonXContent.jsonXContent.createParser(xContentRegistry,
180+
DeprecationHandler.THROW_UNSUPPORTED_OPERATION, phaseDef)) {
181+
phaseExecutionInfo = PhaseExecutionInfo.parse(parser, currentPhase);
182+
} catch (Exception e) {
183+
logger.trace(new ParameterizedMessage("exception reading step keys checking for refreshability, phase definition: {}",
184+
phaseDef), e);
185+
return null;
186+
}
187+
188+
if (phaseExecutionInfo == null || phaseExecutionInfo.getPhase() == null) {
189+
return null;
190+
}
191+
192+
return phaseExecutionInfo.getPhase().getActions().values().stream()
193+
.flatMap(a -> a.toSteps(client, phaseExecutionInfo.getPhase().getName(), null).stream())
194+
.map(Step::getKey)
195+
.collect(Collectors.toCollection(LinkedHashSet::new));
196+
}
197+
198+
/**
199+
* Returns 'true' if the index's cached phase JSON can be safely reread, 'false' otherwise.
200+
*/
201+
static boolean isIndexPhaseDefinitionUpdatable(final NamedXContentRegistry xContentRegistry, final Client client,
202+
final IndexMetaData metaData, final LifecyclePolicy newPolicy) {
203+
final String index = metaData.getIndex().getName();
204+
if (eligibleToCheckForRefresh(metaData) == false) {
205+
logger.debug("[{}] does not contain enough information to check for eligibility of refreshing phase", index);
206+
return false;
207+
}
208+
final String policyId = newPolicy.getName();
209+
210+
final LifecycleExecutionState executionState = LifecycleExecutionState.fromIndexMetadata(metaData);
211+
final Step.StepKey currentStepKey = LifecycleExecutionState.getCurrentStepKey(executionState);
212+
final String currentPhase = currentStepKey.getPhase();
213+
214+
final Set<Step.StepKey> newStepKeys = newPolicy.toSteps(client).stream()
215+
.map(Step::getKey)
216+
.collect(Collectors.toCollection(LinkedHashSet::new));
217+
218+
if (newStepKeys.contains(currentStepKey) == false) {
219+
// The index is on a step that doesn't exist in the new policy, we
220+
// can't safely re-read the JSON
221+
logger.debug("[{}] updated policy [{}] does not contain the current step key [{}], so the policy phase will not be refreshed",
222+
index, policyId, currentStepKey);
223+
return false;
224+
}
225+
226+
final String phaseDef = executionState.getPhaseDefinition();
227+
final Set<Step.StepKey> oldStepKeys = readStepKeys(xContentRegistry, client, phaseDef, currentPhase);
228+
if (oldStepKeys == null) {
229+
logger.debug("[{}] unable to parse phase definition for cached policy [{}], policy phase will not be refreshed",
230+
index, policyId);
231+
return false;
232+
}
233+
234+
final Set<Step.StepKey> oldPhaseStepKeys = oldStepKeys.stream()
235+
.filter(sk -> currentPhase.equals(sk.getPhase()))
236+
.collect(Collectors.toCollection(LinkedHashSet::new));
237+
238+
final PhaseExecutionInfo phaseExecutionInfo = new PhaseExecutionInfo(policyId, newPolicy.getPhases().get(currentPhase), 1L, 1L);
239+
final String peiJson = Strings.toString(phaseExecutionInfo);
240+
241+
final Set<Step.StepKey> newPhaseStepKeys = readStepKeys(xContentRegistry, client, peiJson, currentPhase);
242+
if (newPhaseStepKeys == null) {
243+
logger.debug(new ParameterizedMessage("[{}] unable to parse phase definition for policy [{}] " +
244+
"to determine if it could be refreshed", index, policyId));
245+
return false;
246+
}
247+
248+
if (newPhaseStepKeys.equals(oldPhaseStepKeys)) {
249+
// The new and old phase have the same stepkeys for this current phase, so we can
250+
// refresh the definition because we know it won't change the execution flow.
251+
logger.debug("[{}] updated policy [{}] contains the same phase step keys and can be refreshed", index, policyId);
252+
return true;
253+
} else {
254+
logger.debug("[{}] updated policy [{}] has different phase step keys and will NOT refresh phase " +
255+
"definition as it differs too greatly. old: {}, new: {}",
256+
index, policyId, oldPhaseStepKeys, newPhaseStepKeys);
257+
return false;
258+
}
259+
}
260+
261+
/**
262+
* Rereads the phase JSON for the given index, returning a new cluster state.
263+
*/
264+
static ClusterState refreshPhaseDefinition(final ClusterState state, final String index, final LifecyclePolicyMetadata updatedPolicy) {
265+
final IndexMetaData idxMeta = state.metaData().index(index);
266+
assert eligibleToCheckForRefresh(idxMeta) : "index " + index + " is missing crucial information needed to refresh phase definition";
267+
268+
logger.trace("[{}] updating cached phase definition for policy [{}]", index, updatedPolicy.getName());
269+
LifecycleExecutionState currentExState = LifecycleExecutionState.fromIndexMetadata(idxMeta);
270+
271+
String currentPhase = currentExState.getPhase();
272+
PhaseExecutionInfo pei = new PhaseExecutionInfo(updatedPolicy.getName(),
273+
updatedPolicy.getPolicy().getPhases().get(currentPhase), updatedPolicy.getVersion(), updatedPolicy.getModifiedDate());
274+
275+
LifecycleExecutionState newExState = LifecycleExecutionState.builder(currentExState)
276+
.setPhaseDefinition(Strings.toString(pei, false, false))
277+
.build();
278+
279+
return IndexLifecycleTransition.newClusterStateWithLifecycleState(idxMeta.getIndex(), state, newExState).build();
280+
}
281+
282+
/**
283+
* For the given new policy, returns a new cluster with all updateable indices' phase JSON refreshed.
284+
*/
285+
static ClusterState updateIndicesForPolicy(final ClusterState state, final NamedXContentRegistry xContentRegistry, final Client client,
286+
final LifecyclePolicy oldPolicy, final LifecyclePolicyMetadata newPolicy) {
287+
assert oldPolicy.getName().equals(newPolicy.getName()) : "expected both policies to have the same id but they were: [" +
288+
oldPolicy.getName() + "] vs. [" + newPolicy.getName() + "]";
289+
290+
// No need to update anything if the policies are identical in contents
291+
if (oldPolicy.equals(newPolicy.getPolicy())) {
292+
logger.debug("policy [{}] is unchanged and no phase definition refresh is needed", oldPolicy.getName());
293+
return state;
294+
}
295+
296+
final List<String> indicesThatCanBeUpdated =
297+
StreamSupport.stream(Spliterators.spliteratorUnknownSize(state.metaData().indices().valuesIt(), 0), false)
298+
.filter(meta -> newPolicy.getName().equals(LifecycleSettings.LIFECYCLE_NAME_SETTING.get(meta.getSettings())))
299+
.filter(meta -> isIndexPhaseDefinitionUpdatable(xContentRegistry, client, meta, newPolicy.getPolicy()))
300+
.map(meta -> meta.getIndex().getName())
301+
.collect(Collectors.toList());
302+
303+
ClusterState updatedState = state;
304+
for (String index : indicesThatCanBeUpdated) {
305+
try {
306+
updatedState = refreshPhaseDefinition(updatedState, index, newPolicy);
307+
} catch (Exception e) {
308+
logger.warn(new ParameterizedMessage("[{}] unable to refresh phase definition for updated policy [{}]",
309+
index, newPolicy.getName()), e);
310+
}
311+
}
312+
313+
return updatedState;
314+
}
315+
109316
@Override
110317
protected ClusterBlockException checkBlock(Request request, ClusterState state) {
111318
return state.blocks().globalBlockedException(ClusterBlockLevel.METADATA_WRITE);

0 commit comments

Comments
 (0)