-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add an index->step cache to the PolicyStepsRegistry #82316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 10 commits
23d02cc
e6e6118
92346ef
58ac8a0
ed38398
bdf4be8
38a1711
de52716
5f48e35
91fcef1
84a667e
0f37d1e
25605eb
68161e8
4954725
19ecdd3
8ed3286
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -298,12 +298,19 @@ public void clusterChanged(ClusterChangedEvent event) { | |
if (prevIsMaster != event.localNodeMaster()) { | ||
this.isMaster = event.localNodeMaster(); | ||
if (this.isMaster) { | ||
// we weren't the master, and now we are | ||
onMaster(event.state()); | ||
} else { | ||
// we were the master, and now we aren't | ||
cancelJob(); | ||
policyRegistry.clear(); | ||
} | ||
} | ||
|
||
for (Index index : event.indicesDeleted()) { | ||
policyRegistry.delete(index); | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you wrap this in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. huh come to think of it, can't we just return if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That works also, we can return after the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍, 4954725 I didn't end up going with an early |
||
|
||
final IndexLifecycleMetadata lifecycleMetadata = event.state().metadata().custom(IndexLifecycleMetadata.TYPE); | ||
if (this.isMaster && lifecycleMetadata != null) { | ||
triggerPolicies(event.state(), true); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ | |
import org.elasticsearch.common.io.stream.StreamOutput; | ||
import org.elasticsearch.core.Nullable; | ||
import org.elasticsearch.core.TimeValue; | ||
import org.elasticsearch.core.Tuple; | ||
import org.elasticsearch.index.Index; | ||
import org.elasticsearch.license.XPackLicenseState; | ||
import org.elasticsearch.xcontent.DeprecationHandler; | ||
|
@@ -49,20 +50,27 @@ | |
import java.util.Set; | ||
import java.util.SortedMap; | ||
import java.util.TreeMap; | ||
import java.util.concurrent.ConcurrentHashMap; | ||
import java.util.stream.Collectors; | ||
|
||
public class PolicyStepsRegistry { | ||
private static final Logger logger = LogManager.getLogger(PolicyStepsRegistry.class); | ||
|
||
private final NamedXContentRegistry xContentRegistry; | ||
private final Client client; | ||
private final XPackLicenseState licenseState; | ||
|
||
// keeps track of existing policies in the cluster state | ||
private final SortedMap<String, LifecyclePolicyMetadata> lifecyclePolicyMap; | ||
// keeps track of what the first step in a policy is, the key is policy name | ||
private final Map<String, Step> firstStepMap; | ||
// keeps track of a mapping from policy/step-name to respective Step, the key is policy name | ||
private final Map<String, Map<Step.StepKey, Step>> stepMap; | ||
private final NamedXContentRegistry xContentRegistry; | ||
|
||
// tracks an index->step cache, where the indexmetadata is also tracked for cache invalidation/eviction purposes. | ||
// for a given index, the step can be cached as long as the indexmetadata (and the policy!) hasn't changed. since | ||
// policies change infrequently, the entire cache is cleared on policy change. | ||
private final Map<Index, Tuple<IndexMetadata, Step>> cachedSteps = new ConcurrentHashMap<>(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lets add a line or two of Javadoc explaining what this is for please. This is far from obvious for anybody not involved in this work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How does 5f48e35 sparkle with you? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it be possible to key the "changed state" key check that is based on It feels strange to me to hold on to things like the mappings inside this cache, when those should have no bearing on the current ILM step. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We only use it for instance equality and clear it out of the cache if the index gets removed => memory wise it's free, instance equality comparison is cheap/free and this seems like the safest way of checking whether or not index metadata has changed. We didn't just create a compound key here because we need/want the ability to quickly cleanup the cache if an index is removed without having to iterate a cache. Since we only care about the by index state this seemed fastest/easiest. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This has the side effect however, of any inconsequential change to the index metadata for the index (a dynamic field introduced, for example) invalidating the cache, whereas it doesn't actually need to be invalidated in that case? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that's ok. This is just supposed to help us with not running the expensive step computation on unchanged indices (which will always be the majority in almost any CS update). No need to super optimize this beyond that I'd say. Updating index metadata results in expensive operations anyway that even go as far as resulting in disk IO, the step calculation is probably trivial relatively speaking in all cases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alright, that's fine then, my preference is still to make the keyed part as small as possible, but this is okay for now I think. |
||
|
||
public PolicyStepsRegistry(NamedXContentRegistry xContentRegistry, Client client, XPackLicenseState licenseState) { | ||
this(new TreeMap<>(), new HashMap<>(), new HashMap<>(), xContentRegistry, client, licenseState); | ||
|
@@ -99,6 +107,9 @@ Map<String, Map<Step.StepKey, Step>> getStepMap() { | |
public void update(IndexLifecycleMetadata meta) { | ||
assert meta != null : "IndexLifecycleMetadata cannot be null when updating the policy steps registry"; | ||
|
||
// since the policies (may have) changed, the whole steps cache needs to be thrown out | ||
cachedSteps.clear(); | ||
|
||
DiffableUtils.MapDiff<String, LifecyclePolicyMetadata, Map<String, LifecyclePolicyMetadata>> mapDiff = DiffableUtils.diff( | ||
lifecyclePolicyMap, | ||
meta.getPolicyMetadatas(), | ||
|
@@ -155,6 +166,22 @@ public LifecyclePolicyMetadata read(StreamInput in, String key) { | |
} | ||
} | ||
|
||
public void delete(Index deleted) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add javadoc to this too please, since it isn't as clear when this is intended to be used? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍, 68161e8 |
||
cachedSteps.remove(deleted); | ||
} | ||
|
||
/** | ||
* Clear internal maps that were populated by update (and others). | ||
*/ | ||
public void clear() { | ||
// this is potentially large, so it's important to clear it | ||
cachedSteps.clear(); | ||
// these are relatively small, but there's no harm in clearing them | ||
lifecyclePolicyMap.clear(); | ||
firstStepMap.clear(); | ||
stepMap.clear(); | ||
} | ||
|
||
/** | ||
* Return all ordered steps for the current policy for the index. Does not | ||
* resolve steps using the phase caching, but only for the currently existing policy. | ||
|
@@ -267,6 +294,11 @@ private List<Step> parseStepsFromPhase(String policy, String currentPhase, Strin | |
|
||
@Nullable | ||
public Step getStep(final IndexMetadata indexMetadata, final Step.StepKey stepKey) { | ||
final Tuple<IndexMetadata, Step> cachedStep = cachedSteps.get(indexMetadata.getIndex()); | ||
if (cachedStep != null && cachedStep.v1() == indexMetadata && cachedStep.v2().getKey().equals(stepKey)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a comment about why instance equality is used here and okay rather than object equality? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍, 68161e8 |
||
return cachedStep.v2(); | ||
} | ||
|
||
if (ErrorStep.NAME.equals(stepKey.getName())) { | ||
return new ErrorStep(new Step.StepKey(stepKey.getPhase(), stepKey.getAction(), ErrorStep.NAME)); | ||
} | ||
|
@@ -305,7 +337,9 @@ public Step getStep(final IndexMetadata indexMetadata, final Step.StepKey stepKe | |
+ phaseSteps; | ||
|
||
// Return the step that matches the given stepKey or else null if we couldn't find it | ||
return phaseSteps.stream().filter(step -> step.getKey().equals(stepKey)).findFirst().orElse(null); | ||
final Step s = phaseSteps.stream().filter(step -> step.getKey().equals(stepKey)).findFirst().orElse(null); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just a thought from looking at the profiling you posted: This might be happier as a normal loop :) The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ehhhhhhh, I don't think it's worth it. This reads pretty clearly as-is, and I don't think it's enough of a bottleneck anymore to warrant unrolling to a normal loop (due to this PR's optimization resulting in the code in question just being called a bajillion times less frequently). |
||
cachedSteps.put(indexMetadata.getIndex(), Tuple.tuple(indexMetadata, s)); | ||
return s; | ||
} | ||
|
||
/** | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is an (benign for sure) risk of a leak here: if a policy adds a step to the cache concurrently with it being deleted here, I think we may leave it there until next clear or master failover. Ideally we would retain only those that remain (thus curing this), but I can see how that is more expensive. A comment could be good though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, 19ecdd3