-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Associate translog with Lucene index commit for searchable snapshots shards #53459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Associate translog with Lucene index commit for searchable snapshots shards #53459
Conversation
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
@ywelsch and I talked about this via another channel and we'd like to investigate the possibility to add a pre-recovery hook that would allow to create the empty translog file we want without adding too many code paths based on a new index setting like this PR does in this first version. Such a hook could also help to warm up the cache before entering the recovery process, which is also very interesting. I'll investigate this option. |
This partially reverts commit 406d4de
@DaveCTurner @ywelsch I revisited this PR by using and extending the existing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one nit, o.w. looking good.
server/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although I'd like Yannick's opinion too since I am not 100% comfortable with the IndexShard
lifecycle. I left a couple of comments.
@@ -1313,6 +1313,13 @@ public void close(String reason, boolean flushEngine) throws IOException { | |||
} | |||
} | |||
|
|||
public void preRecovery() { | |||
if (state != IndexShardState.RECOVERING) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More for my curiosity than something that needs changing: why can we not assert state == IndexShardState.RECOVERING
here? The lifecycle of an IndexShard
is still a bit opaque to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes more sense to assert here, there's no reason to not do it.
return createEmptyTranslog(location, shardId, initialGlobalCheckpoint, primaryTerm, null, channelFactory); | ||
} | ||
|
||
public static String createEmptyTranslog(final Path location, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest adding a big warning in a comment here saying how dangerous it is to specify the translog UUID and that it should only be used for shards that will see no indexing.
I'm also idly wondering about how hard it would be to make this Translog
read-only when it's not been created with a fresh UUID. Probably too hard to be worth doing, but thought I'd mention it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest adding a big warning in a comment here saying how dangerous it is to specify the translog UUID and that it should only be used for shards that will see no indexing.
Sure, I added some doc in 557d924
I'm also idly wondering about how hard it would be to make this
Translog
read-only when it's not been created with a fresh UUID. Probably too hard to be worth doing, but thought I'd mention it anyway.
I find the idea interesting but I'm not sure if it worths it; I'd prefer to not create translogs at all if they were not to be used.
@elasticmachine update branch |
Thanks Yannick and David |
Today searchable snapshot shards can be restored from a snapshot, recovered from a peer or force-allocated as stale primary on data node without copying any files on disk. It works because
SearchableSnapshotDirectory
does not rely on local files on disk and because every time such aDirectory
is instantiated an empty translog is associated to it through a new Lucene commit (holds in memory).The translog/lucene commit association is done within a cluster state update when the
IndexShard
's directory is created and requires to open anIndexWriter
and to create the translog and translog checkpoint files on disk. Opening theIndexWriter
causes multiple files to be accessed (to verify checksums and to load field infos) and for some shards this can take a lot of time, causing cluster state applying timeouts. Creating the translog files triggers multiple accesses to disk in order to create or delete directories and fsync files.This PR moves the translog/lucene commit association out of the
Directory
instantiation - and therefore out of the cluster state update thread - in order to make it happen in a pre-recovery phaseat the beginning of the recovery process. It introduces a new hook method named
preRecovery()
that in turns execute the registeredIndexEventListener
s. This allows the searchable snapshot module to register a specificIndexEventListener
that will create a new empty translog with a given translog UUID so that it will be associated with the last lucene commit.This translog creation will happen when restoring the shard from a snapshot; right before recovering a shard from a peer; and when recovering the shard from the existing store after a node restart or a forced allocation. Associating a new translog with the Lucene index (and not the other way around like it is usually done during recoveries) prevent more Lucene commits to happen (as they required an
IndexWriter
, which triggers many file accesses).Relates #50999