Associate translog with Lucene index commit for searchable snapshots shards #53459

tlrx · 2020-03-12T09:25:17Z

Today searchable snapshot shards can be restored from a snapshot, recovered from a peer or force-allocated as stale primary on data node without copying any files on disk. It works because SearchableSnapshotDirectory does not rely on local files on disk and because every time such a Directory is instantiated an empty translog is associated to it through a new Lucene commit (holds in memory).

The translog/lucene commit association is done within a cluster state update when the IndexShard's directory is created and requires to open an IndexWriter and to create the translog and translog checkpoint files on disk. Opening the IndexWriter causes multiple files to be accessed (to verify checksums and to load field infos) and for some shards this can take a lot of time, causing cluster state applying timeouts. Creating the translog files triggers multiple accesses to disk in order to create or delete directories and fsync files.

This PR moves the translog/lucene commit association out of the Directory instantiation - and therefore out of the cluster state update thread - in order to make it happen in a pre-recovery phase
at the beginning of the recovery process. It introduces a new hook method named preRecovery() that in turns execute the registered IndexEventListeners. This allows the searchable snapshot module to register a specific IndexEventListener that will create a new empty translog with a given translog UUID so that it will be associated with the last lucene commit.

This translog creation will happen when restoring the shard from a snapshot; right before recovering a shard from a peer; and when recovering the shard from the existing store after a node restart or a forced allocation. Associating a new translog with the Lucene index (and not the other way around like it is usually done during recoveries) prevent more Lucene commits to happen (as they required an IndexWriter, which triggers many file accesses).

Relates #50999

elasticmachine · 2020-03-12T09:25:19Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

tlrx · 2020-03-12T11:13:02Z

@ywelsch and I talked about this via another channel and we'd like to investigate the possibility to add a pre-recovery hook that would allow to create the empty translog file we want without adding too many code paths based on a new index setting like this PR does in this first version. Such a hook could also help to warm up the cache before entering the recovery process, which is also very interesting. I'll investigate this option.

This partially reverts commit 406d4de

tlrx · 2020-03-12T16:05:20Z

@DaveCTurner @ywelsch I revisited this PR by using and extending the existing IndexEventListener infrastructure so that a listener can react to a pre-recovery event. I've documented the new beforeIndexShardRecovery() method, let me know if you find this approach acceptable or not.

ywelsch

Left one nit, o.w. looking good.

server/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java

DaveCTurner

LGTM, although I'd like Yannick's opinion too since I am not 100% comfortable with the IndexShard lifecycle. I left a couple of comments.

DaveCTurner · 2020-03-12T17:09:37Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

@@ -1313,6 +1313,13 @@ public void close(String reason, boolean flushEngine) throws IOException {
        }
    }

+    public void preRecovery() {
+        if (state != IndexShardState.RECOVERING) {


More for my curiosity than something that needs changing: why can we not assert state == IndexShardState.RECOVERING here? The lifecycle of an IndexShard is still a bit opaque to me.

I think it makes more sense to assert here, there's no reason to not do it.

DaveCTurner · 2020-03-12T17:13:27Z

server/src/main/java/org/elasticsearch/index/translog/Translog.java

+        return createEmptyTranslog(location, shardId, initialGlobalCheckpoint, primaryTerm, null, channelFactory);
+    }
+
+    public static String createEmptyTranslog(final Path location,


Suggest adding a big warning in a comment here saying how dangerous it is to specify the translog UUID and that it should only be used for shards that will see no indexing.

I'm also idly wondering about how hard it would be to make this Translog read-only when it's not been created with a fresh UUID. Probably too hard to be worth doing, but thought I'd mention it anyway.

Suggest adding a big warning in a comment here saying how dangerous it is to specify the translog UUID and that it should only be used for shards that will see no indexing.

Sure, I added some doc in 557d924

I'm also idly wondering about how hard it would be to make this Translog read-only when it's not been created with a fresh UUID. Probably too hard to be worth doing, but thought I'd mention it anyway.

I find the idea interesting but I'm not sure if it worths it; I'd prefer to not create translogs at all if they were not to be used.

tlrx · 2020-03-13T09:55:55Z

@elasticmachine update branch

…ith-lucene-index

tlrx · 2020-03-13T11:46:38Z

Thanks Yannick and David

Associate translog with Lucene index commit

406d4de

tlrx added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Mar 12, 2020

tlrx requested review from ywelsch and DaveCTurner March 12, 2020 09:25

tlrx added 2 commits March 12, 2020 14:29

Partial revert "Associate translog with Lucene index commit"

32df1c4

This partially reverts commit 406d4de

Add pre recovery hook

4ab94ce

ywelsch approved these changes Mar 12, 2020

View reviewed changes

server/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java Outdated Show resolved Hide resolved

DaveCTurner reviewed Mar 12, 2020

View reviewed changes

Feedback

557d924

Merge branch 'feature/searchable-snapshots' into associate-translog-w…

f3fc163

…ith-lucene-index

tlrx merged commit 4bde03a into elastic:feature/searchable-snapshots Mar 13, 2020

tlrx deleted the associate-translog-with-lucene-index branch March 13, 2020 11:46

tlrx mentioned this pull request Apr 6, 2020

Merge feature/searchable-snapshots branch into master #54803

Merged

DaveCTurner mentioned this pull request May 26, 2020

Lazy snapshot restores #50999

Closed

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Associate translog with Lucene index commit for searchable snapshots shards #53459

Associate translog with Lucene index commit for searchable snapshots shards #53459

Uh oh!

tlrx commented Mar 12, 2020 •

edited

Loading

Uh oh!

elasticmachine commented Mar 12, 2020

Uh oh!

tlrx commented Mar 12, 2020

Uh oh!

tlrx commented Mar 12, 2020 •

edited

Loading

Uh oh!

ywelsch left a comment

Uh oh!

Uh oh!

DaveCTurner left a comment

Uh oh!

DaveCTurner Mar 12, 2020

Uh oh!

tlrx Mar 13, 2020

Uh oh!

DaveCTurner Mar 12, 2020

Uh oh!

tlrx Mar 13, 2020

Uh oh!

tlrx commented Mar 13, 2020

Uh oh!

tlrx commented Mar 13, 2020

Uh oh!

Uh oh!

Associate translog with Lucene index commit for searchable snapshots shards #53459

Associate translog with Lucene index commit for searchable snapshots shards #53459

Uh oh!

Conversation

tlrx commented Mar 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Mar 12, 2020

Uh oh!

tlrx commented Mar 12, 2020

Uh oh!

tlrx commented Mar 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Mar 12, 2020

Choose a reason for hiding this comment

Uh oh!

tlrx Mar 13, 2020

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Mar 12, 2020

Choose a reason for hiding this comment

Uh oh!

tlrx Mar 13, 2020

Choose a reason for hiding this comment

Uh oh!

tlrx commented Mar 13, 2020

Uh oh!

tlrx commented Mar 13, 2020

Uh oh!

Uh oh!

tlrx commented Mar 12, 2020 •

edited

Loading

tlrx commented Mar 12, 2020 •

edited

Loading