Improve searchable snapshot mount time #66198

henningandersen · 2020-12-11T09:16:33Z

Reduce the range sizes we fetch during recovery to speed up mount time
until shard started.
On resource constrained setups (rate limiter, disk or network), the time
to mount multiple shards is proportional to the amount of data to fetch
and for most files in a snapshot, we need to fetch only a small piece of
the files to start the shard.

Reduce the range sizes we fetch during mounting to speed up mount time until shard started. On resource constrained setups (rate limiter, disk or network), the time to mount multiple shards is proportional to the amount of data to fetch and for most files in a snapshot, we need to fetch only a small piece of the files to start the shard.

elasticmachine · 2020-12-11T09:16:36Z

Pinging @elastic/es-distributed (Team:Distributed)

tlrx

LGTM (sorry for the conflicting file)

DaveCTurner

LGTM2, I suggested a couple of comments.

DaveCTurner · 2020-12-14T10:35:49Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

@@ -71,6 +71,13 @@
        MAX_SNAPSHOT_CACHE_RANGE_SIZE,                          // max
        Setting.Property.NodeScope
    );
+    public static final Setting<ByteSizeValue> SNAPSHOT_CACHE_RECOVERY_RANGE_SIZE_SETTING = Setting.byteSizeSetting(


Suggest a comment so we remember why we are doing this:

Suggested change

public static final Setting<ByteSizeValue> SNAPSHOT_CACHE_RECOVERY_RANGE_SIZE_SETTING = Setting.byteSizeSetting(

/**

* Starting up a shard involves reading small parts of some files from the repository, independently of the pre-warming process. If we

* expand those ranges using {@link CacheService#SNAPSHOT_CACHE_RANGE_SIZE_SETTING} then we end up reading quite a few 32MB ranges. If

* we read enough of these ranges for the restore throttling rate limiter to kick in then all the read threads will end up waiting on

* the throttle, blocking subsequent reads. By using a smaller read size during restore we avoid clogging up the rate limiter so much.

*/

public static final Setting<ByteSizeValue> SNAPSHOT_CACHE_RECOVERY_RANGE_SIZE_SETTING = Setting.byteSizeSetting(

Also suggest a similar comment on the other setting since this came up as a question in the investigation that led to this PR.

/** * If a search needs data from the repository then we expand it to a larger contiguous range whose size is determined by this setting, * in anticipation of needing nearby data in subsequent reads. Repository reads typically have quite high latency (think ~100ms) and * the default of 32MB for this setting represents the approximate point at which size starts to matter. In other words, reads of * ranges smaller than 32MB don't usually happen much quicker, so we may as well expand all the way to 32MB ranges. */ public static final Setting<ByteSizeValue> SNAPSHOT_CACHE_RANGE_SIZE_SETTING = Setting.byteSizeSetting(

…snapshot_mount_time

Reduce the range sizes we fetch during mounting to speed up mount time until shard started. On resource constrained setups (rate limiter, disk or network), the time to mount multiple shards is proportional to the amount of data to fetch and for most files in a snapshot, we need to fetch only a small piece of the files to start the shard.

henningandersen · 2020-12-14T18:06:46Z

Thanks Tanguy and David!

* elastic/master: (33 commits) Add searchable snapshot cache folder to NodeEnvironment (elastic#66297) [DOCS] Add dynamic runtime fields to docs (elastic#66194) Add HDFS searchable snapshot integration (elastic#66185) Support canceling cross-clusters search requests (elastic#66206) Mute testCacheSurviveRestart (elastic#66289) Fix cat tasks api params in spec and handler (elastic#66272) Snapshot of a searchable snapshot should be empty (elastic#66162) [ML] DFA _explain API should not fail when none field is included (elastic#66281) Add action to decommission legacy monitoring cluster alerts (elastic#64373) move rollup_index param out of RollupActionConfig (elastic#66139) Improve FieldFetcher retrieval of fields (elastic#66160) Remove unsed fields in `RestAnalyzeAction` (elastic#66215) Simplify searchable snapshot CacheKey (elastic#66263) Autoscaling remove feature flags (elastic#65973) Improve searchable snapshot mount time (elastic#66198) [ML] Report cause when datafeed extraction encounters error (elastic#66167) Remove suggest reference in some API specs (elastic#66180) Fix warning when installing a plugin for different ESversion (elastic#66146) [ML] make `xpack.ml.max_ml_node_size` and `xpack.ml.use_auto_machine_memory_percent` dynamically settable (elastic#66132) [DOCS] Add `require_alias` to Bulk API (elastic#66259) ...

In elastic#66198 a setting was introduced to reduce the range size used for searchable snapshots during recovery, unfortunately it was not registered and is therefore not settable.

In #66198 a setting was introduced to reduce the range size used for searchable snapshots during recovery, unfortunately it was not registered and is therefore not settable.

henningandersen added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.11.0 labels Dec 11, 2020

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Dec 11, 2020

henningandersen requested review from tlrx and DaveCTurner December 12, 2020 16:28

tlrx approved these changes Dec 14, 2020

View reviewed changes

DaveCTurner approved these changes Dec 14, 2020

View reviewed changes

henningandersen added 2 commits December 14, 2020 17:06

Merge remote-tracking branch 'origin/master' into enhance_searchable_…

6912d35

…snapshot_mount_time

Comments

7febee9

henningandersen merged commit 6377c5e into elastic:master Dec 14, 2020

henningandersen mentioned this pull request Dec 21, 2020

Register recovery range size setting #66697

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve searchable snapshot mount time #66198

Improve searchable snapshot mount time #66198

Uh oh!

henningandersen commented Dec 11, 2020

Uh oh!

elasticmachine commented Dec 11, 2020

Uh oh!

tlrx left a comment

Uh oh!

DaveCTurner left a comment

Uh oh!

DaveCTurner Dec 14, 2020 •

edited

Loading

Uh oh!

henningandersen commented Dec 14, 2020

Uh oh!

Uh oh!

-    public static final Setting<ByteSizeValue> SNAPSHOT_CACHE_RECOVERY_RANGE_SIZE_SETTING = Setting.byteSizeSetting(
+    /**
+     * Starting up a shard involves reading small parts of some files from the repository, independently of the pre-warming process. If we
+     * expand those ranges using {@link CacheService#SNAPSHOT_CACHE_RANGE_SIZE_SETTING} then we end up reading quite a few 32MB ranges. If
+     * we read enough of these ranges for the restore throttling rate limiter to kick in then all the read threads will end up waiting on
+     * the throttle, blocking subsequent reads. By using a smaller read size during restore we avoid clogging up the rate limiter so much.
+     */
+    public static final Setting<ByteSizeValue> SNAPSHOT_CACHE_RECOVERY_RANGE_SIZE_SETTING = Setting.byteSizeSetting(

Improve searchable snapshot mount time #66198

Improve searchable snapshot mount time #66198

Uh oh!

Conversation

henningandersen commented Dec 11, 2020

Uh oh!

elasticmachine commented Dec 11, 2020

Uh oh!

tlrx left a comment

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Dec 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henningandersen commented Dec 14, 2020

Uh oh!

Uh oh!

DaveCTurner Dec 14, 2020 •

edited

Loading