-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Improve searchable snapshot mount time #66198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve searchable snapshot mount time #66198
Conversation
Reduce the range sizes we fetch during mounting to speed up mount time until shard started. On resource constrained setups (rate limiter, disk or network), the time to mount multiple shards is proportional to the amount of data to fetch and for most files in a snapshot, we need to fetch only a small piece of the files to start the shard.
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (sorry for the conflicting file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM2, I suggested a couple of comments.
@@ -71,6 +71,13 @@ | |||
MAX_SNAPSHOT_CACHE_RANGE_SIZE, // max | |||
Setting.Property.NodeScope | |||
); | |||
public static final Setting<ByteSizeValue> SNAPSHOT_CACHE_RECOVERY_RANGE_SIZE_SETTING = Setting.byteSizeSetting( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest a comment so we remember why we are doing this:
public static final Setting<ByteSizeValue> SNAPSHOT_CACHE_RECOVERY_RANGE_SIZE_SETTING = Setting.byteSizeSetting( | |
/** | |
* Starting up a shard involves reading small parts of some files from the repository, independently of the pre-warming process. If we | |
* expand those ranges using {@link CacheService#SNAPSHOT_CACHE_RANGE_SIZE_SETTING} then we end up reading quite a few 32MB ranges. If | |
* we read enough of these ranges for the restore throttling rate limiter to kick in then all the read threads will end up waiting on | |
* the throttle, blocking subsequent reads. By using a smaller read size during restore we avoid clogging up the rate limiter so much. | |
*/ | |
public static final Setting<ByteSizeValue> SNAPSHOT_CACHE_RECOVERY_RANGE_SIZE_SETTING = Setting.byteSizeSetting( |
Also suggest a similar comment on the other setting since this came up as a question in the investigation that led to this PR.
/**
* If a search needs data from the repository then we expand it to a larger contiguous range whose size is determined by this setting,
* in anticipation of needing nearby data in subsequent reads. Repository reads typically have quite high latency (think ~100ms) and
* the default of 32MB for this setting represents the approximate point at which size starts to matter. In other words, reads of
* ranges smaller than 32MB don't usually happen much quicker, so we may as well expand all the way to 32MB ranges.
*/
public static final Setting<ByteSizeValue> SNAPSHOT_CACHE_RANGE_SIZE_SETTING = Setting.byteSizeSetting(
…snapshot_mount_time
Reduce the range sizes we fetch during mounting to speed up mount time until shard started. On resource constrained setups (rate limiter, disk or network), the time to mount multiple shards is proportional to the amount of data to fetch and for most files in a snapshot, we need to fetch only a small piece of the files to start the shard.
Thanks Tanguy and David! |
* elastic/master: (33 commits) Add searchable snapshot cache folder to NodeEnvironment (elastic#66297) [DOCS] Add dynamic runtime fields to docs (elastic#66194) Add HDFS searchable snapshot integration (elastic#66185) Support canceling cross-clusters search requests (elastic#66206) Mute testCacheSurviveRestart (elastic#66289) Fix cat tasks api params in spec and handler (elastic#66272) Snapshot of a searchable snapshot should be empty (elastic#66162) [ML] DFA _explain API should not fail when none field is included (elastic#66281) Add action to decommission legacy monitoring cluster alerts (elastic#64373) move rollup_index param out of RollupActionConfig (elastic#66139) Improve FieldFetcher retrieval of fields (elastic#66160) Remove unsed fields in `RestAnalyzeAction` (elastic#66215) Simplify searchable snapshot CacheKey (elastic#66263) Autoscaling remove feature flags (elastic#65973) Improve searchable snapshot mount time (elastic#66198) [ML] Report cause when datafeed extraction encounters error (elastic#66167) Remove suggest reference in some API specs (elastic#66180) Fix warning when installing a plugin for different ESversion (elastic#66146) [ML] make `xpack.ml.max_ml_node_size` and `xpack.ml.use_auto_machine_memory_percent` dynamically settable (elastic#66132) [DOCS] Add `require_alias` to Bulk API (elastic#66259) ...
In elastic#66198 a setting was introduced to reduce the range size used for searchable snapshots during recovery, unfortunately it was not registered and is therefore not settable.
In #66198 a setting was introduced to reduce the range size used for searchable snapshots during recovery, unfortunately it was not registered and is therefore not settable.
In #66198 a setting was introduced to reduce the range size used for searchable snapshots during recovery, unfortunately it was not registered and is therefore not settable.
In #66198 a setting was introduced to reduce the range size used for searchable snapshots during recovery, unfortunately it was not registered and is therefore not settable.
Reduce the range sizes we fetch during recovery to speed up mount time
until shard started.
On resource constrained setups (rate limiter, disk or network), the time
to mount multiple shards is proportional to the amount of data to fetch
and for most files in a snapshot, we need to fetch only a small piece of
the files to start the shard.