Skip to content

Improve searchable snapshot mount time #66198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

henningandersen
Copy link
Contributor

Reduce the range sizes we fetch during recovery to speed up mount time
until shard started.
On resource constrained setups (rate limiter, disk or network), the time
to mount multiple shards is proportional to the amount of data to fetch
and for most files in a snapshot, we need to fetch only a small piece of
the files to start the shard.

Reduce the range sizes we fetch during mounting to speed up mount time
until shard started.
On resource constrained setups (rate limiter, disk or network), the time
to mount multiple shards is proportional to the amount of data to fetch
and for most files in a snapshot, we need to fetch only a small piece of
the files to start the shard.
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Dec 11, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (sorry for the conflicting file)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2, I suggested a couple of comments.

@@ -71,6 +71,13 @@
MAX_SNAPSHOT_CACHE_RANGE_SIZE, // max
Setting.Property.NodeScope
);
public static final Setting<ByteSizeValue> SNAPSHOT_CACHE_RECOVERY_RANGE_SIZE_SETTING = Setting.byteSizeSetting(
Copy link
Contributor

@DaveCTurner DaveCTurner Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest a comment so we remember why we are doing this:

Suggested change
public static final Setting<ByteSizeValue> SNAPSHOT_CACHE_RECOVERY_RANGE_SIZE_SETTING = Setting.byteSizeSetting(
/**
* Starting up a shard involves reading small parts of some files from the repository, independently of the pre-warming process. If we
* expand those ranges using {@link CacheService#SNAPSHOT_CACHE_RANGE_SIZE_SETTING} then we end up reading quite a few 32MB ranges. If
* we read enough of these ranges for the restore throttling rate limiter to kick in then all the read threads will end up waiting on
* the throttle, blocking subsequent reads. By using a smaller read size during restore we avoid clogging up the rate limiter so much.
*/
public static final Setting<ByteSizeValue> SNAPSHOT_CACHE_RECOVERY_RANGE_SIZE_SETTING = Setting.byteSizeSetting(

Also suggest a similar comment on the other setting since this came up as a question in the investigation that led to this PR.

    /**
     * If a search needs data from the repository then we expand it to a larger contiguous range whose size is determined by this setting,
     * in anticipation of needing nearby data in subsequent reads. Repository reads typically have quite high latency (think ~100ms) and
     * the default of 32MB for this setting represents the approximate point at which size starts to matter. In other words, reads of
     * ranges smaller than 32MB don't usually happen much quicker, so we may as well expand all the way to 32MB ranges.
     */
    public static final Setting<ByteSizeValue> SNAPSHOT_CACHE_RANGE_SIZE_SETTING = Setting.byteSizeSetting(

@henningandersen henningandersen merged commit 6377c5e into elastic:master Dec 14, 2020
henningandersen added a commit that referenced this pull request Dec 14, 2020
Reduce the range sizes we fetch during mounting to speed up mount time
until shard started.
On resource constrained setups (rate limiter, disk or network), the time
to mount multiple shards is proportional to the amount of data to fetch
and for most files in a snapshot, we need to fetch only a small piece of
the files to start the shard.
@henningandersen
Copy link
Contributor Author

Thanks Tanguy and David!

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Dec 14, 2020
* elastic/master: (33 commits)
  Add searchable snapshot cache folder to NodeEnvironment (elastic#66297)
  [DOCS] Add dynamic runtime fields to docs (elastic#66194)
  Add HDFS searchable snapshot integration (elastic#66185)
  Support canceling cross-clusters search requests (elastic#66206)
  Mute testCacheSurviveRestart (elastic#66289)
  Fix cat tasks api params in spec and handler (elastic#66272)
  Snapshot of a searchable snapshot should be empty (elastic#66162)
  [ML] DFA _explain API should not fail when none field is included (elastic#66281)
  Add action to decommission legacy monitoring cluster alerts (elastic#64373)
  move rollup_index param out of RollupActionConfig (elastic#66139)
  Improve FieldFetcher retrieval of fields (elastic#66160)
  Remove unsed fields in `RestAnalyzeAction` (elastic#66215)
  Simplify searchable snapshot CacheKey (elastic#66263)
  Autoscaling remove feature flags (elastic#65973)
  Improve searchable snapshot mount time (elastic#66198)
  [ML] Report cause when datafeed extraction encounters error (elastic#66167)
  Remove suggest reference in some API specs (elastic#66180)
  Fix warning when installing a plugin for different ESversion (elastic#66146)
  [ML] make `xpack.ml.max_ml_node_size` and `xpack.ml.use_auto_machine_memory_percent` dynamically settable (elastic#66132)
  [DOCS] Add `require_alias` to Bulk API (elastic#66259)
  ...
henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Dec 21, 2020
In elastic#66198 a setting was introduced to reduce the range size used for
searchable snapshots during recovery, unfortunately it was not
registered and is therefore not settable.
henningandersen added a commit that referenced this pull request Dec 29, 2020
In #66198 a setting was introduced to reduce the range size used for
searchable snapshots during recovery, unfortunately it was not
registered and is therefore not settable.
henningandersen added a commit that referenced this pull request Dec 29, 2020
In #66198 a setting was introduced to reduce the range size used for
searchable snapshots during recovery, unfortunately it was not
registered and is therefore not settable.
henningandersen added a commit that referenced this pull request Dec 29, 2020
In #66198 a setting was introduced to reduce the range size used for
searchable snapshots during recovery, unfortunately it was not
registered and is therefore not settable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants