-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Additional docs for shared_cache searchable snapshots #70566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-docs (Team:Docs) |
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor suggestions, but otherwise LGTM.
repository. No replicas are needed, and no complicated monitoring or orchestration | ||
is necessary to restore lost shards. | ||
|
||
In the "full copy" mode, {es} restores a full copy of the {search-snap} shards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the "full copy" mode, {es} restores a full copy of the {search-snap} shards | |
In the _full copy_ mode, {es} restores a full copy of the {search-snap} shards |
restored just yet. If a search hits such a {search-snap} shard before it has been | ||
fully restored, {es} eagerly retrieves the data needed to complete the search. | ||
|
||
In the "shared cache" mode, {es} only stores small parts of the data locally on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the "shared cache" mode, {es} only stores small parts of the data locally on | |
In the _shared cache_ mode, {es} only stores small parts of the data locally on |
|
||
In the "shared cache" mode, {es} only stores small parts of the data locally on | ||
a node in the cluster. This node-local shared cache has a fixed size and evicts | ||
mapped file parts based on a "least-frequently-used" policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mapped file parts based on a "least-frequently-used" policy. | |
mapped file parts based on a least-frequently-used policy. |
Using {search-snap} in the "shared cache" mode, where only parts of the | ||
data are locally cached on the nodes in the cluster, requires configuring a | ||
shared snapshot cache which is used to hold a copy of just the | ||
frequently-accessed parts of shards of indices which are mounted with | ||
`?storage=shared_cache`. The `shared_cache` storage option is for example used | ||
by the <<ilm-searchable-snapshot,ILM searchable snapshot action>> in the | ||
<<frozen-tier,frozen tier>>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using {search-snap} in the "shared cache" mode, where only parts of the | |
data are locally cached on the nodes in the cluster, requires configuring a | |
shared snapshot cache which is used to hold a copy of just the | |
frequently-accessed parts of shards of indices which are mounted with | |
`?storage=shared_cache`. The `shared_cache` storage option is for example used | |
by the <<ilm-searchable-snapshot,ILM searchable snapshot action>> in the | |
<<frozen-tier,frozen tier>>. | |
Using {search-snap} in the shared cache mode, where only parts of the | |
data are locally cached on the nodes in the cluster, requires configuring a | |
shared snapshot cache. The shared snapshot cache is used to hold a copy of just the | |
frequently-accessed parts of shards of indices, which are mounted with | |
`?storage=shared_cache`. The `shared_cache` storage option is used | |
by the <<ilm-searchable-snapshot,ILM searchable snapshot action>> in the | |
<<frozen-tier,frozen tier>>. |
<<frozen-tier,frozen tier>>. | ||
|
||
If you configure a node to have a shared cache (disabled by default) then | ||
that node will fully reserve the specified amount of space for the cache at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that node will fully reserve the specified amount of space for the cache at | |
that node reserves the specified amount of space for the cache at |
gigabytes up to 90% of available disk space, if the node is to be exclusively | ||
used for indices mounted with the `shared_cache` option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gigabytes up to 90% of available disk space, if the node is to be exclusively | |
used for indices mounted with the `shared_cache` option. | |
gigabytes up to 90% of available disk space if the node is exclusively | |
used for indices mounted with the `shared_cache` option. |
Configuring a shared cache that can hold up to 4 terabytes of data is done by | ||
adding the following line to your `elasticsearch.yml` file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configuring a shared cache that can hold up to 4 terabytes of data is done by | |
adding the following line to your `elasticsearch.yml` file: | |
To configure a shared cache that can hold up to 4 terabytes of data, | |
add the following entry to `elasticsearch.yml`: |
On {ess}, the frozen tier is not fully integrated yet and requires a simple | ||
manual configuration step. | ||
|
||
Users in {ess} will have to chose one of the existing tiers in Cloud | ||
(hot/warm/cold) to run the frozen tier functionality on. This can be configured | ||
by link:{cloud}/ec-add-user-settings.html[adding the `xpack.searchable.snapshot.shared_cache.size` user setting] | ||
to one of the existing tiers in the Elasticsearch Service Console. | ||
|
||
Depending on whether the hot/warm/cold tier is to be exclusively used for the | ||
new frozen functionality or whether it is to be shared with other data | ||
on that tier, the shared_cache.size can be configured from just a few | ||
gigabytes up to 90% of the available disk space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On {ess}, the frozen tier is not fully integrated yet and requires a simple | |
manual configuration step. | |
Users in {ess} will have to chose one of the existing tiers in Cloud | |
(hot/warm/cold) to run the frozen tier functionality on. This can be configured | |
by link:{cloud}/ec-add-user-settings.html[adding the `xpack.searchable.snapshot.shared_cache.size` user setting] | |
to one of the existing tiers in the Elasticsearch Service Console. | |
Depending on whether the hot/warm/cold tier is to be exclusively used for the | |
new frozen functionality or whether it is to be shared with other data | |
on that tier, the shared_cache.size can be configured from just a few | |
gigabytes up to 90% of the available disk space. | |
To use the frozen functionality in {ess}, you currently need to manually configure the shared cache on one of the existing tiers (hot/warm/cold). | |
From the Elasticsearch Service console, link:{cloud}/ec-add-user-settings.html[add the `xpack.searchable.snapshot.shared_cache.size` user setting] | |
to the selected tier. | |
If the tier is used exclusively as the frozen tier, the cache can be allotted up to 90% of the available disk space. If the tier is shared--for example, holds both cold and frozen data--the shared cache size might be just a few gigabytes. |
Co-authored-by: debadair <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ywelsch.
Your original copy LGTM with @debadair's edits.
I pushed a few commits to reformat and reorganize the docs. However, feel free to revert those commits if wanted.
It's outside the scope of this PR, but I think we'll want to look at refactoring the searchable snapshot docs soon. It seems that we repeat the same info a lot. The new storage options are also big game changers conceptually.
I did have two bigger questions related to this PR:
-
We support searchable snapshots in ILM's hot phase. What is the default
storage
option used for that phase (which is presumably also the hot tier)? What option is used if I opt out of tiers entirely and just do custom allocation? -
If ESS users configure a cache using
xpack.searchable.snapshot.shared_cache.size
, does ES automatically turn on the cache option for the tier? Does this apply to on-prem ES users too?
[[full-copy]] | ||
Full copy:: | ||
Loads a full, local copy of the snapshotted index's shards into the cluster. The | ||
cold tier uses this option by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the hot tier also use this option by default? I was wondering what was used for searchable snapshots created during the hot phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saying "The cold tier uses this option by default." is a bit tricky, as it might be unclear to a user here what this means (How is it determined whether the API is called "in the cold tier" or "in the frozen tier"? The more technically precise description is that ILM's searchable snapshot action (that internally calls the the mount API) uses "full copy" in the hot/cold phase, and uses "shared cache" in the frozen phase. ILM also links the phases to the tiers (but also only very implicitly, which we can perhaps gloss over here).
. Choose an existing tier to use. Typically, you'll use the cold tier, but the | ||
hot and warm tiers are also supported. You can use this tier as a shared tier, or | ||
you can dedicate the tier exclusively to the shared snapshot cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt that we should give some kind of recommendation here. Using the cold tier made sense to me, but I wasn't sure how well this would play with ILM defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We support searchable snapshots in ILM's hot phase. What is the default storage option used for that phase (which is presumably also the hot tier)?
Answered in-line (see #70566 (comment))
What option is used if I opt out of tiers entirely and just do custom allocation?
The mount API defaults to "full_copy"
If ESS users configure a cache using xpack.searchable.snapshot.shared_cache.size, does ES automatically turn on the cache option for the tier? Does this apply to on-prem ES users too?
I'm not sure I understand the question. Configuring xpack.searchable.snapshot.shared_cache.size
on a subset of nodes is a requirement for a snapshotted index to be successfully mounted with the shared_cache option (otherwise cluster health is red). As answered in the other thread (see #70566 (comment)), the tier connection is only made indirectly via ILM. This means that if someone wants to use the frozen tier (i.e. ILM's frozen phase), they currently need to set xpack.searchable.snapshot.shared_cache.size
on a subset of the nodes (ideally the nodes with the frozen data role). This applies to both on-prem as well as Cloud.
regular index, with minimal need to access the snapshot repository. | ||
|
||
[[shared-cache]] | ||
Shared cache:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be marked as experimental
[[full-copy]] | ||
Full copy:: | ||
Loads a full, local copy of the snapshotted index's shards into the cluster. The | ||
cold tier uses this option by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saying "The cold tier uses this option by default." is a bit tricky, as it might be unclear to a user here what this means (How is it determined whether the API is called "in the cold tier" or "in the frozen tier"? The more technically precise description is that ILM's searchable snapshot action (that internally calls the the mount API) uses "full copy" in the hot/cold phase, and uses "shared cache" in the frozen phase. ILM also links the phases to the tiers (but also only very implicitly, which we can perhaps gloss over here).
=== Mount options | ||
|
||
To search a snapshot, you must first mount it locally as an index. There are two | ||
mounting options, each with a different local storage footprint: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we link to the mount API here?
to hold a copy of just the frequently-accessed parts of shards of indices which | ||
are mounted with `?storage=shared_cache`. If you configure a node to have a | ||
shared cache then that node will reserve space for the cache when it starts up. | ||
To mount a shared cache of a snapshotted index, you must use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the terminology of "mounting a shared cache" a bit odd and would rather go with something like "mounting a snapshotted index using the shared cache storage".
|
||
[discrete] | ||
[[mount-options]] | ||
=== Mount options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather move this section below the "using searchable snapshots" section, maybe integrating it with "how searchable snapshots work" better, because we prefer users to use ILM for getting data into searchable snapshots and I think having this detail come first will guide users in the wrong direction.
Moves details of mount options into "how searchable snapshots work" section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback and changes @ywelsch @DaveCTurner. I left a few nits that can be ignored if wanted. Otherwise LGTM.
Co-authored-by: James Rodewig <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thank you all! |
This adds additional documentation for shared_cache searchable snapshots that are targeting the frozen tier: - it generalizes the introduction section on searchable snapshots, mentioning that they come in two flavors now as well as the relation to cold and frozen tiers, - it expands the shared_cache section and - it adds Cloud-specific instructions for getting started with the frozen tier Co-authored-by: James Rodewig <[email protected]> Co-authored-by: debadair <[email protected]> Co-authored-by: David Turner <[email protected]>
This adds additional documentation for shared_cache searchable snapshots that are targeting the frozen tier: - it generalizes the introduction section on searchable snapshots, mentioning that they come in two flavors now as well as the relation to cold and frozen tiers, - it expands the shared_cache section and - it adds Cloud-specific instructions for getting started with the frozen tier Co-authored-by: James Rodewig <[email protected]> Co-authored-by: debadair <[email protected]> Co-authored-by: David Turner <[email protected]>
This adds additional documentation for shared_cache searchable snapshots that are targeting the frozen tier: - it generalizes the introduction section on searchable snapshots, mentioning that they come in two flavors now as well as the relation to cold and frozen tiers, - it expands the shared_cache section and - it adds Cloud-specific instructions for getting started with the frozen tier Co-authored-by: James Rodewig <[email protected]> Co-authored-by: debadair <[email protected]> Co-authored-by: David Turner <[email protected]>
This adds additional documentation for shared_cache searchable snapshots that are targeting the frozen tier: - it generalizes the introduction section on searchable snapshots, mentioning that they come in two flavors now as well as the relation to cold and frozen tiers, - it expands the shared_cache section and - it adds Cloud-specific instructions for getting started with the frozen tier Co-authored-by: James Rodewig <[email protected]> Co-authored-by: debadair <[email protected]> Co-authored-by: David Turner <[email protected]>
This adds additional documentation for shared_cache searchable snapshots that are targeting the frozen tier: