-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[DOCS] Add searchable snapshots topic. #63040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a couple of comments, I am in doubt about how we want to frame searchable snapshots in docs so that we make it clear that only the full copy version is available now, but can add the other variant in the future without too much confusion. I would be inclined to not introduce the fully remote variant in docs now.
This initial version is something of an exploration of how we want to talk about searchable snapshots. I included the "fully-remote" option in this draft to make sure the descriptions we use now leave room for talking about it later. It introduces a number of new concepts/terms:
We might be able to come up with a better name than "fully-remote storage". I'll add the other terms to the glossary & include them in this PR so we can fine-tune the definitions. |
bf45c7c
to
df1d881
Compare
I agree with Henning, this "fully-remote storage" is not implemented. By mentioning it I'm afraid that it would confuse users or triggers questions that we don't have the answers yet. I thinks it should be removed from the glossary and any doc. |
@elasticmachine retest this please |
I've expanded the conceptual docs a bit, see d4292ec. I think they cover everything I wanted to say, but please check for gaps. I have avoided the "snapshot-backed index" terminology in favour of just calling them "searchable snapshots" or "searchable snapshot indices". I realise this is not so technically correct and we could go back on that and make a clearer distinction between the index and the snapshot behind it if we'd prefer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DaveCTurner. I left a few comments, in particular on the terminology. Otherwise looking good.
|
||
{search-snaps-cap} let you reduce your operating costs by treating the snapshot | ||
as the authoritative copy of some of your indices. The high reliability of the | ||
snapshot repository removes the need to keep multiple copies of their data in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
their
seems wrong, did you mean the
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant their
(as in "belonging to the indices") but it's awkward. Reworded in 29402a5
|
||
A searchable snapshot can be searched just like any other index. | ||
{search-snaps-cap} are often used to access a large archive of historical data, | ||
for which searches may sometimes be complex and time-consuming. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we are here framing searchable snapshots as being slow, which is not necessarily the case. I think I get that the time span of the search will be large and this adds to query time, hence async search is necessary, but I would like to move that to the meat of the section more than this introducing paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying not to, but yes it still suggests slowness. I moved this to a TIP
in 8beb64b and replaced this sentence with one indicating that performance should be similar to a regular index.
other index. You can, for instance, use <<shard-allocation-filtering>> to | ||
restrict these shards to a subset of your nodes. | ||
|
||
Normally you will use {search-snaps-cap} via the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"use" implies "search" in my head, I would prefer to say "create" or "mount" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to talk about something more general than just creating them -- after all ILM does also take care of aliases which lets you search them too. How about manage
? 90a2184
|
||
If a node fails while holding some zero-replica searchable snapshot then there | ||
will be a brief window of time before {es} allocates these shards elsewhere. | ||
During this window of time the cluster health will be `red` and searches that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the cluster will currently only be yellow unless we worked on this recently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes now that the recovery source is always "snapshot" this should be the case. Hedging my bets in c43841d by saying "not green".
docs/reference/glossary.asciidoc
Outdated
@@ -450,6 +456,12 @@ in the <<glossary-mapping,mapping>>. | |||
// end::routing-def[] | |||
-- | |||
|
|||
[[glossary-searchable-snapshot]] searchable snapshot :: | |||
// tag::searchable-snapshot-def[] | |||
A <<glossary-snapshot, snapshot>> of an index or data stream that resides in a remote data store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the "searchable snapshot" object is the index or the snapshot? We talked about "snapshot-backed index", which is clearly the index. I think of it as the index that makes the snapshot searchable.
We could also define "searchable snapshot index/indices" here instead.
I find the current definition here slightly confusing in that any snapshot can be made searchable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reworked the glossary entries in c026c9f.
@elasticmachine run elasticsearch-ci/docs |
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
Pinging @elastic/es-docs (>docs) |
I think this is ready for a full review now; @debadair your input would be useful but you opened the PR so I can't request a review from you. The preview is up and running at https://elasticsearch_63040.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/searchable-snapshots.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's still some redundancy with the last section that could be cleaned up later.
@@ -450,6 +450,22 @@ in the <<glossary-mapping,mapping>>. | |||
// end::routing-def[] | |||
-- | |||
|
|||
[[glossary-searchable-snapshot]] searchable snapshot :: | |||
// tag::searchable-snapshot-def[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're still not quite there with these definitions. If a "searchable snapshot" is an index mounted from snapshot, do we even need the notion of the searchable snapshot index? As written, these definitions don't make the distinction between them clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like us to be able to distinguish the index-in-the-snapshot from the index-in-the-cluster. In principle we are searching the index-in-the-snapshot, hence "searchable snapshot", and we implement this today by creating a corresponding index-in-the-cluster. I think the distinction is important since we may in future support searches directly against snapshots too. I've changed the wording slightly: "index in a snapshot" -> "snapshot of an index" -- does that help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think of it more like "searchable snapshot" is the concept, whereas a "searchable snapshot index" is a concrete index backed by a searchable snapshot. I.e., there is no object anywhere that is a "searchable snapshot", since all snapshots can be made searchable (through a "searchable snapshot index"). But I am also ok with the current text.
do we even need the notion of the searchable snapshot index
I think referring to an index as just a "searchable snapshot" is unintuitive, since it is an index, not a snapshot.
docs/reference/glossary.asciidoc
Outdated
An index in a <<glossary-snapshot, snapshot>> that is mounted as a | ||
<<glossary-searchable-snapshot-index, searchable snapshot index>> and can be | ||
searched as if it were a regular index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An index in a <<glossary-snapshot, snapshot>> that is mounted as a | |
<<glossary-searchable-snapshot-index, searchable snapshot index>> and can be | |
searched as if it were a regular index. | |
A read-only index mounted from a <<glossary-snapshot, snapshot>> that can be searched like any other index. Searchable snapshots do not need | |
<<glossary-replica-shard,replica shards>> for resilience, since their data is | |
reliably stored in the snapshot repository. |
[[glossary-searchable-snapshot-index]] searchable snapshot index :: | ||
// tag::searchable-snapshot-index-def[] | ||
An <<glossary-index, index>> whose data is stored in a <<glossary-snapshot, | ||
snapshot>> that resides in a separate <<glossary-snapshot-repository,snapshot | ||
repository>> such as AWS S3. Searchable snapshot indices do not need | ||
<<glossary-replica-shard,replica>> shards for resilience, since their data is | ||
reliably stored outside the cluster. | ||
// end::searchable-snapshot-index-def[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[[glossary-searchable-snapshot-index]] searchable snapshot index :: | |
// tag::searchable-snapshot-index-def[] | |
An <<glossary-index, index>> whose data is stored in a <<glossary-snapshot, | |
snapshot>> that resides in a separate <<glossary-snapshot-repository,snapshot | |
repository>> such as AWS S3. Searchable snapshot indices do not need | |
<<glossary-replica-shard,replica>> shards for resilience, since their data is | |
reliably stored outside the cluster. | |
// end::searchable-snapshot-index-def[] |
We recommend that you <<indices-forcemerge, force-merge>> indices to a single | ||
segment per shard before mounting them as {search-snaps}. Each read from a | ||
snapshot repository takes time and costs money, and the fewer segments there | ||
are the fewer reads are needed to restore the snapshot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved up
We recommend that you <<indices-forcemerge, force-merge>> indices to a single | |
segment per shard before mounting them as {search-snaps}. Each read from a | |
snapshot repository takes time and costs money, and the fewer segments there | |
are the fewer reads are needed to restore the snapshot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather this was down here. It's not very important to force-merge things before mounting them, and if you're mounting an existing snapshot you basically have no choice since you can't do anything about the segment count without restoring each index, merging it and re-snapshotting it.
{search-snaps-cap} are ideal for managing a large archive of historical data. | ||
Historical information is typically searched less frequently than recent data | ||
and therefore may not need replicas for their performance benefits. | ||
|
||
You can use <<async-search>> with {search-snaps}, which is especially useful | ||
for more complex or time-consuming searches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{search-snaps-cap} are ideal for managing a large archive of historical data. | |
Historical information is typically searched less frequently than recent data | |
and therefore may not need replicas for their performance benefits. | |
You can use <<async-search>> with {search-snaps}, which is especially useful | |
for more complex or time-consuming searches. | |
{search-snaps-cap} are ideal for managing large archives of historical data. | |
Historical information is typically searched less frequently than recent data | |
and performance is less important. | |
For more complex or time-consuming searches, you can use <<async-search>> with {search-snaps}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather keep this wording as-is: "performance is less important" suggests to me that there's a general performance penalty for using searchable snapshots -- in fact the only drawback most of the time is the lack of replicas.
I'll apply the change to the wording re. async searches separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left a number of mostly minor comments.
You can control the allocation of the shards of {search-snap} indices using the | ||
same mechanisms as for regular indices. For example, you could use | ||
<<shard-allocation-filtering>> to restrict {search-snap} shards to a subset of | ||
your nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should go below the ILM section in the interest of explaining the "easy/normal" option first and then the more advanced option afterwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ambivalent -- I put it here since we're starting off by talking about how these indices are mostly maniuplated (searched & allocated) as if they were normal indices, but I've moved it in 990707b.
[[using-searchable-snapshots]] | ||
=== Using {search-snaps} | ||
|
||
Searching a {search-snap} is the same as searching any other index. Search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be "Searching a searchable snapshot index" to be consistent with glossary. I think it reads better too.
Same comment goes for a number of the "{search-snap}" mentions throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I've added a few "index" or "shard" nouns throughout in 990707b.
@@ -450,6 +450,22 @@ in the <<glossary-mapping,mapping>>. | |||
// end::routing-def[] | |||
-- | |||
|
|||
[[glossary-searchable-snapshot]] searchable snapshot :: | |||
// tag::searchable-snapshot-def[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think of it more like "searchable snapshot" is the concept, whereas a "searchable snapshot index" is a concrete index backed by a searchable snapshot. I.e., there is no object anywhere that is a "searchable snapshot", since all snapshots can be made searchable (through a "searchable snapshot index"). But I am also ok with the current text.
do we even need the notion of the searchable snapshot index
I think referring to an index as just a "searchable snapshot" is unintuitive, since it is an index, not a snapshot.
* [DOCS] Add searchable snapshots topic. * [DOCS] Add definitions & remove fully-remote storage. * [DOCS] Fixed duplicate anchor. * Expand conceptual docs for searchable snapshots * Rewordings * Glossary tidy-up * Beta * Reword * More performance idea to a TIP * use -> manage * red -> not green * Missing space? * Update docs/reference/glossary.asciidoc * Fix beta label * Use more attributes, fix link titles * Apply suggestions from code review Co-authored-by: debadair <[email protected]> * Reformat * Minor rewordings * More minor rewordings * Address Henning's comments Co-authored-by: David Turner <[email protected]> Co-authored-by: James Rodewig <[email protected]>
* [DOCS] Add searchable snapshots topic. * [DOCS] Add definitions & remove fully-remote storage. * [DOCS] Fixed duplicate anchor. * Expand conceptual docs for searchable snapshots * Rewordings * Glossary tidy-up * Beta * Reword * More performance idea to a TIP * use -> manage * red -> not green * Missing space? * Update docs/reference/glossary.asciidoc * Fix beta label * Use more attributes, fix link titles * Apply suggestions from code review Co-authored-by: debadair <[email protected]> * Reformat * Minor rewordings * More minor rewordings * Address Henning's comments Co-authored-by: David Turner <[email protected]> Co-authored-by: James Rodewig <[email protected]>
* [DOCS] Add searchable snapshots topic. * [DOCS] Add definitions & remove fully-remote storage. * [DOCS] Fixed duplicate anchor. * Expand conceptual docs for searchable snapshots * Rewordings * Glossary tidy-up * Beta * Reword * More performance idea to a TIP * use -> manage * red -> not green * Missing space? * Update docs/reference/glossary.asciidoc * Fix beta label * Use more attributes, fix link titles * Apply suggestions from code review Co-authored-by: debadair <[email protected]> * Reformat * Minor rewordings * More minor rewordings * Address Henning's comments Co-authored-by: David Turner <[email protected]> Co-authored-by: James Rodewig <[email protected]> Co-authored-by: David Turner <[email protected]> Co-authored-by: James Rodewig <[email protected]>
* [DOCS] Add searchable snapshots topic. * [DOCS] Add definitions & remove fully-remote storage. * [DOCS] Fixed duplicate anchor. * Expand conceptual docs for searchable snapshots * Rewordings * Glossary tidy-up * Beta * Reword * More performance idea to a TIP * use -> manage * red -> not green * Missing space? * Update docs/reference/glossary.asciidoc * Fix beta label * Use more attributes, fix link titles * Apply suggestions from code review Co-authored-by: debadair <[email protected]> * Reformat * Minor rewordings * More minor rewordings * Address Henning's comments Co-authored-by: David Turner <[email protected]> Co-authored-by: James Rodewig <[email protected]> Co-authored-by: David Turner <[email protected]> Co-authored-by: James Rodewig <[email protected]>
* [DOCS] Add searchable snapshots topic. * [DOCS] Add definitions & remove fully-remote storage. * [DOCS] Fixed duplicate anchor. * Expand conceptual docs for searchable snapshots * Rewordings * Glossary tidy-up * Beta * Reword * More performance idea to a TIP * use -> manage * red -> not green * Missing space? * Update docs/reference/glossary.asciidoc * Fix beta label * Use more attributes, fix link titles * Apply suggestions from code review Co-authored-by: debadair <[email protected]> * Reformat * Minor rewordings * More minor rewordings * Address Henning's comments Co-authored-by: David Turner <[email protected]> Co-authored-by: James Rodewig <[email protected]>
* [DOCS] Add searchable snapshots topic. * [DOCS] Add definitions & remove fully-remote storage. * [DOCS] Fixed duplicate anchor. * Expand conceptual docs for searchable snapshots * Rewordings * Glossary tidy-up * Beta * Reword * More performance idea to a TIP * use -> manage * red -> not green * Missing space? * Update docs/reference/glossary.asciidoc * Fix beta label * Use more attributes, fix link titles * Apply suggestions from code review Co-authored-by: debadair <[email protected]> * Reformat * Minor rewordings * More minor rewordings * Address Henning's comments Co-authored-by: David Turner <[email protected]> Co-authored-by: James Rodewig <[email protected]>
* [DOCS] Add searchable snapshots topic. * [DOCS] Add definitions & remove fully-remote storage. * [DOCS] Fixed duplicate anchor. * Expand conceptual docs for searchable snapshots * Rewordings * Glossary tidy-up * Beta * Reword * More performance idea to a TIP * use -> manage * red -> not green * Missing space? * Update docs/reference/glossary.asciidoc * Fix beta label * Use more attributes, fix link titles * Apply suggestions from code review Co-authored-by: debadair <[email protected]> * Reformat * Minor rewordings * More minor rewordings * Address Henning's comments Co-authored-by: David Turner <[email protected]> Co-authored-by: James Rodewig <[email protected]> Co-authored-by: David Turner <[email protected]> Co-authored-by: James Rodewig <[email protected]>
* [DOCS] Add searchable snapshots topic. (#63040) * [DOCS] Add searchable snapshots topic. * [DOCS] Add definitions & remove fully-remote storage. * [DOCS] Fixed duplicate anchor. * Expand conceptual docs for searchable snapshots * Rewordings * Glossary tidy-up * Beta * Reword * More performance idea to a TIP * use -> manage * red -> not green * Missing space? * Update docs/reference/glossary.asciidoc * Fix beta label * Use more attributes, fix link titles * Apply suggestions from code review Co-authored-by: debadair <[email protected]> * Reformat * Minor rewordings * More minor rewordings * Address Henning's comments Co-authored-by: David Turner <[email protected]> Co-authored-by: James Rodewig <[email protected]> * Fixed glossary entries Co-authored-by: David Turner <[email protected]> Co-authored-by: James Rodewig <[email protected]>
No description provided.