Skip to content

Commit e329f29

Browse files
Add Package Level Documentation to o.e.r.blobstore
* Added verbose documentation for the `o.e.r.blobstore` package similar to that added for the snapshot package in elastic#38108 * Moved the documentation on the BlobStoreRepository to the package level to have things in a single place for easier readability.
1 parent ac34af5 commit e329f29

File tree

2 files changed

+203
-39
lines changed

2 files changed

+203
-39
lines changed

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

Lines changed: 3 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -121,45 +121,9 @@
121121
* <p>
122122
* This repository works with any {@link BlobStore} implementation. The blobStore could be (and preferred) lazy initialized in
123123
* {@link #createBlobStore()}.
124-
* <p>
125-
* BlobStoreRepository maintains the following structure in the blob store
126-
* <pre>
127-
* {@code
128-
* STORE_ROOT
129-
* |- index-N - JSON serialized {@link RepositoryData} containing a list of all snapshot ids and the indices belonging to
130-
* | each snapshot, N is the generation of the file
131-
* |- index.latest - contains the numeric value of the latest generation of the index file (i.e. N from above)
132-
* |- incompatible-snapshots - list of all snapshot ids that are no longer compatible with the current version of the cluster
133-
* |- snap-20131010.dat - SMILE serialized {@link SnapshotInfo} for snapshot "20131010"
134-
* |- meta-20131010.dat - SMILE serialized {@link MetaData} for snapshot "20131010" (includes only global metadata)
135-
* |- snap-20131011.dat - SMILE serialized {@link SnapshotInfo} for snapshot "20131011"
136-
* |- meta-20131011.dat - SMILE serialized {@link MetaData} for snapshot "20131011"
137-
* .....
138-
* |- indices/ - data for all indices
139-
* |- Ac1342-B_x/ - data for index "foo" which was assigned the unique id of Ac1342-B_x in the repository
140-
* | |- meta-20131010.dat - JSON Serialized {@link IndexMetaData} for index "foo"
141-
* | |- 0/ - data for shard "0" of index "foo"
142-
* | | |- __1 \ (files with numeric names were created by older ES versions)
143-
* | | |- __2 |
144-
* | | |- __VPO5oDMVT5y4Akv8T_AO_A |- files from different segments see snap-* for their mappings to real segment files
145-
* | | |- __1gbJy18wS_2kv1qI7FgKuQ |
146-
* | | |- __R8JvZAHlSMyMXyZc2SS8Zg /
147-
* | | .....
148-
* | | |- snap-20131010.dat - SMILE serialized {@link BlobStoreIndexShardSnapshot} for snapshot "20131010"
149-
* | | |- snap-20131011.dat - SMILE serialized {@link BlobStoreIndexShardSnapshot} for snapshot "20131011"
150-
* | | |- index-123 - SMILE serialized {@link BlobStoreIndexShardSnapshots} for the shard
151-
* | |
152-
* | |- 1/ - data for shard "1" of index "foo"
153-
* | | |- __1
154-
* | | .....
155-
* | |
156-
* | |-2/
157-
* | ......
158-
* |
159-
* |- 1xB0D8_B3y/ - data for index "bar" which was assigned the unique id of 1xB0D8_B3y in the repository
160-
* ......
161-
* }
162-
* </pre>
124+
* </p>
125+
* For in depth documentation on how exactly implementations of this class interact with the snapshot functionality please refer to the
126+
* documentation of the package {@link org.elasticsearch.repositories.blobstore}.
163127
*/
164128
public abstract class BlobStoreRepository extends AbstractLifecycleComponent implements Repository {
165129
private static final Logger logger = LogManager.getLogger(BlobStoreRepository.class);
Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
/*
2+
* Licensed to Elasticsearch under one or more contributor
3+
* license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright
5+
* ownership. Elasticsearch licenses this file to you under
6+
* the Apache License, Version 2.0 (the "License"); you may
7+
* not use this file except in compliance with the License.
8+
* You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing,
13+
* software distributed under the License is distributed on an
14+
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
* KIND, either express or implied. See the License for the
16+
* specific language governing permissions and limitations
17+
* under the License.
18+
*/
19+
20+
/**
21+
* <p>This package exposes the blobstore repository used by Elasticsearch Snapshots.</p>
22+
*
23+
* <h1>Preliminaries</h1>
24+
*
25+
* <p>The {@link org.elasticsearch.repositories.blobstore.BlobStoreRepository} forms the basis of implementations of
26+
* {@link org.elasticsearch.repositories.Repository} on top of a blob store. A blobstore can be used as the basis for an implementation
27+
* as long as it provides for GET, PUT and (except for in the case of read-only repositories) LIST operations.
28+
* These operations are formally defined as specified by the {@link org.elasticsearch.common.blobstore.BlobContainer} interface that
29+
* any {@code BlobStoreRepository} when implementation must provide via its implementation of
30+
* {@link org.elasticsearch.repositories.blobstore.BlobStoreRepository#getBlobContainer()}.</p>
31+
*
32+
* <p>The blob store is written to and read from both the master node as well as the data nodes. All metadata related to the snapshots'
33+
* scope and health (i.e. the indices a snapshot contains, a snapshot of the cluster state, index metadata and the status of the snapshot)
34+
* are written by the master node.</p>
35+
* <p>For each shard, the data-node holding the shard's primary writes the actual data in form of the shard's segment files to the
36+
* repository as well as metadata about all the segment files that the repository stores for a given shard.</p>
37+
*
38+
* <p>For the specifics of how the operations on the repository are invoked during the snapshot process please refer to the documentation
39+
* of the {@link org.elasticsearch.snapshots} package.</p>
40+
*
41+
* <p>BlobStoreRepository maintains the following structure of blobs containing data and metadata in the blob store. The exact operations
42+
* executed on these blobs are explained below.</p>
43+
* <pre>
44+
* {@code
45+
* STORE_ROOT
46+
* |- index-N - JSON serialized {@link org.elasticsearch.repositories.RepositoryData} containing a list of all snapshot ids
47+
* | and the indices belonging to each snapshot, N is the generation of the file
48+
* |- index.latest - contains the numeric value of the latest generation of the index file (i.e. N from above)
49+
* |- incompatible-snapshots - list of all snapshot ids that are no longer compatible with the current version of the cluster
50+
* |- snap-20131010.dat - SMILE serialized {@link org.elasticsearch.snapshots.SnapshotInfo} for snapshot "20131010"
51+
* |- meta-20131010.dat - SMILE serialized {@link org.elasticsearch.cluster.metadata.MetaData} for snapshot "20131010"
52+
* | (includes only global metadata)
53+
* |- snap-20131011.dat - SMILE serialized {@link org.elasticsearch.snapshots.SnapshotInfo} for snapshot "20131011"
54+
* |- meta-20131011.dat - SMILE serialized {@link org.elasticsearch.cluster.metadata.MetaData} for snapshot "20131011"
55+
* .....
56+
* |- indices/ - data for all indices
57+
* |- Ac1342-B_x/ - data for index "foo" which was assigned the unique id of Ac1342-B_x in the repository
58+
* | |- meta-20131010.dat - JSON Serialized {@link org.elasticsearch.cluster.metadata.IndexMetaData} for index "foo"
59+
* | |- 0/ - data for shard "0" of index "foo"
60+
* | | |- __1 \ (files with numeric names were created by older ES versions)
61+
* | | |- __2 |
62+
* | | |- __VPO5oDMVT5y4Akv8T_AO_A |- files from different segments see snap-* for their mappings to real segment files
63+
* | | |- __1gbJy18wS_2kv1qI7FgKuQ |
64+
* | | |- __R8JvZAHlSMyMXyZc2SS8Zg /
65+
* | | .....
66+
* | | |- snap-20131010.dat - SMILE serialized {@link org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardSnapshot} for
67+
* | | | snapshot "20131010"
68+
* | | |- snap-20131011.dat - SMILE serialized {@link org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardSnapshot} for
69+
* | | | snapshot "20131011"
70+
* | | |- index-123 - SMILE serialized {@link org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardSnapshots} for
71+
* | | | the shard
72+
* | |
73+
* | |- 1/ - data for shard "1" of index "foo"
74+
* | | |- __1
75+
* | | .....
76+
* | |
77+
* | |-2/
78+
* | ......
79+
* |
80+
* |- 1xB0D8_B3y/ - data for index "bar" which was assigned the unique id of 1xB0D8_B3y in the repository
81+
* ......
82+
* }
83+
* </pre>
84+
*
85+
* <h1>Getting the Repository's RepositoryData</h1>
86+
*
87+
* <p>Loading the {@link org.elasticsearch.repositories.RepositoryData} holding a list of all snapshots as well as the mapping of indices'
88+
* names to their repository {@link org.elasticsearch.repositories.IndexId} by invoking
89+
* {@link org.elasticsearch.repositories.blobstore.BlobStoreRepository#getRepositoryData} is implemented as follows:</p>
90+
* <ol>
91+
* <li>
92+
* <ol>
93+
* <li>The blobstore repository stores the {@code RepositoryData} in blobs named {@code /index-N} directly under the
94+
* repositories' root.</li>
95+
* <li>The blobstore also stores the most recent {@code N} as a signed 64bit long in the blob {@code /index.latest} directly
96+
* under the repositories' root.</li>
97+
* </ol>
98+
* </li>
99+
* <li>
100+
* <ol>
101+
* <li>Find the most recent {@code RepositoryData} by getting a list of all index-N blobs through listing all blobs with prefix "index-"
102+
* under the repository root and selecting the one with the highest value for N.</li>
103+
* <li>If this operation fails because the repositories' {@code BlobContainer} does not support list operations in the case of read-only
104+
* repositories, read the highest value of N from the the index.latest blob.</li>
105+
* </ol>
106+
* </li>
107+
* <li>
108+
* <ol>
109+
* <li>Use the just determined value of {@code N} and get the "/index-N" blob and deserialize the {@code RepositoryData} from it.</li>
110+
* <li>If no value of {@code N} could be found since neither an {@code index.latest} nor any {@code index-N} blobs exist in the repository,
111+
* it is assumed to be empty and {@link org.elasticsearch.repositories.RepositoryData#EMPTY} is returned.</li>
112+
* </ol>
113+
* </li>
114+
* </ol>
115+
* <h1>Creating a Snapshot</h1>
116+
*
117+
* <h2>Initializing a Snapshot in the Repository</h2>
118+
*
119+
* <p>Creating a snapshot in the repository begins by a call to {@link org.elasticsearch.repositories.Repository#initializeSnapshot} which
120+
* the blob store repository implements as such:</p>
121+
* <ol>
122+
* <li>Verify that no snapshot by the requested name exists.</li>
123+
* <li>Write a blob containing the cluster metadata to the root of the blob store repository at /meta-${snapshot-uuid}.dat</li>
124+
* <li>Write the metadata for each index to a blob in that index's directory at /indices/${index-snapshot-uuid}/meta-${snapshot-uuid}.dat
125+
* </li>
126+
* </ol>
127+
* TODO: This behavior is problematic, adjust these docs once https://github.com/elastic/elasticsearch/issues/41581 is fixed
128+
*
129+
* <h2>Writing Shard Data (Segments)</h2>
130+
*
131+
* <p>Once all the metadata was written by the snapshot initialization the snapshot process moves on to writing the actual shard data to the
132+
* repository by invoking {@link org.elasticsearch.repositories.Repository#snapshotShard} which is implemented as follows:</p>
133+
*
134+
* <p>Note:</p>
135+
* <ul>
136+
* <li>For each shard {@code i} in a given index its path in the blob store is located at {@code /indices/${index-snapshot-uuid}/{i}}</li>
137+
* <li>All the following steps on the shard's primary's data node.</li>
138+
* </ul>
139+
* <ol>
140+
* <li>Create the {@link org.apache.lucene.index.IndexCommit} for each shard to snapshot on the shard's primary.</li>
141+
* <li>List all blobs in the shard's path.</li>
142+
* <li>Find the {@link org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardSnapshots} blob with name "index-${N}" for the
143+
* highest possible value of N in the list to get the information of what segment files are already available in the blobstore.</li>
144+
* <li>By comparing the files in the {@code IndexCommit} and the available file list from the previous step, determine the segment files
145+
* that need to be written to the blob store. For each segment to be written to the blob store, the logic generates a unique name by
146+
* combining the segment data blob prefix "__" and a UUID and writes the segment to the blobstore.</li>
147+
* <li>After completing all segment writes, a blob containing a
148+
* {@link org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardSnapshot} with name "snap-${snapshot-uuid}.dat" is written to the
149+
* shard's path, containing a list of all the files referenced by the snapshot as well as some metadata about the snapshot.</li>
150+
* <li>Once all the segments and the {@code BlobStoreIndexShardSnapshot} blob have been written, an updated
151+
* {@code BlobStoreIndexShardSnapshots} blob is written to the shard's path with name "index-${(N+1)}".</li>
152+
* </ol>
153+
*
154+
* <h2>Finalizing the Snapshot</h2>
155+
*
156+
* <p>After all primaries have finished writing the necessary segment files to the blob store in the previous step the master node moves on
157+
* to finalize the snapshot by invoking {@link org.elasticsearch.repositories.Repository#finalizeSnapshot}.</p>
158+
*
159+
* This method executes the following actions in order:
160+
* <ol>
161+
* <li>Write the {@link org.elasticsearch.snapshots.SnapshotInfo} blob for the given snapshot to the key {@code /snap-${snapshot-uuid}.dat}
162+
* directly under the repository root.</li>
163+
* <li>Write an updated {@code RepositoryData} blob to the key {@code /index-${N+1}} using the {@code N} derived when initializing the
164+
* snapshot in the first step.</li>
165+
* <li>Write the updated {@code /index.latest} blob containing the new repository generation {@code N + 1}.</li>
166+
* </ol>
167+
*
168+
* <h1>Deleting a Snapshot</h1>
169+
*
170+
* <p>Deleting a snapshot is an operation that is exclusively executed on the master node that runs through the following sequence of
171+
* action when {@link org.elasticsearch.repositories.blobstore.BlobStoreRepository#deleteSnapshot} is invoked:</p>
172+
*
173+
* <ol>
174+
* <li>Get the current {@code RepositoryData} from the latest {@code index-N} blob at the repository root.</li>
175+
* <li>Write an updated {@code RepositoryData} blob with the deleted snapshot removed to key {@code index-${N+1}} under the repository
176+
* root.</li>
177+
* <li>Write an updated {@code index.latest} blob containing {@code N + 1}.</li>
178+
* <li>Delete the global {@code MetaData} blob {@code meta-${snapshot-uuid}} stored directly under the repository root associated with the
179+
* snapshot as well as the {@code SnapshotInfo} blob at {@code /snap-${snapshot-uuid}.dat}.</li>
180+
* <li>For each index referenced by the snapshot:
181+
* <ol>
182+
* <li>Delete the snapshot's {@code IndexMetaData} at {@code /indices/${index-snapshot-uuid}/meta-${snapshot-uuid}}.</li>
183+
* <li>Go through all shard directories {@code /indices/${index-snapshot-uuid}/${i}} and:
184+
* <ol>
185+
* <li>Remove the {@code BlobStoreIndexShardSnapshot} blob at {@code /indices/${index-snapshot-uuid}/${i}/snap-${snapshot-uuid}.dat}.</li>
186+
* <li>List all blobs in the shard path {@code /indices/${index-snapshot-uuid}} and build a new {@code BlobStoreIndexShardSnapshots} from
187+
* the remaining {@code BlobStoreIndexShardSnapshot} blobs in the shard, then write it to the next shard generation blob at
188+
* {@code /indices/${index-snapshot-uuid}/${i}/index-${N+1}} (The shard's generation is determined from the list of {@code index-N} blobs
189+
* in the shard directory).</li>
190+
* <li>Delete all segment blobs (identified by having the data blob prefix {@code __}) in the shard directory that are not referenced by the
191+
* just written {@code BlobStoreIndexShardSnapshots}.</li>
192+
* </ol>
193+
* </li>
194+
* </ol>
195+
* </li>
196+
* </ol>
197+
* TODO: The above sequence of actions can lead to leaking files when an index completely goes out of scope. Adjust this documentation once
198+
* https://github.com/elastic/elasticsearch/issues/13159 is fixed.
199+
*/
200+
package org.elasticsearch.repositories.blobstore;

0 commit comments

Comments
 (0)