Skip to content

Add Package Level Documentation to o.e.r.blobstore #42101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

original-brownbear
Copy link
Member

  • Added verbose documentation for the o.e.r.blobstore package similar to
    that added for the snapshot package in
    Add Package Level JavaDoc on Snapshots #38108
  • Moved the documentation on the BlobStoreRepository to the package
    level to have things in a single place for easier readability.

* Added verbose documentation for the `o.e.r.blobstore` package similar to
that added for the snapshot package in
elastic#38108
* Moved the documentation on the BlobStoreRepository to the package
level to have things in a single place for easier readability.
@original-brownbear original-brownbear added >docs General docs changes :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.2.0 labels May 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Armin, I left a few small suggestions.

*
* <p>The {@link org.elasticsearch.repositories.blobstore.BlobStoreRepository} forms the basis of implementations of
* {@link org.elasticsearch.repositories.Repository} on top of a blob store. A blobstore can be used as the basis for an implementation
* as long as it provides for GET, PUT and (except for in the case of read-only repositories) LIST operations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says that LIST is not required on read-only repositories, but PUT is. That sounds like it should be the other way round.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually neither is required for a read-only repo :) You only need GET for those, will adjust accordingly :)

* <li>
* <ol>
* <li>The blobstore repository stores the {@code RepositoryData} in blobs named with incrementing suffix {@code N} at {@code /index-N}
* directly under the repositories' root.</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/repositories'/repository's/ here and I think in most other places too, because we're only really talking about a single repository here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point :) fixed

*
* <p>The blob store is written to and read from by both the master- as well as the data-nodes. All metadata related to a snapshots'
* scope and health (i.e. the indices a snapshot contains, a snapshot of the cluster state, index metadata and the status of the snapshot)
* are written by the master node.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the above sentence could be clearer if it were a few shorter sentences instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just dropped the part in brackets here now, it seemed really redundant? Given that the text below eventually just links to the classes that get serialized here anyway?

@original-brownbear
Copy link
Member Author

thanks for taking a look @DaveCTurner, all points addressed now I think :)

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrershov should have a look here as well before merging

Copy link
Contributor

@andrershov andrershov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@original-brownbear thanks for your PR, this documentation is really useful! I've asked for some clarifications/elaborations.

* {@link org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardSnapshot} with name {@code snap-${snapshot-uuid}.dat} is written to
* the shard's path and contains a list of all the files referenced by the snapshot as well as some metadata about the snapshot. See the
* documentation of {@code BlobStoreIndexShardSnapshot} for details on its contents.</li>
* <li>Once all the segments and the {@code BlobStoreIndexShardSnapshot} blob have been written, an updated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we elaborate here why we need both BlobStoreIndexShardSnapshot and BlobStoreIndexShardSnapshots? AFAIK, it's needed exclusively for delete snapshot operations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently use BlobStoreIndexShardSnapshots to determine the segments to upload when creating incremental snapshots of a shard as well as what to delete.
Technically speaking BlobStoreIndexShardSnapshots is redundant as you could rebuild it from all the BlobStoreIndexShardSnapshot blobs. In practice this might get tricky since it implies downloading potentially hundreds of them (you have exactly as many as you have snapshots referencing the shard).
I think I'd really just leave the details of what is in BlobStoreIndexShardSnapshot precisely to the link to that class (it contains some additional metadata about the time it took to snapshot the shard that we expose via an API). It seems to me it's easier to read this by not mentioning the duplication of data here, that just has the reader wondering why (and it's a fair question :D) when understanding that the BlobStoreIndexShardSnapshots holds the list of segments is all that really matters for the big picture here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@original-brownbear
Copy link
Member Author

@andrershov all addressed I think :)

Copy link
Contributor

@andrershov andrershov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Member Author

Jenkins run
elasticsearch-ci/1

@original-brownbear
Copy link
Member Author

thanks @andrershov @ywelsch @DaveCTurner !

@original-brownbear original-brownbear merged commit db0fbf0 into elastic:master May 22, 2019
@original-brownbear original-brownbear deleted the document-blob-store branch May 22, 2019 05:36
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
* Add Package Level Documentation to o.e.r.blobstore

* Added verbose documentation for the `o.e.r.blobstore` package similar to
that added for the snapshot package in
elastic#38108
* Moved the documentation on the BlobStoreRepository to the package
level to have things in a single place for easier readability.
@original-brownbear original-brownbear restored the document-blob-store branch August 6, 2020 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >docs General docs changes v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants