Introduce repository test kit/analyser #67247

DaveCTurner · 2021-01-11T12:16:30Z

Today we rely on blob stores behaving in a certain way so that they can be used
as a snapshot repository. There are an increasing number of third-party blob
stores that claim to be S3-compatible, but which may not offer a suitably
correct or performant implementation of the S3 API. We rely on somesubtle
semantics with concurrent readers and writers, but some blob stores may not
implement it correctly. Hitting a corner case in the implementation may be rare
in normal use, and may be hard to reproduce or to distinguish from an
Elasticsearch bug.

This commit introduces a new POST /_snapshot/.../_analyse API which exercises
the more problematic corners of the repository implementation looking for
correctness bugs and measures the details of the performance of the repository
under concurrent load.

Today we rely on blob stores behaving in a certain way so that they can be used correctly as a snapshot repository. There are an increasing number of third-party blob stores that claim to be S3-compatible, but which may not offer a suitably performant implementation of the S3 API. We rely on S3's subtle semantics with concurrent readers and writers, but some blob stores may not implement it correctly. Hitting a corner case in the implementation may be rare in normal use, and may be hard to reproduce or to distinguish from an Elasticsearch bug. This commit introduces a new `POST _repository/.../_speed_test` API which measures the details of the performance of the repository under concurrent load, and exercises the more problematic corners of the API implementation looking for correctness bugs.

DaveCTurner

Opening this draft for discussion on a few points before I proceed much further.

DaveCTurner · 2021-01-11T12:17:06Z

docs/reference/snapshot-restore/apis/repo-speed-test-api.asciidoc

+[[repo-speed-test-api-response-body]]
+==== {api-response-body-title}
+
+TODO


Today we report the details of every read and write performed during the test. It'd probably be useful to add some higher-level summary statistics too, maybe only returning the low-level ones if ?detailed is passed. Suggestions for the higher-level stats are up for discussion.

Added simple accumulators in 0e904d8.

x-pack/plugin/snapshot-repo-test-kit/qa/rest/build.gradle

docs/reference/snapshot-restore/apis/repo-speed-test-api.asciidoc

DaveCTurner · 2021-01-11T12:24:35Z

...ClusterTest/java/org/elasticsearch/repositories/blobstore/testkit/RepositorySpeedTestIT.java

+import static org.hamcrest.Matchers.lessThanOrEqualTo;
+import static org.hamcrest.Matchers.startsWith;
+
+public class RepositorySpeedTestIT extends AbstractSnapshotIntegTestCase {


These tests are incomplete and do not currently ensure that the speed test detects various kinds of repository bug.

DaveCTurner · 2021-01-11T12:27:48Z

...rc/main/java/org/elasticsearch/repositories/blobstore/testkit/RepositorySpeedTestAction.java

+                        deleteContainerAndSendResponse();
+                    } else {
+                        logger.debug(
+                            "expected blobs [{}] missing in [{}:{}], trying again; retry count = {}",


Do we need this any more? AFAIK S3 was the only blob store that didn't have consistent listings, but it does now doesn't it?

I think we should require consistent listings (as well as test for it), as all currently officially supported blob stores support it.

...-kit/src/main/java/org/elasticsearch/repositories/blobstore/testkit/BlobSpeedTestAction.java

Courtesy of http://weidagang.github.io/text-diagram/ with this input: object Writer Repo Readers note right of Writer: Write phase Writer -> Repo: Write blob with random content Writer -> Readers: Read range during write (rarely) Readers -> Repo: Read range Repo -> Readers: Contents of range, or "not found" Readers -> Writer: Acknowledge read, including checksum if found Repo -> Writer: Write complete space 5 note right of Writer: Read phase Writer -> Readers: Read range [a,b) Readers -> Repo: Read range Writer -> Repo: Overwrite blob (rarely) Repo -> Readers: Contents of range Repo -> Writer: Overwrite complete Readers -> Writer: Ack read (with checksum) space 5 note right of Writer: Verify phase Writer -> Writer: Confirm checksums

...n/security/src/main/java/org/elasticsearch/xpack/security/operator/OperatorOnlyRegistry.java

ywelsch

I've done a first pass on this.

Regarding naming, I'm not sure I would phrase this as a "speed_test", and would like to bring the following suggestions into the discussion:

Repository compatibility checker (_compatibility_check)
Repository compatibility tester (_compatibility_test)

I think we should provide more documentation on what this tool is checking (e.g. read after a fresh write works as expected), and also describe its limitations (e.g. it can't check durability, which is another important factor for a repository, as well as that data integrity over time (i.e. shouldn't become corrupted)) and say that it might be incomplete (can only show presence of issues, not absence of them).

I would like the tool to better exercise the various upload techniques (e.g. single vs multi-part upload or resumable upload). This currently relies too implicitly on setting the right blob sizes.

Maybe I missed this in the docs, but should we advise to run this on a multi-node cluster? What size of cluster?

How much temp storage will be taken up in the repository if a user runs this? Let's be explicit about that in the docs.

Do we test anywhere that the deletes actually worked? (i.e. not only that the API says success).

There are quite a number of parameters to tune here, and even as someone familiar with the system feel overloaded by the docs here. Should we explicitly distinguish between reasonable settings to tune, and expert settings?

ywelsch · 2021-01-11T13:07:42Z

...rc/main/java/org/elasticsearch/repositories/blobstore/testkit/RepositorySpeedTestAction.java

+                        );
+                    }
+                    final BlobContainer blobContainer = getBlobContainer();
+                    final Map<String, BlobMetadata> blobsMap = blobContainer.listBlobs();


should we check that sub-path semantics are properly implemented? (e.g. the trailing slash problem)

...n/security/src/main/java/org/elasticsearch/xpack/security/operator/OperatorOnlyRegistry.java

ywelsch · 2021-01-28T10:37:54Z

...ClusterTest/java/org/elasticsearch/repositories/blobstore/testkit/RepositorySpeedTestIT.java

+        return List.of(TestPlugin.class, LocalStateCompositeXPackPlugin.class, SnapshotRepositoryTestKit.class);
+    }
+
+    public void testFoo() {


//TODO: better name

ywelsch · 2021-01-28T10:47:36Z

...rc/main/java/org/elasticsearch/repositories/blobstore/testkit/RepositorySpeedTestAction.java

+                        deleteContainerAndSendResponse();
+                    } else {
+                        logger.debug(
+                            "expected blobs [{}] missing in [{}:{}], trying again; retry count = {}",


I think we should require consistent listings (as well as test for it), as all currently officially supported blob stores support it.

DaveCTurner · 2021-01-28T14:13:53Z

Thanks for the review Yannick. +1 to most of the points you raise, some select comments inline:

Regarding naming, I'm not sure I would phrase this as a "speed_test", and would like to bring the following suggestions into the discussion

I know what you mean; I have the same concern with "compatibility test" or "compatibility check" as I did with "extended verification" in that if we return 200 OK then users will interpret that to mean "my repository is definitely compatible", regardless of any statements in the docs.

That said, I also find the speed test notion awkward, and I don't feel very strongly either way. I'd rather not go back and forth with the code, let's settle on a name we're happy with elsewhere and then move forward.

I would like the tool to better exercise the various upload techniques (e.g. single vs multi-part upload or resumable upload). This currently relies too implicitly on setting the right blob sizes.

Agreed, I think this needs a richer interface, something like BlobContainer#writeRandom, since each repository will need to implement this in its own way. Do you think this is needed for v1, however?

How much temp storage will be taken up in the repository if a user runs this? Let's be explicit about that in the docs.

WDYT about that being an input parameter? On reflection it sounds more useful for users to set the total size rather than the max blob size, and let us divide up the total as we see fit.

…ution. (#69690) Relates #67247

…ution. Relates elastic#67247 Backport of elastic#69690

…ution. (#69717) Relates #67247 Backport of #69690

…ution. (#69718) Relates #67247 Backport of #69690

The repository analyzer does not write empty blobs to the repository, and we assert that the blobs are nonempty, but the tests randomly check for the empty case anyway. This commit ensures that the blobs used in tests are always nonempty. Relates elastic#67247

Relates #67247

Relates elastic#67247 Backport of elastic#69316

The repository analyzer does not write empty blobs to the repository, and we assert that the blobs are nonempty, but the tests randomly check for the empty case anyway. This commit ensures that the blobs used in tests are always nonempty. Relates #67247

Relates #67247 Backport of #69316

DaveCTurner added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.12.0 labels Jan 11, 2021

DaveCTurner commented Jan 11, 2021

View reviewed changes

DaveCTurner requested review from ywelsch and original-brownbear January 11, 2021 12:28

DaveCTurner commented Jan 12, 2021

View reviewed changes

...-kit/src/main/java/org/elasticsearch/repositories/blobstore/testkit/BlobSpeedTestAction.java Outdated Show resolved Hide resolved

DaveCTurner added 12 commits January 12, 2021 08:46

Include blob-level request in more failure messages

6dc193c

Merge branch 'master' into 2021-01-11-repository-speed-test

6a9aea7

Apparently precommit doesn't like a sequence diagram

d96d5ad

Add detailed parameter

b744d9e

Collect stats

0e904d8

Rename to summary

e32a377

Fix RandomBlobContentBytesReference

c237aa5

Fixes

36a6e52

Mark speed test actions as operator-only

280f982

Report blob-level request on all failures

64c220d

Reroute to arbitrary snapshot node, not the master

29f5d50

DaveCTurner commented Jan 15, 2021

View reviewed changes

...n/security/src/main/java/org/elasticsearch/xpack/security/operator/OperatorOnlyRegistry.java Outdated Show resolved Hide resolved

DaveCTurner added 6 commits January 15, 2021 19:40

Add TODOs so the github comments don't get lost

75705ec

Expose magic read-node-count parameters

7180045

Remove another magic number

0b78fe4

Document defaults

40e7e74

Record max read waiting time

8db0096

Document response format

24d7392

ywelsch reviewed Jan 28, 2021

View reviewed changes

fcofdez mentioned this pull request Mar 1, 2021

Take into account base_path setting during repository analysis execution #69690

Merged

fcofdez added a commit that referenced this pull request Mar 1, 2021

Take into account base_path setting during repository analysis exec…

28306b4

…ution. (#69690) Relates #67247

fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Mar 1, 2021

Take into account base_path setting during repository analysis exec…

5f82184

…ution. Relates elastic#67247 Backport of elastic#69690

fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Mar 1, 2021

Take into account base_path setting during repository analysis exec…

bbcd02a

…ution. Relates elastic#67247 Backport of elastic#69690

This was referenced Mar 1, 2021

[7.x] Take into account base_path setting during repository analysis execution #69717

Merged

[7.12] Take into account base_path setting during repository analysis execution #69718

Merged

fcofdez added a commit that referenced this pull request Mar 2, 2021

Take into account base_path setting during repository analysis exec…

9ee35f8

…ution. (#69717) Relates #67247 Backport of #69690

fcofdez added a commit that referenced this pull request Mar 2, 2021

Take into account base_path setting during repository analysis exec…

f1de81a

…ution. (#69718) Relates #67247 Backport of #69690

DaveCTurner mentioned this pull request Mar 2, 2021

Fix lower size bounds in RandomBlobContent*Tests #69771

Merged

fcofdez added a commit that referenced this pull request Mar 2, 2021

Add integration tests for repository analyser test kit (#69316)

a819079

Relates #67247

fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Mar 2, 2021

Add integration tests for repository analyser test kit

722dcf7

Relates elastic#67247 Backport of elastic#69316

fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Mar 2, 2021

Add integration tests for repository analyser test kit

b3a2969

Relates elastic#67247 Backport of elastic#69316

This was referenced Mar 2, 2021

[7.x] Add integration tests for repository analyser test kit #69780

Merged

[7.12] Add integration tests for repository analyser test kit #69781

Merged

fcofdez added a commit that referenced this pull request Mar 2, 2021

Add integration tests for repository analyser test kit (#69780)

1db660a

Relates #67247 Backport of #69316

fcofdez added a commit that referenced this pull request Mar 2, 2021

Add integration tests for repository analyser test kit (#69781)

ca315ff

Relates #67247 Backport of #69316

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce repository test kit/analyser #67247

Introduce repository test kit/analyser #67247

DaveCTurner commented Jan 11, 2021 •

edited

Loading

DaveCTurner left a comment

DaveCTurner Jan 11, 2021

DaveCTurner Jan 15, 2021

DaveCTurner Jan 11, 2021

DaveCTurner Jan 11, 2021

ywelsch Jan 28, 2021

ywelsch left a comment

ywelsch Jan 11, 2021

ywelsch Jan 28, 2021

ywelsch Jan 28, 2021

DaveCTurner commented Jan 28, 2021

Introduce repository test kit/analyser #67247

Introduce repository test kit/analyser #67247

Conversation

DaveCTurner commented Jan 11, 2021 • edited Loading

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaveCTurner commented Jan 28, 2021

DaveCTurner commented Jan 11, 2021 •

edited

Loading