Skip to content

Introduce repository test kit/analyser #67247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
5ddd2db
Introduce repository test kit with speed test
DaveCTurner Jan 11, 2021
6dc193c
Include blob-level request in more failure messages
DaveCTurner Jan 12, 2021
5f48376
Everyone loves a sequence diagram
DaveCTurner Jan 12, 2021
6a9aea7
Merge branch 'master' into 2021-01-11-repository-speed-test
DaveCTurner Jan 15, 2021
d96d5ad
Apparently precommit doesn't like a sequence diagram
DaveCTurner Jan 15, 2021
b744d9e
Add detailed parameter
DaveCTurner Jan 15, 2021
0e904d8
Collect stats
DaveCTurner Jan 15, 2021
e32a377
Rename to summary
DaveCTurner Jan 15, 2021
c237aa5
Fix RandomBlobContentBytesReference
DaveCTurner Jan 15, 2021
36a6e52
Fixes
DaveCTurner Jan 15, 2021
280f982
Mark speed test actions as operator-only
DaveCTurner Jan 15, 2021
64c220d
Report blob-level request on all failures
DaveCTurner Jan 15, 2021
29f5d50
Reroute to arbitrary snapshot node, not the master
DaveCTurner Jan 15, 2021
75705ec
Add TODOs so the github comments don't get lost
DaveCTurner Jan 15, 2021
7180045
Expose magic read-node-count parameters
DaveCTurner Jan 16, 2021
0b78fe4
Remove another magic number
DaveCTurner Jan 16, 2021
40e7e74
Document defaults
DaveCTurner Jan 16, 2021
8db0096
Record max read waiting time
DaveCTurner Jan 16, 2021
24d7392
Document response format
DaveCTurner Jan 16, 2021
2f3094b
Merge branch 'master' into 2021-01-11-repository-speed-test
DaveCTurner Feb 5, 2021
cbe503b
License headers
DaveCTurner Feb 5, 2021
fbb0ed8
Rename speed test -> analysis
DaveCTurner Feb 5, 2021
32169f4
Implementation details etc
DaveCTurner Feb 5, 2021
452968a
Add TODO for writeBlobRandomly
DaveCTurner Feb 5, 2021
6076d34
Add TODO for delete verification
DaveCTurner Feb 5, 2021
a50c0d1
Add TODO to remove cleanup retries
DaveCTurner Feb 5, 2021
4b6d53b
Add plumbing for max_total_data_size
DaveCTurner Feb 5, 2021
2263d9f
Skip verification, it doesn't work
DaveCTurner Feb 5, 2021
6662e1c
prefer US spelling
DaveCTurner Feb 5, 2021
7b3dabe
Add bugs_detected response field
DaveCTurner Feb 5, 2021
9ecfe46
Respects max total size
DaveCTurner Feb 5, 2021
c4c3cfa
Fix action name
DaveCTurner Feb 5, 2021
d433948
Better tests, corresponding with docs
DaveCTurner Feb 5, 2021
bff99a0
Merge branch 'master' into 2021-01-11-repository-speed-test
DaveCTurner Feb 8, 2021
59ff6ec
No need for retries when listing
DaveCTurner Feb 8, 2021
b200616
Merge branch 'master' into 2021-01-11-repository-speed-test
DaveCTurner Feb 15, 2021
28b488b
Add integ tests for failure and success cases
DaveCTurner Feb 15, 2021
b383325
Unrelated change
DaveCTurner Feb 15, 2021
bda3abc
Permit listing a nonexistent FsBlobContainer
DaveCTurner Feb 15, 2021
622fe24
No-action TODOs
DaveCTurner Feb 15, 2021
a5ee4ca
Merge branch 'master' into 2021-01-11-repository-speed-test
DaveCTurner Feb 15, 2021
032a3a6
Merge branch 'master' into 2021-01-11-repository-speed-test-WIP
DaveCTurner Feb 16, 2021
4cae510
Docs improvements
DaveCTurner Feb 16, 2021
12ccaba
ActionRunnable just to be sure that exceptions are caught
DaveCTurner Feb 16, 2021
6308ccc
Human-readability
DaveCTurner Feb 16, 2021
f10e51c
Javadoc
DaveCTurner Feb 16, 2021
ffee1a7
Close input stream used for reading
DaveCTurner Feb 16, 2021
b3c6bee
TODO also non-powers-of-two
DaveCTurner Feb 16, 2021
d0887fa
Clearer comment
DaveCTurner Feb 16, 2021
505d8f0
Merge branch 'master' into 2021-01-11-repository-speed-test
DaveCTurner Feb 16, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions docs/reference/snapshot-restore/apis/repo-speed-test-api.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
[role="xpack"]
[[repo-speed-test-api]]
=== Repository speed test API
++++
<titleabbrev>Repository speed test</titleabbrev>
++++

Measures the performance characteristics of a snapshot repository.

////
[source,console]
----
PUT /_snapshot/my_repository
{
"type": "fs",
"settings": {
"location": "my_backup_location"
}
}
----
// TESTSETUP
////

[source,console]
----
POST /_snapshot/my_repository/_speed_test?blob_count=10&concurrency=4&max_blob_size=1mb&timeout=120s
----

[[repo-speed-test-api-request]]
==== {api-request-title}

`POST /_snapshot/<repository>/_speed_test`

[[repo-speed-test-api-desc]]
==== {api-description-title}

There are a large number of third-party storage systems available, not all of
which are suitable for use as a snapshot repository by {es}. Some storage
systems perform poorly, or behave incorrectly, especially when accessed
concurrently by multiple clients as the nodes of an {es} cluster do.

The Repository speed test API performs a collection of read and write
operations on your repository which are specially designed to detect incorrect
behaviour and to measure the performance characteristics of your storage
system.

Each speed test runs a wide variety of operations generated by a pseudo-random
process. You can seed this process using the optional `seed` parameter in order
to repeat the same set of operations in multiple experiments. Note that the
operations are performed concurrently so may not always happen in the same
order on each run.

The default values for the parameters to this API are deliberately low to
reduce the impact of running this API accidentally. A realistic experiment
should set `blob_count` to at least `2000` and `max_blob_size` to at least
`2gb`, and will almost certainly need to increase the `timeout` to allow time
for the process to complete successfully.

If the speed test is successful this API returns details of the testing
process, including how long each operation took. You can use this information
to analyse the performance of your storage system. If any operation fails or
returns an incorrect result, this API returns an error. If the API returns an
error then it may not have removed all the data it wrote to the repository. The
error will indicate the location of any leftover data, and this path is also
recorded in the {es} logs. You should verify yourself that this location has
been cleaned up correctly. If there is still leftover data at the specified
location then you should manually remove it.

If the connection from your client to {es} is closed while the client is
waiting for the result of the speed test then the test is cancelled. Since a
speed test takes a long time to complete, you may need to configure your client
to wait for longer than usual for a response. On cancellation the speed test
attempts to clean up the data it was writing, but it may not be able to remove
it all. The path to the leftover data is recorded in the {es} logs. You should
verify yourself that this location has been cleaned up correctly. If there is
still leftover data at the specified location then you should manually remove
it.

NOTE: A speed test writes a substantial amount of data to your repository and
then reads it back again. This consumes bandwidth on the network between the
cluster and the repository, and storage space and IO bandwidth on the
repository itself. You must ensure this load does not affect other users of
these systems. Speed tests respect the repository settings
`max_snapshot_bytes_per_sec` and `max_restore_bytes_per_sec` if available, and
the cluster setting `indices.recovery.max_bytes_per_sec` which you can use to
limit the bandwidth they consume.

[[repo-speed-test-api-path-params]]
==== {api-path-parms-title}

`<repository>`::
(Required, string)
Name of the snapshot repository to test.

[[repo-speed-test-api-query-params]]
==== {api-query-parms-title}

`blob_count`::
(Optional, integer) The total number of blobs to write to the repository during
the test. Defaults to `100`. For realistic experiments you should set this to
at least `2000`.

`concurrency`::
(Optional, integer) The number of write operations to perform concurrently.
Defaults to `10`.

`seed`::
(Optional, integer) The seed for the pseudo-random number generator used to
generate the list of operations performed during the test. To repeat the same
set of operations in multiple experiments, use the same seed in each
experiment.

`max_blob_size`::
(Optional, <<size-units, size units>>) The maximum size of a blob to be written
during the test. Defaults to `10mb`. For realistic experiments you should set
this to at least `2gb`.

`timeout`::
(Optional, <<time-units, time units>>) Specifies the period of time to wait for
the test to complete. If no response is received before the timeout expires,
the test is cancelled and returns an error. Defaults to `30s`.

[role="child_attributes"]
[[repo-speed-test-api-response-body]]
==== {api-response-body-title}

TODO
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today we report the details of every read and write performed during the test. It'd probably be useful to add some higher-level summary statistics too, maybe only returning the low-level ones if ?detailed is passed. Suggestions for the higher-level stats are up for discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added simple accumulators in 0e904d8.

Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ For more information, see <<snapshot-restore>>.

include::put-repo-api.asciidoc[]
include::verify-repo-api.asciidoc[]
include::repo-speed-test-api.asciidoc[]
include::get-repo-api.asciidoc[]
include::delete-repo-api.asciidoc[]
include::clean-up-repo-api.asciidoc[]
Expand Down
3 changes: 3 additions & 0 deletions docs/reference/snapshot-restore/register-repository.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,9 @@ POST /_snapshot/my_unverified_backup/_verify

It returns a list of nodes where repository was successfully verified or an error message if verification process failed.

If desired, you can also test a repository more thoroughly using the
<<repo-speed-test-api>>.

[discrete]
[[snapshots-repository-cleanup]]
=== Repository cleanup
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ protected void doRun() {
});
}

static boolean isDedicatedVotingOnlyNode(Set<DiscoveryNodeRole> roles) {
public static boolean isDedicatedVotingOnlyNode(Set<DiscoveryNodeRole> roles) {
return roles.contains(DiscoveryNodeRole.MASTER_ROLE) && roles.contains(DiscoveryNodeRole.DATA_ROLE) == false &&
roles.stream().anyMatch(role -> role.roleName().equals("voting_only"));
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,5 +46,11 @@ public RestStatus status() {
public RepositoryVerificationException(StreamInput in) throws IOException{
super(in);
}

@Override
public synchronized Throwable fillInStackTrace() {
// stack trace for a verification failure is uninteresting, the message has all the information we need
return this;
}
}

Original file line number Diff line number Diff line change
Expand Up @@ -2289,17 +2289,47 @@ private static ActionListener<Void> fileQueueListener(BlockingQueue<BlobStoreInd
});
}

private static InputStream maybeRateLimit(InputStream stream, Supplier<RateLimiter> rateLimiterSupplier, CounterMetric metric) {
return new RateLimitingInputStream(stream, rateLimiterSupplier, metric::inc);
private static InputStream maybeRateLimit(
InputStream stream,
Supplier<RateLimiter> rateLimiterSupplier,
RateLimitingInputStream.Listener throttleListener) {
return new RateLimitingInputStream(stream, rateLimiterSupplier, throttleListener);
}

/**
* Wrap the restore rate limiter (controlled by the repository setting `max_restore_bytes_per_sec` and the cluster setting
* `indices.recovery.max_bytes_per_sec`) around the given stream. Any throttling is reported to the given listener and not otherwise
* recorded in the value returned by {@link BlobStoreRepository#getRestoreThrottleTimeInNanos}.
*/
public InputStream maybeRateLimitRestores(InputStream stream) {
return maybeRateLimit(maybeRateLimit(stream, () -> restoreRateLimiter, restoreRateLimitingTimeInNanos),
recoverySettings::rateLimiter, restoreRateLimitingTimeInNanos);
return maybeRateLimitRestores(stream, restoreRateLimitingTimeInNanos::inc);
}

/**
* Wrap the restore rate limiter (controlled by the repository setting `max_restore_bytes_per_sec` and the cluster setting
* `indices.recovery.max_bytes_per_sec`) around the given stream. Any throttling is recorded in the value returned by {@link
* BlobStoreRepository#getRestoreThrottleTimeInNanos}.
*/
public InputStream maybeRateLimitRestores(InputStream stream, RateLimitingInputStream.Listener throttleListener) {
return maybeRateLimit(maybeRateLimit(stream, () -> restoreRateLimiter, throttleListener),
recoverySettings::rateLimiter, throttleListener);
}

/**
* Wrap the snapshot rate limiter (controlled by the repository setting `max_snapshot_bytes_per_sec`) around the given stream. Any
* throttling is recorded in the value returned by {@link BlobStoreRepository#getSnapshotThrottleTimeInNanos()}.
*/
public InputStream maybeRateLimitSnapshots(InputStream stream) {
return maybeRateLimit(stream, () -> snapshotRateLimiter, snapshotRateLimitingTimeInNanos);
return maybeRateLimitSnapshots(stream, snapshotRateLimitingTimeInNanos::inc);
}

/**
* Wrap the snapshot rate limiter (controlled by the repository setting `max_snapshot_bytes_per_sec`) around the given stream. Any
* throttling is reported to the given listener and not otherwise recorded in the value returned by {@link
* BlobStoreRepository#getSnapshotThrottleTimeInNanos()}.
*/
public InputStream maybeRateLimitSnapshots(InputStream stream, RateLimitingInputStream.Listener throttleListener) {
return maybeRateLimit(stream, () -> snapshotRateLimiter, throttleListener);
}

@Override
Expand Down Expand Up @@ -2539,6 +2569,10 @@ private static void failStoreIfCorrupted(Store store, Exception e) {
}
}

public boolean supportURLRepo() {
return supportURLRepo;
}

/**
* The result of removing a snapshot from a shard folder in the repository.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,12 @@ public class OperatorOnlyRegistry {
"cluster:admin/autoscaling/put_autoscaling_policy",
"cluster:admin/autoscaling/delete_autoscaling_policy",
"cluster:admin/autoscaling/get_autoscaling_policy",
"cluster:admin/autoscaling/get_autoscaling_capacity");
"cluster:admin/autoscaling/get_autoscaling_capacity",
// Repository speed test actions are not mentioned in core, literal strings are needed.
"cluster:admin/repository/speed_test",
"cluster:admin/repository/speed_test/blob",
"cluster:admin/repository/speed_test/blob/read"
);

/**
* Check whether the given action and request qualify as operator-only. The method returns
Expand Down
30 changes: 30 additions & 0 deletions x-pack/plugin/snapshot-repo-test-kit/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
apply plugin: 'elasticsearch.internal-cluster-test'
apply plugin: 'elasticsearch.esplugin'
esplugin {
name 'snapshot-repo-test-kit'
description 'A plugin for a test kit for snapshot repositories'
classname 'org.elasticsearch.repositories.blobstore.testkit.SnapshotRepositoryTestKit'
extendedPlugins = ['x-pack-core']
}
archivesBaseName = 'x-pack-snapshot-repo-test-kit'

dependencies {
compileOnly project(path: xpackModule('core'), configuration: 'default')
internalClusterTestImplementation project(path: xpackModule('core'), configuration: 'testArtifacts')
}

addQaCheckDependencies()

configurations {
testArtifacts.extendsFrom testRuntime
testArtifacts.extendsFrom testImplementation
}

def testJar = tasks.register("testJar", Jar) {
appendix 'test'
from sourceSets.test.output
}

artifacts {
testArtifacts testJar
}
6 changes: 6 additions & 0 deletions x-pack/plugin/snapshot-repo-test-kit/qa/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apply plugin: 'elasticsearch.build'
tasks.named("test").configure { enabled = false }

dependencies {
api project(':test:framework')
}
26 changes: 26 additions & 0 deletions x-pack/plugin/snapshot-repo-test-kit/qa/rest/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
apply plugin: 'elasticsearch.testclusters'
apply plugin: 'elasticsearch.standalone-rest-test'
apply plugin: 'elasticsearch.rest-test'
apply plugin: 'elasticsearch.rest-resources'

dependencies {
testImplementation project(path: xpackModule('snapshot-repo-test-kit'), configuration: 'testArtifacts')
}

final File repoDir = file("$buildDir/testclusters/repo")

tasks.named("integTest").configure {
systemProperty 'tests.path.repo', repoDir
}

testClusters.matching { it.name == "integTest" }.configureEach {
testDistribution = 'DEFAULT'
setting 'path.repo', repoDir.absolutePath
}

restResources {
restApi {
includeCore 'indices', 'search', 'bulk', 'snapshot', 'nodes', '_common'
includeXpack 'snapshot_repo_test_kit'
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License;
* you may not use this file except in compliance with the Elastic License.
*/

package org.elasticsearch.repositories.blobstore.testkit.rest;

import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.repositories.blobstore.testkit.AbstractSnapshotRepoTestKitRestTestCase;
import org.elasticsearch.repositories.fs.FsRepository;

public class FsSnapshotRepoTestKitIT extends AbstractSnapshotRepoTestKitRestTestCase {

@Override
protected String repositoryType() {
return FsRepository.TYPE;
}

@Override
protected Settings repositorySettings() {
return Settings.builder().put("location", System.getProperty("tests.path.repo")).build();
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License;
* you may not use this file except in compliance with the Elastic License.
*/

package org.elasticsearch.repositories.blobstore.testkit.rest;

import com.carrotsearch.randomizedtesting.annotations.ParametersFactory;
import org.elasticsearch.test.rest.yaml.ClientYamlTestCandidate;
import org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase;

public class SnapshotRepoTestKitClientYamlTestSuiteIT extends ESClientYamlSuiteTestCase {

public SnapshotRepoTestKitClientYamlTestSuiteIT(final ClientYamlTestCandidate testCandidate) {
super(testCandidate);
}

@ParametersFactory
public static Iterable<Object[]> parameters() throws Exception {
return ESClientYamlSuiteTestCase.createParameters();
}
}
Loading