Add Caching for RepositoryData in BlobStoreRepository #52341

original-brownbear · 2020-02-13T21:15:46Z

Cache latest RepositoryData on heap when it's absolutely safe to do so (i.e. when the repository is in strictly consistent mode).

RepositoryData can safely be assumed to not grow to a size that would cause trouble because we often have at least two copies of it loaded at the same time when doing repository operations. Also, concurrent snapshot API status requests currently load it independently of each other and so on, making it safe to cache on heap and assume as "small" IMO.

The benefits of this move are:

Much faster repository status API calls
- listing all snapshot names becomes instant
- Other operations are sped up massively too because they mostly operate in two steps: load repository data then load multiple other blobs to get the additional data
Additional cloud cost savings
Better resiliency, saving another spot where an IO issue could break the snapshot
We can simplify a number of spots in the current code that currently pass around the repository data in tricky ways to avoid loading it multiple times in follow ups.

I know we are thinking about caching other repository metadata in an internal index, but I think this blob is better served from heap since it's so performance critical (this is irrelevant now to some degree but more interesting for concurrent repository operations).

WIP

elasticmachine · 2020-02-13T21:15:48Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

…itory-data

original-brownbear · 2020-02-14T08:29:22Z

test/framework/src/main/java/org/elasticsearch/common/settings/MockSecureSettings.java

@@ -36,7 +36,7 @@
 */
 public class MockSecureSettings implements SecureSettings {

-    private Map<String, SecureString> secureStrings = new HashMap<>();
+    private Map<String, String> secureStrings = new HashMap<>();


Just fixing a bug here that prevented node restarts in tests for nodes that used secure settings. If we always return the same SecureString for getString then if a node ever closes that SecureString it couldn't get the setting again.

original-brownbear · 2020-02-14T08:33:10Z

server/src/test/java/org/elasticsearch/snapshots/CorruptedBlobStoreRepositoryIT.java

@@ -356,4 +367,9 @@ private void assertRepositoryBlocked(Client client, String repo, String existing
        assertThat(repositoryException4.getMessage(),
            containsString("Could not read repository data because the contents of the repository do not match its expected state."));
    }
+
+    private void fullRestart() throws Exception {


This is admittedly pretty dirty now test-wise. Obviously, it also illustrates that this change slightly weakens our ability to detect concurrent repository writes from multiple clusters.
IMO, this is a fair trade-off though and one we've already been making over and over in other spots (anytime we optimised away a loading of RepositoryData we weakened the ability to detect concurrent load obviously).

original-brownbear · 2020-02-14T09:37:34Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -1359,6 +1398,7 @@ public void onFailure(String source, Exception e) {

                    @Override
                    public void clusterStateProcessed(String source, ClusterState oldState, ClusterState newState) {
+                        cacheRepositoryData(filteredRepositoryData.withGenId(newGen));


This is a little dirty and we talked about it in other PRs .. the generation here is -1 due to the weird way the repository data is loaded initially and we have to eventually set it. I'll clean that up in a follow up, for now just setting it here where it matters (it doesn't matter in the above code when serializing filteredRepositoryData to the repo) seemed the driest.

Maybe add a TODO then?

Not sure what to even put in it though. In the end, this is simply the way RepositoryData works for now. We have the same kind of code for the cluster state version as well I guess. I don't have a good idea for a better abstract yet :(

…itory-data

original-brownbear · 2020-02-19T11:11:53Z

server/src/main/java/org/elasticsearch/repositories/IndexId.java

@@ -42,16 +42,14 @@
    private final int hashCode;

    public IndexId(final String name, final String id) {
-        this.name = name;
-        this.id = id;
+        this.name = name.intern();


Moving to interning here and for the snapshot id actually makes the on heap RepositoryData about the same size as the serialized one (unfortunately for the time being we serialize it as json uncompressed) => I think the safety check for 500kb I added is valid.

I would prefer to avoid JVM String interning. If we can deduplicate the Strings ourselves, that's fine, but adding more and more stuff to be internalized feels dangerous (another source of OOM).

original-brownbear · 2020-02-19T11:20:16Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+            try {
+                final int len =
+                    BytesReference.bytes(updated.snapshotsToXContent(XContentFactory.jsonBuilder(), true)).length();
+                if (len > ByteSizeUnit.KB.toBytes(500)) {


For context:

on heap 1k shards in a snapshot means about 25kb of shard generations which themselves make up ~50% of the size of RepositoryData on heap with the interning changes I added.
The number of snapshots doesn't matter much by comparison. So in the Cloud case of 100 snapshots, this would allow for caching snapshots of up to ~ 15k shards which should work fine for most users I'm assuming.

original-brownbear · 2020-02-19T11:25:21Z

@ywelsch @tlrx optimized the on heap size of RepositoryData a little and added a size limit for caching as discussed yesterday :) Should be good for another review

ywelsch · 2020-02-19T13:06:43Z

server/src/main/java/org/elasticsearch/repositories/IndexId.java

@@ -42,16 +42,14 @@
    private final int hashCode;

    public IndexId(final String name, final String id) {
-        this.name = name;
-        this.id = id;
+        this.name = name.intern();


I would prefer to avoid JVM String interning. If we can deduplicate the Strings ourselves, that's fine, but adding more and more stuff to be internalized feels dangerous (another source of OOM).

ywelsch · 2020-02-19T13:10:04Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -513,10 +514,13 @@ public void deleteSnapshot(SnapshotId snapshotId, long repositoryStateId, Versio
    private RepositoryData safeRepositoryData(long repositoryStateId, Map<String, BlobMetaData> rootBlobs) {
        final long generation = latestGeneration(rootBlobs.keySet());
        final long genToLoad;
+        final RepositoryData cached;


maybe we can just cache the serialized (and compressed) bytes instead of the object. Decompressing should still be fast.

Yea I experimented a little with this and it's really sweet so I moved to that now. The compression ration is massive (more than a factor of 10x for 100 snapshots with 1k shards each) and I can hardly imagine a repo that could break 500kb for the cached size (100 snapshots of 1k shards is only ~20kb and adding additional snapshots is cheap as well since each new snapshot effectively just adds that snapshots uuid + name if it doesn't contain new shards).

ywelsch · 2020-02-19T13:14:35Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                    return;
+                }
+            } catch (IOException e) {
+                throw new AssertionError("Impossible, no IO happens here", e);


If the node can't serialize the RepositoryData, it would die here. Perhaps just catch and log (while having an assert as well for our tests)

Sure thing, though I figured this is actually impossible because the same code must have serialized that repository data to even make it visible here (since we only cache what we previously wrote to the repo).

…itory-data

tlrx · 2020-02-19T15:50:42Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+        if (bestEffortConsistency == false) {
+            try {
+                final int len =
+                    BytesReference.bytes(updated.snapshotsToXContent(XContentFactory.jsonBuilder(), true)).length();


Maybe make RepositoryData implements Accountable? Should we craft an estimating function instead of serializing the whole RepositoryData back again? If we were serializing back to XContent we could pass a FilterOutputStream to the XContentBuilder so that it just count bytes and not build everything in heap.

Or if we follow Yannick's suggestion on caching the serialized and compressed bytes then maybe we should keep track of the length of the original blob.

Jup, went with Yannicks approach now :)

tlrx · 2020-02-19T15:51:44Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -1359,6 +1398,7 @@ public void onFailure(String source, Exception e) {

                    @Override
                    public void clusterStateProcessed(String source, ClusterState oldState, ClusterState newState) {
+                        cacheRepositoryData(filteredRepositoryData.withGenId(newGen));


Maybe add a TODO then?

…itory-data

ywelsch

A few more small comments. Looking very good already though.

ywelsch · 2020-02-20T09:45:28Z

...epository-s3/src/test/java/org/elasticsearch/repositories/s3/S3BlobStoreRepositoryTests.java

@@ -132,6 +132,8 @@ public void testEnforcedCooldownPeriod() throws IOException {
            }
        })));

+        // Master failover to clear RepositoryData cache


I think I would prefer an explicit undocumented setting to disable the cache. This might also turn out to be useful if we see any issues with this new functionality

Yea that's much nicer indeed :) added that setting now and used it in tests.

ywelsch · 2020-02-20T09:47:04Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -529,6 +536,9 @@ private RepositoryData safeRepositoryData(long repositoryStateId, Map<String, Bl
            throw new RepositoryException(metadata.name(), "concurrent modification of the index-N file, expected current generation [" +
                repositoryStateId + "], actual current generation [" + genToLoad + "]");
        }
+        if (cached != null && cached.v1() == genToLoad && cached.v2() != null) {


Should cached.v2() always be non-null if cached != null?

Jup now it is.

ywelsch · 2020-02-20T09:47:29Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -1057,6 +1067,10 @@ public void endVerification(String seed) {
    // and concurrent modifications.
    private final AtomicLong latestKnownRepoGen = new AtomicLong(RepositoryData.UNKNOWN_REPO_GEN);

+    // Best effort cache of the latest known repository data and its generation, cached serialized as compressed json
+    private final AtomicReference<Tuple<Long, BytesReference>> latestKnownRepositoryData =
+        new AtomicReference<>(new Tuple<>(RepositoryData.EMPTY_REPO_GEN, null));


perhaps just initialize to null?

ywelsch · 2020-02-20T09:51:17Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                            " repository behavior going forward.", metadata.name());
+                    }
+                    // Set empty repository data to not waste heap for an outdated cached value
+                    latestKnownRepositoryData.set(new Tuple<>(RepositoryData.EMPTY_REPO_GEN, null));


just set to null?

ywelsch · 2020-02-20T09:54:25Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+    }
+
+    private RepositoryData repositoryDataFromCachedEntry(Tuple<Long, BytesReference> cacheEntry) throws IOException {
+        if (cacheEntry.v1() == RepositoryData.EMPTY_REPO_GEN) {


I think this optimization is unnecessary, and complicating the checks on null (see my suggestions above)

…itory-data

original-brownbear · 2020-02-20T10:55:53Z

Thanks Yannick, all addressed in 533b4f8 now I hope :)

ywelsch

LGTM

original-brownbear · 2020-02-20T11:57:56Z

Thanks Yannick + Tanguy!

Cache latest `RepositoryData` on heap when it's absolutely safe to do so (i.e. when the repository is in strictly consistent mode). `RepositoryData` can safely be assumed to not grow to a size that would cause trouble because we often have at least two copies of it loaded at the same time when doing repository operations. Also, concurrent snapshot API status requests currently load it independently of each other and so on, making it safe to cache on heap and assume as "small" IMO. The benefits of this move are: * Much faster repository status API calls * listing all snapshot names becomes instant * Other operations are sped up massively too because they mostly operate in two steps: load repository data then load multiple other blobs to get the additional data * Additional cloud cost savings * Better resiliency, saving another spot where an IO issue could break the snapshot * We can simplify a number of spots in the current code that currently pass around the repository data in tricky ways to avoid loading it multiple times in follow ups.

Add Caching for RepositoryData in BlobStoreRepository

f6833a8

WIP

original-brownbear added >non-issue WIP :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Feb 13, 2020

original-brownbear added 2 commits February 14, 2020 08:42

Merge remote-tracking branch 'elastic/master' into cache-latest-repos…

3f198b1

…itory-data

docs + safer

246d5ee

original-brownbear commented Feb 14, 2020

View reviewed changes

shorter

b5d1354

original-brownbear commented Feb 14, 2020

View reviewed changes

original-brownbear removed the WIP label Feb 14, 2020

original-brownbear marked this pull request as ready for review February 14, 2020 09:49

original-brownbear requested review from ywelsch and tlrx February 14, 2020 09:50

original-brownbear added v7.7.0 v8.0.0 labels Feb 17, 2020

original-brownbear added 3 commits February 17, 2020 15:52

Merge remote-tracking branch 'elastic/master' into cache-latest-repos…

7abac78

…itory-data

Merge remote-tracking branch 'elastic/master' into cache-latest-repos…

dff276e

…itory-data

Size Limit Cache and Intern Strings

7a2ecf6

original-brownbear commented Feb 19, 2020

View reviewed changes

intern all the way

039c0d0

ywelsch suggested changes Feb 19, 2020

View reviewed changes

Merge remote-tracking branch 'elastic/master' into cache-latest-repos…

9c060f9

…itory-data

tlrx reviewed Feb 19, 2020

View reviewed changes

original-brownbear added 2 commits February 19, 2020 17:55

Merge remote-tracking branch 'elastic/master' into cache-latest-repos…

44ba270

…itory-data

Cache compressed serialized

520716e

original-brownbear requested a review from ywelsch February 19, 2020 20:16

original-brownbear requested a review from tlrx February 19, 2020 20:16

ywelsch reviewed Feb 20, 2020

View reviewed changes

original-brownbear added 2 commits February 20, 2020 11:31

Merge remote-tracking branch 'elastic/master' into cache-latest-repos…

a02f129

…itory-data

CR: setting to disable cache and null out cache

533b4f8

original-brownbear requested a review from ywelsch February 20, 2020 10:55

ywelsch approved these changes Feb 20, 2020

View reviewed changes

original-brownbear merged commit f5ca487 into elastic:master Feb 20, 2020

original-brownbear deleted the cache-latest-repository-data branch February 20, 2020 11:58

original-brownbear mentioned this pull request Feb 20, 2020

Add Caching for RepositoryData in BlobStoreRepository (#52341) #52566

Merged

original-brownbear mentioned this pull request Jun 7, 2020

Fix Bug With RepositoryData Caching #57785

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

seut mentioned this pull request Aug 29, 2022

bp: Add Caching for RepositoryData in BlobStoreRepository crate/crate#12845

Merged

5 tasks

Add Caching for RepositoryData in BlobStoreRepository #52341

Add Caching for RepositoryData in BlobStoreRepository #52341

Uh oh!

Conversation

original-brownbear commented Feb 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Feb 13, 2020

Uh oh!

original-brownbear Feb 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Feb 19, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Feb 20, 2020

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Feb 20, 2020

Uh oh!

Uh oh!

original-brownbear commented Feb 13, 2020 •

edited

Loading

original-brownbear Feb 14, 2020 •

edited

Loading