Abort non-fully consumed S3 input stream #62370

tlrx · 2020-09-15T11:11:25Z

Today when an S3RetryingInputStream is closed the remaining bytes
that were not consumed are drained right before closing the underlying
stream. In some contexts it might be more efficient to not consume the
remaining bytes and just drop the connection.

This is for example the case with snapshot backed indices prewarming,
where there is not point in reading potentially large blobs if we know
the cache file we want to write the content of the blob as already been
evicted. Draining all bytes here takes a slot in the prewarming thread
pool for nothing.

Backport of #62167 for 7.10.0

Today when an S3RetryingInputStream is closed the remaining bytes that were not consumed are drained right before closing the underlying stream. In some contexts it might be more efficient to not consume the remaining bytes and just drop the connection. This is for example the case with snapshot backed indices prewarming, where there is not point in reading potentially large blobs if we know the cache file we want to write the content of the blob as already been evicted. Draining all bytes here takes a slot in the prewarming thread pool for nothing.

elasticmachine · 2020-09-15T11:11:27Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

tlrx · 2020-09-15T11:54:41Z

@elasticmachine update branch

…2441) Today when a snapshot restore is aborted (for example when the index is explicitly deleted) while the restoration of the files from the repository has already started the file restores are not interrupted. It means that Elasticsearch will continue to read the files from the repository and will continue to write them to disk until all files are restored; the store will then be closed and files will be deleted from disk at some point but this can take a while. This will also take some slots in the SNAPSHOT thread pool too. The Recovery API won't show any files actively being recovered, the only notable indicator would be the active threads in the SNAPSHOT thread pool. This commit adds a check before reading a file to restore and before writing bytes on disk so that a closing store can be detected more quickly and the file recovery process aborted. This way the file restores just stops and for most of the repository implementations it means that no more bytes are read (see #62370 for S3), finishing threads in the SNAPSHOT thread pool more quickly too.

…2441) (#62607) Today when a snapshot restore is aborted (for example when the index is explicitly deleted) while the restoration of the files from the repository has already started the file restores are not interrupted. It means that Elasticsearch will continue to read the files from the repository and will continue to write them to disk until all files are restored; the store will then be closed and files will be deleted from disk at some point but this can take a while. This will also take some slots in the SNAPSHOT thread pool too. The Recovery API won't show any files actively being recovered, the only notable indicator would be the active threads in the SNAPSHOT thread pool. This commit adds a check before reading a file to restore and before writing bytes on disk so that a closing store can be detected more quickly and the file recovery process aborted. This way the file restores just stops and for most of the repository implementations it means that no more bytes are read (see #62370 for S3), finishing threads in the SNAPSHOT thread pool more quickly too.

tlrx added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs backport v7.10.0 labels Sep 15, 2020

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Sep 15, 2020

Merge branch '7.x' into abort-s3-retrying-input-streams-7.x

e0440ae

tlrx merged commit faf96c1 into elastic:7.x Sep 15, 2020

tlrx deleted the abort-s3-retrying-input-streams-7.x branch September 15, 2020 12:33

tlrx mentioned this pull request Sep 16, 2020

Also abort ongoing file restores when snapshot restore is aborted #62441

Merged

tlrx mentioned this pull request Sep 18, 2020

Also abort ongoing file restores when snapshot restore is aborted #62607

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Abort non-fully consumed S3 input stream #62370

Abort non-fully consumed S3 input stream #62370

Uh oh!

tlrx commented Sep 15, 2020

Uh oh!

elasticmachine commented Sep 15, 2020

Uh oh!

tlrx commented Sep 15, 2020

Uh oh!

Uh oh!

Abort non-fully consumed S3 input stream #62370

Abort non-fully consumed S3 input stream #62370

Uh oh!

Conversation

tlrx commented Sep 15, 2020

Uh oh!

elasticmachine commented Sep 15, 2020

Uh oh!

tlrx commented Sep 15, 2020

Uh oh!

Uh oh!