Skip to content

Add support for range reads and retries to URL repositories #69521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Mar 8, 2021

Conversation

fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Feb 24, 2021

This PR adds support for range reads to URL based repositories and improves the resiliency for the http based URLs. Since the URL http client is quite limited I've decided to use the apache http client, since we already use it in other plugins (s3). There's quite a bit of boilerplate for license headers, etc. The scope of the change is quite limited.

As a follow up, I'll add integration tests for searchable snapshots using an URL repository.

@fcofdez fcofdez force-pushed the url-repo-range-reads branch from dfe16f1 to be9ac38 Compare February 24, 2021 10:24
@fcofdez fcofdez marked this pull request as ready for review February 24, 2021 11:57
@fcofdez fcofdez added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Feb 24, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@fcofdez fcofdez added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.13.0 >enhancement labels Feb 24, 2021
@fcofdez fcofdez requested a review from ywelsch February 24, 2021 11:59
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this @fcofdez. I've done a first pass and left some comments. I'm also wondering how the change to the HTTP client will affect https with custom certificates. Is the keystore used still the same?

}
}

@Override
public InputStream readBlob(String blobName, long position, long length) throws IOException {
throw new UnsupportedOperationException();
final InputStream inputStream = getInputStream(new URL(path, blobName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep throwing UnsupportedOperationException here, as we should not support something with horrible performance characteristics.

In a separate PR, I think we should also make sure that users are using "fs" repository types when accessing shared filesystems instead of URL repositories (by changing docs).

In yet a separate PR, we can remove support for "file" in URL repositories in ES 8.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@fcofdez fcofdez requested a review from ywelsch February 25, 2021 10:27
@fcofdez
Copy link
Contributor Author

fcofdez commented Feb 25, 2021

I'm also wondering how the change to the HTTP client will affect https with custom certificates. Is the keystore used still the same?

By default it uses:

    public static SSLContext createDefault() throws SSLInitializationException {
        try {
            final SSLContext sslContext = SSLContext.getInstance(SSLContextBuilder.TLS);
            sslContext.init(null, null, null);
            return sslContext;
        } catch (final NoSuchAlgorithmException ex) {
            throw new SSLInitializationException(ex.getMessage(), ex);
        } catch (final KeyManagementException ex) {
            throw new SSLInitializationException(ex.getMessage(), ex);
        }
    }

To be honest I'm not sure if this behaves the same way as it does today.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left more comments. One point I want to emphasize is that I think the exception handling in RetryingInputStream needs to be reworked.

To be honest I'm not sure if this behaves the same way as it does today.

can you follow-up with the es-security team on that?

throw new NoSuchFileException("blob object [" + blobName + "] not found");
}

if (response.getStatusCode() > 299) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be more restrictive here, and only allow 200 (OK) and 206 (Partial Content).

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. AFAICS it will use the default keystore so no change in behavior

@fcofdez
Copy link
Contributor Author

fcofdez commented Mar 7, 2021

@elasticmachine update branch

@fcofdez
Copy link
Contributor Author

fcofdez commented Mar 8, 2021

@elasticmachine update branch

@fcofdez fcofdez merged commit ae5308c into elastic:master Mar 8, 2021
fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Mar 9, 2021
fcofdez added a commit that referenced this pull request Mar 9, 2021
fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Mar 26, 2021
fcofdez added a commit that referenced this pull request Mar 26, 2021
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Sep 13, 2021
Today we document that you can use URL repositories with searchable
snapshots, but in fact it only works for HTTP(S) repositories. This
commit adjusts the docs to clarify.

Relates elastic#69521
DaveCTurner added a commit that referenced this pull request Sep 13, 2021
Today we document that you can use URL repositories with searchable
snapshots, but in fact it only works for HTTP(S) repositories. This
commit adjusts the docs to clarify.

Relates #69521
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Sep 13, 2021
Today we document that you can use URL repositories with searchable
snapshots, but in fact it only works for HTTP(S) repositories. This
commit adjusts the docs to clarify.

Relates elastic#69521
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Sep 13, 2021
Today we document that you can use URL repositories with searchable
snapshots, but in fact it only works for HTTP(S) repositories. This
commit adjusts the docs to clarify.

Relates elastic#69521
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Sep 13, 2021
Today we document that you can use URL repositories with searchable
snapshots, but in fact it only works for HTTP(S) repositories. This
commit adjusts the docs to clarify.

Relates elastic#69521
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Sep 13, 2021
Today we document that you can use URL repositories with searchable
snapshots, but in fact it only works for HTTP(S) repositories. This
commit adjusts the docs to clarify.

Relates elastic#69521
elasticsearchmachine pushed a commit that referenced this pull request Sep 13, 2021
Today we document that you can use URL repositories with searchable
snapshots, but in fact it only works for HTTP(S) repositories. This
commit adjusts the docs to clarify.

Relates #69521
elasticsearchmachine pushed a commit that referenced this pull request Sep 13, 2021
Today we document that you can use URL repositories with searchable
snapshots, but in fact it only works for HTTP(S) repositories. This
commit adjusts the docs to clarify.

Relates #69521
elasticsearchmachine pushed a commit that referenced this pull request Sep 13, 2021
Today we document that you can use URL repositories with searchable
snapshots, but in fact it only works for HTTP(S) repositories. This
commit adjusts the docs to clarify.

Relates #69521
elasticsearchmachine pushed a commit that referenced this pull request Sep 13, 2021
Today we document that you can use URL repositories with searchable
snapshots, but in fact it only works for HTTP(S) repositories. This
commit adjusts the docs to clarify.

Relates #69521
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.13.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants