Skip to content

#31608 Add S3 Setting to Force Path Type Access #34721

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 9, 2018

Conversation

original-brownbear
Copy link
Member

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@rjernst
Copy link
Member

rjernst commented Oct 23, 2018

Why does this need to be configurable? Can we just always use path style access, since all bucket names are supported through it?

@original-brownbear
Copy link
Member Author

@rjernst but I think for path style access to work you have to supply the region in AWS S3, so it won't work if you don't manually supply endpoint I think?

@original-brownbear
Copy link
Member Author

@rjernst that said ... if endpoint is supplied it seems that yes, path style access should always work ?

@rjernst
Copy link
Member

rjernst commented Oct 23, 2018

I believe you can hit any region specific endpoint to find the region a bucket exists in. This is essentially how using dns based buckets with s3.amazonaws.com works. I think we could (if endpoint is not specified) do an initial request with a fixed endpoint (eg us-east-1's endpoint), just to find where the bucket exists, then create the "real" client with the calculated endpoint?

My concern is creeping back to several client settings which must be carefully set to be in sync with each other, like we used to have with s3 repository. IMO we should strive to keep the settings minimal.

@original-brownbear
Copy link
Member Author

original-brownbear commented Oct 23, 2018

@rjernst

I think we could (if endpoint is not specified) do an initial request with a fixed endpoint (eg us-east-1's endpoint), just to find where the bucket exists, then create the "real" client with the calculated endpoint?

Yea we could actually :) I'm just worried there may be other risks in not allowing dns style access ever that we don't see (which is a weak argument admittedly but it seems like this move could trigger similar proxy/firewall issues to those that started the issue in the first place?).

My concern is creeping back to several client settings which must be carefully set to be in sync with each other, like we used to have with s3 repository. IMO we should strive to keep the settings minimal.

Generally I would agree here. In this case it's a little tricky. We'd be disabling the ability to use dns style access, which might have unforeseen consequences in similar situations to those that started this issue in the first place?
It seems to me that there is some value in not deviating from the default behaviour of the S3 client unless we really have to and living with the cost of simply offering this as an advanced setting to deal with firewall/proxy etc. issues? It's also not a costly setting to maintain since it maps directly to an S3 client setting imo.

@rjernst
Copy link
Member

rjernst commented Oct 23, 2018

All settings are costly, because we need to test their use. This is the general problem with s3 repository. Unless we test the different ways these settings can be used, we can't confidently say they work. Keeping the settings minimal means we minimize our s3 test matrix (which is still lacking with the current exposed settings, this PR only adds to the untested cases...).

@original-brownbear
Copy link
Member Author

@rjernst fair point with the tests (I could add another run to execute against 127.0.0.1 to bring the initial test run back). I guess it's a judgement call on how risking disabling DNS style access is.

@rjernst
Copy link
Member

rjernst commented Oct 23, 2018

I don't see the risk (other than changing what has been "working" thus far). I see it as standardizing on the newer style requests for s3, which supports any bucket name.

@original-brownbear
Copy link
Member Author

@rjernst

The risk would (imo) be that we are changing the endpoint used by our requests to S3 for everyone. I could see this causing trouble with some restrictive firewalls/proxies (though you could argue that they're in the same situation for newly created buckets which often respond with a 307 redirect ...).

... I guess aside from the below point you won me over (given the 307 redirect situation) with the idea of manually finding the bucket location and forcing path style access :)

Also, one additional risk/annoyance this introduces in tests is that instead of now having to run a loop over 127.0.0.1 and localhost to cover both cases, you're introducing a case where we add our own code to handle the default S3 use case and would have to write tests for the retrieving of the correct region.
That sounds a lot trickier to maintain than just having a setting to for the access pattern.

@rjernst
Copy link
Member

rjernst commented Oct 23, 2018

That sounds a lot trickier to maintain than just having a setting to for the access pattern.

I would argue testing the dns pattern is very difficult to test with our minio based integration tests, while path based access is relatively easy.

@original-brownbear
Copy link
Member Author

@rjernst but we would have to test the functionality that figures out the correct region? (other than that I agree)

@rjernst
Copy link
Member

rjernst commented Oct 23, 2018

Yes. But the get bucket location is a simple api that we can mimic, and we only need a basic integration test there (that we call the get bucket location api at all). The nuances of mapping region to endpoint can be done by unit tests (which is mostly straightforward for all new endpoints, only legacy endpoints are special iirc).

@original-brownbear
Copy link
Member Author

@rjernst sounds good => happy to implement this.
@ywelsch are you ok with this too? :)

@tlrx
Copy link
Member

tlrx commented Oct 23, 2018

@original-brownbear Can you summarize what is the proposed solution? Thanks

@original-brownbear
Copy link
Member Author

@tlrx sure, it's just 2 steps :)

  1. Always use the path based access pattern
  2. If no endpoint is set, figure out the correct AWS region (=> endpoint) by making the API that gets us a bucket's location and use that

Should work fine right?

@tlrx
Copy link
Member

tlrx commented Oct 23, 2018

The SDK already takes cares to find the right region to use when no endpoint is defined and path style access left to the default - I'm wondering if it works similarly when path style access is enabled. Do you know? That would be great, because it means that we'll get it for free and it would play nicely with how S3 clients are created with the lazy blobstore creation as well as when secure settings are reloaded.

@original-brownbear
Copy link
Member Author

@tlrx Last thing I remember was that you had to set the region with path style access enabled. But I will try this and report back, maybe it works out of the box and I'm wrong.

@original-brownbear
Copy link
Member Author

@tlrx huh turns out this does actually work (tested with master). If we force com.amazonaws.services.s3.AmazonS3Builder#withPathStyleAccessEnabled to true and don't set a region but just used our standard default endpoint new AwsClientBuilder.EndpointConfiguration(Constants.S3_HOSTNAME, null) it still resolved worked fine with a bucket in eu-west for me just now :)

So maybe we can just enforce path style access without any additional code?

@rjernst
Copy link
Member

rjernst commented Oct 23, 2018

Great! +1 to just use path style access.

@tlrx
Copy link
Member

tlrx commented Oct 23, 2018

I ran some tests and it confirms what I suspected, ie the AWS SDK has some built-in mechanism to resolve the region to use for a given bucket whatever the path style access configuration is.

The SDK hits the s3.amazonaws.com endpoint on the first request and receives a 400 or 301 response that contains the region to use for the given bucket. This region is then cached internally by the SDK:

With path style disabled and no endpoint:

[node-0] http-outgoing-0 >> "HEAD / HTTP/1.1[\r][\n]"
[node-0] http-outgoing-0 >> "Host: test.eu-west-1.elasticsearch.org.s3.amazonaws.com[\r][\n]"
[node-0] http-outgoing-0 << "HTTP/1.1 400 Bad Request[\r][\n]"
[node-0] http-outgoing-0 << "x-amz-bucket-region: eu-west-1[\r][\n]"
...
[node-0] http-outgoing-1 >> "HEAD / HTTP/1.1[\r][\n]
[node-0] http-outgoing-1 >> "Host: test.eu-west-1.elasticsearch.org.s3.eu-west-1.amazonaws.com[\r][\n]"

With path style disabled and endpoint (http://s3.eu-west-1.amazonaws.com):

[node-0] http-outgoing-0 >> "HEAD / HTTP/1.1[\r][\n]"
[node-0] http-outgoing-0 >> "Host: test.eu-west-1.elasticsearch.org.s3.eu-west-1.amazonaws.com[\r][\n]"
[node-0] http-outgoing-0 << "HTTP/1.1 200 OK[\r][\n]"
[node-0] http-outgoing-0 << "x-amz-bucket-region: eu-west-1[\r][\n]"
...
[node-0] http-outgoing-0 >> "GET /?prefix=test-tlrx%2Findex-&encoding-type=url HTTP/1.1[\r][\n]"
[node-0] http-outgoing-0 >> "Host: test.eu-west-1.elasticsearch.org.s3.eu-west-1.amazonaws.com[\r][\n]

With path style enabled and no endpoint:

[node-0] http-outgoing-0 >> "HEAD /test.eu-west-1.elasticsearch.org/ HTTP/1.1[\r][\n]"
[node-0] http-outgoing-0 >> "Host: s3.amazonaws.com[\r][\n]"
[node-0] http-outgoing-0 << "HTTP/1.1 301 Moved Permanently[\r][\n]"
[node-0] http-outgoing-0 << "x-amz-bucket-region: eu-west-1[\r][\n]"
...
[node-0] http-outgoing-1 >> "HEAD /test.eu-west-1.elasticsearch.org/ HTTP/1.1[\r][\n]"

With path style enabled and endpoint (http://s3.eu-west-1.amazonaws.com):

[node-0] http-outgoing-0 >> "HEAD /test.eu-west-1.elasticsearch.org/ HTTP/1.1[\r][\n]"
[node-0] http-outgoing-0 >> "Host: s3.eu-west-1.amazonaws.com[\r][\n]"
[node-0] http-outgoing-0 << "HTTP/1.1 200 OK[\r][\n]"
[node-0] http-outgoing-0 << "x-amz-bucket-region: eu-west-1[\r][\n]"
...
[node-0] http-outgoing-0 >> "GET /test.eu-west-1.elasticsearch.org/?prefix=test-tlrx%2Findex-&encoding-type=url HTTP/1.1[\r][\n]"

So I think there's no need to execute an initial GET bucket request as suggested and we can use the SDK default behavior.

@tlrx
Copy link
Member

tlrx commented Oct 23, 2018

I didn't see your update, but nice to see that we found the same results :)

Note that enabling path style access is a breaking change and should be documented.

* Use path style access pattern to fix elastic#31608
* closes elastic#31608
@original-brownbear
Copy link
Member Author

@tlrx @rjernst I just reset this PR to simply adding the true setting for path style access now. Where do we want to document the change?

@original-brownbear
Copy link
Member Author

@tlrx added a line that mentions the exclusive use of path style access now. Not sure where to put the breaking change documentation (or if it's even my job to put it anywhere :))?

@tlrx
Copy link
Member

tlrx commented Oct 25, 2018

@elasticmachine test this please

@colings86 colings86 added v6.6.0 and removed v6.5.0 labels Oct 25, 2018
@@ -304,6 +304,8 @@ You may further restrict the permissions by specifying a prefix within the bucke
The bucket needs to exist to register a repository for snapshots. If you did not create the bucket then the repository
registration will fail.

Note: All bucket operations are using the path style access pattern. The DNS style access pattern is never used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the AWS terminology is "virtual hosted style"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right changing :)

@@ -112,7 +112,8 @@ AmazonS3 buildClient(final S3ClientSettings clientSettings) {
//
// We do this because directly constructing the client is deprecated (was already deprecated in 1.1.223 too)
// so this change removes that usage of a deprecated API.
builder.withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(endpoint, null));
builder.withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(endpoint, null))
.withPathStyleAccessEnabled(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think the SDK provides a enablePathStyleAccess() method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that looks nicer :)

@tlrx
Copy link
Member

tlrx commented Oct 26, 2018

@tlrx added a line that mentions the exclusive use of path style access now. Not sure where to put the breaking change documentation (or if it's even my job to put it anywhere :))?

I'd put a mention in the migration doc. I also don't think this should go to 6.6.0 but only in 7.0

@original-brownbear
Copy link
Member Author

@tlrx thanks for taking a look! All points addressed I think :)

@original-brownbear
Copy link
Member Author

@tlrx ping :) can you take another look here?

@jijojv
Copy link

jijojv commented Nov 2, 2018

+1 This is exactly the issue we're facing with an internal s3 like storage and was going to PR the same approach. @tlrx Please merge.

@original-brownbear
Copy link
Member Author

@tlrx @rjernst given that people in the real world are affected by this, shouldn't we maybe backport this to 6.x (we could use a system property to enable the new behaviour for 6.x)?

@original-brownbear
Copy link
Member Author

@tlrx ping :)

@tlrx
Copy link
Member

tlrx commented Nov 7, 2018

@original-brownbear Sorry, I missed the notification. I tend to think that it could go in 7.0 only and I don't like to multiply system properties (I think we only have es.allow_insecure_settings so far) that will need specific tests too..

@original-brownbear
Copy link
Member Author

@tlrx np I'm on vacation anyway :D I'm fine with only fixing 7.0, this was more of a product question I think :)
That said, are you ok with the code/docs the way they are now?

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments, thanks!

@@ -304,6 +304,8 @@ You may further restrict the permissions by specifying a prefix within the bucke
The bucket needs to exist to register a repository for snapshots. If you did not create the bucket then the repository
registration will fail.

Note: All bucket operations are using the path style access pattern. The virtual hosted style access pattern is never used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Note: starting [7.0], all bucket...?

==== S3 Repository Plugin

* The plugin now uses the path style access pattern for all requests.
In previous versions it was automatically determining whether to use virtual hosted style or path style
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe In previous versions the decision to use virtual hosted style or [..] was automatically determined by the AWS Java SDK

@@ -112,7 +112,7 @@ AmazonS3 buildClient(final S3ClientSettings clientSettings) {
//
// We do this because directly constructing the client is deprecated (was already deprecated in 1.1.223 too)
// so this change removes that usage of a deprecated API.
builder.withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(endpoint, null));
builder.withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(endpoint, null)).enablePathStyleAccess();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it deserves its own line :)

@original-brownbear
Copy link
Member Author

Jenkins test this

@original-brownbear
Copy link
Member Author

@tlrx thanks for the review!

@original-brownbear original-brownbear merged commit 02b4e28 into elastic:master Nov 9, 2018
@original-brownbear original-brownbear deleted the 31608 branch November 9, 2018 04:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make possible to enforce path-style access method to buckets in S3 client settings
6 participants