S3 URI Parser #272

millems · 2017-11-02T23:54:27Z

1.11.x provides http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3URI.html to parse the interesting components out of a S3 URI.

A similar piece of functionality should be made available in 2.0.

In the short term, customers can still use it from 1.11.x.

ghost · 2019-03-13T13:57:10Z

Similarly, my project uses AmazonS3Client.getUrl(String bucketName, String key) to generate and save the URL of an object I've just created. Is there anything equivalent available in 2.0?

millems · 2019-03-13T17:16:21Z

@perihelion1 #860 tracks that feature.

ribeirux · 2019-10-23T11:50:50Z

Any update on this?

Should we still use the AmazonS3URI from 1.11.x to get the bucket/key/version/region from a S3 URL?

Is there a way of downloading an object from S3 using the URl with v2 sdk for java?

millems · 2019-10-23T19:14:59Z

@ribeirux There's currently only a function to create a URL, given the bucket/key/etc in 2.x.

Parsing the URI for bucket/key/etc. is surprisingly challenging, and our functionality in 1.11.x doesn't cover all scenarios. We'd like our 2.x implementation to work for all URLs, but things are in flight with S3's URI patterns right now (see: path style deprecation, among other things).

We'd like to see the dust settle on that chaos before we commit to being able to implement this functionality.

What's the reason you need parsing of the URI? That would help us when we're designing things out.

As for downloading an object using a URI in the SDK: not really. Is there a reason the JDK's URL connection (or something similar like Apache's HTTP client) isn't sufficient? We'd like to create a way to download objects using the SDK's retry policies (see: the downloading via presigned URLs design discussion), but that's lower in our backlog than removing other 2.x migration blockers.

mrog · 2019-10-24T19:06:50Z

@millems I can't speak for @ribeirux , but I'll give you my use case. We have a service that saves files to S3 and includes their HTTPS URLs in the output. These URLs are meant to be consumed both by people (via web browsers on our internal network) and other services (in our data center).

Our security has been set up so that every service has its own AWS key. Services can't retrieve files from S3 without using the S3 client and authenticating with their keys. This provides better security than letting every service read from every bucket. It also means we have to extract the bucket names and keys from the URLs so we can provide them to the S3 client.

While we could write our own code to parse the URLs, having the AWS SDK do that step would be a much more maintainable solution.

millems · 2019-10-24T19:33:09Z

That makes sense. I don't think S3 URLs were originally designed to be reversible, but I can see how it's easier to store one thing (the URL) than both the bucket and the key for later use.

ribeirux · 2019-10-24T22:21:52Z

Thanks @millems for the quick reply. @mrog that's precisely my use case :)

The aws cli supports downloads using the S3 URI. It would be great to have the same logic across different sdks/tools.

slobo · 2020-12-16T17:40:36Z

Parsing the URI for bucket/key/etc. is surprisingly challenging, and our functionality in 1.11.x doesn't cover all scenarios

Are we talking about s3://bucket/path/to/key style URIs? Very curious to know what the caveats are, as in some projects we used a simple pregex (^s3://([^/]+)/(.+)$), and I can't think of a situation when it would fail for a valid URI...

carlspring · 2021-02-13T14:24:10Z

Are there any plans to prioritize this and look into it? We could really use it for our work on the s3fs-nio project.

Tvaroh · 2021-02-15T11:25:56Z

Parsing the URI for bucket/key/etc. is surprisingly challenging, and our functionality in 1.11.x doesn't cover all scenarios

Are we talking about s3://bucket/path/to/key style URIs? Very curious to know what the caveats are, as in some projects we used a simple pregex (^s3://([^/]+)/(.+)$), and I can't think of a situation when it would fail for a valid URI...

I use it to parse signed S3 links which look quite differently.

LikeLifeItself · 2021-03-30T07:39:39Z

Spring cloud aws (version 2.x.x) uses s3://blah-blah URIs in SimpleStorageProtocolResolver. I see that spring cloud in progress with SDK V2 integration and hope that they will support URIs too, but don't see any ways to build URIs by SDK V2.

rejevichb · 2021-07-07T12:44:27Z

@millems any idea if this will end up being included in 2.x in the near future?

millems · 2021-07-12T18:07:22Z

It likely will not be included in the near (2021) future.

There are issues with the functionality as it exists in 1.x. Some customers expect us to validate that the URLs are actually AWS S3-owned URLs, and unintentionally introduce security issues into their service based on that assumption. It also doesn't support a myriad of S3 features, like access points and outposts.

We'd love to fix these issues, but that's a considerable amount of effort. We also know that most people don't really care about those issues, but as an SDK team that needs to deliver a comprehensive product we can't really ignore them.

I'm a bit inclined to encourage the open source community to take on this project in a separate repository since they can ignore the features that they don't care about (access points, URL validation) and deliver something much more quickly. I know that's not a good answer, and we still want to get to this issue some day, but it's not our top priority. If you need something quickly, it might be worth forking off the 1.x implementation so that you can ignore the issues that we can't really ignore.

rejevichb · 2021-07-12T20:30:36Z

@millems thanks for the context and transparency, the situation is understandable and I appreciate you getting back. For the time being we're using 1.x just for the AmazonS3URI functionality and 2.x for everything else.

To clarify, you're not suggesting that I fork v2 and open a PR addressing this issue into the v2 repo? I don't have too much to show for addressing the security related issue and the feature related issues you mentioned but haven't done much due diligence.

If we wanted to maintain our own fork (which of course we'd like to avoid) we'd want to fork off of v2 which I'd imagine would take much more work to keep up to date with the primary repo.

For context, we store URI's that reference S3 buckets and then create a AmazonsS3URI with our internal URI and use getBucket and getKey to create the GetObjectRequest. We use URIs to resolve things like parameters in parameter store and secrets in secrets manager via SSM.

millems · 2021-07-12T20:39:58Z

Your clarification is correct. We can't just take the 1.x functionality into 2.x as-is, because people have the expectation that any solution we provide as part of the AWS SDK would be comprehensive across all S3 functionality, and would meet their security assumptions.

Sideloading 1.x for this functionality is a fine solution for now, but it might cause issues for people who do not want 1.x on their classpath. In that case, those people could copy the 1.x functionality into their application (it's fairly standalone and licensed for that use) or even create a separate third-party project outside of the aws github organization so that it doesn't have the "expectation" baggage of being a comprehensive solution for all possible S3 endpoints.

github-actions · 2023-04-06T21:01:47Z

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

davidh44 · 2023-04-06T21:35:04Z

Thanks for being patient with us, S3 URI parsing is now available in v2. We've added the parseUri() API to S3Utilities, which returns an S3Uri object. Although we had stated our intentions to add validation and AccessPoints/Outposts parsing, we ultimately decided to forgo them due to the complexity involved and lack of demand.

You'll need to convert a String to a URI to pass to the API. We did not include String preprocessing due to issues with edge cases. Specifically, keys/queries with unsafe/reserved characters must be encoded. Dots in bucket names in virtual-hosted-style URIs must not be encoded.

You can now retrieve all query parameters, not just the versionId.

The following snippet shows an example of the new APIs

S3Client s3Client = S3Client.create();
S3Utilities s3Utilities = s3Client.utilities();

String url = "https://s3.us-west-1.amazonaws.com/myBucket/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88";
URI uri = URI.create(url);
S3Uri s3Uri = s3Utilities.parseUri(uri);

Region region = s3Uri.region().orElse(null); // Region.US_WEST_1
String bucket = s3Uri.bucket().orElse(null); // "myBucket"
String key = s3Uri.key().orElse(null); // "resources/doc.txt"
boolean isPathStyle = s3Uri.isPathStyle(); // true

Map<String, List<String>> queryParams = s3Uri.rawQueryParameters(); // {versionId=["abc123"], partNumber=["77", "88"]}
String versionId = s3Uri.firstMatchingRawQueryParameter("versionId").orElse(null); // "abc123"
String partNumber = s3Uri.firstMatchingRawQueryParameter("partNumber").orElse(null); // "77"
List<String> partNumbers = s3Uri.firstMatchingRawQueryParameters("partNumber"); // ["77", "88"]

millems added 1.x Parity Feature Request labels Nov 2, 2017

brainstorm mentioned this issue Mar 1, 2019

Simple AmazonS3URI methods missing on V2 #1107

Closed

justnance added feature-request A feature should be added or improved. and removed Feature Request labels Apr 19, 2019

millems changed the title ~~Method for parsing S3 URI.~~ S3 URI Parser Jul 8, 2019

danielcweeks mentioned this issue Oct 15, 2020

Add AWS subproject and initial S3FileIO implementation apache/iceberg#1573

Merged

carlspring mentioned this issue Feb 13, 2021

Implement an S3 URI carlspring/s3fs-nio#213

Open

2 tasks

cold-pumpkin mentioned this issue May 12, 2021

[#18] 유저 프로필 수정 기능 추가 f-lab-edu/share-my-hobby#19

Merged

3 tasks

jtjeferreira mentioned this issue Aug 31, 2022

use AWS SDK v2 tpunder/fm-sbt-s3-resolver#75

Closed

yasminetalby added the p1 This is a high priority issue label Nov 12, 2022

klopfdreh mentioned this issue Nov 15, 2022

feat: S3PathMatchingResourcePatternResolver 3.x.x awspring/spring-cloud-aws#558

Merged

11 tasks

davidh44 mentioned this issue Mar 30, 2023

S3 URI Parser #3874

Merged

12 tasks

davidh44 closed this as completed Apr 6, 2023

michaeldavis-toast mentioned this issue Mar 6, 2024

S3Utilities.parseUri() doesn't work with localhost-based URIs (specifically, LocalStack) #4996

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 URI Parser #272

S3 URI Parser #272

millems commented Nov 2, 2017

ghost commented Mar 13, 2019

millems commented Mar 13, 2019

ribeirux commented Oct 23, 2019 •

edited

Loading

millems commented Oct 23, 2019

mrog commented Oct 24, 2019

millems commented Oct 24, 2019

ribeirux commented Oct 24, 2019 •

edited

Loading

slobo commented Dec 16, 2020

carlspring commented Feb 13, 2021

Tvaroh commented Feb 15, 2021

LikeLifeItself commented Mar 30, 2021 •

edited

Loading

rejevichb commented Jul 7, 2021

millems commented Jul 12, 2021

rejevichb commented Jul 12, 2021 •

edited

Loading

millems commented Jul 12, 2021

github-actions bot commented Apr 6, 2023

davidh44 commented Apr 6, 2023

S3 URI Parser #272

S3 URI Parser #272

Comments

millems commented Nov 2, 2017

ghost commented Mar 13, 2019

millems commented Mar 13, 2019

ribeirux commented Oct 23, 2019 • edited Loading

millems commented Oct 23, 2019

mrog commented Oct 24, 2019

millems commented Oct 24, 2019

ribeirux commented Oct 24, 2019 • edited Loading

slobo commented Dec 16, 2020

carlspring commented Feb 13, 2021

Tvaroh commented Feb 15, 2021

LikeLifeItself commented Mar 30, 2021 • edited Loading

rejevichb commented Jul 7, 2021

millems commented Jul 12, 2021

rejevichb commented Jul 12, 2021 • edited Loading

millems commented Jul 12, 2021

github-actions bot commented Apr 6, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

davidh44 commented Apr 6, 2023

ribeirux commented Oct 23, 2019 •

edited

Loading

ribeirux commented Oct 24, 2019 •

edited

Loading

LikeLifeItself commented Mar 30, 2021 •

edited

Loading

rejevichb commented Jul 12, 2021 •

edited

Loading