Skip to content

S3 URI Parser #272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
millems opened this issue Nov 2, 2017 · 17 comments
Closed

S3 URI Parser #272

millems opened this issue Nov 2, 2017 · 17 comments
Labels
1.x Parity feature-request A feature should be added or improved. p1 This is a high priority issue

Comments

@millems
Copy link
Contributor

millems commented Nov 2, 2017

1.11.x provides http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3URI.html to parse the interesting components out of a S3 URI.

A similar piece of functionality should be made available in 2.0.

In the short term, customers can still use it from 1.11.x.

@ghost
Copy link

ghost commented Mar 13, 2019

Similarly, my project uses AmazonS3Client.getUrl(String bucketName, String key) to generate and save the URL of an object I've just created. Is there anything equivalent available in 2.0?

@millems
Copy link
Contributor Author

millems commented Mar 13, 2019

@perihelion1 #860 tracks that feature.

@justnance justnance added feature-request A feature should be added or improved. and removed Feature Request labels Apr 19, 2019
@millems millems changed the title Method for parsing S3 URI. S3 URI Parser Jul 8, 2019
@ribeirux
Copy link

ribeirux commented Oct 23, 2019

Any update on this?

Should we still use the AmazonS3URI from 1.11.x to get the bucket/key/version/region from a S3 URL?

Is there a way of downloading an object from S3 using the URl with v2 sdk for java?

@millems
Copy link
Contributor Author

millems commented Oct 23, 2019

@ribeirux There's currently only a function to create a URL, given the bucket/key/etc in 2.x.

Parsing the URI for bucket/key/etc. is surprisingly challenging, and our functionality in 1.11.x doesn't cover all scenarios. We'd like our 2.x implementation to work for all URLs, but things are in flight with S3's URI patterns right now (see: path style deprecation, among other things).

We'd like to see the dust settle on that chaos before we commit to being able to implement this functionality.

What's the reason you need parsing of the URI? That would help us when we're designing things out.

As for downloading an object using a URI in the SDK: not really. Is there a reason the JDK's URL connection (or something similar like Apache's HTTP client) isn't sufficient? We'd like to create a way to download objects using the SDK's retry policies (see: the downloading via presigned URLs design discussion), but that's lower in our backlog than removing other 2.x migration blockers.

@mrog
Copy link

mrog commented Oct 24, 2019

@millems I can't speak for @ribeirux , but I'll give you my use case. We have a service that saves files to S3 and includes their HTTPS URLs in the output. These URLs are meant to be consumed both by people (via web browsers on our internal network) and other services (in our data center).

Our security has been set up so that every service has its own AWS key. Services can't retrieve files from S3 without using the S3 client and authenticating with their keys. This provides better security than letting every service read from every bucket. It also means we have to extract the bucket names and keys from the URLs so we can provide them to the S3 client.

While we could write our own code to parse the URLs, having the AWS SDK do that step would be a much more maintainable solution.

@millems
Copy link
Contributor Author

millems commented Oct 24, 2019

That makes sense. I don't think S3 URLs were originally designed to be reversible, but I can see how it's easier to store one thing (the URL) than both the bucket and the key for later use.

@ribeirux
Copy link

ribeirux commented Oct 24, 2019

Thanks @millems for the quick reply. @mrog that's precisely my use case :)

The aws cli supports downloads using the S3 URI. It would be great to have the same logic across different sdks/tools.

@slobo
Copy link

slobo commented Dec 16, 2020

Parsing the URI for bucket/key/etc. is surprisingly challenging, and our functionality in 1.11.x doesn't cover all scenarios

Are we talking about s3://bucket/path/to/key style URIs? Very curious to know what the caveats are, as in some projects we used a simple pregex (^s3://([^/]+)/(.+)$), and I can't think of a situation when it would fail for a valid URI...

@carlspring
Copy link

Are there any plans to prioritize this and look into it? We could really use it for our work on the s3fs-nio project.

@Tvaroh
Copy link

Tvaroh commented Feb 15, 2021

Parsing the URI for bucket/key/etc. is surprisingly challenging, and our functionality in 1.11.x doesn't cover all scenarios

Are we talking about s3://bucket/path/to/key style URIs? Very curious to know what the caveats are, as in some projects we used a simple pregex (^s3://([^/]+)/(.+)$), and I can't think of a situation when it would fail for a valid URI...

I use it to parse signed S3 links which look quite differently.

@LikeLifeItself
Copy link

LikeLifeItself commented Mar 30, 2021

Spring cloud aws (version 2.x.x) uses s3://blah-blah URIs in SimpleStorageProtocolResolver. I see that spring cloud in progress with SDK V2 integration and hope that they will support URIs too, but don't see any ways to build URIs by SDK V2.

@rejevichb
Copy link

@millems any idea if this will end up being included in 2.x in the near future?

@millems
Copy link
Contributor Author

millems commented Jul 12, 2021

It likely will not be included in the near (2021) future.

There are issues with the functionality as it exists in 1.x. Some customers expect us to validate that the URLs are actually AWS S3-owned URLs, and unintentionally introduce security issues into their service based on that assumption. It also doesn't support a myriad of S3 features, like access points and outposts.

We'd love to fix these issues, but that's a considerable amount of effort. We also know that most people don't really care about those issues, but as an SDK team that needs to deliver a comprehensive product we can't really ignore them.

I'm a bit inclined to encourage the open source community to take on this project in a separate repository since they can ignore the features that they don't care about (access points, URL validation) and deliver something much more quickly. I know that's not a good answer, and we still want to get to this issue some day, but it's not our top priority. If you need something quickly, it might be worth forking off the 1.x implementation so that you can ignore the issues that we can't really ignore.

@rejevichb
Copy link

rejevichb commented Jul 12, 2021

@millems thanks for the context and transparency, the situation is understandable and I appreciate you getting back. For the time being we're using 1.x just for the AmazonS3URI functionality and 2.x for everything else.

To clarify, you're not suggesting that I fork v2 and open a PR addressing this issue into the v2 repo? I don't have too much to show for addressing the security related issue and the feature related issues you mentioned but haven't done much due diligence.

If we wanted to maintain our own fork (which of course we'd like to avoid) we'd want to fork off of v2 which I'd imagine would take much more work to keep up to date with the primary repo.

For context, we store URI's that reference S3 buckets and then create a AmazonsS3URI with our internal URI and use getBucket and getKey to create the GetObjectRequest. We use URIs to resolve things like parameters in parameter store and secrets in secrets manager via SSM.

@millems
Copy link
Contributor Author

millems commented Jul 12, 2021

Your clarification is correct. We can't just take the 1.x functionality into 2.x as-is, because people have the expectation that any solution we provide as part of the AWS SDK would be comprehensive across all S3 functionality, and would meet their security assumptions.

Sideloading 1.x for this functionality is a fine solution for now, but it might cause issues for people who do not want 1.x on their classpath. In that case, those people could copy the 1.x functionality into their application (it's fairly standalone and licensed for that use) or even create a separate third-party project outside of the aws github organization so that it doesn't have the "expectation" baggage of being a comprehensive solution for all possible S3 endpoints.

@github-actions
Copy link

github-actions bot commented Apr 6, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@davidh44
Copy link
Contributor

davidh44 commented Apr 6, 2023

Thanks for being patient with us, S3 URI parsing is now available in v2. We've added the parseUri() API to S3Utilities, which returns an S3Uri object. Although we had stated our intentions to add validation and AccessPoints/Outposts parsing, we ultimately decided to forgo them due to the complexity involved and lack of demand.

You'll need to convert a String to a URI to pass to the API. We did not include String preprocessing due to issues with edge cases. Specifically, keys/queries with unsafe/reserved characters must be encoded. Dots in bucket names in virtual-hosted-style URIs must not be encoded.

You can now retrieve all query parameters, not just the versionId.

The following snippet shows an example of the new APIs

S3Client s3Client = S3Client.create();
S3Utilities s3Utilities = s3Client.utilities();

String url = "https://s3.us-west-1.amazonaws.com/myBucket/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88";
URI uri = URI.create(url);
S3Uri s3Uri = s3Utilities.parseUri(uri);

Region region = s3Uri.region().orElse(null); // Region.US_WEST_1
String bucket = s3Uri.bucket().orElse(null); // "myBucket"
String key = s3Uri.key().orElse(null); // "resources/doc.txt"
boolean isPathStyle = s3Uri.isPathStyle(); // true

Map<String, List<String>> queryParams = s3Uri.rawQueryParameters(); // {versionId=["abc123"], partNumber=["77", "88"]}
String versionId = s3Uri.firstMatchingRawQueryParameter("versionId").orElse(null); // "abc123"
String partNumber = s3Uri.firstMatchingRawQueryParameter("partNumber").orElse(null); // "77"
List<String> partNumbers = s3Uri.firstMatchingRawQueryParameters("partNumber"); // ["77", "88"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.x Parity feature-request A feature should be added or improved. p1 This is a high priority issue
Projects
None yet
Development

No branches or pull requests