-
Notifications
You must be signed in to change notification settings - Fork 906
S3 URI Parser #272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Similarly, my project uses |
@perihelion1 #860 tracks that feature. |
Any update on this? Should we still use the AmazonS3URI from 1.11.x to get the bucket/key/version/region from a S3 URL? Is there a way of downloading an object from S3 using the URl with v2 sdk for java? |
@ribeirux There's currently only a function to create a URL, given the bucket/key/etc in 2.x. Parsing the URI for bucket/key/etc. is surprisingly challenging, and our functionality in 1.11.x doesn't cover all scenarios. We'd like our 2.x implementation to work for all URLs, but things are in flight with S3's URI patterns right now (see: path style deprecation, among other things). We'd like to see the dust settle on that chaos before we commit to being able to implement this functionality. What's the reason you need parsing of the URI? That would help us when we're designing things out. As for downloading an object using a URI in the SDK: not really. Is there a reason the JDK's URL connection (or something similar like Apache's HTTP client) isn't sufficient? We'd like to create a way to download objects using the SDK's retry policies (see: the downloading via presigned URLs design discussion), but that's lower in our backlog than removing other 2.x migration blockers. |
@millems I can't speak for @ribeirux , but I'll give you my use case. We have a service that saves files to S3 and includes their HTTPS URLs in the output. These URLs are meant to be consumed both by people (via web browsers on our internal network) and other services (in our data center). Our security has been set up so that every service has its own AWS key. Services can't retrieve files from S3 without using the S3 client and authenticating with their keys. This provides better security than letting every service read from every bucket. It also means we have to extract the bucket names and keys from the URLs so we can provide them to the S3 client. While we could write our own code to parse the URLs, having the AWS SDK do that step would be a much more maintainable solution. |
That makes sense. I don't think S3 URLs were originally designed to be reversible, but I can see how it's easier to store one thing (the URL) than both the bucket and the key for later use. |
Thanks @millems for the quick reply. @mrog that's precisely my use case :) The aws cli supports downloads using the S3 URI. It would be great to have the same logic across different sdks/tools. |
Are we talking about |
Are there any plans to prioritize this and look into it? We could really use it for our work on the s3fs-nio project. |
I use it to parse signed S3 links which look quite differently. |
Spring cloud aws (version 2.x.x) uses |
@millems any idea if this will end up being included in 2.x in the near future? |
It likely will not be included in the near (2021) future. There are issues with the functionality as it exists in 1.x. Some customers expect us to validate that the URLs are actually AWS S3-owned URLs, and unintentionally introduce security issues into their service based on that assumption. It also doesn't support a myriad of S3 features, like access points and outposts. We'd love to fix these issues, but that's a considerable amount of effort. We also know that most people don't really care about those issues, but as an SDK team that needs to deliver a comprehensive product we can't really ignore them. I'm a bit inclined to encourage the open source community to take on this project in a separate repository since they can ignore the features that they don't care about (access points, URL validation) and deliver something much more quickly. I know that's not a good answer, and we still want to get to this issue some day, but it's not our top priority. If you need something quickly, it might be worth forking off the 1.x implementation so that you can ignore the issues that we can't really ignore. |
@millems thanks for the context and transparency, the situation is understandable and I appreciate you getting back. For the time being we're using 1.x just for the AmazonS3URI functionality and 2.x for everything else. To clarify, you're not suggesting that I fork v2 and open a PR addressing this issue into the v2 repo? I don't have too much to show for addressing the security related issue and the feature related issues you mentioned but haven't done much due diligence. If we wanted to maintain our own fork (which of course we'd like to avoid) we'd want to fork off of v2 which I'd imagine would take much more work to keep up to date with the primary repo. For context, we store URI's that reference S3 buckets and then create a AmazonsS3URI with our internal URI and use getBucket and getKey to create the GetObjectRequest. We use URIs to resolve things like parameters in parameter store and secrets in secrets manager via SSM. |
Your clarification is correct. We can't just take the 1.x functionality into 2.x as-is, because people have the expectation that any solution we provide as part of the AWS SDK would be comprehensive across all S3 functionality, and would meet their security assumptions. Sideloading 1.x for this functionality is a fine solution for now, but it might cause issues for people who do not want 1.x on their classpath. In that case, those people could copy the 1.x functionality into their application (it's fairly standalone and licensed for that use) or even create a separate third-party project outside of the aws github organization so that it doesn't have the "expectation" baggage of being a comprehensive solution for all possible S3 endpoints. |
|
Thanks for being patient with us, S3 URI parsing is now available in v2. We've added the You'll need to convert a String to a URI to pass to the API. We did not include String preprocessing due to issues with edge cases. Specifically, keys/queries with unsafe/reserved characters must be encoded. Dots in bucket names in virtual-hosted-style URIs must not be encoded. You can now retrieve all query parameters, not just the versionId. The following snippet shows an example of the new APIs
|
1.11.x provides http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3URI.html to parse the interesting components out of a S3 URI.
A similar piece of functionality should be made available in 2.0.
In the short term, customers can still use it from 1.11.x.
The text was updated successfully, but these errors were encountered: