Skip to content

UriComponentsBuilder '{' '}' may not be encoded although invalid characters #26466

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jonenst opened this issue Jan 28, 2021 · 13 comments
Closed
Assignees
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) type: enhancement A general enhancement
Milestone

Comments

@jonenst
Copy link

jonenst commented Jan 28, 2021

Using the UriComponentsBuilder, the { and } characters can end up in the result if you are not careful (they are the only ones from the invalid printable ascii chars which do this, most probably because they are used for templates, like in {city}).

jshell> UriComponentsBuilder.fromUriString(" \"<>\\^`|][%{}").encode().build().toUriString()
$2 ==> "%20%22%3C%3E%5C%5E%60%7C%5D%5B%25{}"
// {} not percent encoded at the end

jshell> UriComponentsBuilder.fromUriString(" \"<>\\^`|][%}{").encode().build().toUriString()
$3 ==> "%20%22%3C%3E%5C%5E%60%7C%5D%5B%25%7D%7B"
// }{ correctly percent encoded at the end

Using toUri() instead of toUriString() at least does check and throws an exception in the bad case.

 jshell> UriComponentsBuilder.fromUriString("}{").encode().build().toUri()
$4 ==> %7D%7B

jshell> UriComponentsBuilder.fromUriString("{}").encode().build().toUri()
|  Exception java.lang.IllegalStateException: Could not create URI object: Illegal character in path at index 0: {}

Using toUri() and removing .encode() actually makes it encode:

jshell> UriComponentsBuilder.fromUriString(" \"<>\\^`|][%{}").encode().build().toUriString()
$2 ==> "%20%22%3C%3E%5C%5E%60%7C%5D%5B%25{}"
// As seen before, with .encode() and .toUriString(): {} not encoded

jshell> UriComponentsBuilder.fromUriString(" \"<>\\^`|][{}").build().toUri();
$8 ==> %20%22%3C%3E%5C%5E%60%7C%5D%5B%7B%7D
// without .encode() and with .toUri(): {} encoded !?

With buildAndExand(), things are a bit safer, but still there are cases where it lets unencoded chars through.

 jshell> UriComponentsBuilder.fromUriString("{a}").buildAndExpand().toUriString()
|  Exception java.lang.IllegalArgumentException: Not enough variable values available to expand 'a'
// a bit safer, expand detects the missing argument

jshell> UriComponentsBuilder.fromUriString("{}").buildAndExpand().toUriString()
$29 ==> "{}"
// empty brackets are neither encoded nor detected as errors.
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Jan 28, 2021
@jonenst jonenst changed the title UriComponentsBuilder dangerous API: non encoded invalid characters UriComponentsBuilder '{' '}' may not be encoded although invalid characters Jan 28, 2021
@sbrannen
Copy link
Member

I've edited your comment to improve the formatting. You might want to check out this Mastering Markdown guide for future reference.

@sbrannen sbrannen added the in: web Issues in web modules (web, webmvc, webflux, websocket) label Jan 28, 2021
@jonenst
Copy link
Author

jonenst commented Jan 28, 2021

Thanks. Sorry for the bad formatting.

I found another place where '{' slips through (and in addition to '}' slipping through, the wrong name must be passed as the key in the uriVariable map to avoid getting an exception):

jshell>  UriComponentsBuilder.fromUriString("/{foo{}}").encode().build().expand("bar")
$132 ==> /bar}

jshell>  UriComponentsBuilder.fromUriString("/{foo{}}").encode().build().expand(Map.of("foo{","bar"))
$133 ==> /bar}
# the key in the map should be foo{} !

jshell>  UriComponentsBuilder.fromUriString("/{foo{}}").encode().build().expand(Map.of("foo{}","bar"))
|  Exception java.lang.IllegalArgumentException: Map has no value for 'foo{'
# throws :(

This one is because UriTemplateEncoder chooses the variable name to be "foo{}" by correctly tracking the nesting level and so correctly doesn't encode this variable name, but the expandUriComponent uses Pattern.compile("\{([^/]+?)}") to find names to replace and so stops at the first '}'. Normal regexps can't track arbitrary high nesting levels, so there's no easy fix for this one while still allowing names of variables to contain correctly nested brackets. Note that as soon as a variable name is incorrectly nested, it becomes considered as a literal (with it's surround '{' and '}'):

jshell>  UriComponentsBuilder.fromUriString("/{foo{}}").encode().build().expand("bar")
$140 ==> /bar}
# correctly nested

jshell>  UriComponentsBuilder.fromUriString("/{foo{}").encode().build().expand("bar")
$141 ==> /%7Bfoo%7B%7D
# incorrectly nested, treated as a literal, bar is not used

jshell>  UriComponentsBuilder.fromUriString("/{foo}}").encode().build().expand("bar")
$142 ==> /bar%7D
# incorrectly nested, closes too soon, the remaining '}' is treated as a literal

There are more weird manifestations of the discrepancy the UriTemplateEncoder code and the expandUriComponent code:

jshell>  UriComponentsBuilder.fromUriString("/{foo{}{}}").encode().build().expand("bar")
|  Exception java.lang.IllegalArgumentException: Not enough variable values available to expand '}'
# this should have worked, a single variable "foo{}{}" gets its value from the positional argument.
# But it throws about a bogus second variable.
jshell>  UriComponentsBuilder.fromUriString("/{foo{}{}}").encode().build().expand("bar", "xxx")
$144 ==> /barxxx
# correctly nested, so  UriTemplateEncoder treats it as one unencoded variable
# but the regexp in expandUriComponent matches 2 variables. Chaos ensues.

@poutsma
Copy link
Contributor

poutsma commented Jan 29, 2021

We are aware of the fact that it's impossible to correctly parse a string into a URL using a regex. But I would argue a even stronger case: it's impossible to correctly parse URL strings at all, for a number of reasons:

The typical developers idea of what a URL is differs significantly from the relevant specifications. We have no intention on trying to enforce these specs all of our users, and instead try to be more pragmatic instead. For instance, this leads to us supporting url template placeholders where strictly they are not allowed.

Secondly, there is URL encoding. It is impossible to distinguish between encoded and unencoded components, and even if you could, there are too many encoding choices available to guess correctly 100% of the time. Again, we could have decided to only support the specifications, but instead we try to be more lenient where we can.

Overall, the goal of the UriComponentsBuilder is to be useful for Spring Framework itself, and its users. The goal is not to strictly conform to URL RFCs, though we try to do so where we can. The parse methods on the UriComponentsBuilder are essentially best guesses, with absolutely no guarantee of parsing the given string correctly, because that is impossible anyway—whether using a regex or not.

Given the above, I am not at all surprised that the builder has difficulty parsing strings like \"<>\\^|][%{}or/{foo{}{}}`. In fact, I am surprised the former is even accepted at all, because it's not a valid URI. Both simply have too much ambiguity, as we do allow placeholders. If we would resolve this and not allow placeholders in those places, we would break the vast majority of the Spring apps out there. So that is not an option.

However, there is a way to resolve this. As the name of the class suggests, the UriComponentsBuilder can also be used as a builder, where you can supply individual components (scheme, host, path, etc) one at a time. Used in this way, there is less ambiguity to solve, so the builder generally does a better job.

@poutsma poutsma self-assigned this Jan 29, 2021
@poutsma poutsma added status: declined A suggestion or change that we don't feel we should currently apply and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Jan 29, 2021
@poutsma poutsma closed this as completed Jan 29, 2021
@rstoyanchev
Copy link
Contributor

rstoyanchev commented Jan 29, 2021

I think I see where the issue is and it's related to the encode() at the UriComponentsBuilder level. That encodes the template while it still contains URI variables, while URI variables themselves are encoded more strongly later (i.e. all URI reserved characters as opposed to just legal ones) when expanded. This means that if you don't expand, you can end up submitting a URI with what was expected to be placeholders.

@rstoyanchev rstoyanchev reopened this Jan 29, 2021
@rstoyanchev rstoyanchev assigned rstoyanchev and unassigned poutsma Jan 29, 2021
@rstoyanchev rstoyanchev added type: enhancement A general enhancement and removed status: declined A suggestion or change that we don't feel we should currently apply labels Jan 29, 2021
@rstoyanchev rstoyanchev added this to the 5.3.4 milestone Jan 29, 2021
@jonenst
Copy link
Author

jonenst commented Jan 29, 2021

@poutsma Thanks for taking the time to explain. I used \"<>\\^|][%{} just to show that all other characters work fine, not to trigger an obscure behavior. using "{}" alone is enough to trigger the bug. All in all, maybe you're right, and your explanation can be added to the docs ?

Thank you all for your time

@jonenst
Copy link
Author

jonenst commented Feb 1, 2021

Another case where '{' slips through is the UriComponentsBuilder.toUriString() shortcut when we forgot a variable (but supplied at least one other variable) :

jshell> UriComponentsBuilder.fromUriString("/{a}/").uriVariables(Map.of("a","X")).toUriString()
$39 ==> "/X/"
# OK

jshell> UriComponentsBuilder.fromUriString("/{a}/").uriVariables(Map.of("b","X")).toUriString()
$36 ==> "/{a}/"
# invallid

@rstoyanchev
Copy link
Contributor

rstoyanchev commented Feb 1, 2021

On closer look, UriComponents#toUriString() is nothing but concatenation of the current state of the individual parts:

 * Concatenate all URI components to return the fully formed URI String.
 * <p>This method does nothing more than a simple concatenation based on
 * current values. That means it could produce different results if invoked
 * before vs after methods that can change individual values such as
 * {@code encode}, {@code expand}, or {@code normalize}.

In other words it is expected that variables may not have been expanded nor encoded and that would be reflected in the resulting String. The toUri() method on the other hand does validate and reject illegal characters.

You're right there are some corner cases with nested and/or mismatched placeholders but I don't see any easy solutions.

@rstoyanchev rstoyanchev removed this from the 5.3.4 milestone Feb 1, 2021
@rstoyanchev rstoyanchev added status: waiting-for-feedback We need additional information before we can continue status: waiting-for-triage An issue we've not yet triaged or decided on labels Feb 1, 2021
@spring-projects-issues
Copy link
Collaborator

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@spring-projects-issues spring-projects-issues added the status: feedback-reminder We've sent a reminder that we need additional information before we can continue label Feb 8, 2021
@jonenst
Copy link
Author

jonenst commented Feb 9, 2021

hi @rstoyanchev , what feedback do you want from me ?

I agree that the behavior of toUriString() is clearly documented as beeing only a concatenation depending on whether you already called .encode() or not, so that's good. But to me the problem are those edge cases where '{' slips through even though you called .encode().
So to me, the solution is either to fix the edge cases, or to document the edge cases. And if the edge cases are documented, then also document that calling .toUri() instead of toUriString() does a final validation (by redecoding and reencoding) which allows to ensure that the user is not in one of the edge cases ?

I see 2 edge cases: unexpanded variables, and weird mismatches between the regex code and the escaping code.

unexpanded variables:

# using encode_template but not calling expand() but the template
# contains curly brackets (maybe the template author didn't think of variables)
# emtpy or not empty matched curly brackets slip through
jshell> UriComponentsBuilder.fromUriString("{}").encode().build().toUriString()
$2 ==> "{}"
jshell> UriComponentsBuilder.fromUriString("{a}").encode().build().toUriString()
$3 ==> "{a}"
# In this case, document that when you don't call expand(),
# you should encode the result instead of encode_template
# ie ".build().encode()" instead of "encode().build()"
# The current docs currently says that .encode().build is almost always better
# but when you are not calling expand() I think it's worse.


# using encode_template and calling expand, but still empty curly brackets slip through
UriComponentsBuilder.fromUriString("{}").encode().build().expand().toUriString()
$5 ==> "{}"
# document this or fix it ?


# The UriComponentsBuilder.toUriString() shortcut uses
# encode_template but allows for missing variables:
jshell>  UriComponentsBuilder.fromUriString("/{a}/").uriVariables(Map.of("b","X")).toUriString()
$15 ==> "/{a}/"
# fix it to disallow missing variables ? Or document that missing variables
# will result in invalid chars in the result string ?
# I don't see how a resulting string with missing variables is useful,
# you can't safely use it as a new template
# because it's been partially encoded, so you will either
# have doubly encoded parts or non encoded parts.

weird stuff, unfixable ? Remove the nesting parsing code and document that variable names should not contain '{' and '}' ?

#  mismatch between the parsing during encoding and the regexp for the variables
# last '}' slips through
jshell> UriComponentsBuilder.fromUriString("/{foo{}}").encode().build().expand("bar").toUriString()
$7 ==> "/bar}"

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue status: feedback-reminder We've sent a reminder that we need additional information before we can continue labels Feb 9, 2021
@rstoyanchev
Copy link
Contributor

rstoyanchev commented Feb 9, 2021

Thanks for the feedback.

I suppose we can tighten the encoding of the template, which has to work around placeholder, to disallow empty, nested, or mismatched braces, and expect those to be expanded via URI variables instead if actually needed.

That should only leave cases where there are actual URI variables, or what looks like it, including the case with partial expansion via UriComponentsBuilder#uriVariables, which by the way is a feature we can't disallow it.

That's probably good enough. I mean if code somehow forgets to do expand and ends up with a String that contains braces, wouldn't it still need to construct a java.net.URI which is needed at the end in most places? That should then reject it.

This was referenced Mar 11, 2021
lxbzmy pushed a commit to lxbzmy/spring-framework that referenced this issue Mar 26, 2022
This commit better aligns how URI variable placeholders are detected
in UriComponentsBuilder#encode (i.e. the pre-encoding of the literal
parts of a URI template) and how they are expanded later on.
The latter relies on a pattern that stops at the first closing '}'
which excludes the possibility for well-formed, nested placeholders
other than variables with regex syntax, e.g. "{year:\d{1,4}}".

UriComponentsBuilder#encode now also stops at the first closing '}' and
further ensures the placeholder is not empty and that it has '{' before
deciding to treat it as a URI variable.

Closes spring-projectsgh-26466
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

5 participants