-
Notifications
You must be signed in to change notification settings - Fork 38.4k
UriComponentsBuilder '{' '}' may not be encoded although invalid characters #26466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've edited your comment to improve the formatting. You might want to check out this Mastering Markdown guide for future reference. |
Thanks. Sorry for the bad formatting. I found another place where '{' slips through (and in addition to '}' slipping through, the wrong name must be passed as the key in the uriVariable map to avoid getting an exception): jshell> UriComponentsBuilder.fromUriString("/{foo{}}").encode().build().expand("bar")
$132 ==> /bar}
jshell> UriComponentsBuilder.fromUriString("/{foo{}}").encode().build().expand(Map.of("foo{","bar"))
$133 ==> /bar}
# the key in the map should be foo{} !
jshell> UriComponentsBuilder.fromUriString("/{foo{}}").encode().build().expand(Map.of("foo{}","bar"))
| Exception java.lang.IllegalArgumentException: Map has no value for 'foo{'
# throws :( This one is because UriTemplateEncoder chooses the variable name to be "foo{}" by correctly tracking the nesting level and so correctly doesn't encode this variable name, but the expandUriComponent uses Pattern.compile("\{([^/]+?)}") to find names to replace and so stops at the first '}'. Normal regexps can't track arbitrary high nesting levels, so there's no easy fix for this one while still allowing names of variables to contain correctly nested brackets. Note that as soon as a variable name is incorrectly nested, it becomes considered as a literal (with it's surround '{' and '}'): jshell> UriComponentsBuilder.fromUriString("/{foo{}}").encode().build().expand("bar")
$140 ==> /bar}
# correctly nested
jshell> UriComponentsBuilder.fromUriString("/{foo{}").encode().build().expand("bar")
$141 ==> /%7Bfoo%7B%7D
# incorrectly nested, treated as a literal, bar is not used
jshell> UriComponentsBuilder.fromUriString("/{foo}}").encode().build().expand("bar")
$142 ==> /bar%7D
# incorrectly nested, closes too soon, the remaining '}' is treated as a literal There are more weird manifestations of the discrepancy the UriTemplateEncoder code and the expandUriComponent code: jshell> UriComponentsBuilder.fromUriString("/{foo{}{}}").encode().build().expand("bar")
| Exception java.lang.IllegalArgumentException: Not enough variable values available to expand '}'
# this should have worked, a single variable "foo{}{}" gets its value from the positional argument.
# But it throws about a bogus second variable.
jshell> UriComponentsBuilder.fromUriString("/{foo{}{}}").encode().build().expand("bar", "xxx")
$144 ==> /barxxx
# correctly nested, so UriTemplateEncoder treats it as one unencoded variable
# but the regexp in expandUriComponent matches 2 variables. Chaos ensues. |
We are aware of the fact that it's impossible to correctly parse a string into a URL using a regex. But I would argue a even stronger case: it's impossible to correctly parse URL strings at all, for a number of reasons: The typical developers idea of what a URL is differs significantly from the relevant specifications. We have no intention on trying to enforce these specs all of our users, and instead try to be more pragmatic instead. For instance, this leads to us supporting url template placeholders where strictly they are not allowed. Secondly, there is URL encoding. It is impossible to distinguish between encoded and unencoded components, and even if you could, there are too many encoding choices available to guess correctly 100% of the time. Again, we could have decided to only support the specifications, but instead we try to be more lenient where we can. Overall, the goal of the Given the above, I am not at all surprised that the builder has difficulty parsing strings like However, there is a way to resolve this. As the name of the class suggests, the |
I think I see where the issue is and it's related to the |
@poutsma Thanks for taking the time to explain. I used Thank you all for your time |
Another case where '{' slips through is the UriComponentsBuilder.toUriString() shortcut when we forgot a variable (but supplied at least one other variable) : jshell> UriComponentsBuilder.fromUriString("/{a}/").uriVariables(Map.of("a","X")).toUriString()
$39 ==> "/X/"
# OK
jshell> UriComponentsBuilder.fromUriString("/{a}/").uriVariables(Map.of("b","X")).toUriString()
$36 ==> "/{a}/"
# invallid |
On closer look,
In other words it is expected that variables may not have been expanded nor encoded and that would be reflected in the resulting String. The You're right there are some corner cases with nested and/or mismatched placeholders but I don't see any easy solutions. |
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed. |
hi @rstoyanchev , what feedback do you want from me ? I agree that the behavior of toUriString() is clearly documented as beeing only a concatenation depending on whether you already called .encode() or not, so that's good. But to me the problem are those edge cases where '{' slips through even though you called .encode(). I see 2 edge cases: unexpanded variables, and weird mismatches between the regex code and the escaping code. unexpanded variables: # using encode_template but not calling expand() but the template
# contains curly brackets (maybe the template author didn't think of variables)
# emtpy or not empty matched curly brackets slip through
jshell> UriComponentsBuilder.fromUriString("{}").encode().build().toUriString()
$2 ==> "{}"
jshell> UriComponentsBuilder.fromUriString("{a}").encode().build().toUriString()
$3 ==> "{a}"
# In this case, document that when you don't call expand(),
# you should encode the result instead of encode_template
# ie ".build().encode()" instead of "encode().build()"
# The current docs currently says that .encode().build is almost always better
# but when you are not calling expand() I think it's worse.
# using encode_template and calling expand, but still empty curly brackets slip through
UriComponentsBuilder.fromUriString("{}").encode().build().expand().toUriString()
$5 ==> "{}"
# document this or fix it ?
# The UriComponentsBuilder.toUriString() shortcut uses
# encode_template but allows for missing variables:
jshell> UriComponentsBuilder.fromUriString("/{a}/").uriVariables(Map.of("b","X")).toUriString()
$15 ==> "/{a}/"
# fix it to disallow missing variables ? Or document that missing variables
# will result in invalid chars in the result string ?
# I don't see how a resulting string with missing variables is useful,
# you can't safely use it as a new template
# because it's been partially encoded, so you will either
# have doubly encoded parts or non encoded parts. weird stuff, unfixable ? Remove the nesting parsing code and document that variable names should not contain '{' and '}' ? # mismatch between the parsing during encoding and the regexp for the variables
# last '}' slips through
jshell> UriComponentsBuilder.fromUriString("/{foo{}}").encode().build().expand("bar").toUriString()
$7 ==> "/bar}" |
Thanks for the feedback. I suppose we can tighten the encoding of the template, which has to work around placeholder, to disallow empty, nested, or mismatched braces, and expect those to be expanded via URI variables instead if actually needed. That should only leave cases where there are actual URI variables, or what looks like it, including the case with partial expansion via That's probably good enough. I mean if code somehow forgets to do expand and ends up with a String that contains braces, wouldn't it still need to construct a |
This commit better aligns how URI variable placeholders are detected in UriComponentsBuilder#encode (i.e. the pre-encoding of the literal parts of a URI template) and how they are expanded later on. The latter relies on a pattern that stops at the first closing '}' which excludes the possibility for well-formed, nested placeholders other than variables with regex syntax, e.g. "{year:\d{1,4}}". UriComponentsBuilder#encode now also stops at the first closing '}' and further ensures the placeholder is not empty and that it has '{' before deciding to treat it as a URI variable. Closes spring-projectsgh-26466
Using the
UriComponentsBuilder
, the{
and}
characters can end up in the result if you are not careful (they are the only ones from the invalid printable ascii chars which do this, most probably because they are used for templates, like in{city}
).Using
toUri()
instead oftoUriString()
at least does check and throws an exception in the bad case.Using
toUri()
and removing.encode()
actually makes it encode:With
buildAndExand()
, things are a bit safer, but still there are cases where it lets unencoded chars through.The text was updated successfully, but these errors were encountered: