Skip to content
This repository was archived by the owner on Apr 12, 2024. It is now read-only.

URL path and query encoding/decoding problems #13815

Open
dinofx opened this issue Jan 22, 2016 · 6 comments
Open

URL path and query encoding/decoding problems #13815

dinofx opened this issue Jan 22, 2016 · 6 comments

Comments

@dinofx
Copy link

dinofx commented Jan 22, 2016

SPACE is never encoded as '+' in a URL. '+' means '+'.

But the $http service incorrectly encodes SPACE as + in the URL query, instead of %20. For example:
$http.get('/someUrl', {params: {x: 'foo bar'}})....

Only in the BODY of a POST with content-type of "application/x-www-form-urlencoded" is SPACE encoded as '+'.

Similarly, the $location service will mangle addresses after the page loads, rewriting:
/my/page?app=Google+ to /my/page?app=Google%20
(when using html5 mode)

@gkalpak
Copy link
Member

gkalpak commented Jan 22, 2016

Relevant comment: #3042 (comment)
So, according to W3's recommendations, it seems that the current behavior is fine.

@gkalpak gkalpak added this to the Purgatory milestone Jan 22, 2016
@nlwillia
Copy link

From a very strict interpretation of the available standards, you may be correct. The URL RFC does not define a plus-to-space conversion. That concept only exists in HTML standards where aside from a W3 blurb the emphasis seems to be on x-www-url-encoded form submissions which technically would apply to post data and not a get URL.

However, in practice, across browsers and applications, the convention seems to be applied nearly universally. I can't find an example where it isn't.

<!doctype html5>
<html>
<body>
    <form method="get">
        <input type="text" name="test" />
        <input type="submit" value="Submit" />
    </form>
</body>
</html>

In Firefox, Chrome and IE this always (for me in Win8.1 at least) navigates to ?test=a+b for an input value of "a b". For web searches, Google and Bing interpret + in their query term as a space. Servers receiving ?q=a+b typically expose the value to the application as "a b".

If Angular takes a hard "plus means plus" stance, then it's going to be in conflict with the environment in which it runs and the servers with which it interacts. Consider an application with a get form that submits to an Angular page. If the browser converts the user's space input to pluses, how is the Angular application supposed to distinguish space from an actual plus? The only way would be to parse the raw URL query string as supplied by the browser.

For Angular to be useful to developers, it needs to conform to reality. Ideally that would track against written standards, but sometimes there are situations where you have to settle for practicality. I'd be happy to see movement at the standards level to clarify this issue, but until that happens, it seems to me that this behavior is the right compromise for Angular to make.

There could be an argument for there to be a configuration option that would control whether Angular parses URLs in strict RFC mode or relaxed web mode, but I think there would have to be examples (which I'd be interested to see) of where browsers or applications needed it.

@dinofx
Copy link
Author

dinofx commented Jan 22, 2016

I've never had to use a self-submitting form. Perhaps POST is used more often, and browsers are just reusing their application/x-www-form-urlencoded code to build the query string. But, the same browsers will return "a%20b" when calling encodeURIComponent("a b").

Servers choosing to interpret + as "space" would also interpret "%20" as space. But the opposite is not true. WebSphere Liberty is one J2EE server which follow the spec (by default). See:
http://www-01.ibm.com/support/knowledgecenter/#!/SSEQTP_8.5.5/com.ibm.websphere.wlp.doc/autodita/rwlp_metatype_4ic.html?cp=SSEQTP_8.5.5%2F1-0-2-2-0
(search for decodeUrlPlusSign, which defaults to false. Setting it to true is problematic, since it also decodes '+' in the pathInfo).

Angular could always encode both SPACE and PLUS using % encoding, which should work on any server. Or, an optional boolean property could be used to configure $locationProvider?

I've only come across 2 lines of code so far that would need to be changed based on this setting:
In the function parseKeyValue:

key = keyValue = keyValue.replace(/\+/g,'%20'); //avoid call to replace

and

function encodeUriQuery(val, pctEncodeSpaces) // pctEncodeSpaces ||= newSetting

@dinofx
Copy link
Author

dinofx commented Jan 22, 2016

I guess angular is always encoding SPACE and PLUS, but the problem comes from the fact that it also decodes things. So it seems like a config property is the only way to prevent '+' from being decoded, and then reencoded as %20.

This property could also be used to not encode '+' as %2b, but that's just icing.

@nlwillia
Copy link

An example self-submitting use case is a static get form in the masthead of a site submitting to an Angular-driven search page. In that scenario you have to deal with what the browser is sending you and with what the target server expects for the search term.

Some language examples:

  • PHP - you have a choice of urldecode which interprets the plus and rawurldecode which doesn't.
  • Python - you have a choice of urllib.unquote and unquote_plus.
  • Node - querystring.parse('test=a+b') returns {test: 'a b'}

Given that servers and languages have configuration options and implementation choices that vary, going configurable is probably the only way to keep everyone happy.

@skeeto
Copy link

skeeto commented Aug 6, 2016

I'm having real problems with this when using Wolfram Alpha. Vimperator messes up the URL encoding and changes all the + operators into spaces in the encoded query, both for yanking and for display. Fortunately I'm still taken to the correct page and it's only the yank and display that is wrong. For example, a query of "a+b" should be this URL:

https://www.wolframalpha.com/input/?i=a%2Bb

But Vimperator displays it as this (while quietly visiting the above):

https://www.wolframalpha.com/input/?i=a+b

Same thing happens in other search engines, including Google, though it doesn't matter much there.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants