Skip to content

uri.parse doesn't seem to follow url.spec.whatwg.org #29420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
eseidelGoogle opened this issue Apr 21, 2017 · 11 comments
Open

uri.parse doesn't seem to follow url.spec.whatwg.org #29420

eseidelGoogle opened this issue Apr 21, 2017 · 11 comments
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. core-2 library-core type-enhancement A request for a change that isn't a bug

Comments

@eseidelGoogle
Copy link

This was brought to my attention due to:
flutter/flutter#9529

For example:
Uri.parse('https://www.google.com/#q=[') throws.

https://url.spec.whatwg.org/#path-state
Otherwise, run these steps:
If c is not a URL code point and not U+0025 (%), validation error.
If c is U+0025 (%) and remaining does not start with two ASCII hex digits, validation error.
UTF-8 percent encode c using the path percent-encode set, and append the result to buffer.

which my understanding is validation error doesn't actually cause the parse to fail:
"A validation error does not mean that the parser terminates. Termination of a parser is always stated explicitly, e.g., through a return statement."

It looks like if we wanted to conform, there are tests:
https://github.com/w3c/web-platform-tests/tree/master/url

We would just need to write a dart-based harness for running them.

@rmacnak-google rmacnak-google added library-core area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. labels Apr 21, 2017
@rmacnak-google
Copy link
Contributor

@lrhn

@lrhn
Copy link
Member

lrhn commented Apr 24, 2017

We are trying to follow RFC 3986, while being more lenient than required (escaping some invalid characters, but not all) and performing some of the possible normalizations. We are currently not trying to match the whatwg version.
In this case, [ is a gen-delim token, which is not valid in the query part of an URI. Because it's a gen-delim token, the parser does not allow it.

We should consider whether the whatwg parsing algorithm is worth adapting to. It's likely to be what some users will expect (as this bug testifies to), but we won't match all parts of it (for example, we don't have blobs).

@eseidelGoogle
Copy link
Author

I would expect user expectations have changed a bit in the 12 years since RFC 3986. :) That said, this isn't a particularly high priority bug. But next time we do go looking at the url library the whatwg had a ton of tests for making it easy to conform to its url spec.

@welldone217
Copy link

It could be fixed ?

@eseidelGoogle
Copy link
Author

For no good reason (other than that I used to do this kind of thing as my day job) I wrote a test harness to see how well Dart's Uri.parse passes WHATWG's suite:
https://github.com/eseidelGoogle/dart_url_tests

The major difference is that Uri.parse is way more strict (throws exceptions) where WHATWG's seems to try real hard to always give you an answer even for "invalid" urls.

WHATWG also expects something like the Uri class to expose more of the component details, which as far as I can tell Dart's Uri class doesn't currently.

Regardless, this remains a low priority bug IMO, I just did this for fun tonight. :)

@eseidelGoogle
Copy link
Author

Here are the results for the curious: https://gist.github.com/eseidelGoogle/99681c9b362596f2b45427d16896864e

Caveats: There are also likely bugs in my harness. Certainly I'm not checking everything the tests expect me to due to limited component accessors on Uri. Also, I've not compared Chrome or Firefox or other Url implementations to know if anyone actually conforms to that suite.

@lrhn
Copy link
Member

lrhn commented Oct 25, 2017

Our current plan for Dart 2 is to change the Uri parser to match the WhatWG URL parser. This is a breaking change if anyone depends on the current behavior (including the automatic normalization), but it has two advantages. First, it allows more URIs to be parsed (as this bug documents), and it it allows Dart compiled to a browser to leverage the built-in functionality, which should improve performance and code-size).
The WhatWG URL parser only recognizes the authority section of "special" URI schemes (http, etc), so we'll inherit that restriction. Hopefully we can work around the remaining restrictions.

We do expose one property less than the WhatWG URL (we don't separate username and password, exposing both as just userInfo), but we should match the rest.

@natebosch
Copy link
Member

Is this still planned? Will it need to wait for Dart 3?

@illia-romanenko
Copy link

Apparently due to that issue it's hard to implement Google's encoded polyline https://developers.google.com/maps/documentation/utilities/polylinealgorithm
as it might produce "[" etc...

@lrhn
Copy link
Member

lrhn commented Nov 16, 2018

You'll probably have to wait for Dart 3, or us adding an alternative to Uri in some way. Breaking changes are hard.

As for [, the Uri parser is now more permissive and allows [, ] and # in more places. Still, characters have meaning, so you should escape random text when putting it into a URI, not just assume that it will work. If you use someUri.resolveUri(Uri(query: polyline)), it will allow any character in the polyline argument, and escape what needs to be escaped. Never build URIs by string concatenation unless you undestand and satsify the URI grammar.
Example:

main() {
  var uri = Uri.parse("http://example.org/whatnot");
  var v = uri.resolveUri(new Uri(query: "\x00\xff[]=&#/\\:?"));
  print(v);
  print(Uri.decodeQueryComponent(v.query));
}

@lrhn
Copy link
Member

lrhn commented Sep 5, 2022

More interest has been shown on making the Dart Uri class work more like the WhatWG URL standard, because sometimes web users get tripped up by the differences.

We should consider whether it's viable to change the Uri behavior. It'll likely be a breaking change in some regards, but also one that might reduce the deployment size of web code, if it can use the browser code instead of shipping our own version.
The biggest risk is likely that the WhatWG URL does not canonicalize the same way the Dart Uri does (it does some things for HTTP/HTTPS URIs, but not, e.g., escape case normaliation). The Dart tools rely on comparing URIs for equality to decide whether two URIs refer to the same library.

If we want to change the Dart Uri class to match the WhatWG URL behavior, we should definitely not implement it ourselves in Dart. Even just manually keeping up with updates on the "public suffix list" is a fool's errand.

Getting the functionality from Cronet would be optimal where it's available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. core-2 library-core type-enhancement A request for a change that isn't a bug
Projects
None yet
Development

No branches or pull requests

7 participants