Skip to content

Support URL canonicalization/normalization #229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sanmai-NL opened this issue Oct 16, 2016 · 3 comments
Closed

Support URL canonicalization/normalization #229

sanmai-NL opened this issue Oct 16, 2016 · 3 comments

Comments

@sanmai-NL
Copy link

sanmai-NL commented Oct 16, 2016

This is a feature request. For a project I need to iterate over the path segments that make up the path component of a URL, relative to a base URL.

It appears this library currently (at 1.2.1) does not help with this. To do it manually, I think I must canonicalize the URLs in the pair so that I can directly compare as strings their path components and determine whether/where they diverge or extend.

See these tests to get a taste of what I mean.

@SimonSapin
Copy link
Member

With this library, parsed_base_url.join(str) parses the given string relatively to the given base URL. Parsing does do some normalization along the way. For example (from the first test case you linked), the protocol is ASCII-lowercased and . and .. path segments are resolved.

https://tools.ietf.org/html/rfc3986#section-6 discusses how there are many possible ways to define equivalence and normalization, with a "comparison ladder" although this is not strictly one-dimensional. https://url.spec.whatwg.org/#url-equivalence picks one way, and this library implements it with PartialEq for Url (that is, overloading the == operator).

Does this do what you want? If not, how precisely?

@sanmai-NL
Copy link
Author

sanmai-NL commented Nov 6, 2017

I think my original issue had to do with the special behavior of the join method based on the presence of a trailing slash in the parsed URL.

I've made an example that shows how to do what I was looking for back then. @SimonSapin: Do you find it educational enough to add as an example?

But the usefulness of my solution hinges on the quality of the normalization. See also iron/staticfile#90 (comment) . @SimonSapin: Are you interested in improving the quality/degree of normalization further?

@SimonSapin
Copy link
Member

I’ve filed #461 to have a look at iron/staticfile#90 (comment) specifically. Closing as it is unlikely that this crate will provide normalization beyond what is specified in https://url.spec.whatwg.org/, if only because there is no single answer to what the behavior should be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants