Skip to content

Don’t remove leading dots in domain names #171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

SimonSapin
Copy link
Member

Fixes #166, closes #170.

http://www.unicode.org/reports/tr46/#ToASCII has algorithms like:

  • Break the result [a domain] into labels at U+002E FULL STOP.
  • […]
  • join the labels using U+002E FULL STOP as a separator

Before this PR, the IDNA code does "join" with code like:

     let mut result = String::new();
     for label in /* ... */ {
         if result.len() > 0 {
             result.push('.');
         }
         result.push_str(/* ... */)
     }

This writes a dot before every label, except the first. But this code seems to be buggy: if all labels so far are empty, it won’t push dots to separate empty labels. In other words, leading dots are stripped.

I’ve fixed this apparent bug by using a boolean, but a number of tests started failing. They apparently expect leading dots to be stripped. I don’t see any thing in the spec to support this behavior.

@valenting did you mean to strip leading dots in this code? Am I missing something in the spec?

Review on Reviewable

@valenting
Copy link
Collaborator

I don't recall if my implementation happened to match the unit tests, or if I changed the implementation to match the unit tests.
But I do agree that the spec doesn't specify this, and your patch makes more sense.
We should raise this issue with the unicode group, to get the unit tests fixed.

@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably #172) made this pull request unmergeable. Please resolve the merge conflicts.

@valenting
Copy link
Collaborator

Reviewed 3 of 3 files at r1.
Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.


a discussion (no related file):
Sorry I never completed this review. As I said, this makes sense, and I think we should land it.


tests/tests.rs, line 119 [r1] (raw file):
Interestingly, not how Firefox parses this URL :)


Comments from the review on Reviewable.io

@Manishearth
Copy link
Member

Manishearth commented Dec 18, 2016

Just realized that this is the same as #256 .

Both chrome and firefox do not strip any dots, and the spec agrees with this. We should land either this or #256

We're not running the IDNA tests on CI, btw, so we already have a bunch that fail.

The tests should be changed and also CId.

@Manishearth
Copy link
Member

But I do agree that the spec doesn't specify this

The spec says to "join using full stops", and the "join" operation usually means putting the given delimeter between the segments, even if they are empty. ["","",""].join('.') == ".."

@SimonSapin
Copy link
Member Author

SimonSapin commented Dec 19, 2016

What bothers me about this change, and why this PR isn’t merged yet, is that it makes some IDNA tests fail. It looks like these tests might be incorrect? Still, just commenting them out (as done in the PR at the moment) feels wrong. But the Unicode Technical Committee has been completely unresponsive to reports of spec issues so far, so I don’t know what to do here.

@Manishearth
Copy link
Member

Manishearth commented Dec 19, 2016

The tests are wrong -- they don't agree with the spec or the implementations.

@Manishearth Manishearth added this to the rust-url oxidation milestone Dec 19, 2016
@Manishearth
Copy link
Member

Anyway, r=me on this change

@dtolnay
Copy link
Contributor

dtolnay commented May 12, 2017

This is still waiting for a response from the Unicode Technical Committee right?

behnam added a commit to behnam/rust-url that referenced this pull request May 12, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.
behnam added a commit to behnam/rust-url that referenced this pull request May 12, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.
behnam added a commit to behnam/rust-url that referenced this pull request May 12, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.

Fix servo#166
@behnam
Copy link
Contributor

behnam commented May 12, 2017

I have submitted a new PR, based on this, which also has a fix for the conformance tests: #337

About the spec and test data issues, please see discussion on the issue: #166

With those, I recommend closing this old PR.

behnam added a commit to behnam/rust-url that referenced this pull request May 12, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.

Fix servo#166
@Hoverbear
Copy link

Closing this as requested.

@Hoverbear Hoverbear closed this May 17, 2017
behnam added a commit to behnam/rust-url that referenced this pull request May 19, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.

Fix servo#166
behnam added a commit to behnam/rust-url that referenced this pull request May 23, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.

Fix servo#166
behnam added a commit to behnam/rust-url that referenced this pull request May 29, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.

Fix servo#166
behnam added a commit to behnam/rust-url that referenced this pull request Jun 15, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.

Fix servo#166
behnam added a commit to behnam/rust-url that referenced this pull request Jun 20, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.

Fix servo#166
bors-servo pushed a commit that referenced this pull request Jun 21, 2017
[idna] Update data to Unicode 10.0 and fix logic

* Change the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in #166. (Another attempt on #337 and #171)

* Update `IdnaTest.txt` file to UCD 10.0 and fix Validation Rules, specially Bidi Rules, for the tests to pass.

* Add TODO marks for new flags introduced in Unicode 10.0 version of UTS#46. (http://www.unicode.org/reports/tr46/proposed.html)

* Add integration test for `rust-url` crate for the new behavior.

Fix #166

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-url/351)
<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic when parsing a . in file URLs
7 participants