-
Notifications
You must be signed in to change notification settings - Fork 346
Don’t remove leading dots in domain names #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I don't recall if my implementation happened to match the unit tests, or if I changed the implementation to match the unit tests. |
☔ The latest upstream changes (presumably #172) made this pull request unmergeable. Please resolve the merge conflicts. |
Reviewed 3 of 3 files at r1. a discussion (no related file): tests/tests.rs, line 119 [r1] (raw file): Comments from the review on Reviewable.io |
The spec says to "join using full stops", and the "join" operation usually means putting the given delimeter between the segments, even if they are empty. |
What bothers me about this change, and why this PR isn’t merged yet, is that it makes some IDNA tests fail. It looks like these tests might be incorrect? Still, just commenting them out (as done in the PR at the moment) feels wrong. But the Unicode Technical Committee has been completely unresponsive to reports of spec issues so far, so I don’t know what to do here. |
The tests are wrong -- they don't agree with the spec or the implementations. |
Anyway, r=me on this change |
This is still waiting for a response from the Unicode Technical Committee right? |
A retry of servo#171 This diff changes the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in servo#166. The change in the code results in a few failures for test cases of the Conformance Testing data provided with UTS servo#46. But, as the header of the test data file (IdnaTest.txt) says: "If the file does not indicate an error, then the implementation must either have an error, or must have a matching result." Therefore, failing on those test cases does not break conformance with UTS servo#46, and to some level, anticipated. As mentioned in servo#166, a feedback is submitted for this inconsistency and the test logic can be improved later if the data file addresses the comments. Until then, we can throw less errors and maintain passing conformance tests with this diff.
A retry of servo#171 This diff changes the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in servo#166. The change in the code results in a few failures for test cases of the Conformance Testing data provided with UTS servo#46. But, as the header of the test data file (IdnaTest.txt) says: "If the file does not indicate an error, then the implementation must either have an error, or must have a matching result." Therefore, failing on those test cases does not break conformance with UTS servo#46, and to some level, anticipated. As mentioned in servo#166, a feedback is submitted for this inconsistency and the test logic can be improved later if the data file addresses the comments. Until then, we can throw less errors and maintain passing conformance tests with this diff. To keep the side-effects of ignoring errors during test runs as minimum as possible, I have separated `TooShortForDns` error from `TooLongForDns`. The `Error` struct has been kept private, so the change won't affect any library users.
A retry of servo#171 This diff changes the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in servo#166. The change in the code results in a few failures for test cases of the Conformance Testing data provided with UTS servo#46. But, as the header of the test data file (IdnaTest.txt) says: "If the file does not indicate an error, then the implementation must either have an error, or must have a matching result." Therefore, failing on those test cases does not break conformance with UTS servo#46, and to some level, anticipated. As mentioned in servo#166, a feedback is submitted for this inconsistency and the test logic can be improved later if the data file addresses the comments. Until then, we can throw less errors and maintain passing conformance tests with this diff. To keep the side-effects of ignoring errors during test runs as minimum as possible, I have separated `TooShortForDns` error from `TooLongForDns`. The `Error` struct has been kept private, so the change won't affect any library users. Fix servo#166
A retry of servo#171 This diff changes the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in servo#166. The change in the code results in a few failures for test cases of the Conformance Testing data provided with UTS servo#46. But, as the header of the test data file (IdnaTest.txt) says: "If the file does not indicate an error, then the implementation must either have an error, or must have a matching result." Therefore, failing on those test cases does not break conformance with UTS servo#46, and to some level, anticipated. As mentioned in servo#166, a feedback is submitted for this inconsistency and the test logic can be improved later if the data file addresses the comments. Until then, we can throw less errors and maintain passing conformance tests with this diff. To keep the side-effects of ignoring errors during test runs as minimum as possible, I have separated `TooShortForDns` error from `TooLongForDns`. The `Error` struct has been kept private, so the change won't affect any library users. Fix servo#166
Closing this as requested. |
A retry of servo#171 This diff changes the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in servo#166. The change in the code results in a few failures for test cases of the Conformance Testing data provided with UTS servo#46. But, as the header of the test data file (IdnaTest.txt) says: "If the file does not indicate an error, then the implementation must either have an error, or must have a matching result." Therefore, failing on those test cases does not break conformance with UTS servo#46, and to some level, anticipated. As mentioned in servo#166, a feedback is submitted for this inconsistency and the test logic can be improved later if the data file addresses the comments. Until then, we can throw less errors and maintain passing conformance tests with this diff. To keep the side-effects of ignoring errors during test runs as minimum as possible, I have separated `TooShortForDns` error from `TooLongForDns`. The `Error` struct has been kept private, so the change won't affect any library users. Fix servo#166
A retry of servo#171 This diff changes the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in servo#166. The change in the code results in a few failures for test cases of the Conformance Testing data provided with UTS servo#46. But, as the header of the test data file (IdnaTest.txt) says: "If the file does not indicate an error, then the implementation must either have an error, or must have a matching result." Therefore, failing on those test cases does not break conformance with UTS servo#46, and to some level, anticipated. As mentioned in servo#166, a feedback is submitted for this inconsistency and the test logic can be improved later if the data file addresses the comments. Until then, we can throw less errors and maintain passing conformance tests with this diff. To keep the side-effects of ignoring errors during test runs as minimum as possible, I have separated `TooShortForDns` error from `TooLongForDns`. The `Error` struct has been kept private, so the change won't affect any library users. Fix servo#166
A retry of servo#171 This diff changes the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in servo#166. The change in the code results in a few failures for test cases of the Conformance Testing data provided with UTS servo#46. But, as the header of the test data file (IdnaTest.txt) says: "If the file does not indicate an error, then the implementation must either have an error, or must have a matching result." Therefore, failing on those test cases does not break conformance with UTS servo#46, and to some level, anticipated. As mentioned in servo#166, a feedback is submitted for this inconsistency and the test logic can be improved later if the data file addresses the comments. Until then, we can throw less errors and maintain passing conformance tests with this diff. To keep the side-effects of ignoring errors during test runs as minimum as possible, I have separated `TooShortForDns` error from `TooLongForDns`. The `Error` struct has been kept private, so the change won't affect any library users. Fix servo#166
A retry of servo#171 This diff changes the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in servo#166. The change in the code results in a few failures for test cases of the Conformance Testing data provided with UTS servo#46. But, as the header of the test data file (IdnaTest.txt) says: "If the file does not indicate an error, then the implementation must either have an error, or must have a matching result." Therefore, failing on those test cases does not break conformance with UTS servo#46, and to some level, anticipated. As mentioned in servo#166, a feedback is submitted for this inconsistency and the test logic can be improved later if the data file addresses the comments. Until then, we can throw less errors and maintain passing conformance tests with this diff. To keep the side-effects of ignoring errors during test runs as minimum as possible, I have separated `TooShortForDns` error from `TooLongForDns`. The `Error` struct has been kept private, so the change won't affect any library users. Fix servo#166
A retry of servo#171 This diff changes the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in servo#166. The change in the code results in a few failures for test cases of the Conformance Testing data provided with UTS servo#46. But, as the header of the test data file (IdnaTest.txt) says: "If the file does not indicate an error, then the implementation must either have an error, or must have a matching result." Therefore, failing on those test cases does not break conformance with UTS servo#46, and to some level, anticipated. As mentioned in servo#166, a feedback is submitted for this inconsistency and the test logic can be improved later if the data file addresses the comments. Until then, we can throw less errors and maintain passing conformance tests with this diff. To keep the side-effects of ignoring errors during test runs as minimum as possible, I have separated `TooShortForDns` error from `TooLongForDns`. The `Error` struct has been kept private, so the change won't affect any library users. Fix servo#166
[idna] Update data to Unicode 10.0 and fix logic * Change the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in #166. (Another attempt on #337 and #171) * Update `IdnaTest.txt` file to UCD 10.0 and fix Validation Rules, specially Bidi Rules, for the tests to pass. * Add TODO marks for new flags introduced in Unicode 10.0 version of UTS#46. (http://www.unicode.org/reports/tr46/proposed.html) * Add integration test for `rust-url` crate for the new behavior. Fix #166 <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-url/351) <!-- Reviewable:end -->
Fixes #166, closes #170.
http://www.unicode.org/reports/tr46/#ToASCII has algorithms like:
Before this PR, the IDNA code does "join" with code like:
This writes a dot before every label, except the first. But this code seems to be buggy: if all labels so far are empty, it won’t push dots to separate empty labels. In other words, leading dots are stripped.
I’ve fixed this apparent bug by using a boolean, but a number of tests started failing. They apparently expect leading dots to be stripped. I don’t see any thing in the spec to support this behavior.
@valenting did you mean to strip leading dots in this code? Am I missing something in the spec?