Skip to content

Panic when parsing a . in file URLs #166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alexcrichton opened this issue Feb 3, 2016 · 7 comments · Fixed by #351
Closed

Panic when parsing a . in file URLs #166

alexcrichton opened this issue Feb 3, 2016 · 7 comments · Fixed by #351

Comments

@alexcrichton
Copy link
Contributor

Parsing the url file://./foo will cause rust-url to panic, for example:

let _url = url::Url::parse("file://./foo");

yields:

thread '<main>' panicked at 'a non-empty list of numbers', ../src/libcore/option.rs:335
stack backtrace:
   1:     0x7fda97eb0f40 - sys::backtrace::tracing::imp::write::haa19c02b4de52f3bG0t
   2:     0x7fda97eb3075 - panicking::log_panic::_<closure>::closure.41218
   3:     0x7fda97eb2af0 - panicking::log_panic::h527fe484e9de8fe1W7x
   4:     0x7fda97ead303 - sys_common::unwind::begin_unwind_inner::h51f64b1a34c60827fTs
   5:     0x7fda97ead418 - sys_common::unwind::begin_unwind_fmt::h0845853a1913f45blSs
   6:     0x7fda97eb05a1 - rust_begin_unwind
   7:     0x7fda97ede0cf - panicking::panic_fmt::h3967afc085fe8067LFK
   8:     0x7fda97e8ffb5 - option::_<impl>::expect::expect::h6499254404020416113
                        at ../src/libcore/macros.rs:29
   9:     0x7fda97e8c02d - host::parse_ipv4addr::h8bc077b4a80e7f06FIa
                        at src/host.rs:186
  10:     0x7fda97e886b8 - host::_<impl>::parse::hd99c67cae0091fc0Uya
                        at src/host.rs:61
  11:     0x7fda97e98e65 - parser::parse_file_host::ha2c614830ab90204nKb
                        at src/parser.rs:469
  12:     0x7fda97e926b0 - parser::parse_relative_url::h4e24ca8208217f655mb
                        at src/parser.rs:217
  13:     0x7fda97e85497 - parser::parse_url::hdcd76e516d93c3d01eb
                        at src/parser.rs:122
  14:     0x7fda97e80a2b - _<impl>::parse::h6996f44ab86e41c8w1p
                        at src/lib.rs:437
  15:     0x7fda97e809eb - _<impl>::parse::h6fc311753b000c53hlq
                        at src/lib.rs:566
  16:     0x7fda97e80998 - main::h13ef09143d6a2ad6faa
                        at src/main.rs:4
  17:     0x7fda97eb2894 - sys_common::unwind::try::try_fn::h11901883998771707766
  18:     0x7fda97eb03e8 - __rust_try
  19:     0x7fda97eb2536 - rt::lang_start::hc150f651dd2af18b44x
  20:     0x7fda97e81be9 - main
  21:     0x7fda9722cec4 - __libc_start_main
  22:     0x7fda97e80878 - <unknown>
  23:                0x0 - <unknown>
@dlrobertson
Copy link

After a quick git bisect it looks like it doesn't panic prior to fee35ca

dlrobertson added a commit to dlrobertson/rust-url that referenced this issue Feb 21, 2016
When parsing a url such as file://./foo the parser should not panic.
@SimonSapin
Copy link
Member

This lead me to find "interesting" behavior that may or may not be a bug: #171. I’m waiting to hear from Valentin.

@alexcrichton Is this blocking anything? If so, we can land a work around in the meantime.

@alexcrichton
Copy link
Contributor Author

Ah no this isn't blocking anything on my end, this was just something I found surprising that I ran into at some point (I forget even how at this point...)

@dtolnay
Copy link
Contributor

dtolnay commented May 12, 2017

This no longer panics as of the current version of url.

extern crate url;

fn main() {
    println!("{:?}", url::Url::parse("file://./foo"));
}

Bisect shows it was fixed in 9e759f1.

@behnam
Copy link
Contributor

behnam commented May 12, 2017

@dtolnay, rust-url crate doesn't panic with the patch, but the integration tests under idna still fail.

One problem is that the spec does NOT mention if VerifyDnsLength is expected to be set for the Conformance Test, or not.

Another problem is that, even when VerifyDnsLength (which is the reason for some of the failures) is disabled, there are still cases that the expected fields are missing the leading dots, present in the source field. For example, line 4956, which starts with a U+3002 IDEOGRAPHIC FULL STOP in the source:

B;  。\u0635\u0649\u05B0\u0644\u0627。岓\u0F84𝩋ᡂ;    \u0635\u0649\u05B0\u0644\u0627.岓\u0F84𝩋ᡂ;   xn--7cb2vlb7cxa.xn--3ed095b9x3dbd8t #   صىְلا.岓྄𝩋ᡂ

I'm going to write up a feedback to the authors and ask for clarification.

@behnam
Copy link
Contributor

behnam commented May 12, 2017

I didn't see feedback regarding this on the UTC (internal) mailing list or the submitted feedbacks, so I just submitted one, as follows.

I'm going submit a new PR here and work on a fix for the test data file, as I want to send that to the spec editors.


Hi there,

We have faced a couple of issues with implementing the UTS #46 Conformance Testing for the rust-url library:

  1. Neither Section 8 Conformance Testing nor the header of the data file (IdnaTest.txt) which flags need to be set for Section 4 Processing algorithms, specially VerifyDnsLength, or any of the proposed flags: CheckHyphens, CheckBidi, and CheckJoiners.

  2. When VerifyDnsLength is not set, many cases fail, which are referring to Processing Step 4.2 of Section 4.2 ToASCII, meaning that VerifyDnsLength is expected to be set.

For example, line 169:

B;  。; [A4_2]; [A4_2]

(The current implementation of rust-url sets flag VerifyDnsLength because it results in a smaller failure rate for the test data.)

  1. When VerifyDnsLength is set, there are unexpected failures in the test data for those cases with the source field starting with FULL STOP or a replacement character.

For example, line 4956:

B;  。\u0635\u0649\u05B0\u0644\u0627。岓\u0F84𝩋ᡂ;    \u0635\u0649\u05B0\u0644\u0627.岓\u0F84𝩋ᡂ;   xn--7cb2vlb7cxa.xn--3ed095b9x3dbd8t #   صىְلا.岓྄𝩋ᡂ

Starting with U+3002 IDEOGRAPHIC FULL STOP, during the Section 4.2 ToASCII algorithm, it should fail at step Processing 4.2, because of the first label having length zero. But no failure is anticipated in the data file.

The test data appears to be expecting dropping empty labels (or leading FULL STOPs) from the domain name (which would allow the test cases to pass), but there are’s no step under Section 4 Processing or Section 4.2 ToASCII regarding this behavior.

Please see these for original discussion and more info:

behnam added a commit to behnam/rust-url that referenced this issue May 12, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.
behnam added a commit to behnam/rust-url that referenced this issue May 12, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.
behnam added a commit to behnam/rust-url that referenced this issue May 12, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.

Fix servo#166
behnam added a commit to behnam/rust-url that referenced this issue May 12, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.

Fix servo#166
behnam added a commit to behnam/rust-url that referenced this issue May 19, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.

Fix servo#166
behnam added a commit to behnam/rust-url that referenced this issue May 23, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.

Fix servo#166
@behnam
Copy link
Contributor

behnam commented May 24, 2017

This issue was discussed on the UTC list. Based on this and other feedback about the IdnaTest.txt file, it's not updated for the Unicode 10.0 release.

Apparently one of the problems in the reference implementation was using Java's regex split() on the domain name, eating the empty labels before they fail.

Anyways, new data file is here: http://unicode.org/Public/idna/10.0.0/IdnaTest.txt

There are many failures here with this data file, which I'll try to fix and submit a full PR.

Unicode 10.0.0 will be release in about one month from now. So, in the meanwhile, we can still land #337 , or just wait until the day the new data file becoming official.

behnam added a commit to behnam/rust-url that referenced this issue May 29, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.

Fix servo#166
behnam added a commit to behnam/rust-url that referenced this issue Jun 15, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.

Fix servo#166
behnam added a commit to behnam/rust-url that referenced this issue Jun 20, 2017
A retry of servo#171

This diff changes the behavior of ToASCII step to match the spec and
prevent failures on some cases when a domain name starts with leading
dots (FULL STOPs), as requested in
servo#166.

The change in the code results in a few failures for test cases of the
Conformance Testing data provided with UTS servo#46. But, as the header of
the test data file (IdnaTest.txt) says: "If the file does not indicate
an error, then the implementation must either have an error, or must
have a matching result."

Therefore, failing on those test cases does not break conformance with
UTS servo#46, and to some level, anticipated.

As mentioned in servo#166, a feedback
is submitted for this inconsistency and the test logic can be improved
later if the data file addresses the comments.

Until then, we can throw less errors and maintain passing conformance
tests with this diff.

To keep the side-effects of ignoring errors during test runs as minimum
as possible, I have separated `TooShortForDns` error from
`TooLongForDns`. The `Error` struct has been kept private, so the change
won't affect any library users.

Fix servo#166
bors-servo pushed a commit that referenced this issue Jun 21, 2017
[idna] Update data to Unicode 10.0 and fix logic

* Change the behavior of ToASCII step to match the spec and prevent failures on some cases when a domain name starts with leading dots (FULL STOPs), as requested in #166. (Another attempt on #337 and #171)

* Update `IdnaTest.txt` file to UCD 10.0 and fix Validation Rules, specially Bidi Rules, for the tests to pass.

* Add TODO marks for new flags introduced in Unicode 10.0 version of UTS#46. (http://www.unicode.org/reports/tr46/proposed.html)

* Add integration test for `rust-url` crate for the new behavior.

Fix #166

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-url/351)
<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants