Make it easier to debug path parsing failures #869

notgull · 2023-10-07T23:20:27Z

I have a non-trivial bit of code using rustix that is returning EINVAL. I thought it was an OS error, but it turns out that I was getting rust-lang/rust#116523. However it was being masked by the fact that Arg impl returns EINVAL if there is a \0 in the path bytes. It would be nice if rustix returned an error code less plausible than EINVAL when a path failure like this happens.

The text was updated successfully, but these errors were encountered:

sunfishcode · 2023-10-08T14:20:08Z

Do you have any suggestions for which code to use? I can see how EINVAL is confusing like this, but I imagine any error code we could return could also be confusing. Even if we picked something as silly as ENOANO, if I were debugging that I think I might guess that's coming from the kernel just as well.

For what it''s worth, Errno::INVAL corresponds to std::io::ErrorKind::InvalidInput in Rust, which is what the standard library uses if you pass a filesystem function a string with embedded NULs. That said, the standard library puts it in an Error which also contains a string message explaining what happened.

So, another option would be to have functions that take path arguments use a different error type, which could be an enum containing either an Errno for an OS error or some form of "invalid path" error. That'd be a semver bump, but we could consider it next time we do a semver bump.

notgull · 2023-12-06T01:34:14Z

I think that this could be solved by taking advantage of the fact that there are many unused error codes.

Error codes are represented on linux_raw as a u16 but only ever take up the -4096..0 range. Which leaves the rest of the u16 space open.
Error codes are represented on libc as a c_int. But errno can only ever be positive, which leaves an additional 1 << 31 error codes to be used here.

Specifically, in both cases, there is a surefire way to tell if something is an actual system error code:

For linux_raw, the highest bit will always be set to 1. So we can match on this bit to tell if it is a system error code or if it's something we've created. This leaves the remaining 15 bits of space to do what we want with.
For libc, the highest bit will always be 0. So we can match on this bit in a similar way, giving us 31 bits of space left over.

For the conversions in arg.rs, these errors can occur:

Conversion errors caused by null bytes in CString::new.
os_str() was unable to be properly converted to a UTF-8 string.
The name is too long for the stack based buffer... although I'd argue that ENAMETOOLONG covers this well enough already.

Ignoring the third error, let's say we use a one-bit determinant to decide between the first two errors. This leaves us with 14 bits to hold the index at which path processing failed; either the zero in the CString or the invalid UTF-8 cluster. This can hold a maximum index of 1 << 14 = 16384, which is probably high enough for most cases, and we can just use 1 << 14 as a sentinel value for "this length can't be properly represented".

Thoughts on this? We potentially ran into this issue as a part of dbus2/zbus#517, which makes it the second time I've stepped on this particular footgun.

notgull mentioned this issue Oct 7, 2023

Replace socket2 calls with rustix smol-rs/async-io#146

Merged

notgull mentioned this issue Dec 3, 2023

Actual smol rs update dbus2/zbus#517

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make it easier to debug path parsing failures #869

Make it easier to debug path parsing failures #869

notgull commented Oct 7, 2023

sunfishcode commented Oct 8, 2023

Uh oh!

notgull commented Dec 6, 2023

Uh oh!

Make it easier to debug path parsing failures #869

Make it easier to debug path parsing failures #869

Comments

notgull commented Oct 7, 2023

sunfishcode commented Oct 8, 2023

Uh oh!

notgull commented Dec 6, 2023

Uh oh!