Skip to content

Make it easier to debug path parsing failures #869

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
notgull opened this issue Oct 7, 2023 · 2 comments
Open

Make it easier to debug path parsing failures #869

notgull opened this issue Oct 7, 2023 · 2 comments

Comments

@notgull
Copy link
Contributor

notgull commented Oct 7, 2023

I have a non-trivial bit of code using rustix that is returning EINVAL. I thought it was an OS error, but it turns out that I was getting rust-lang/rust#116523. However it was being masked by the fact that Arg impl returns EINVAL if there is a \0 in the path bytes. It would be nice if rustix returned an error code less plausible than EINVAL when a path failure like this happens.

@sunfishcode
Copy link
Member

Do you have any suggestions for which code to use? I can see how EINVAL is confusing like this, but I imagine any error code we could return could also be confusing. Even if we picked something as silly as ENOANO, if I were debugging that I think I might guess that's coming from the kernel just as well.

For what it''s worth, Errno::INVAL corresponds to std::io::ErrorKind::InvalidInput in Rust, which is what the standard library uses if you pass a filesystem function a string with embedded NULs. That said, the standard library puts it in an Error which also contains a string message explaining what happened.

So, another option would be to have functions that take path arguments use a different error type, which could be an enum containing either an Errno for an OS error or some form of "invalid path" error. That'd be a semver bump, but we could consider it next time we do a semver bump.

@notgull
Copy link
Contributor Author

notgull commented Dec 6, 2023

I think that this could be solved by taking advantage of the fact that there are many unused error codes.

  • Error codes are represented on linux_raw as a u16 but only ever take up the -4096..0 range. Which leaves the rest of the u16 space open.
  • Error codes are represented on libc as a c_int. But errno can only ever be positive, which leaves an additional 1 << 31 error codes to be used here.

Specifically, in both cases, there is a surefire way to tell if something is an actual system error code:

  • For linux_raw, the highest bit will always be set to 1. So we can match on this bit to tell if it is a system error code or if it's something we've created. This leaves the remaining 15 bits of space to do what we want with.
  • For libc, the highest bit will always be 0. So we can match on this bit in a similar way, giving us 31 bits of space left over.

For the conversions in arg.rs, these errors can occur:

  • Conversion errors caused by null bytes in CString::new.
  • os_str() was unable to be properly converted to a UTF-8 string.
  • The name is too long for the stack based buffer... although I'd argue that ENAMETOOLONG covers this well enough already.

Ignoring the third error, let's say we use a one-bit determinant to decide between the first two errors. This leaves us with 14 bits to hold the index at which path processing failed; either the zero in the CString or the invalid UTF-8 cluster. This can hold a maximum index of 1 << 14 = 16384, which is probably high enough for most cases, and we can just use 1 << 14 as a sentinel value for "this length can't be properly represented".

Thoughts on this? We potentially ran into this issue as a part of dbus2/zbus#517, which makes it the second time I've stepped on this particular footgun.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants