Skip to content

fix(udp/fast-apple): ignore empty cmsghdr #2154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 14, 2025
Merged

Conversation

mxinden
Copy link
Collaborator

@mxinden mxinden commented Feb 12, 2025

On MacOS < 14, with fast-apple-datapath feature, calls to libc::CMSG_NXTHDR might continuously return empty (i.e. all zero) libc::cmsghdr instead of a null pointer. This results in a busy loop in decode_recv:

let cmsg_iter = unsafe { cmsg::Iter::new(hdr) };
for cmsg in cmsg_iter {
	match (cmsg.cmsg_level, cmsg.cmsg_type) {

https://github.com/quinn-rs/quinn/blob/b4378bb39dab4b58a1e6a3fea4fff9f87033dab6/quinn-udp/src/unix.rs#L685C1-L687C50

This commit fixes the above, returning a null_mut() pointer on an empty libc::cmsgdhr, thus terminating the cmsg_iter.

See also mozilla/neqo#2427 for details.

@mxinden
Copy link
Collaborator Author

mxinden commented Feb 12, 2025

For the record, here is a Mozilla CI run, testing this patch on MacOS 10.15:

https://treeherder.mozilla.org/jobs?repo=try&revision=cc4cbf5f7bd0a48dcab741bb09c0c99cc94fa6da

@larseggert
Copy link
Contributor

larseggert commented Feb 12, 2025

Eijatuuli@eija quinn % RUST_LOG=trace cargo test -p quinn-udp --features fast-apple-datapath
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.16s
     Running unittests src/lib.rs (target/debug/deps/quinn_udp-921787c1926eb24c)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/tests.rs (target/debug/deps/tests-ba553e2506d92958)

running 6 tests
test basic ... ok
test ecn_v6 ... ok
test ecn_v4 ... ok
test ecn_v4_mapped_v6 ... ok
test gso ... ignored
test ecn_v6_dualstack ... ok

test result: ok. 5 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests quinn_udp

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

@mxinden mxinden changed the title fix(udp/fast-apple): ignore empty cmsg fix(udp/fast-apple): ignore empty cmsghdr Feb 12, 2025
On MacOS < 14, with `fast-apple-datapath` feature, calls to
`libc::CMSG_NXTHDR` might continuously return empty (i.e. all zero)
`libc::cmsghdr` instead of a null pointer. This results in a busy loop
in `decode_recv`:

``` rust
let cmsg_iter = unsafe { cmsg::Iter::new(hdr) };
for cmsg in cmsg_iter {
	match (cmsg.cmsg_level, cmsg.cmsg_type) {
```
https://github.com/quinn-rs/quinn/blob/b4378bb39dab4b58a1e6a3fea4fff9f87033dab6/quinn-udp/src/unix.rs#L685C1-L687C50

This commit fixes the above, returning a `null_mut()` pointer on an
empty `libc::cmsgdhr`, thus terminating the `cmsg_iter`.

See also mozilla/neqo#2427 for details.
@mxinden
Copy link
Collaborator Author

mxinden commented Feb 12, 2025

For the record, here is a Mozilla CI run, testing this patch on MacOS 10.15:

https://treeherder.mozilla.org/jobs?repo=try&revision=cc4cbf5f7bd0a48dcab741bb09c0c99cc94fa6da

All green 🎉

With quinn-udp unit tests succeeding and and Mozilla CI succeeding, this pull request is ready for review.

@mxinden mxinden marked this pull request as ready for review February 12, 2025 13:23
@mxinden
Copy link
Collaborator Author

mxinden commented Feb 13, 2025

@Ralith any objections?

@djc given that this is not a breaking change and given that this change is behind the fast-apple-datapath feature only, would you be opposed to merging and publishing here soon? I can then include this patch in Firefox and thus we get more data.

@djc
Copy link
Member

djc commented Feb 13, 2025

@djc given that this is not a breaking change and given that this change is behind the fast-apple-datapath feature only, would you be opposed to merging and publishing here soon? I can then include this patch in Firefox and thus we get more data.

Sounds good to me!

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the likelihood that other BSDs have a similar bug? Should we perhaps guard against this in the Iterator implementation?

let current = self.cmsg.take()?;
self.cmsg = unsafe { self.hdr.cmsg_nxt_hdr(current).as_ref() };
Some(current)

@mxinden
Copy link
Collaborator Author

mxinden commented Feb 14, 2025

  • On MacOS < 14 with fast-apple-datapath feature, thus using recvmsg_x, this bug happens consistently, i.e. on each recv call. Note that on versions >= 14 this issue no longer occurs.
  • On MacOS without fast-apple-datapath feature, thus using recvmsg, we have not seen this bug happen on any version.
  • The various BSD versions are tested as part of Quinn's CI, Neqo's CI as well as various BSD based Firefox Nightly users. We have not seen this issue occur.

Thus I assume this bug is strictly related to how recvmsg_x writes to the control messages on Macos < 14, and thus I don't think we need a general fix in Iterator.

Would you agree @thomaseizinger?

@thomaseizinger
Copy link
Contributor

Thus I assume this bug is strictly related to how recvmsg_x writes to the control messages on Macos < 14, and thus I don't think we need a general fix in Iterator.

Would you agree @thomaseizinger?

Makes sense to me!

@djc
Copy link
Member

djc commented Feb 14, 2025

@mxinden I have invited you as a crates.io owner for quinn-udp. Want to take care of merging/publishing?

@mxinden
Copy link
Collaborator Author

mxinden commented Feb 14, 2025

That is great news. Thank you @djc. I will merge, tag and publish now.

@mxinden mxinden added this pull request to the merge queue Feb 14, 2025
Merged via the queue into quinn-rs:main with commit f4bd4c2 Feb 14, 2025
20 checks passed
@mxinden mxinden deleted the empty-cmsg branch February 14, 2025 12:01
@mxinden
Copy link
Collaborator Author

mxinden commented Feb 14, 2025

Tagged and published.

Thank you for the help everyone!

thomaseizinger added a commit to thomaseizinger/quinn that referenced this pull request Apr 15, 2025
In quinn-rs#2154, we added a piece of logic to `cmsg_nxt_hdr` that - within
the `apple-fast-datapath` code inspects the `cmsg_len` property and
checks if we can actually fit another `cmsghdr` into it.

This however only makes sense when we are _reading_ a buffer of cmsgs
that has been initialised by the kernel.

When we are _sending_ datagrams, the cmsg buffer is initialised with
all zeros and for every cmsg we want to attach to a datagram, we
advance the buffer using CMSG_NXTHDR. In this case, all the fields
within `cmsghdr` are 0 initialised and it doesn't make sense to check
their length. In fact, doing so results in a false-positive panic
triggered by `Encoder::push` because it thinks that the cmsg buffer is
too short to accomodate the message it wants to write.
thomaseizinger added a commit to thomaseizinger/quinn that referenced this pull request Apr 15, 2025
In quinn-rs#2154, we added a piece of logic to `cmsg_nxt_hdr` that - within
the `apple-fast-datapath` code inspects the `cmsg_len` property and
checks if we can actually fit another `cmsghdr` into it.

This however only makes sense when we are _reading_ a buffer of cmsgs
that has been initialised by the kernel.

When we are _sending_ datagrams, the cmsg buffer is initialised with
all zeros and for every cmsg we want to attach to a datagram, we
advance the buffer using CMSG_NXTHDR. In this case, all the fields
within `cmsghdr` are 0 initialised and it doesn't make sense to check
their length. In fact, doing so results in a false-positive panic
triggered by `Encoder::push` because it thinks that the cmsg buffer is
too short to accomodate the message it wants to write.

To fix this issue, we split the `cmsg_nxt_hdr` API into two:

- One for decoding the next cmsg
- One for simply advancing it when we are in the process of writing
  `cmsghdr`s.
thomaseizinger added a commit to thomaseizinger/quinn that referenced this pull request Apr 15, 2025
In quinn-rs#2154, we added a piece of logic to `cmsg_nxt_hdr` that - within
the `apple-fast-datapath` code - inspects the `cmsg_len` property and
checks if we can actually fit another `cmsghdr` into it.

This however only makes sense when we are _reading_ a buffer of cmsgs
that has been initialised by the kernel.

When we are _sending_ datagrams, the cmsg buffer is initialised with
all zeros and for every cmsg we want to attach to a datagram, we
advance the buffer using CMSG_NXTHDR. In this case, all the fields
within `cmsghdr` are 0 initialised and it doesn't make sense to check
their length. In fact, doing so results in a false-positive panic
triggered by `Encoder::push` because it thinks that the cmsg buffer is
too short to accomodate the message it wants to write.

To fix this issue, we split the `cmsg_nxt_hdr` API into two:

- One for decoding the next cmsg
- One for simply advancing it when we are in the process of writing
  `cmsghdr`s.
thomaseizinger added a commit to thomaseizinger/quinn that referenced this pull request Apr 15, 2025
In quinn-rs#2154, we added a piece of logic to `cmsg_nxt_hdr` that -
within the `apple-fast-datapath` code inspects the `cmsg_len` property
and checks if we can actually fit another `cmsghdr` into it.

This however only makes sense when we are _reading_ a buffer of cmsgs
that has been initialised by the kernel.

When we are _sending_ datagrams, the cmsg buffer is initialised with
all zeros and for every cmsg we want to attach to a datagram, we
advance the buffer using CMSG_NXTHDR. In this case, all the fields
within `cmsghdr` are 0 initialised and it doesn't make sense to check
their length. In fact, doing so results in a false-positive panic
triggered by `Encoder::push` because it thinks that the cmsg buffer is
too short to accomodate the message it wants to write.

To fix this issue, we move the check to the `Iterator` implementation,
guarded by the `apple_fast` cfg. This allows us to use the
`cmsg_nxt_hdr` function for both reading and writing cmsgs.
thomaseizinger added a commit to thomaseizinger/quinn that referenced this pull request Apr 15, 2025
In quinn-rs#2154, we added a piece of logic to `cmsg_nxt_hdr` that - within
the `apple-fast-datapath` code - inspects the `cmsg_len` property and
checks if we can actually fit another `cmsghdr` into it.

This however only makes sense when we are _reading_ a buffer of cmsgs
that has been initialised by the kernel.

When we are _sending_ datagrams, the cmsg buffer is initialised with
all zeros and for every cmsg we want to attach to a datagram, we
advance the buffer using CMSG_NXTHDR. In this case, all the fields
within `cmsghdr` are 0 initialised and it doesn't make sense to check
their length. In fact, doing so results in a false-positive panic
triggered by `Encoder::push` because it thinks that the cmsg buffer is
too short to accomodate the message it wants to write.

To fix this issue, we split the `cmsg_nxt_hdr` API into two:

- One for decoding the next cmsg
- One for simply advancing it when we are in the process of writing
  `cmsghdr`s.
thomaseizinger added a commit to thomaseizinger/quinn that referenced this pull request Apr 15, 2025
In quinn-rs#2154, we added a piece of logic to `cmsg_nxt_hdr` that -
within the `apple-fast-datapath` code inspects the `cmsg_len` property
and checks if we can actually fit another `cmsghdr` into it.

This however only makes sense when we are _reading_ a buffer of cmsgs
that has been initialised by the kernel.

When we are _sending_ datagrams, the cmsg buffer is initialised with
all zeros and for every cmsg we want to attach to a datagram, we
advance the buffer using CMSG_NXTHDR. In this case, all the fields
within `cmsghdr` are 0 initialised and it doesn't make sense to check
their length. In fact, doing so results in a false-positive panic
triggered by `Encoder::push` because it thinks that the cmsg buffer is
too short to accomodate the message it wants to write.

To fix this issue, we move the check to the `Iterator` implementation,
guarded by the `apple_fast` cfg. This allows us to use the
`cmsg_nxt_hdr` function for both reading and writing cmsgs.
thomaseizinger added a commit to thomaseizinger/quinn that referenced this pull request Apr 15, 2025
In quinn-rs#2154, we added a piece of logic to `cmsg_nxt_hdr` that -
within the `apple-fast-datapath` code inspects the `cmsg_len` property
and checks if we can actually fit another `cmsghdr` into it.

This however only makes sense when we are _reading_ a buffer of cmsgs
that has been initialised by the kernel.

When we are _sending_ datagrams, the cmsg buffer is initialised with
all zeros and for every cmsg we want to attach to a datagram, we
advance the buffer using CMSG_NXTHDR. In this case, all the fields
within `cmsghdr` are 0 initialised and it doesn't make sense to check
their length. In fact, doing so results in a false-positive panic
triggered by `Encoder::push` because it thinks that the cmsg buffer is
too short to accomodate the message it wants to write.

To fix this issue, we move the check to the `Iterator` implementation,
guarded by the `apple_fast` cfg. This allows us to use the
`cmsg_nxt_hdr` function for both reading and writing cmsgs.
thomaseizinger added a commit to thomaseizinger/quinn that referenced this pull request Apr 15, 2025
In quinn-rs#2154, we added a piece of logic to `cmsg_nxt_hdr` that -
within the `apple-fast-datapath` code inspects the `cmsg_len` property
and checks if we can actually fit another `cmsghdr` into it.

This however only makes sense when we are _reading_ a buffer of cmsgs
that has been initialised by the kernel.

When we are _sending_ datagrams, the cmsg buffer is initialised with
all zeros and for every cmsg we want to attach to a datagram, we
advance the buffer using CMSG_NXTHDR. In this case, all the fields
within `cmsghdr` are 0 initialised and it doesn't make sense to check
their length. In fact, doing so results in a false-positive panic
triggered by `Encoder::push` because it thinks that the cmsg buffer is
too short to accomodate the message it wants to write.

To fix this issue, we move the check to the `Iterator` implementation,
guarded by the `apple_fast` cfg. This allows us to use the
`cmsg_nxt_hdr` function for both reading and writing cmsgs.
thomaseizinger added a commit to thomaseizinger/quinn that referenced this pull request Apr 15, 2025
In quinn-rs#2154, we added a piece of logic to `cmsg_nxt_hdr` that -
within the `apple-fast-datapath` code inspects the `cmsg_len` property
and checks if we can actually fit another `cmsghdr` into it.

This however only makes sense when we are _reading_ a buffer of cmsgs
that has been initialised by the kernel.

When we are _sending_ datagrams, the cmsg buffer is initialised with
all zeros and for every cmsg we want to attach to a datagram, we
advance the buffer using CMSG_NXTHDR. In this case, all the fields
within `cmsghdr` are 0 initialised and it doesn't make sense to check
their length. In fact, doing so results in a false-positive panic
triggered by `Encoder::push` because it thinks that the cmsg buffer is
too short to accomodate the message it wants to write.

To fix this issue, we move the check to the `Iterator` implementation,
guarded by the `apple_fast` cfg. This allows us to use the
`cmsg_nxt_hdr` function for both reading and writing cmsgs.
thomaseizinger added a commit to thomaseizinger/quinn that referenced this pull request Apr 15, 2025
In quinn-rs#2154, we added a piece of logic to `cmsg_nxt_hdr` that -
within the `apple-fast-datapath` code - inspects the `cmsg_len`
property and checks if we can actually fit another `cmsghdr` into it.

This however only makes sense when we are _reading_ a buffer of cmsgs
that has been initialised by the kernel.

When we are _sending_ datagrams, the cmsg buffer is initialised with
all zeros and for every cmsg we want to attach to a datagram, we
advance the buffer using CMSG_NXTHDR. In this case, all the fields
within `cmsghdr` are 0 initialised and it doesn't make sense to check
their length. In fact, doing so results in a false-positive panic
triggered by `Encoder::push` because it thinks that the cmsg buffer is
too short to accomodate the message it wants to write.

To fix this issue, we move the check to the `Iterator` implementation,
guarded by the `apple_fast` cfg. This allows us to use the
`cmsg_nxt_hdr` function for both reading and writing cmsgs.
thomaseizinger added a commit to thomaseizinger/quinn that referenced this pull request Apr 16, 2025
In quinn-rs#2154, we added a piece of logic to `cmsg_nxt_hdr` that -
within the `apple-fast-datapath` code - inspects the `cmsg_len`
property and checks if we can actually fit another `cmsghdr` into it.

This however only makes sense when we are _reading_ a buffer of cmsgs
that has been initialised by the kernel.

When we are _sending_ datagrams, the cmsg buffer is initialised with
all zeros and for every cmsg we want to attach to a datagram, we
advance the buffer using CMSG_NXTHDR. In this case, all the fields
within `cmsghdr` are 0 initialised and it doesn't make sense to check
their length. In fact, doing so results in a false-positive panic
triggered by `Encoder::push` because it thinks that the cmsg buffer is
too short to accomodate the message it wants to write.

To fix this issue, we move the check to the `Iterator` implementation,
guarded by the `apple_fast` cfg. This allows us to use the
`cmsg_nxt_hdr` function for both reading and writing cmsgs.
thomaseizinger added a commit to thomaseizinger/quinn that referenced this pull request Apr 16, 2025
In quinn-rs#2154, we added a piece of logic to `cmsg_nxt_hdr` that -
within the `apple-fast-datapath` code - inspects the `cmsg_len`
property and checks if we can actually fit another `cmsghdr` into it.

This however only makes sense when we are _reading_ a buffer of cmsgs
that has been initialised by the kernel.

When we are _sending_ datagrams, the cmsg buffer is initialised with
all zeros and for every cmsg we want to attach to a datagram, we
advance the buffer using CMSG_NXTHDR. In this case, all the fields
within `cmsghdr` are 0 initialised and it doesn't make sense to check
their length. In fact, doing so results in a false-positive panic
triggered by `Encoder::push` because it thinks that the cmsg buffer is
too short to accomodate the message it wants to write.

To fix this issue, we move the check to the `Iterator` implementation,
guarded by the `apple_fast` cfg. This allows us to use the
`cmsg_nxt_hdr` function for both reading and writing cmsgs.
thomaseizinger added a commit to thomaseizinger/quinn that referenced this pull request Apr 16, 2025
In quinn-rs#2154, we added a piece of logic to `cmsg_nxt_hdr` that -
within the `apple-fast-datapath` code - inspects the `cmsg_len`
property and checks if we can actually fit another `cmsghdr` into it.

This however only makes sense when we are _reading_ a buffer of cmsgs
that has been initialised by the kernel.

When we are _sending_ datagrams, the cmsg buffer is initialised with
all zeros and for every cmsg we want to attach to a datagram, we
advance the buffer using CMSG_NXTHDR. In this case, all the fields
within `cmsghdr` are 0 initialised and it doesn't make sense to check
their length. In fact, doing so results in a false-positive panic
triggered by `Encoder::push` because it thinks that the cmsg buffer is
too short to accomodate the message it wants to write.

To fix this issue, we move the check to the `Iterator` implementation,
guarded by the `apple_fast` cfg. This allows us to use the
`cmsg_nxt_hdr` function for both reading and writing cmsgs.
github-merge-queue bot pushed a commit that referenced this pull request Apr 18, 2025
In #2154, we added a piece of logic to `cmsg_nxt_hdr` that -
within the `apple-fast-datapath` code - inspects the `cmsg_len`
property and checks if we can actually fit another `cmsghdr` into it.

This however only makes sense when we are _reading_ a buffer of cmsgs
that has been initialised by the kernel.

When we are _sending_ datagrams, the cmsg buffer is initialised with
all zeros and for every cmsg we want to attach to a datagram, we
advance the buffer using CMSG_NXTHDR. In this case, all the fields
within `cmsghdr` are 0 initialised and it doesn't make sense to check
their length. In fact, doing so results in a false-positive panic
triggered by `Encoder::push` because it thinks that the cmsg buffer is
too short to accomodate the message it wants to write.

To fix this issue, we move the check to the `Iterator` implementation,
guarded by the `apple_fast` cfg. This allows us to use the
`cmsg_nxt_hdr` function for both reading and writing cmsgs.
mxinden added a commit to mxinden/neqo that referenced this pull request May 16, 2025
The `quinn-udp` `fast-apple-datapath` Cargo feature allows Neqo to use
the private MacOS multi-message IO functions `sendmsg_x` and
`recvmsg_x`.

We have seen issues with `fast-apple-datapath`:

- quinn-rs/quinn#2214
- quinn-rs/quinn#2154

To reduce the risk of the upcoming larger roll out of the Fast UDP for
Firefox project to Firefox Release, disable `fast-apple-datapath` for
now.  Enabling it can then be a separate project afterwards.
mxinden added a commit to mxinden/neqo that referenced this pull request May 16, 2025
The `quinn-udp` `fast-apple-datapath` Cargo feature allows Neqo to use
the private MacOS multi-message IO functions `sendmsg_x` and
`recvmsg_x`.

We have seen issues with `fast-apple-datapath`:

- quinn-rs/quinn#2214
- quinn-rs/quinn#2154

To reduce the risk of the upcoming larger roll out of the Fast UDP for
Firefox project to Firefox Release, disable `fast-apple-datapath` for
now.  Enabling it can then be a separate project afterwards.
github-merge-queue bot pushed a commit to mozilla/neqo that referenced this pull request May 16, 2025
The `quinn-udp` `fast-apple-datapath` Cargo feature allows Neqo to use
the private MacOS multi-message IO functions `sendmsg_x` and
`recvmsg_x`.

We have seen issues with `fast-apple-datapath`:

- quinn-rs/quinn#2214
- quinn-rs/quinn#2154

To reduce the risk of the upcoming larger roll out of the Fast UDP for
Firefox project to Firefox Release, disable `fast-apple-datapath` for
now.  Enabling it can then be a separate project afterwards.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants