Skip to content

nRF70: TX packets silently dropped #88857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JordanYates opened this issue Apr 21, 2025 · 2 comments
Open

nRF70: TX packets silently dropped #88857

JordanYates opened this issue Apr 21, 2025 · 2 comments
Assignees
Labels
area: Wi-Fi Wi-Fi bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: low Low impact/importance bug

Comments

@JordanYates
Copy link
Collaborator

JordanYates commented Apr 21, 2025

Describe the bug

The TX path of the nRF70 WiFi driver silently* drops TX packets if too many packets are queued at once.
https://github.com/zephyrproject-rtos/nrf_wifi/blob/e2c9a783448d919d191591be04ae9f3dd0643027/fw_if/umac_if/src/system/tx.c#L1102

if (qlen >= NRF70_MAX_TX_PENDING_QLEN) {
	goto out;
}

* an error code is returned, but in practice is ignored all the way back up the stack

To Reproduce

Run zephyr/samples/net/zperf on a nRF7002dk, with a reduced CONFIG_NRF70_MAX_TX_PENDING_QLEN or a very large "rate" parameter.

Expected behavior

Packets that have been properly created, allocated, and pushed all the way down to the lowest layers of the Wi-Fi driver shouldn't be arbitrarily dropped just because a large number of packets have been previously queued.

Impact

Dropped packets, broken protocols, etc.

This also leads to super un-intuitive behavior such as increasing the heap size leading to increased packet loss (since more packets can be pending with the extra memory).

Additional context

I'm not sure what the goal of NRF70_MAX_TX_PENDING_QLEN is supposed to be, since the memory is already allocated (mostly) at this point, and dropping packets is infinitely worse than the packet simply being sent later than you might otherwise expect.

This problem gets worse at lower SPI/QSPI bus speeds, since the ability of the driver to clear out packets is reduced.
The application is also totally unaware that this is happening, and has no way to implement any backoff mechanism.

The two current workarounds:

  • arbitrarily add k_sleep between packet sends,in order to limit the packets fed to the driver
  • Increase CONFIG_NRF70_MAX_TX_PENDING_QLEN to the point the option stops being relevent

The second option can also lead to deadlocks, since nrf_wifi_utils_q_enqueue is trying to allocate memory with the vif_lock held (see #88781).

@JordanYates JordanYates added area: Wi-Fi Wi-Fi bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx labels Apr 21, 2025
@github-project-automation github-project-automation bot moved this to To triage in nRF platform Apr 21, 2025
@krish2718
Copy link
Collaborator

The application is also totally unaware that this is happening, and has no way to implement any backoff mechanism.

I remember discussing or raising an issue to implement queue stop/start APIs like they have in Linux net_if_stop/start and drivers use this to flow control to avoid any packet drops. Let me find that reference and link it here.

@krish2718
Copy link
Collaborator

Sorry, it's an internal discussion, but quoting here in case it's useful: :Linux API https://elixir.bootlin.com/linux/v6.15-rc3/A/ident/netif_stop_queue

nRF70 driver does have a pending_q i.e., Queue to be used when nRF70 is busy and doesn't accept any further frames, but it has a build-time limit (for memory control), so, if Queue is full, it will start dropping packets.

Typically in Linux, if 3/4 (1st watermark) of the queue is full, then we notify the stack to stop sending and then the stack starts queueing, and informs the application using sock_send errors (don't remember exact, -EAGAIN or -ENOBUFS) and then application can start queuing, and once 1/2 buffers are available (2nd watermark) then queues are started, this back-pressure mechanism ensures no packet drops.

@danieldegrasse danieldegrasse added the priority: low Low impact/importance bug label Apr 22, 2025
@nordic-piks nordic-piks moved this from To triage to Backlog in nRF platform Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Wi-Fi Wi-Fi bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: low Low impact/importance bug
Projects
Status: Backlog
Development

No branches or pull requests

3 participants