Skip to content

nRF70: deadlock on memory allocation & mutex #88781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JordanYates opened this issue Apr 18, 2025 · 1 comment
Open

nRF70: deadlock on memory allocation & mutex #88781

JordanYates opened this issue Apr 18, 2025 · 1 comment
Assignees
Labels
area: Wi-Fi Wi-Fi bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: low Low impact/importance bug

Comments

@JordanYates
Copy link
Collaborator

JordanYates commented Apr 18, 2025

Describe the bug

The nrf_wifi driver can block forever on memory allocation

First instance, blocking forever in tx_q (with the vif_lock mutex held in nrf_wifi_if_send):
Image
Second instance, blocking forever in nrf70_bh_wq:
Image

Other threads (nrf70_intr_wq and shell_uart end up blocked forever as well due to the mutex never being released.

To Reproduce

Run the zperf sample application on a nRF7002dk:

  1. west build -b nrf7002dk/nrf5340/cpuapp -S wifi-ipv4 zephyr/samples/net/zperf/ -p
  2. west flash --erase
  3. On a PC: .\Downloads\iperf-2.2.1-win64.exe --server --interval 1 --udp
  4. Via shell: wifi connect -s $SSID -p $PSK -k 1
  5. Via shell: zperf udp upload $PC_IP 5001 2 1K 100M
  6. Repeat 5 until sample stops responding

Expected behavior

Driver should not deadlock, zperf sample should always complete.

Impact

nRF70 driver is not suitable for real-world applications if it can randomly deadlock.

Logs and console output
Shell output while running testing:

Connection requested
Connected
[00:00:23.291,564] <wrn> net_dhcpv4: DHCP server provided more DNS servers than can be saved
[00:00:23.298,522] <wrn> net_dhcpv4: DHCP server provided more DNS servers than can be saved
[00:00:23.298,553] <inf> net_dhcpv4: Received: 192.168.20.23
[00:00:23.298,614] <inf> net_config: IPv4 address: 192.168.20.23
[00:00:23.298,614] <inf> net_config: Lease time: 86400 seconds
[00:00:23.298,614] <inf> net_config: Subnet: 255.255.255.0
[00:00:23.298,645] <inf> net_config: Router: 192.168.20.1
uart:~$ > zperf udp upload 192.168.20.70 5001 2 1K 100M
zperf udp upload 192.168.20.70 5001 2 1K 100M
Remote port is 5001
Connecting to 192.168.20.70
Duration:       2.00 s
Packet size:    1000 bytes
Rate:           100000 kbps
Starting...
Rate:           100.00 Mbps
Packet duration 78 us
-
Upload completed!
Statistics:             server  (client)
Duration:               1.92 s  (2.00 s)
Num packets:            1494    (1494)
Num packets out order:  0
Num packets lost:       0
Jitter:                 478 us
Rate:                   6.20 Mbps       (5.97 Mbps)
uart:~$ > zperf udp upload 192.168.20.70 5001 2 1K 100M
zperf udp upload 192.168.20.70 5001 2 1K 100M
Remote port is 5001
Connecting to 192.168.20.70
Duration:       2.00 s
Packet size:    1000 bytes
Rate:           100000 kbps
Starting...
Rate:           100.00 Mbps
Packet duration 78 us

> Sample deadlocks at this point

Environment (please complete the following information):

Additional Context

Memory allocation shims block forever:

static void *zep_shim_mem_alloc(size_t size)
{
size_t size_aligned = ROUND_UP(size, 4);
return k_heap_aligned_alloc(&wifi_drv_ctrl_mem_pool, WORD_SIZE, size_aligned, K_FOREVER);
}
static void *zep_shim_data_mem_alloc(size_t size)
{
size_t size_aligned = ROUND_UP(size, 4);
return k_heap_aligned_alloc(&wifi_drv_data_mem_pool, WORD_SIZE, size_aligned, K_FOREVER);
}

The IRQ event processor also performs ALL operations under the "IRQ spinlock", which is actually just the same mutex used everywhere, meaning that any memory allocation in the IRQ path can also deadlock the driver:
https://github.com/zephyrproject-rtos/nrf_wifi/blob/e2c9a783448d919d191591be04ae9f3dd0643027/hw_if/hal/src/system/hal_api.c#L57

@JordanYates
Copy link
Collaborator Author

Another path through TX that can deadlock with vif_lock held:
Image

@danieldegrasse danieldegrasse added the priority: low Low impact/importance bug label Apr 22, 2025
@nordic-piks nordic-piks moved this from To triage to Backlog in nRF platform Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Wi-Fi Wi-Fi bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: low Low impact/importance bug
Projects
Status: Backlog
Development

No branches or pull requests

3 participants