Skip to content

USB Assert - In Endpoint #10862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AGlass0fMilk opened this issue Jun 19, 2019 · 22 comments
Closed

USB Assert - In Endpoint #10862

AGlass0fMilk opened this issue Jun 19, 2019 · 22 comments

Comments

@AGlass0fMilk
Copy link
Member

Description

After my PR #10689 was recently merged, I synced a project I'm working on up to Mbed-OS master. This project uses USB MSD.

Now, when I connect my device to a host (tried on Linux and Windows), I hit the following assert:

++ MbedOS Error Info ++                                                                                                       
Error Status: 0x80FF0144 Code: 324 Module: 255                                                                                
Error Message: Assertion failed: info->pending >= 1                                                                           
Location: 0x6549                                                                                                              
File: ./mbed-os/usb/device/USBDevice/USBDevice.cpp+955                                                                        
Error Value: 0x0                                                                                                              
Current Thread: main  Id: 0x2000155C Entry: 0x76B5 StackSize: 0x1000 StackMem: 0x200023B0 SP: 0x2003FF00                      
For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=NRF52840_DK                                               
-- MbedOS Error Info --       

I then changed back to my feature branch (https://github.com/AGlass0fMilk/mbed-os/tree/nrf52840-usbphy-implementation), rebuild and the issue is gone. My device mounts as /dev/sdb fine.

Not sure if anything changed in the USB stack that might be causing this. I don't see any other recent changes.

Essentially, my main code ends with (after setting up FAT file system from example):

USBMSD usb(&qspi_blockdevice);
while (true) {
	usb.process();
}

I am also printing to the debug UART. Other than that, there's not much else going on in the project.

I tested this on my USB Serial example here: https://github.com/AGlass0fMilk/mbed-usb-cdc-example.git

Same results. When I'm on my feature branch it works as expected. When I pull master (currently at 14b77c9) the code hits the assert as soon as I plug in USB.

I haven't had luck tracing down the commit that introduced this issue. My feature branch isn't that stale so that should help narrow it down.

Testing on nRF52840_DK with GCC_ARM toolchain.

Issue request type

[ ] Question
[ ] Enhancement
[X] Bug
@AGlass0fMilk
Copy link
Member Author

Did some testing where I rebased my feature branch against some newer commits that have been pulled into master until the issue came back. Rebasing on dc77c40 causes the issue to appear so it seems to be related to Cordio.

Maybe the USB interrupt priority needs to be higher?

@AGlass0fMilk
Copy link
Member Author

Rebased with df84eb1 (just before Cordio PR) and I can confirm it works as expected.

I tried adjusting the USB interrupt priority all the way to 0 (highest) and still the issue appeared, albeit maybe a little delayed compared to the lower priority.

I also tried removing Cordio from the build by removing the BLE feature from the nRF52840_DK and the issue still occurred. Not sure how that's happening.

@paul-szczepanek-arm Any idea what could be happening?

It'd be understandable if USB and BLE could not be used at the same time but simply linking BLE in shouldn't disable USB.

@AGlass0fMilk
Copy link
Member Author

If I revert the changes to targets.json introduced in dc77c40 the issue goes away. May be related to removing the soft device (although I don't use any soft-device aware functions that I know of. Maybe the power peripheral driver.)

The following changes to the MCU_NRF52840 target produces the issue:

image

Hopefully there isn't some weird reliance on the softdevice that the USB driver/hardware has...

@ciarmcom
Copy link
Member

Internal Jira reference: https://jira.arm.com/browse/MBOCUSTRIA-1302

@AGlass0fMilk
Copy link
Member Author

AGlass0fMilk commented Jun 20, 2019

My current theory is that the issue is related to how critical regions are handled when using the softdevice vs when not using the softdevice.

It seems that an IN interrupt is happening before the upper stack can officially queue a write operation. This violates the stack's logic and results in an assert. It seems to happen only on non-control endpoints? I have to do some more testing with different USB classes to see how the error changes.

Below are two screenshots of the same code (the usb serial example I linked to above). One compiled with softdevice support and one compiled without. The one without the SD abruptly stops after the assert happens.

Failure
Screenshot from 2019-06-19 22-13-55

No Failure:
Screenshot from 2019-06-19 22-14-04

@AGlass0fMilk
Copy link
Member Author

It seems the cause of the issue lies in the simplified critical region API being used by the Nordic USB driver (and presumably all Nordic drivers). Normally, with a softdevice-enabled build, the critical region API used by the Nordic drivers is redirected to sd_nvic_critical_region_enter/exit:

void app_util_critical_region_enter(uint8_t *p_nested)
{
#if __CORTEX_M == (0x04U)
ASSERT(APP_LEVEL_PRIVILEGED == privilege_level_get())
#endif
#if defined(SOFTDEVICE_PRESENT)
/* return value can be safely ignored */
(void) sd_nvic_critical_region_enter(p_nested);
#else
app_util_disable_irq();
#endif
}
void app_util_critical_region_exit(uint8_t nested)
{
#if __CORTEX_M == (0x04U)
ASSERT(APP_LEVEL_PRIVILEGED == privilege_level_get())
#endif
#if defined(SOFTDEVICE_PRESENT)
/* return value can be safely ignored */
(void) sd_nvic_critical_region_exit(nested);
#else
app_util_enable_irq();
#endif
}

This function is defined in nrf_nvic.h/c, which changes based on which softdevice "target" you enable (in this case, it's SOFTDEVICE_NONE):

uint32_t sd_nvic_critical_region_enter(uint8_t * p_is_nested_critical_region)
{
__disable_irq();
*p_is_nested_critical_region = (m_in_critical_region != 0);
m_in_critical_region++;
return NRF_SUCCESS;
}
uint32_t sd_nvic_critical_region_exit(uint8_t is_nested_critical_region)
{
m_in_critical_region--;
if (is_nested_critical_region == 0)
{
m_in_critical_region = 0;
__enable_irq();
}
return NRF_SUCCESS;
}

This is the same function used by the Mbed critical_section HAL implementation for nRF52-based targets:

void hal_critical_section_enter()
{
/* expect 1 or N calls but only lock on the first one */
if (outside_critical_region) {
sd_nvic_critical_region_enter(&outside_critical_region);
}
}
void hal_critical_section_exit()
{
/* unlock on first call */
outside_critical_region = 1;
sd_nvic_critical_region_exit(0);
}
bool hal_in_critical_section(void)
{
return !outside_critical_region;
}

However, when the macro SOFTDEVICE_PRESENT is disabled, you see in the first code block above the Nordic app_util critical region API gets retargeted to app_util_disable/enable_irq.

static uint32_t m_in_critical_region = 0;
void app_util_disable_irq(void)
{
__disable_irq();
m_in_critical_region++;
}
void app_util_enable_irq(void)
{
m_in_critical_region--;
if (m_in_critical_region == 0)
{
__enable_irq();
}
}

This has a critical region counter that is entirely unaware of the number of nested critical regions Mbed may be in at the time and so if it isn't in a nested critical region, it globally enables interrupts again and completely crashes Mbed's expectation of being in a critical section.

I tested the fix in my fork and it fixed this issue. Needs to be CI tested with all the other Nordic drivers (which I'm surprised still worked with this going on):

https://github.com/AGlass0fMilk/mbed-os/blob/f548f558da387be2ac88249c87d14dbd784ad81e/targets/TARGET_NORDIC/TARGET_NRF5x/TARGET_SDK_15_0/components/libraries/util/app_util_platform.c#L66-L101

I don't see a nice clean way of fixing this given the software architecture of Nordic's SDK. That discussion will be saved for the associated PR: #10881

@paul-szczepanek-arm
Copy link
Member

That is some sterling work, thank you.

@linlingao
Copy link
Contributor

Excellent work indeed. Looks like PRs have been merged, can you close this?

@AGlass0fMilk
Copy link
Member Author

Resolved by #10881

@facchinm
Copy link
Contributor

Hi @AGlass0fMilk ,
first of all thanks for the awesome patch.
After #10881 merge the bug is much less reproducible but I think I found a way to trigger it anyway.
Exposing as a SerialUSB (CDC) device and opening/closing the host serial port multiple times ( > 10 usually) triggers the same error path.

Here's the mbed debug output and the stacktrace

++ MbedOS Error Info ++
Error Status: 0x80FF0144 Code: 324 Module: 255
Error Message: Assertion failed: info->pending >= 1
Location: 0x123B9
File: ./mbed-os/usb/device/USBDevice/USBDevice.cpp+955
Error Value: 0x0
Current Thread: main  Id: 0x2000126C Entry: 0x13341 StackSize: 0x8000 StackMem: 0x20002040 SP: 0x2003FF04 
For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=ARDUINO_NANO33BLE
-- MbedOS Error Info --
#7  0x000123b8 in mbed_assert_internal ()
    at ./mbed-os/platform/mbed_assert.c:26
#8  0x00019136 in USBDevice::in ()
    at ./mbed-os/usb/device/USBDevice/USBDevice.cpp:955
#9  0x0001b2be in USBPhyHw::process ()
    at ./mbed-os/usb/device/targets/TARGET_NORDIC/TARGET_MCU_NRF52840/USBPhy_Nordic.cpp:528
#10 0x0001908c in USBDevice::start_process ()
    at ./mbed-os/usb/device/USBDevice/USBDevice.cpp:1555
#11 0x000172e6 in usbd_ep_data_handler ()
    at ./mbed-os/targets/TARGET_NORDIC/TARGET_NRF5x/TARGET_SDK_15_0/integration/nrfx/legacy/nrf_drv_usbd.c:1219
#12 0x0001774c in ev_epdata_handler ()
    at ./mbed-os/targets/TARGET_NORDIC/TARGET_NRF5x/TARGET_SDK_15_0/integration/nrfx/legacy/nrf_drv_usbd.c:1325
#13 0x00017fc6 in USBD_IRQHandler ()

Let me know if you need additional info or if it's enough to triage the issue 😉

@paul-szczepanek-arm I'd ask you to reopen the issue too (or I can open another one if needed), thanks!

@paul-szczepanek-arm
Copy link
Member

That's above my paygrade, @SeppoTakalo?

@AGlass0fMilk
Copy link
Member Author

Reopening as per @facchinm's comments.

@AGlass0fMilk AGlass0fMilk reopened this Jul 22, 2019
@AGlass0fMilk
Copy link
Member Author

@facchinm When reproducing the issue, how fast do you open/close the CDC port? Is it very rapid or after 10 times in an hour or something? Are you unplugging the device and plugging it back in?

I'm thinking the driver may need to be updated to deal with this in a more robust way. My current idea is:

  • Phy driver keeps track of number of pending In/Out events that should happen (just as the upper layer is doing)

  • When it encounters a situation where this assert will be triggered (eg: no pending events on some endpoint but an event is occurring anyway) it caches the event and forwards it next time the upper layers make a call and would be expecting the event.

@facchinm
Copy link
Contributor

Thanks for reopening @AGlass0fMilk !
Both fast and slow open/close trigger the issue, and I never completely unplug the device.
Your proposal to make the driver more reliable looks very reasonable, if you need any help for testing a solution just tag me on the PR 😉

@AGlass0fMilk
Copy link
Member Author

@facchinm I haven't been able to reproduce the issue yet. I am using some example code I wrote here: https://github.com/AGlass0fMilk/mbed-usb-cdc-example

I am currently building with master (0b8ae1b)

Can you provide some sample code and procedure for reproducing this issue? What software are you using to open and close the USB CDC/Serial port? What operating system is your USB host running? Can you give me any details of your hardware setup?

@facchinm
Copy link
Contributor

facchinm commented Jul 23, 2019

I'm in a very nonstandard environment (a wrapper built over mbed) but I've been able to replicate using this program (just remove the retarget if you are targeting a board with a correct porting layer)

#include "mbed.h"
#include "USBSerial.h"

USBSerial _serial(false);
PinName pin = P0_2;

mbed::UARTSerial serial(P1_3, P1_10, 115200);

namespace mbed {
FileHandle *mbed_override_console(int fd) {
  return &serial;
}

FileHandle *mbed_target_override_console(int fd) {
  return &serial;
}
}

int main() {
	_serial.connect();
	while (1) {
		if (_serial.connected()) {
			_serial.printf("%d\n", AnalogIn(pin).read_u16());
		}
		wait(0.001f);
	}
}

Moving to mbed master (and swapping target to NRF52840_DK) made it more difficult to trigger the issue, but eventually I made it. I was previously branching off 9cdfe37

++ MbedOS Error Info ++                                                                                                                            
Error Status: 0x80FF0144 Code: 324 Module: 255                                                                                                     
Error Message: Assertion failed: info->pending >= 1                                                                                                
Location: 0x1E25                                                                                                                                   
File: ./mbed-os/usb/device/USBDevice/USBDevice.cpp+955                                                                                             
Error Value: 0x0                                                                                                                                   
Current Thread: rtx_idle  Id: 0x20001888 Entry: 0x2F51 StackSize: 0x200 StackMem: 0x20001BD0 SP: 0x2003FF00                                        
For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=NRF52840_DK                                                                    
-- MbedOS Error Info --                                                                                                                            
                                                                                                                                                   
= System will be rebooted due to a fatal error =                                                                                                   
= Reboot count(=1) reached maximum, system will halt after rebooting =

As host OS I'm on Linux (kernel 5.2.2-arch1-1-ARCH), and to open/close the serial port I'm using the Arduino IDE (which uses jssc under the hood).

@AGlass0fMilk
Copy link
Member Author

So I tested this with an nRF52840_DK by modifying your code a little (my dev kit requires hardware flow control for the debug uart):

#include "mbed.h"
#include "USBSerial.h"

USBSerial _serial(false);
PinName pin = P0_2;

mbed::UARTSerial serial(USBTX, USBRX, 115200);

namespace mbed {
FileHandle *mbed_override_console(int fd) {
  serial.set_flow_control(mbed::SerialBase::RTSCTS, STDIO_UART_RTS, STDIO_UART_CTS);
  return &serial;
}
}

int main() {
	_serial.connect();
	while (1) {
		if (_serial.connected()) {
			_serial.printf("%d\n", AnalogIn(pin).read_u16());
		}
		printf("hello!\r\n");
		wait(0.001f);
	}
}

I was able to read from both the debug serial and the target's USB serial port simultaneously without issue. I was also able to open and close the target's USB serial port well over 10 times without crashing. I was using the Arduino IDE under a Windows environment.

My first guess in your case is somehow the wrapper layer or some other part of your system config/setup is affecting the interrupt latency of the USB driver.

Feel free to reach out via email if you want to discuss details of your setup that are not yet public. I noticed you're working on porting a new Arduino BLE board based on the nRF52840.

@facchinm
Copy link
Contributor

I retested both the mbed plain and the mbed derived code on Windows and, surprise, it doesn't crash at all...
Instead, both Linux and OSX trigger the issue. It looks like both the OS send multiple line coding packets on open, one for every parameter.
I'll investigate a little bit and report back, thanks for taking a look!

@AGlass0fMilk
Copy link
Member Author

@facchinm Should this issue be closed? Is the USB implementation working for the Arduino-MbedOS core?

@facchinm
Copy link
Contributor

In fact no, I added this hacky patch to the mbed tree we are using to avoid getting a million user tickets

From 788235de9cd4821884e36c98c5751902cae4c43e Mon Sep 17 00:00:00 2001
From: Martino Facchin <[email protected]>
Date: Wed, 31 Jul 2019 12:48:04 +0200
Subject: [PATCH] HACK: avoid #10862 by not firing the assert

---
 usb/device/USBDevice/USBDevice.cpp | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/usb/device/USBDevice/USBDevice.cpp b/usb/device/USBDevice/USBDevice.cpp
index 8291fbce06..da69c4c1c6 100644
--- a/usb/device/USBDevice/USBDevice.cpp
+++ b/usb/device/USBDevice/USBDevice.cpp
@@ -952,10 +952,11 @@ void USBDevice::in(usb_ep_t endpoint)
 
     endpoint_info_t *info = &_endpoint_info[EP_TO_INDEX(endpoint)];
 
-    MBED_ASSERT(info->pending >= 1);
-    info->pending -= 1;
-    if (info->callback) {
-        info->callback();
+    if (info->pending >= 1) {
+        info->pending -= 1;
+        if (info->callback) {
+            info->callback();
+        }
     }
 }

Of course it's not acceptable upstream, so I didn't either try to publish it 🙂

facchinm added a commit to arduino/mbed-os that referenced this issue Nov 13, 2020
facchinm added a commit to arduino/mbed-os that referenced this issue Nov 13, 2020
facchinm added a commit to arduino/mbed-os that referenced this issue Nov 20, 2020
facchinm added a commit to arduino/mbed-os that referenced this issue Nov 23, 2020
facchinm added a commit to arduino/mbed-os that referenced this issue Nov 24, 2020
facchinm added a commit to arduino/mbed-os that referenced this issue Dec 3, 2020
giulcioffi pushed a commit to giulcioffi/mbed-os that referenced this issue Dec 9, 2020
facchinm added a commit to arduino/mbed-os that referenced this issue Dec 10, 2020
facchinm added a commit to arduino/mbed-os that referenced this issue Dec 10, 2020
facchinm added a commit to arduino/mbed-os that referenced this issue Dec 11, 2020
facchinm added a commit to arduino/mbed-os that referenced this issue Jan 4, 2021
facchinm added a commit to arduino/mbed-os that referenced this issue Jan 4, 2021
facchinm added a commit to arduino/mbed-os that referenced this issue Jan 5, 2021
facchinm added a commit to arduino/mbed-os that referenced this issue Feb 5, 2021
pennam pushed a commit to arduino/mbed-os that referenced this issue Mar 8, 2021
giulcioffi pushed a commit to giulcioffi/mbed-os that referenced this issue Mar 11, 2021
pennam pushed a commit to arduino/mbed-os that referenced this issue May 10, 2021
pennam pushed a commit to arduino/mbed-os that referenced this issue Jun 21, 2021
pennam pushed a commit to arduino/mbed-os that referenced this issue Jun 21, 2021
pennam pushed a commit to arduino/mbed-os that referenced this issue Jun 21, 2021
facchinm added a commit to facchinm/mbed-os that referenced this issue Jun 28, 2021
facchinm added a commit to arduino/mbed-os that referenced this issue Jul 26, 2021
facchinm added a commit to arduino/mbed-os that referenced this issue Sep 22, 2021
facchinm added a commit to arduino/mbed-os that referenced this issue Feb 3, 2022
facchinm added a commit to arduino/mbed-os that referenced this issue Feb 7, 2022
facchinm added a commit to arduino/mbed-os that referenced this issue Jun 20, 2022
facchinm added a commit to arduino/mbed-os that referenced this issue Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants