-
Notifications
You must be signed in to change notification settings - Fork 3k
USB Assert - In Endpoint #10862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Did some testing where I rebased my feature branch against some newer commits that have been pulled into master until the issue came back. Rebasing on dc77c40 causes the issue to appear so it seems to be related to Cordio. Maybe the USB interrupt priority needs to be higher? |
Rebased with df84eb1 (just before Cordio PR) and I can confirm it works as expected. I tried adjusting the USB interrupt priority all the way to 0 (highest) and still the issue appeared, albeit maybe a little delayed compared to the lower priority. I also tried removing Cordio from the build by removing the @paul-szczepanek-arm Any idea what could be happening? It'd be understandable if USB and BLE could not be used at the same time but simply linking BLE in shouldn't disable USB. |
If I revert the changes to The following changes to the Hopefully there isn't some weird reliance on the softdevice that the USB driver/hardware has... |
Internal Jira reference: https://jira.arm.com/browse/MBOCUSTRIA-1302 |
My current theory is that the issue is related to how critical regions are handled when using the softdevice vs when not using the softdevice. It seems that an Below are two screenshots of the same code (the usb serial example I linked to above). One compiled with softdevice support and one compiled without. The one without the SD abruptly stops after the assert happens. |
It seems the cause of the issue lies in the simplified critical region API being used by the Nordic USB driver (and presumably all Nordic drivers). Normally, with a softdevice-enabled build, the critical region API used by the Nordic drivers is redirected to Lines 64 to 90 in bf78dc4
This function is defined in Lines 103 to 123 in bf78dc4
This is the same function used by the Mbed mbed-os/targets/TARGET_NORDIC/TARGET_NRF5x/TARGET_NRF52/critical_section_api.c Lines 25 to 43 in bf78dc4
However, when the macro Lines 47 to 62 in bf78dc4
This has a critical region counter that is entirely unaware of the number of nested critical regions Mbed may be in at the time and so if it isn't in a nested critical region, it globally enables interrupts again and completely crashes Mbed's expectation of being in a critical section. I tested the fix in my fork and it fixed this issue. Needs to be CI tested with all the other Nordic drivers (which I'm surprised still worked with this going on): I don't see a nice clean way of fixing this given the software architecture of Nordic's SDK. That discussion will be saved for the associated PR: #10881 |
That is some sterling work, thank you. |
Excellent work indeed. Looks like PRs have been merged, can you close this? |
Resolved by #10881 |
Hi @AGlass0fMilk , Here's the mbed debug output and the stacktrace
Let me know if you need additional info or if it's enough to triage the issue 😉 @paul-szczepanek-arm I'd ask you to reopen the issue too (or I can open another one if needed), thanks! |
That's above my paygrade, @SeppoTakalo? |
Reopening as per @facchinm's comments. |
@facchinm When reproducing the issue, how fast do you open/close the CDC port? Is it very rapid or after 10 times in an hour or something? Are you unplugging the device and plugging it back in? I'm thinking the driver may need to be updated to deal with this in a more robust way. My current idea is:
|
Thanks for reopening @AGlass0fMilk ! |
@facchinm I haven't been able to reproduce the issue yet. I am using some example code I wrote here: https://github.com/AGlass0fMilk/mbed-usb-cdc-example I am currently building with master (0b8ae1b) Can you provide some sample code and procedure for reproducing this issue? What software are you using to open and close the USB CDC/Serial port? What operating system is your USB host running? Can you give me any details of your hardware setup? |
I'm in a very nonstandard environment (a wrapper built over mbed) but I've been able to replicate using this program (just remove the retarget if you are targeting a board with a correct porting layer) #include "mbed.h"
#include "USBSerial.h"
USBSerial _serial(false);
PinName pin = P0_2;
mbed::UARTSerial serial(P1_3, P1_10, 115200);
namespace mbed {
FileHandle *mbed_override_console(int fd) {
return &serial;
}
FileHandle *mbed_target_override_console(int fd) {
return &serial;
}
}
int main() {
_serial.connect();
while (1) {
if (_serial.connected()) {
_serial.printf("%d\n", AnalogIn(pin).read_u16());
}
wait(0.001f);
}
} Moving to mbed master (and swapping target to NRF52840_DK) made it more difficult to trigger the issue, but eventually I made it. I was previously branching off 9cdfe37
As host OS I'm on Linux (kernel 5.2.2-arch1-1-ARCH), and to open/close the serial port I'm using the Arduino IDE (which uses jssc under the hood). |
So I tested this with an nRF52840_DK by modifying your code a little (my dev kit requires hardware flow control for the debug uart):
I was able to read from both the debug serial and the target's USB serial port simultaneously without issue. I was also able to open and close the target's USB serial port well over 10 times without crashing. I was using the Arduino IDE under a Windows environment. My first guess in your case is somehow the wrapper layer or some other part of your system config/setup is affecting the interrupt latency of the USB driver. Feel free to reach out via email if you want to discuss details of your setup that are not yet public. I noticed you're working on porting a new Arduino BLE board based on the nRF52840. |
I retested both the mbed plain and the mbed derived code on Windows and, surprise, it doesn't crash at all... |
@facchinm Should this issue be closed? Is the USB implementation working for the Arduino-MbedOS core? |
In fact no, I added this hacky patch to the mbed tree we are using to avoid getting a million user tickets From 788235de9cd4821884e36c98c5751902cae4c43e Mon Sep 17 00:00:00 2001
From: Martino Facchin <[email protected]>
Date: Wed, 31 Jul 2019 12:48:04 +0200
Subject: [PATCH] HACK: avoid #10862 by not firing the assert
---
usb/device/USBDevice/USBDevice.cpp | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/usb/device/USBDevice/USBDevice.cpp b/usb/device/USBDevice/USBDevice.cpp
index 8291fbce06..da69c4c1c6 100644
--- a/usb/device/USBDevice/USBDevice.cpp
+++ b/usb/device/USBDevice/USBDevice.cpp
@@ -952,10 +952,11 @@ void USBDevice::in(usb_ep_t endpoint)
endpoint_info_t *info = &_endpoint_info[EP_TO_INDEX(endpoint)];
- MBED_ASSERT(info->pending >= 1);
- info->pending -= 1;
- if (info->callback) {
- info->callback();
+ if (info->pending >= 1) {
+ info->pending -= 1;
+ if (info->callback) {
+ info->callback();
+ }
}
} Of course it's not acceptable upstream, so I didn't either try to publish it 🙂 |
Description
After my PR #10689 was recently merged, I synced a project I'm working on up to Mbed-OS master. This project uses USB MSD.
Now, when I connect my device to a host (tried on Linux and Windows), I hit the following assert:
I then changed back to my feature branch (https://github.com/AGlass0fMilk/mbed-os/tree/nrf52840-usbphy-implementation), rebuild and the issue is gone. My device mounts as
/dev/sdb
fine.Not sure if anything changed in the USB stack that might be causing this. I don't see any other recent changes.
Essentially, my main code ends with (after setting up FAT file system from example):
I am also printing to the debug UART. Other than that, there's not much else going on in the project.
I tested this on my USB Serial example here: https://github.com/AGlass0fMilk/mbed-usb-cdc-example.git
Same results. When I'm on my feature branch it works as expected. When I pull master (currently at 14b77c9) the code hits the assert as soon as I plug in USB.
I haven't had luck tracing down the commit that introduced this issue. My feature branch isn't that stale so that should help narrow it down.
Testing on nRF52840_DK with GCC_ARM toolchain.
Issue request type
The text was updated successfully, but these errors were encountered: