-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Regression: nNRF52840 HW dies at 8 minutes 20 seconds in various samples using IPSP #11744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Assigning to @pdunaj, @andyross and @andrewboie since this looks like a commit to the kernel code. |
Updated w/ full log from nrf52840_pca10056 running lwm2m_client sample from master with no changes. It obviously never connects to LwM2M server due to mismatching network subnet issues w/ the default overlay-bt.conf vs. the gateway's IPSP subnet. But the outcome is the same. |
Huh... that patch fixed a real bug, so it can't just be reverted. This is for sure a rollover condition. This is a 16kHz clock, right? So a signed rollover is going to happen at 2^23 bits, or just about exactly eight and a half minutes. I'll take a look, though @pdunaj is likely to see it first. Note that I have another rework in progress of this area to prevent the "set timeout on idle" behavior entirely, which is essentially a parallel way to fix the original bug (with rapidly/repeatedly set timeouts bumping the next tick forward). If we can't find the rollover here, we could presumably revert this one once the other fix is ready. If, y'know, it doesn't itself introduce more bugs. |
@andyross I think this is a 32KHz clock |
I guess this bug is a duplicate of #11694. |
Hah, even better as it can be an unsigned overflow. Oh! And I see you literally just pushed a patch. Yeah, that looks like it exactly. |
(Though AFAICT that bug predates @pdunaj 's patch mentioned above. @mike-scott are you sure about that bisection?) |
@andyross Absolutely. I spent quite some time working through the commits. And this bug reproduces perfectly every time. |
Then either we have another bug lurking, or that patch was somehow able to trigger the rollover more reliably in your test in some subtle way that I'm not able to see. The patch in #11747 for sure should fix bugs just like this though. |
Yep, I tested #11747 just now and it fixes the issue I was seeing. |
I saw this with MQTT sample also. Logging completely stops, but if after it stops, you connect BLE and do IPSP stuff, the networking is still working, can ping to the device, and I am still getting stack init logs like |
@mike-scott , @andyross , bug that was fixed by the commit mentioned caused delayed work to be not executed. It could be that in the code there was something that was simply not running and now it is allowed to execute and behavior changes. If it is so it should be relatively simple to track down. ... no it cannot be that. I checked the comment again and the PR mentioned above. It looks like an overflow issue on the counter. |
Describe the bug
During automated testing of the LwM2M client in master branch, a regression was noted on both reel_board and nrf52840_pca10056. The board dies after 8 minutes and 20 seconds. Other samples which aren't public also displayed the same behavior. One was an MQTT and HTTP application. All samples tested use IPSP network connection via BLE.
Bisection led to this commit:
baea224 kernel: Always set clock expiry with sync with timeout module
Prior to this commit, the samples runs normally.
To Reproduce
Steps to reproduce the behavior:
6a. NOTE: if user wants connection to actually work, they may need to alter the configuration of their IPSP gateway to use the 2001:db8:: subnet.
6b. Outcome is the same whether or not IPSP networking actually works
Expected behavior
Board / sample should run longer than 8 minutes 20 seconds.
Impact
Unacceptable board behavior
Screenshots or console output
NOTE: Logging just stops here. There is no OOPS.
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: