-
Notifications
You must be signed in to change notification settings - Fork 716
Custom BLE server characteristic only appears with log levels Debug and Verbose #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
One thing we could try is to introduce "slowness" at various points in our code. We can selectively enable/disable logging dynamically in the apps. For example, enable logging at the start and disable logging 1/2 way through. Using this "divide by 2" approach, we might be able to quickly locate the "area" where the timing conditions are becoming problematic. I am not saying by any stretch of the imagination that there isn't an error in my usage of the ESP-IDF low level BLE APIs. The only clue I have is that all worked at one build of the ESP-IDF and stopped working at another ... but if I was doing something wrong in the older build, it might have been tolerated where now it isn't. But wrong is wrong ... and I need to snipe hunt it down. |
Well hmm ... I tried to recreate the issue ... and in the end did ... however I'm troubled. First, I built the source with 100% debugging enabled, ran the app and used nRF Connect and saw the exposed service as expected. Next, I compiled the code but switched off debugging within the application using
I re-ran the test and even though no log output was appearing in the console, nRF Connect continued to see the exposed service. Finally, I switched off debugging inside Ouch!!! This is going to be a royal pain to diagnose. If I switch on debugging (even with no-output), we work but if I switch off the code generation for debugging, we fail. This is (to me) a timing issue ... the only way we are going to solve this one is by testing with manual injection of delays until it works ... this is going to be slow. Oh well ... |
As an added note, with Log Level set to Verbose, I'm trying to add 5 characteristics, each with their own unique UUID. Each characteristic is initialized to an 8-byte NULL sequence with:
Interestingly, only 4 characteristics show up at any given time. I can identify which characteristic is missing using nRF Connect and disable one of the ones that is currently working, and then the missing one will appear. I haven't been able to identify a pattern regarding how the missing characteristic is selected to be missing. I think this is probably related to the timing issue noted above, I don't think this is a new issue. So, for now, I don't want to create a new issue. When the current issue gets resolved, I'll have to see whether this other issue is resolved as well. |
Perhaps even more interestingly, I just tested with Log Level set to No Output, and I still get 4 out of the 5 characteristics... hmm. |
How did you set log level to no Output? At compilation time or at run time? |
at compilation time in "make menuconfig"
|
When you run your tests, do you see the following error? Can you also confirm that you are running the latest ESP-IDF version (master from Github?). I want to verify that we are running the same development base. |
For our records, here is a pastebin of a trace with logging enable that works ... My purpose in capturing this is to replace the logs, one at a time, with delays to see what I can find. |
Some important progress ... I resorted to a horrible test scenario ... one I was hoping to avoid. We know that the apps work with Logging enabled and don't work with logging compiled out. So with logging enabled, I went through each log output in turn generated by the app as shown in the output and commented it out. I then re-built the app and re-tested. I must have gone through about 20 iterations before I found a big-ole-clue. There is a function called:
Its purpose is to create a GATT server advertised service. At the beginning of the function there is a log statement. When I commented it out, we fail (i.e. no characteristics shown) but if we leave it present, the characteristics remain. I demonstrated this to myself a few times. Now what remains is to focus attention there and see what the story is. I need to look at the immediate preceding calls and subsequent calls. However, that will be tomorrow night's task as it is end of day here. |
Updated: 2017-10-05:
Pretty standard stuff. I found that if I removed the debug statement, no service would be found while if I left the debug statement in, the service would be found. Ok ... sounds like a race condition. So I commented out the debug statement and replaced with a sleep for 1000msecs ... just as a test. When I ran it, we delayed for a second as expected ... but there was no service found. I put the debug statement back in, tested again, and the service was found. This made me think that what we were hunting here may not be a race condition. My next test was to leave the debug statement in but remove the parameters to it. I recompiled and re-ran, and again no service was found. Aha!!! This would seem to say that something in the evaluation of the parameters is causing us problems. Ok ... so now let us focus on the parameters. It is a string print of:
With the debug commented out but the above as a source line, I recompiled again. This time ... with no debug but this line ... the service was present. I changed the statement to:
retested and still the service was present. Finally I ended up with just:
and still the service was present. I removed that line ... and no service!!! OOooh!! Ok ... so what have we here? the getUUID() returns an instance of BLEUUID.... and that's it. Its late and I'm done again for this evening ... but I'm not understanding what is going on yet. What ever is happening is not a race condition. |
Offtopic, sorry Neil, but it is funny. I thought it may be
but i didnt expected it can be only getUUID(). |
I'm truly perplexed on this one. Its causing me a lot of head scratching. I've never seen a bug quite like this one. Obviously, it isn't local environment specific (i.e. my system isn't the only one that is broke) because it is being reported by mr @coder543 . Because it didn't manifest with debugging enabled, it was pure brute force to get to this point where we can enable/disable the bug with a one line change (and a lot of luck). I'm not anywhere close to being satisfied with what would appear to be a workaround (i.e. coding a do-nothing call to getUUID()). Friday is a working day for me on my job but Sat/Sun are coming soon and I'll be able to devote contiguous hours to the study. However I have no "science" to yet apply ... I feel like an alchemist who will try and produce gold by mixing different chemicals together "just to see what will happen". My hope is that as I poke it more with a stick, something will show up ... but I'm not confident. I'll be looking for colleagues like yourself who would be willing to invest a little time to hop on Skype with me just so you can watch for 10 minutes while I recreate the problem and show you the symptoms in the hope that:
|
Im short of time this days, a lot work at my job, but if you have test code i have spare couple hours and i can do some tests. |
Next session I get on this puzzle I'll put together a "goody package". Simple instructions, a sample app and a description of how to reproduce. I'll then post a link to that in this issue. The only think I won't supply is the other end of the BLE story ... I'll assume that a tester has access to an Android phone running "nRF Connect". But that of itself might be a good test ... for example using a different peer tool to see if that plays a part ... for example "bluetoothctl" or "gatttool" on Linux. |
I will try with SampleServer from BLEtests with custom service/characteristic UUIDs. The rest i need is described here. (I think) |
Its problem with adding 5th and next characteristics to service:
|
I dont know if something been changed in BLE stack, but i can see characteristics (still max 4) with log level ERROR setup in menuconfig. |
I just checked the logs over here, and I'm seeing the same error message @chegewara is seeing on my 5th characteristic. Somehow I didn't notice this before. So, I guess my issue of having more than 4 characteristics is a separate bug from the one this issue is tracking. I'll go ahead and open another ticket for it. |
Yes indeed ... please do separate the concept of > 4 characteristics causing an issue from the notion of having to have debug logging enabled in order to see any services/characteristics. |
ATM i cant recreate issue with log level causing not showing characteristics. I can see all 4 characteristics with log level VERBOSE and ERROR. |
@chegewara have you tried using Sample1.cpp with different log levels? I can consistently reproduce it over here. |
I still owe you a full written story to recreate plus sample artifacts. That is number one on my task list when my day job ends in about 4 hours and my weekend starts. Because this story is such a mysterious puzzle, we will need to spell out all the environmental components. In summary, and before I have written up the full story, the salient points are:
Output from:
|
Ok then, we are in different environment.
SampleServer just a little bit modified is my test app. I will try with Sample1. |
Got it last night, thanks.
I dont know if this is related somehow but with 2 and more services ping is over 1200 ms. |
Got back to studying this tonight ... found that the following doesn't work:
But the following does:
The pattern I seem to be seeing is that "accessing" the |
Digging further, one more clue ... when changing the "make menuconfig" option: Compiler options -> Optimization level from "-Og" to "-Os" the problem seems to go away completely. There was nothing "thoughtful" in my discovery of this ... purely a guess. The underlying SDKCONFIG flag appears to be |
At cpp_utils/BLEService.cpp line 75, you didn't set srvc_id.is_primary. So the services you created may be Second Service and will not show. In release mode, srvc_id.is_primary will be initialed to a rand value, mostly not 0, so it works well. But in debug mode, it will be initialed to 0, causing the service not showing. Please add this line and test again, thanks! |
Looks like its it. Many thanks. |
Using the old toolchain the latest ESP-IDF master, I'm able to get
Sample1
working... when the log level is Debug or Verbose. When the log level is Info, the custom characteristic doesn't actually appear through the nRF Connect app, but the device is there and named correctly.For obvious reasons, I'm having trouble debugging this, since the issue appears to be timing dependent, and the log statements slow the code down just enough to avoid the issue.
Possibly related to #82 or espressif/esp-idf#857
The text was updated successfully, but these errors were encountered: