Skip to content

Custom BLE server characteristic only appears with log levels Debug and Verbose #109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
coder543 opened this issue Oct 3, 2017 · 29 comments
Assignees

Comments

@coder543
Copy link

coder543 commented Oct 3, 2017

Using the old toolchain the latest ESP-IDF master, I'm able to get Sample1 working... when the log level is Debug or Verbose. When the log level is Info, the custom characteristic doesn't actually appear through the nRF Connect app, but the device is there and named correctly.

For obvious reasons, I'm having trouble debugging this, since the issue appears to be timing dependent, and the log statements slow the code down just enough to avoid the issue.

Possibly related to #82 or espressif/esp-idf#857

@nkolban
Copy link
Owner

nkolban commented Oct 3, 2017

One thing we could try is to introduce "slowness" at various points in our code. We can selectively enable/disable logging dynamically in the apps. For example, enable logging at the start and disable logging 1/2 way through. Using this "divide by 2" approach, we might be able to quickly locate the "area" where the timing conditions are becoming problematic.

I am not saying by any stretch of the imagination that there isn't an error in my usage of the ESP-IDF low level BLE APIs. The only clue I have is that all worked at one build of the ESP-IDF and stopped working at another ... but if I was doing something wrong in the older build, it might have been tolerated where now it isn't. But wrong is wrong ... and I need to snipe hunt it down.

@nkolban
Copy link
Owner

nkolban commented Oct 4, 2017

Well hmm ... I tried to recreate the issue ... and in the end did ... however I'm troubled.

First, I built the source with 100% debugging enabled, ran the app and used nRF Connect and saw the exposed service as expected.

Next, I compiled the code but switched off debugging within the application using

esp_log_level_set("*", ESP_LOG_ERROR);

I re-ran the test and even though no log output was appearing in the console, nRF Connect continued to see the exposed service.

Finally, I switched off debugging inside make menuconfig, re-compiled and re-ran and NOW we don't see the exposed service through nRF Connect as you described in the original report.

Ouch!!! This is going to be a royal pain to diagnose. If I switch on debugging (even with no-output), we work but if I switch off the code generation for debugging, we fail. This is (to me) a timing issue ... the only way we are going to solve this one is by testing with manual injection of delays until it works ... this is going to be slow. Oh well ...

@coder543
Copy link
Author

coder543 commented Oct 4, 2017

As an added note, with Log Level set to Verbose, I'm trying to add 5 characteristics, each with their own unique UUID. Each characteristic is initialized to an 8-byte NULL sequence with:

BLECharacteristic *characteristic = service->createCharacteristic(
	UUID, perms
);

char value[8] = {0};
characteristic->setValue((uint8_t*)&value, 8);

Interestingly, only 4 characteristics show up at any given time. I can identify which characteristic is missing using nRF Connect and disable one of the ones that is currently working, and then the missing one will appear. I haven't been able to identify a pattern regarding how the missing characteristic is selected to be missing.

I think this is probably related to the timing issue noted above, I don't think this is a new issue. So, for now, I don't want to create a new issue. When the current issue gets resolved, I'll have to see whether this other issue is resolved as well.

@coder543
Copy link
Author

coder543 commented Oct 4, 2017

Perhaps even more interestingly, I just tested with Log Level set to No Output, and I still get 4 out of the 5 characteristics... hmm.

@nkolban
Copy link
Owner

nkolban commented Oct 4, 2017

How did you set log level to no Output? At compilation time or at run time?

@coder543
Copy link
Author

coder543 commented Oct 4, 2017 via email

@nkolban
Copy link
Owner

nkolban commented Oct 5, 2017

When you run your tests, do you see the following error?

espressif/esp-idf#1077

Can you also confirm that you are running the latest ESP-IDF version (master from Github?). I want to verify that we are running the same development base.

@nkolban
Copy link
Owner

nkolban commented Oct 5, 2017

For our records, here is a pastebin of a trace with logging enable that works ...

https://pastebin.com/YAMm9DwA

My purpose in capturing this is to replace the logs, one at a time, with delays to see what I can find.

@nkolban
Copy link
Owner

nkolban commented Oct 5, 2017

Some important progress ... I resorted to a horrible test scenario ... one I was hoping to avoid. We know that the apps work with Logging enabled and don't work with logging compiled out. So with logging enabled, I went through each log output in turn generated by the app as shown in the output and commented it out. I then re-built the app and re-tested.

I must have gone through about 20 iterations before I found a big-ole-clue.

There is a function called:

BLEService::executeCreate()

Its purpose is to create a GATT server advertised service. At the beginning of the function there is a log statement. When I commented it out, we fail (i.e. no characteristics shown) but if we leave it present, the characteristics remain. I demonstrated this to myself a few times.

Now what remains is to focus attention there and see what the story is. I need to look at the immediate preceding calls and subsequent calls. However, that will be tomorrow night's task as it is end of day here.

@nkolban
Copy link
Owner

nkolban commented Oct 6, 2017

Updated: 2017-10-05:
After the work last night, I thought the resolution for this was going to be easy. Here is the start of the function called "executeCreate" in BLEService:

void BLEService::executeCreate(BLEServer *pServer) {
   ESP_LOGD(LOG_TAG, ">> executeCreate() - Creating service (esp_ble_gatts_create_service) service uuid: %s", getUUID().toString().c_str());
   m_pServer          = pServer;
   ...

Pretty standard stuff. I found that if I removed the debug statement, no service would be found while if I left the debug statement in, the service would be found. Ok ... sounds like a race condition. So I commented out the debug statement and replaced with a sleep for 1000msecs ... just as a test. When I ran it, we delayed for a second as expected ... but there was no service found. I put the debug statement back in, tested again, and the service was found.

This made me think that what we were hunting here may not be a race condition.

My next test was to leave the debug statement in but remove the parameters to it. I recompiled and re-ran, and again no service was found. Aha!!! This would seem to say that something in the evaluation of the parameters is causing us problems. Ok ... so now let us focus on the parameters. It is a string print of:

getUUID().toString().c_str()

With the debug commented out but the above as a source line, I recompiled again. This time ... with no debug but this line ... the service was present.

I changed the statement to:

getUUID().toString();

retested and still the service was present.

Finally I ended up with just:

getUUID();

and still the service was present. I removed that line ... and no service!!!

OOooh!! Ok ... so what have we here?

the getUUID() returns an instance of BLEUUID.... and that's it.

Its late and I'm done again for this evening ... but I'm not understanding what is going on yet. What ever is happening is not a race condition.

@chegewara
Copy link
Collaborator

chegewara commented Oct 6, 2017

Offtopic, sorry Neil, but it is funny. I thought it may be

getUUID().toString().c_str()

but i didnt expected it can be only getUUID().

@nkolban
Copy link
Owner

nkolban commented Oct 6, 2017

I'm truly perplexed on this one. Its causing me a lot of head scratching. I've never seen a bug quite like this one. Obviously, it isn't local environment specific (i.e. my system isn't the only one that is broke) because it is being reported by mr @coder543 . Because it didn't manifest with debugging enabled, it was pure brute force to get to this point where we can enable/disable the bug with a one line change (and a lot of luck). I'm not anywhere close to being satisfied with what would appear to be a workaround (i.e. coding a do-nothing call to getUUID()). Friday is a working day for me on my job but Sat/Sun are coming soon and I'll be able to devote contiguous hours to the study. However I have no "science" to yet apply ... I feel like an alchemist who will try and produce gold by mixing different chemicals together "just to see what will happen". My hope is that as I poke it more with a stick, something will show up ... but I'm not confident. I'll be looking for colleagues like yourself who would be willing to invest a little time to hop on Skype with me just so you can watch for 10 minutes while I recreate the problem and show you the symptoms in the hope that:

  1. You can see something I'm missing?
  2. You can think up new explanations and tests?
  3. You can recreate on your system and we can parallelize tests and theories?

@chegewara
Copy link
Collaborator

Im short of time this days, a lot work at my job, but if you have test code i have spare couple hours and i can do some tests.

@nkolban
Copy link
Owner

nkolban commented Oct 6, 2017

Next session I get on this puzzle I'll put together a "goody package". Simple instructions, a sample app and a description of how to reproduce. I'll then post a link to that in this issue. The only think I won't supply is the other end of the BLE story ... I'll assume that a tester has access to an Android phone running "nRF Connect". But that of itself might be a good test ... for example using a different peer tool to see if that plays a part ... for example "bluetoothctl" or "gatttool" on Linux.

@chegewara
Copy link
Collaborator

I will try with SampleServer from BLEtests with custom service/characteristic UUIDs. The rest i need is described here. (I think)

@chegewara
Copy link
Collaborator

Its problem with adding 5th and next characteristics to service:

D (1467) BLEDevice: gattServerEventHandler [esp_gatt_if: 3] ... ESP_GATTS_ADD_CHAR_EVT
D (1475) BLEUtils: GATT ServerEvent: ESP_GATTS_ADD_CHAR_EVT
D (1481) BLEUtils: [status: ESP_GATT_ERROR, attr_handle: 0 0x00, service_handle: 40 0x28, char_uuid: 00005234-0000-1000-8000-00805f9b34fb]
D (1493) BLEServer: >> handleGATTServerEvent: ESP_GATTS_ADD_CHAR_EVT
D (1500) BLECharacteristic: >> setHandle: handle=0x00, characteristic uuid=00005234-0000-1000-8000-00805f9b34fb
D (1510) BLECharacteristic: << setHandle
D (1514) BLEServer: << handleGATTServerEvent
D (1518) BLECharacteristic: << executeCreate
D (1522) BLECharacteristic: >> executeCreate()
D (1527) BLECharacteristic: Registering characteristic (esp_ble_gatts_add_char): uuid: 00006234-0000-1000-8000-00805f9b34fb, service: UUID: 4fafc201-1fb5-459e-8fcc-c5c9c331914b, handle: 0x28
D (1544) FreeRTOS: Semaphore taking: name: CreateEvt (0x3ffe3ca8), owner: <N/A> for executeCreate
D (1553) FreeRTOS: Semaphore taken: name: CreateEvt (0x3ffe3ca8), owner: executeCreate
E (1561) BT: btc_gatts_arg_deep_copy 8 no mem

D (1566) BLEDevice: gattServerEventHandler [esp_gatt_if: 3] ... ESP_GATTS_ADD_CHAR_EVT
D (1574) BLEUtils: GATT ServerEvent: ESP_GATTS_ADD_CHAR_EVT
D (1580) BLEUtils: [status: ESP_GATT_ERROR, attr_handle: 0 0x00, service_handle: 40 0x28, char_uuid: 00006234-0000-1000-8000-00805f9b34fb]
D (1592) BLEServer: >> handleGATTServerEvent: ESP_GATTS_ADD_CHAR_EVT
D (1599) BLECharacteristic: >> setHandle: handle=0x00, characteristic uuid=00006234-0000-1000-8000-00805f9b34fb
D (1609) BLECharacteristic: << setHandle
D (1613) BLEServer: << handleGATTServerEvent
D (1618) BLECharacteristic: << executeCreate
D (1622) FreeRTOS: Semaphore taking: name: StartEvt (0x3ffd7fe8), owner: <N/A> for start
D (1630) FreeRTOS: Semaphore taken: name: StartEvt (0x3ffd7fe8), owner: start
D (1638) BLEDevice: gattServerEventHandler [esp_gatt_if: 3] ... ESP_GATTS_START_EVT
D (1645) BLEUtils: GATT ServerEvent: ESP_GATTS_START_EVT
D (1650) BLEUtils: [status: ESP_GATT_OK, service_handle: 0x28]

@chegewara
Copy link
Collaborator

I dont know if something been changed in BLE stack, but i can see characteristics (still max 4) with log level ERROR setup in menuconfig.

@coder543
Copy link
Author

coder543 commented Oct 6, 2017

I just checked the logs over here, and I'm seeing the same error message @chegewara is seeing on my 5th characteristic. Somehow I didn't notice this before.

So, I guess my issue of having more than 4 characteristics is a separate bug from the one this issue is tracking. I'll go ahead and open another ticket for it.

@nkolban
Copy link
Owner

nkolban commented Oct 6, 2017

Yes indeed ... please do separate the concept of > 4 characteristics causing an issue from the notion of having to have debug logging enabled in order to see any services/characteristics.

@chegewara
Copy link
Collaborator

ATM i cant recreate issue with log level causing not showing characteristics. I can see all 4 characteristics with log level VERBOSE and ERROR.

@coder543
Copy link
Author

coder543 commented Oct 6, 2017

@chegewara have you tried using Sample1.cpp with different log levels? I can consistently reproduce it over here.

@nkolban
Copy link
Owner

nkolban commented Oct 6, 2017

I still owe you a full written story to recreate plus sample artifacts. That is number one on my task list when my day job ends in about 4 hours and my weekend starts. Because this story is such a mysterious puzzle, we will need to spell out all the environmental components. In summary, and before I have written up the full story, the salient points are:

  1. Building on Linux Ubuntu
  2. Using the latest ESP-IDF from Git master
  3. Using the tool chain known as:
gcc version 5.2.0 (crosstool-NG crosstool-ng-1.22.0-61-gab8375a) 

Output from:

$ xtensa-esp32-elf-gcc -v
  1. Using the BLE C++ code as retrieved from master for cpp_utils (latest)

@chegewara
Copy link
Collaborator

chegewara commented Oct 6, 2017

Ok then, we are in different environment.

  1. Linux Ubuntu 14.04,
  2. Latest esp-idf
  3. latest toolchain (1.22.0-73) - with modified pthread.h
  4. A little bit modified to debug, i will revoke all changes and try again

SampleServer just a little bit modified is my test app. I will try with Sample1.

@nkolban
Copy link
Owner

nkolban commented Oct 7, 2017

Attached to this issue we now have 109.pdf which is my write-up on how to recreate as easily as possible and how to test and expected (and unexpected) results. The 109.zip contains the source files I tested with.

109.pdf
109.zip

@chegewara
Copy link
Collaborator

Got it last night, thanks.
Ive found something, maybe this can help you. Still dont work like expected but:

  • if you add 2nd service then this service is advertised, but 1st service is still invisible

I dont know if this is related somehow but with 2 and more services ping is over 1200 ms.

This was referenced Oct 13, 2017
@nkolban
Copy link
Owner

nkolban commented Oct 15, 2017

Got back to studying this tonight ... found that the following doesn't work:

void BLEService::executeCreate(BLEServer *pServer) {
	m_pServer          = pServer;
	m_semaphoreCreateEvt.take("executeCreate"); 

But the following does:

void BLEService::executeCreate(BLEServer *pServer) {
	char x[1000];
	memcpy(x, &m_uuid, sizeof(m_uuid));
	m_pServer          = pServer;
	m_semaphoreCreateEvt.take("executeCreate"); 

The pattern I seem to be seeing is that "accessing" the m_uuid data is causing "something" to happen.... but what!!!

@nkolban
Copy link
Owner

nkolban commented Oct 15, 2017

Digging further, one more clue ... when changing the "make menuconfig" option: Compiler options -> Optimization level from "-Og" to "-Os" the problem seems to go away completely. There was nothing "thoughtful" in my discovery of this ... purely a guess.

The underlying SDKCONFIG flag appears to be CONFIG_OPTIMIZATION_LEVEL_RELEASE=y

@blueMoodBHD
Copy link

At cpp_utils/BLEService.cpp line 75, you didn't set srvc_id.is_primary. So the services you created may be Second Service and will not show.

In release mode, srvc_id.is_primary will be initialed to a rand value, mostly not 0, so it works well. But in debug mode, it will be initialed to 0, causing the service not showing.

Please add this line and test again, thanks!
srvc_id.is_primary = true;

@chegewara
Copy link
Collaborator

Looks like its it. Many thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants