-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Bluetooth: Mesh: Fix pb-adv link close timeout #61551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bluetooth: Mesh: Fix pb-adv link close timeout #61551
Conversation
@@ -50,7 +50,7 @@ LOG_MODULE_REGISTER(bt_mesh_pb_adv); | |||
|
|||
#define RETRANSMIT_TIMEOUT K_MSEC(CONFIG_BT_MESH_PB_ADV_RETRANS_TIMEOUT) | |||
#define BUF_TIMEOUT K_MSEC(400) | |||
#define CLOSING_TIMEOUT 3 | |||
#define CLOSING_TIMEOUT 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5.3.1.4.3
A Link Close message is used to close a link. Since this message is not acknowledged, the sender shall repeat this message at least three times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message is sent three times. It is ensured by RETRANSMITS_LINK_CLOSE
. CLOSING_TIMEOUT
defines the, at least intended, wait time before the link should be closed in seconds. I think the default RETRANSMIT_TIMEOUT
is 500 msec, so in testing the message is retransmitted 4-5 times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a counter could be used instead of timeout?
subsys/bluetooth/mesh/pb_adv.c
Outdated
@@ -672,12 +641,40 @@ static int bearer_ctl_send_unacked(struct net_buf *buf, void *user_data) | |||
prov_clear_tx(); | |||
k_work_reschedule(&link.prot_timer, PROTOCOL_TIMEOUT); | |||
|
|||
bt_mesh_adv_send(buf, &buf_sent_cb, user_data); | |||
bt_mesh_adv_send(buf, NULL, user_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite weird. You do not provide callback but provide callback data. Something is actually wrong here.
Since user_data is actively used to pass link statuses or close reason after removing callbacks there is no functionality that applies user_data to the final destination. This change, seems, broke it. The functionality requires much more refactoring or should be reverted.
This is quite strange why it wasn't caught by bsim tests. Probably we need bsim tests to check provisioning adv bearer behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is wierd. The only purpose the CB's had was to check if the link was closing and to close it if it was. Other than that I could not see another purpose for the CB's. I was not quite sure if I could remove the user_data, so I left them to be discussed here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I just made an attempt to fix this with very little knowledge of the prov layers. It is quite possible that we need a major refactoring to fully resolve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understood it: The buf_sent_cb
was only used from within bearer_ctl_send_unacked
. All calls to bearer_ctl_send_unacked
passes a link status as user data. Originally the user data/status was only used when passed to the close_link
call in the buf_sent
CB. buf_start
only purpose was to ensure that buf_sent
was called if an error occured during sending. When I implemented another scheme for closing the link that respected the CLOSING_TIMEOUT
define, these CBs did not serve any purpose anymore and could be removed. Am I missing something?
@@ -926,11 +923,9 @@ static void prov_link_close(enum prov_bearer_link_status status) | |||
* be restransmitted at least three times. Retransmit the LINK_CLOSE | |||
* message until CLOSING_TIMEOUT has elapsed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You moved CLOSING_TIMEOUT
usage to another place but left a comment nonrelavant. Please keep the code clean for better maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sry, my bad 👍
@@ -1034,6 +1034,7 @@ static void test_reprovisioning_provisioner(void) | |||
|
|||
LOG_INF("Provisioning a remote device..."); | |||
ASSERT_OK(k_sem_take(&prov_sem, K_SECONDS(40))); | |||
k_sleep(K_MSEC(200)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment here about why this is 200 milliseconds and why this is required.
When I add some huge changes like PSA support or another settings backend those change timing significantly such kind no name delays confuse a lot. Why they were added? and why exact this digit?
Sometimes it looks like an attempt to relax test conditions to hide issues or bugs that raise from time to time. It is very difficult to maintain such kinds of tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you have done is how it was working originally, but then was changed here: #36498 Do you think it won't cause the issue described in that PR?
Another question is that if we really need to make every provisioning 2 seconds longer? The Link Close message can still be missed and the result will be the same on the provisionee, but the provisioner will wait for additional 2 seconds, while it could start provisioning the next device.
Alters the closing timout implementation of the pb-adv so that the provisioner does not close the link until the closing timeout has elapsed. Reduces the timeout from 3 to 2 seconds. Adds minor delay in reprovision persistence test to avoid failure after changes. Signed-off-by: Anders Storrø <[email protected]>
d1302ba
to
dd39c8e
Compare
Hmm... I need to check. The reason I started this was because the final step of provisioning with PB-adv failed like 80% of the time in test_shell |
@PavelVPV after my latest changes I do not think that the issues from the previous PR will be relevant. If the device receives a When it comes to the question of if we should do this due to increased execution time, I'm not sure. From what I have seen the prov procedure with pb-adv really stuggles to pass the finalizing messages in the procedure, causing the provisionee to retransmit PROV complete messages for the If we decide to keep the implementation as is I would like to clean up a bit in the code since all use of |
ctl_buf_create(LINK_CLOSE, &link.close_link.reason, 1, RETRANSMITS_LINK_CLOSE); | ||
|
||
if (!buf) { | ||
LOG_ERR("Failed to create link close buffer"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call close_link
here in case of error or retry?
link.close_link.start = k_uptime_get(); | ||
|
||
bearer_ctl_send_link_close(); | ||
k_work_reschedule(&link.tx.retransmit, RETRANSMIT_TIMEOUT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move k_work_reschedule
inside bearer_ctl_send_link_close
?
net_buf_unref(buf); | ||
|
||
return 0; | ||
} | ||
|
||
static void bearer_ctl_send_link_close(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to link_close_send
?
@@ -50,7 +50,7 @@ LOG_MODULE_REGISTER(bt_mesh_pb_adv); | |||
|
|||
#define RETRANSMIT_TIMEOUT K_MSEC(CONFIG_BT_MESH_PB_ADV_RETRANS_TIMEOUT) | |||
#define BUF_TIMEOUT K_MSEC(400) | |||
#define CLOSING_TIMEOUT 3 | |||
#define CLOSING_TIMEOUT 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a counter could be used instead of timeout?
} | ||
|
||
prov_clear_tx(); | ||
k_work_reschedule(&link.prot_timer, PROTOCOL_TIMEOUT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be started only once? And is there any need in it if protocol_timeout
doesn't do anything if link is not ADV_LINK_ACTIVE
?
Something has changed between d11c0a(The main commit where I discovered this issue) and the current main commit (2218e8). Now it seems the pb-adv is able to complete the full provisioning procedure without failure. I have tried to investigate what the reason is but have not found it this far. Should we just close this PR then @alxelax & @PavelVPV? Or is these changes still of interest? |
Lets postpone changing the code until we are certain what is an issue. |
I'd suggest covering this in bsim tests. Seems pb-adv is not covered explicitly. Only over provisioning tests. |
For the record, 2fe4493 seems to have fixed this issue (for me). |
Alters the closing timout implementation of the pb-adv so that the provisioner does not close the link until the closing timeout has elapsed. Reduces the timeout from 3 to 2 seconds.
Adds minor delay in reprovision persistence test to avoid failure after changes.