-
Notifications
You must be signed in to change notification settings - Fork 7.3k
zbus: Add runtime observers pool #88834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
zbus: Add runtime observers pool #88834
Conversation
57113c9
to
1c5ab6d
Compare
1c5ab6d
to
4fd5b20
Compare
4fd5b20
to
d91b186
Compare
How does the current behavior not act like other types of observers?
zbus is not a stable API, and while that is not an argument for changing API's for no reason, it also is not a reason to not make changes. What is the advantage of hiding the memory allocation inside the library? The goal of zbus (or at least one of them) is for connecting various components of an application together without necessarily needing to understand the deep internals of each one. In such a situation, how large should To me, the "overhead" of needing to declare an extra variable somewhere is vastly preferable to needing to mess around with some variably sized Kconfig option after every update. Honestly, the above is my biggest pain-point with Zephyr as a whole. Every time dynamic allocation from a shared resource creeps in somewhere, you have no clue whether an application will work after an update until you do exhaustive testing on every path in your application. |
Original signature, the one I would like to keep: ZBUS_CHAN_ADD_OBS(channel, observer, priority);
zbus_chan_add_obs(const struct zbus_channel *chan, const struct zbus_observer *obs, k_timeout_t timeout); In the original signature, developers only need to pay attention to the essential thing channel, observers, and priority for static and timeout for runtime. You can call both arbitrarily if using dynamic allocation, which is the regular usage, as message subscribers rely on the heap, and they are usually enabled. The limitation is the heap size. The static allocation for zbus (message subscribers and runtime observers) is an exceptional usage. That is meant for applications where dynamic allocation is prohibited due to safety reasons, memory constraints, or other reasons. In this scenario, it is normal for developers to calculate array sizes for almost everything, as they need to be statically allocated. The changed signature affected the API (unnecessarily), added an unrelated variable (node), and relied on an error-prone approach. We are bringing the complexity only necessary for the exceptional usage (heap prohibited) for all users unnecessarily. int zbus_chan_add_obs(const struct zbus_channel *chan, const struct zbus_observer *obs,
struct zbus_observer_node *node, k_timeout_t timeout); Pain points of the changed signature:
Usually, in programming, we avoid changing APIs unnecessarily. Several developers already use that as it is. Why change the API if we can keep it the same and still deliver the feature?
Make that easier to use. With the node variable, the developers need to deal with more complexities: the meaning of the variable (since that is not related to the feature), the scope of the variable, and whether the variable is already used.
It is easy. To be safe, you can count in the code how many "zbus_chan_add_obs" you have and put that number in the configuration. You need to analyze the "add" and "rm" sequence calls to optimize them. It may not be worth it since each extra node will take only 8 bytes. Again, let the user use the heap for flexibility and the stack when they need it, all without changing the way, and without adding unnecessary usage complexities. |
d91b186
to
341f11f
Compare
Nothing you describe is a behavioral change, just consequences from the extra parameter in the function.
I disagree that this is "easy". Doing a grep for
And sure, you could just set the value to something like 16 and almost never hit an issue, but 128 bytes is not nothing.
How does the user use the stack in this proposal? From what I can tell the only options are the system heap or the new pool you are proposing. |
The suggested approach works for almost every application. It is not usual to call that functions (add or rm) in loops. Flow control is more feasible and easier to manage. You can suggest/use another approach if you like. The truth is that the node approach would not make the loop or flow control easier, since this lifetime would be an issue. You will end up needing an array of nodes...
If 128 bytes is an issue in your application, you can reduce it to the appropriate value. You just need to analyze or test the code more.
The pool is an array of nodes statically allocated in the stack and managed by the memory slab. |
Talking about the pain points of the node. Take a look at the sample code. This could be easly found in developers code: /*
* Copyright (c) 2022 Rodrigo Peixoto <[email protected]>
* SPDX-License-Identifier: Apache-2.0
*/
#include "messages.h"
#include <stdint.h>
#include <zephyr/kernel.h>
#include <zephyr/logging/log.h>
#include <zephyr/zbus/zbus.h>
LOG_MODULE_DECLARE(zbus, CONFIG_ZBUS_LOG_LEVEL);
ZBUS_CHAN_DEFINE(chan_foo, struct msg_foo, NULL, NULL, ZBUS_OBSERVERS_EMPTY, ZBUS_MSG_INIT(.x = 0));
static void lis_bar_callback(const struct zbus_channel *chan)
{
const struct msg_foo *msg = zbus_chan_const_msg(chan);
LOG_INF(" Channel %s foo message value %d", zbus_chan_name(chan), msg->x);
}
ZBUS_LISTENER_DEFINE(lis_bar, lis_bar_callback);
void system_init()
{
/* This can be a usual error for new users. The node variable goes out of scope... */
struct zbus_observer_node obs_node;
zbus_chan_add_obs(&chan_foo, &lis_bar, &obs_node, K_MSEC(200));
LOG_INF("System started");
}
int main(void)
{
system_init();
struct msg_foo msg = {.x = 1};
while (1) {
zbus_chan_pub(&chan_foo, &msg, K_FOREVER);
msg.x += 1;
k_msleep(1000);
}
return 0;
} The error retuner by that is: west build -t run
-- west build: running target run
[0/1] To exit from QEMU enter: 'CTRL+a, x'[QEMU] CPU: qemu32,+nx,+pae
SeaBIOS (version zephyr-v1.0.0-0-g31d4e0e-dirty-20200714_234759-fv-az50-zephyr)
Booting from ROM..
I: System started
E: Page fault at address 0x103cf4 (error code 0x3)
E: Access violation: supervisor thread not allowed to write
E: PTE: 0x103000 -> 0x0000000000103000: US A
E: EAX: 0x001120e0, EBX: 0x00103ce4, ECX: 0xffffffff, EDX: 0x00103cf4
E: ESI: 0x0011f3dc, EDI: 0xffffffff, EBP: 0x0011f394, ESP: 0x0011f394
E: EFLAGS: 0x00000002 CS: 0x0008 CR3: 0x00111000
E: EIP: 0x00105de2
E: call trace:
E: 0: 0x00103f67
E: 1: 0x001024f7
E: 2: 0x001025f3
E: 3: 0x001028ca
E: 4: 0x0010031c
E: 5: 0x00103ce4
E: 6: 0x001006c6
E: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
E: Current thread: 0x1120e0 (unknown)
E: Halting system No clues, nothing to help developers see where to start looking for the defect. |
Memory slabs and stacks have nothing to do with each other, its certainly not where the memory comes from, but w/e.
No clue, apart from the call trace, program counter and the If we're talking about consistency, the node method is how every other "notification" mechanism I have seen in Zephyr works. |
The other pain point is reusing the same node. In this sample, that causes a zbus execution issue: /*
* Copyright (c) 2022 Rodrigo Peixoto <[email protected]>
* SPDX-License-Identifier: Apache-2.0
*/
#include "messages.h"
#include <stdint.h>
#include <zephyr/kernel.h>
#include <zephyr/logging/log.h>
#include <zephyr/zbus/zbus.h>
LOG_MODULE_DECLARE(zbus, CONFIG_ZBUS_LOG_LEVEL);
ZBUS_CHAN_DEFINE(chan_foo, struct msg_foo, NULL, NULL, ZBUS_OBSERVERS_EMPTY, ZBUS_MSG_INIT(.x = 0));
static void lis_bar_callback(const struct zbus_channel *chan)
{
const struct msg_foo *msg = zbus_chan_const_msg(chan);
LOG_INF("[BAR] Channel %s, message value %d", zbus_chan_name(chan), msg->x);
}
ZBUS_LISTENER_DEFINE(lis_bar, lis_bar_callback);
static void lis_baz_callback(const struct zbus_channel *chan)
{
const struct msg_foo *msg = zbus_chan_const_msg(chan);
LOG_INF("[BAZ] Channel %s, message value %d", zbus_chan_name(chan), msg->x);
}
ZBUS_LISTENER_DEFINE(lis_baz, lis_baz_callback);
static void lis_regular_callback(const struct zbus_channel *chan)
{
const struct msg_foo *msg = zbus_chan_const_msg(chan);
LOG_INF("[Regular] Channel %s, message value %d", zbus_chan_name(chan), msg->x);
}
ZBUS_LISTENER_DEFINE(lis_regular, lis_regular_callback);
ZBUS_CHAN_ADD_OBS(chan_foo, lis_regular, 3);
int main(void)
{
LOG_INF("System started");
struct zbus_observer_node obs_node;
/* Reuse the same obs_node. This can also be a usual issue for new developers */
zbus_chan_add_obs(&chan_foo, &lis_bar, &obs_node, K_MSEC(200));
zbus_chan_add_obs(&chan_foo, &lis_baz, &obs_node, K_MSEC(200));
struct msg_foo msg = {.x = 1};
while (1) {
zbus_chan_pub(&chan_foo, &msg, K_FOREVER);
msg.x += 1;
k_msleep(1000);
}
return 0;
} The correct output if you have one node per addition is: I: System started
I: [Regular] Channel chan_foo, message value 1
I: [BAR] Channel chan_foo, message value 1
I: [BAZ] Channel chan_foo, message value 1
I: [Regular] Channel chan_foo, message value 2
I: [BAR] Channel chan_foo, message value 2
I: [BAZ] Channel chan_foo, message value 2
I: [Regular] Channel chan_foo, message value 3
I: [BAR] Channel chan_foo, message value 3
I: [BAZ] Channel chan_foo, message value 3
I: [Regular] Channel chan_foo, message value 4
I: [BAR] Channel chan_foo, message value 4
I: [BAZ] Channel chan_foo, message value 4
... <continue> However, the output for this code is: I: System started
I: [BAZ] Channel chan_foo, message value 1
I: [BAZ] Channel chan_foo, message value 1
I: [BAZ] Channel chan_foo, message value 1
I: [BAZ] Channel chan_foo, message value 1
I: [BAZ] Channel chan_foo, message value 1
I: [BAZ] Channel chan_foo, message value 1
I: [BAZ] Channel chan_foo, message value 1
I: [BAZ] Channel chan_foo, message value 1
I: [BAZ] Channel chan_foo, message value 1
... <continue> So again, no warnings, nothing to give the user a clue of how to solve the issue. The issue affected the zbus VDED in a way that prevented it from delivering any messages. I suspect it is in a loop. It needs further investigation. |
35fc548
to
bc40168
Compare
Stack vs heap: If the mutable data is not in the heap, it is in the stack, right? I am not talking about the stack kernel object.
Rephrasing that to "no friendly clues", mainly for newcomers.
I made my point. Let's see what the others think about that. Thanks for the discussion. |
eb2c38f
to
420f71b
Compare
A previous PR merged (to remove runtime observers' dependency with heap) added inconsistencies and compatibility breaks to the zbus. This commit improves that by removing the inconsistencies and still attending to the features requested by the community. Signed-off-by: Rodrigo Peixoto <[email protected]>
Reverse the changes the affected API and add test to the new dynamic and static variations. Signed-off-by: Rodrigo Peixoto <[email protected]>
Reverse the changes the affected API and add adjust the sample to the new dynamic and static variations. Signed-off-by: Rodrigo Peixoto <[email protected]>
No migration necessary for this feature. Signed-off-by: Rodrigo Peixoto <[email protected]>
Add documentation of the runtime observers pool. Signed-off-by: Rodrigo Peixoto <[email protected]>
Add runtime observers static allocation feature to release notes. Signed-off-by: Rodrigo Peixoto <[email protected]>
420f71b
to
ec6dcec
Compare
There are pros/cons to both approaches, maybe it could be an option to have both?
These are my two cents, I just didn't like the |
Neither do I at first. Now I'm used to it, as it's being used in a very controlled way. I hear you. I was thinking of an approach that lets the developer allocate that and still provides an abstraction layer to make it simple and less error-prone. I will return with something soon. |
The aim of PR #84271 (#79777) was to eliminate the need for the heap during runtime when adding a runtime observer. That was accomplished. However, the change impacted the API and the consistency of the runtime observers, which, after the PR, did not behave like the other types of observers.
After some analysis, community comments, and questions, I could confirm that the feature is proper, but the implementation has broken API and behavior consistency, negatively impacting the zbus quality.
As we are in the same cycle (Zephyr 4.1.99), I believe that it would be better to adjust that to implement the feature but with no effect on the API or the runtime observer's behavior.