Skip to content

device/drivers: RFC Update API sys, safer with multi iface #48817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

bjarki-andreasen
Copy link
Collaborator

@bjarki-andreasen bjarki-andreasen commented Aug 9, 2022

device/drivers: RFC Update API sys, safer with multi iface

This draft shows how to update the API subsystem to be safer
and support multiple APIs pr device instance, at the cost of
some ROM

The update is entirely backwards compatible with the current
solution, which is to store a pointer to the API
implementation inside the struct device which uses it.

The new solution creates a device struct for each API in
iterable sections in ROM. The new device structs are
copies of the device struct defined by DEVICE_DT_DEFINE,
yet they have the API member overwritten to the specified
API. This allows for using them with the current API wrappers
without any modification.

To get the normal device, with its default API, use
DEVICE_DT_GET(node)

To get the same device, but for use with the i2c API, if
implemented for it, use DEVICE_DT_API_GET(i2c, node)

Both return a const struct device * which can be used
transparently with the appropriate API wrapper

This allows for a device to define multiple APIs for
itself.

It also ensures that only an API which is in
fact implemented for a specific device instance can
be used, giving a linker error if fx. the i2c API is
not implemented for a device using DEVICE_DT_API_DEFINE,
and DEVICE_DT_API_GET(i2c, node) is used.

The DEVICE_DT_API_GET should be wrapped in the API headers
to I2C_DEVICE_DT_API_GET(node) etc. for clarity, which becomes
DEVICE_DT_API_GET(i2c, node)

Inside the device drivers, the instanciation for an i3c device
device will go from this:

struct i3c_driver_api;

DEVICE_DT_DEFINE(node_id, ..., &i3c_api)

to:

struct i3c_driver_api;

DEVICE_DT_DEFINE(node_id, ..., NULL)
I3C_DEVICE_DT_API_DEFINE(node_id, &i3c_api)

and can be expanded to:

struct i2c_driver_api;
struct i3c_driver_api;

DEVICE_DT_DEFINE(node_id, ..., NULL)
I2C_API_DT_DEFINE(node_id, i2c_driver_api)
I3C_API_DT_DEFINE(node_id, i3c_driver_api)

The two APIs can then be accessed like

struct device *i2c_dev = I2C_DEVICE_DT_API_GET(node_id);
struct device *i3c_dev = I3C_DEVICE_DT_API_GET(node_id);

allowing for reuse of existing APIs without modification outside of
updating the DEVICE_DT_DEFINE and GET

The APIs must also be updated to use the new solution.
This is done in few steps:

Add iterable section to
common-rom-kernel-device-driver-apis.ld

Add convenience wrappers for
DEVICE_DT_API_DEFINE, DEVICE_DT_API_GET etc.
to appropriate API header files.

Done.

In the future, we may change the DEVICE_DT_DEFINE macro to
set the struct device api member to NULL, and only allow
for the use of DEVICE_DT_API_GET etc. which is safer and
allows for iterating through all devices which support a
specific API (which is useful for shell for example)

The Location API is designed to provide the essential location
data which can be produced from sources like SUPL. GNSS, WIFI-
location, cellular location, propriatary solutions etc.

It is primed for use with the upcomming multi interface device
model, and for implementation into the upcomming GNSS API

Signed-off-by: Bjarki Arge Andreasen <[email protected]>
@henrikbrixandersen
Copy link
Member

How come this includes a location API?

@bjarki-andreasen
Copy link
Collaborator Author

How come this includes a location API?

It is just an example, I could have used most other APIs as well

@andyross
Copy link
Collaborator

andyross commented Aug 9, 2022

One aesthetic problem here is that we now have two "kinds" of API fields in struct devie: the primary pointer and the "impl" array. Would be nice to see them unified in the final version, though that's going to involve changing a lot of code that is currently banging the api field directly.

@bjarki-andreasen
Copy link
Collaborator Author

bjarki-andreasen commented Aug 9, 2022

One aesthetic problem here is that we now have two "kinds" of API fields in struct devie: the primary pointer and the "impl" array. Would be nice to see them unified in the final version, though that's going to involve changing a lot of code that is currently banging the api field directly.

That is the plan, keep both in the transition period, once all APIs have been updated to use the array, we can use a script to update the DEVICE_DT_DEFINE() API argument to the matching API_DT_DEFINE() in the in-tree drivers, then remove the api field from the struct device, and update the wrapper to only use the api, which can also be done automatically with a script

@henrikbrixandersen
Copy link
Member

As mentioned on todays Architecture WG meeting, my main concerns with this approach are:

  • The added execution overhead when making API calls (due to the lookup).
  • While the -ENOSYS error checking may seem nice, gracefully handling calling e.g. a SPI API function on a UART device is not really needed in Zephyr due to how Zephyr is used.
  • This does not solve the issue for one struct device exposing multiple interfaces of the same device class (e.g. one CAN controller with multiple CAN network ports).
  • I think we need to look into a broader long-term change/solution to how we wish to handle devices in Zephyr (devicetree node vs. struct device vs. API struct ...).

@galak
Copy link
Collaborator

galak commented Aug 9, 2022

So I wonder about changing the model to have "typed" devices instead as a solution.

So having a struct location_device that embeds a struct device.

Than having something like DEVICE_DT_GET_SUBSYS(<NODE>, <SUBSYS>) so something like:
DEVICE_DT_GET(<NODE>, location)

@bjarki-andreasen
Copy link
Collaborator Author

bjarki-andreasen commented Aug 11, 2022

As mentioned on todays Architecture WG meeting, my main concerns with this approach are:

* The added execution overhead when making API calls (due to the lookup).

Yeah, but realistically, the list is gonna be 1 to 5 entries long pr API, so with the features it brings and the simplicity of it I will argue it is an acceptable overhead.

* While the -ENOSYS error checking may seem nice, gracefully handling calling e.g. a SPI API function on a UART device is not really needed in Zephyr due to how Zephyr is used.

I know SPI and UART APIs are improbable on a single device, it's just an example. If one, by accident, uses a SPI API wrapper on a UART device, one would get undefined behavior with the current solution, this solution gives you a meaningful error, it's not just nice in my opinion, its essential.

* This does not solve the issue for one `struct device` exposing multiple interfaces of the _same_ device class (e.g. one CAN controller with multiple CAN network ports).

That is correct, that is solved using this solution #48932 since each CAN needs its own device handle, which we essentially have to get from the device tree.

The larger issue is that a subnode, like an UART needs to be on a UART bus node. Consider the following:
An uart_ic is on a SPI bus, yet contains 4 UARTs. A slave device is on one of the nodes. How does the slave device know where to call uart_poll_out?

&spi {
    uart_ic {
        gnss {

        };
    };
};

We have to create a node in the device tree for the slave, or the application, to use. Creating it in the device tree seems the most obvious location to place it, it is clear what is happening, and it is collected in once location. Otherwise, we need to create another file or format, where we create these bindings, which just seems overly complicated when the device tree works in almost all cases for most drivers, the few outliers like your CAN device or this hypothetical uart ic seem like an okay compromise for me, i would go as far as saying its not even a compromise, but a good solution :)

&spi {
    uart_ic {
        uart_ic_0 {
            gnss {

            };
        }
    };
};
* I think we need to look into a broader long-term change/solution to how we wish to handle devices in Zephyr (devicetree node vs. `struct device` vs. API struct ...).

I think we have a perfectly adequate solution which with the enhancement from this PR will provide all the functionality we need in a timely fashion, with very small changes, which would give plenty of time for a broader long-term change/solution to be thought up, without blocking development until such a time comes.

@str4t0m
Copy link
Collaborator

str4t0m commented Aug 11, 2022

Yeah, but realistically, the list is gonna be 1 to 5 entries long pr API, so with the features it brings and the simplicity of it I will argue it is an acceptable overhead.

The main issue here is probably that the runtime overhead is not introduced for multi-iface devices, but for interfaces with several instances like GPIO ports.
STM32 devices have up to 11 ports, so it can be necessary to integrate over all of them just to set the state of a single pin.

@bjarki-andreasen
Copy link
Collaborator Author

bjarki-andreasen commented Aug 11, 2022

So I wonder about changing the model to have "typed" devices instead as a solution.

So having a struct location_device that embeds a struct device.

Than having something like DEVICE_DT_GET_SUBSYS(<NODE>, <SUBSYS>) so something like: DEVICE_DT_GET(<NODE>, location)

Something like it could be a solution, we still need a struct device for the API wrappers, and creating a struct device for each API is just solution 1 again. So we also need to update all API wrappers.

We can update the driver APIs to use the node to perform API calls to statically defined API implementations. Say uart_poll_out(struct device, ...) would become UART_POLL_OUT(node_id, ...) This is safe as it will return a linker error if the implementation does not exist in ROM, and allows for multiple APIs

It is also compatible with the current solution since uart_poll_out and UART_POLL_OUT can coexist

@bjarki-andreasen bjarki-andreasen force-pushed the rfc_device_driver_api_multi_iface branch from 4c00ee4 to 6b36f39 Compare August 15, 2022 17:37
@bjarki-andreasen
Copy link
Collaborator Author

Update to the design of this solution, please reread the first comment and review the new solution. In short, APIs are now statically allocated in lists, but are accessed through a function similar to DEVICE_DT_GET (DEVICE_DT_API_GET(node, api_type)) @galak

There is also no need to change any API wrapper since the new solution simply creates copies of the original struct device, but with the API field changed to the appropriate API implementation

Copy link
Collaborator

@tbursztyka tbursztyka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be clearer if you invert the order of patches. Add the generic device API additions, then put an example. You could actually have a test adding a dumb API, so it would make adding location API unnecessary in this PR.

I am not entirely convinced about "api_type" paramater. Maybe the actual type would be better directly (without struct, i.e. in your example: location_driver_api). Too bad typeof() cannot be used at pre-processor level.

@carlescufi carlescufi mentioned this pull request Aug 16, 2022
11 tasks
@bjarki-andreasen bjarki-andreasen force-pushed the rfc_device_driver_api_multi_iface branch 2 times, most recently from ab7ccc7 to 3747342 Compare August 17, 2022 19:17
@bjarki-andreasen bjarki-andreasen force-pushed the rfc_device_driver_api_multi_iface branch from 3747342 to 476a428 Compare August 17, 2022 19:18
* @{
*/

#define LOCATION_DEVICE_API_DEFINE(node_id, api_ptr) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not seeing any code that uses any of these macros.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a draft :) The real PR will be created once we agree on the design ;)

@@ -0,0 +1,289 @@
/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to split the driver code from the use of driver test. This makes it more like a real driver case, and shows any issues with things like extern's that might be needed to access various structs, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, i should not have included the location API in this draft, although i did get some good feedback for it :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that doesn't concern me, its the need to show how this works for a set of drivers and users.

For example, w/o having a test driver that is standalone we aren't seeing what I think is a N^2 issue with handling the externs to the API structs.

Comment on lines 1030 to 1044
#define DEVICE_DT_API_DEFINE(node_id, api_type, api_ptr) \
const Z_DECL_ALIGN(struct device) \
DEVICE_DT_API_NAME_GET(node_id, api_type) \
__attribute__((__section__(".z_device_api_" # api_type))) = { \
.name = DEVICE_DT_GET(node_id)->name, \
.config = DEVICE_DT_GET(node_id)->config, \
.api = (api_ptr), \
.state = DEVICE_DT_GET(node_id)->state, \
.data = DEVICE_DT_GET(node_id)->data, \
.handles = DEVICE_DT_GET(node_id)->handles, \
COND_CODE_1(CONFIG_PM_DEVICE, (.pm = DEVICE_DT_GET(node_id)->pm), ()) \
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, this is defining a 2nd device that is basically a copy of the first one except the API pointer, thus allowing to share the same state. Not a bad idea, the only downside is memory usage. Compared to having independent devices, one will now have to define N+1 devices for a multi-API device. Even though this defines a new device (only for impl reasons), I'd change the name to something like DEVICE_DT_API_(ASSIGN|ADD).

It is important to note that with this change, an application user needs to know whether a device is multi-API or not. For a location device that is a regular device, one can just do DEVICE_DT_GET(...), but for a multi-API this is no longer possible, because the pointer to the peudo-device with the location API pointer is another one. How can we build generic code, then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end goal is to move away from the DEVICE_DT_GET entirely, no generic device access. There will be a transition period where all legacy drivers and apps will work with generic devices, but all devices should be updated to use these new macros, DEVICE_DT_DEFINE will be updated to either require api_type, or have api_ptr removed. This way, all devices, multi-api or not will work the same way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end goal is to move away from the DEVICE_DT_GET entirely, no generic device access. There will be a transition period where all legacy drivers and apps will work with generic devices, but all devices should be updated to use these new macros, DEVICE_DT_DEFINE will be updated to either require api_type, or have api_ptr removed. This way, all devices, multi-api or not will work the same way.

So if the end goal is to modify all devices, I'd like to see the full picture before buying this approach. This is providing a simple solution, but eventually, it assumes we'll refactor everything because DEVICE_DT_GET is not enough. Will the final solution still require 2 struct devices?

Copy link
Collaborator Author

@bjarki-andreasen bjarki-andreasen Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in the end, all device structs will be created by API type, DEVICE_DT_DEFINE(node_id, ..., api_ptr, api_type), and additional struct devices will be created if DEVICE_DT_API_DEFINE is used. In ROM, there will be a section for initializing devices, and a section containing iterable sections with device structs sorted by API type.

const Z_DECL_ALIGN(struct device) \
DEVICE_DT_API_NAME_GET(node_id, api_type) \
__attribute__((__section__(".z_device_api_" # api_type))) = { \
.name = DEVICE_DT_GET(node_id)->name, \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will break name uniqueness

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if you try to define the same API type to a node twice, which is not allowed, and will give an error as expected.

the unique name for uart api is _device_api_dts_ord DT_DEP_ORD(node_id) _uart
and for i2c would be _device_api_dts_ord DT_DEP_ORD(node_id) _i2c

Or am i missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a string, used by e.g. device_get_binding or to print device name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, the name property.
It should now become the string name with "_<api_type>" appended to it,
so "uart1" should become "uart1_uart" for the uart API type

Comment on lines +319 to +326
#define LOCATION_DEVICE_API_DEFINE(node_id, api_ptr) \
DEVICE_DT_API_DEFINE(node_id, location, api_ptr)

#define LOCATION_DEVICE_DT_GET(node_id) \
DEVICE_DT_API_GET(node_id, location)

#define LOCATION_DEVICE_DT_INST_GET(inst) \
DEVICE_DT_INST_API_GET(inst, location)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment here: not convinced about the semantics. We could avoid helper macros by just defining the subsystem token, e.g. #define LOCATION_API location.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

@bjarki-andreasen bjarki-andreasen force-pushed the rfc_device_driver_api_multi_iface branch from 476a428 to b26d2c0 Compare August 18, 2022 14:44
This draft shows how to update the API subsystem to be safer
and support multiple APIs pr device instance, at the cost of
some ROM

The update is entirely backwards compatible with the current
solution, which is to store a pointer to the API
implementation inside the struct device which uses it.

The new solution creates a device struct for each API in
iterable sections in ROM. The new device structs are
copies of the device struct defined by DEVICE_DT_DEFINE,
yet they have the API member overwritten to the specified
API. This allows for using them with the current API wrappers
without any modification.

To get the normal device, with its default API, use
DEVICE_DT_GET(node)

To get the same device, but for use with the location API, if
implemented for it, use DEVICE_DT_API_GET(node, location)

Both return a const struct device * which can be used
transparently with the appropriate API wrapper

This allows for a device to define multiple APIs for
itself.

It also ensures that only an API which is in
fact implemented for a specific device instance can
be used, giving a linker error if fx. the uart API is
not implemented for a device using DEVICE_DT_API_DEFINE,
and DEVICE_DT_API_GET(node, uart) is used.

The DEVICE_DT_API_GET should be wrapped in the API headers
to UART_DEVICE_DT_API_GET(node) etc. for clarity

Inside the device drivers, the instanciation for an uart
device will go from this:

DEVICE_DT_DEFINE(node_id, ..., uart_api_impl)

to:

DEVICE_DT_DEFINE(node_id, ...)
UART_DEVICE_DT_API_DEFINE(node_id, uart_api_impl)

and can be expanded to:

DEVICE_DT_DEFINE(node_id, ...)
UART_API_DT_DEFINE(node_id, uart_api_impl)
LOCATION_API_DT_DEFINE(node_id, loc_api_impl)
...

The APIs must also be updated to use the new solution.
This is done in one step:

Add convenience wrappers for
DEVICE_DT_API_DEFINE, DEVICE_DT_API_GET etc.
to appropriate API header files.

Done, API devices can now be defined and accessed.

In the future, we may change the DEVICE_DT_DEFINE macro to
set the struct device api member to NULL, and only allow
for the use of DEVICE_DT_API_GET etc. which is safer since
it is guaranteed that the API is implemented for the device

Signed-off-by: Bjarki Arge Andreasen <[email protected]>
@github-actions
Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Oct 19, 2022
@github-actions github-actions bot closed this Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants