Skip to content

PM_DEVICE_POWER_DOMAIN refactor #89372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

bjarki-andreasen
Copy link
Collaborator

@bjarki-andreasen bjarki-andreasen commented May 1, 2025

This PR refactors PM_DEVICE_POWER_DOMAIN to be simpler and more scalable, moving power domain and device constraint logic from device drivers to PM_DEVICE_RUNTIME.

There are a few bug fixes and patches as well which where discovered during refactoring :)

decsny
decsny previously requested changes May 2, 2025
Copy link
Member

@decsny decsny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks really good to me. The change request is specifically over some DT related nits. And I have one question about one of the other power domain drivers which is not being touched on this PR at the moment (power_domain_soc_state_change). Because that one in particular seems like the biggest refactor target of any of the existing in tree ones, given the goal of this PR is to remove code about calling turn on/off actions on domain children. That's basically all that driver's code seems to be for from what I can tell.

Comment on lines 12 to 19
/* Disable unused nodes on power_mode3_domain */
&dma0 {
status = "disabled";
};

&flexcomm1 {
status = "disabled";
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is fine, and it's maybe a bigger ask from this PR but since this RW is one of the few in tree users of power domain, would it make sense on this PR to fix that soc since the power domain is described incorrectly right now?

If not I guess nxp maintainers will fix it later.. but to be clear what I'm seeing is maybe an issue on that platform which I was not aware of before because I was not involved here, is that PM3 on that chip is not a domain but a power state, and should not be described as a domain? I'm not sure about this though because I am actually basically just now getting called in to get a handle on the power architecture of Zephyr due to problems we are having with this specific chip in Zephyr when users enable CONFIG_PM.... apparently the current state of things is not working out

It seems like instead there should be a domain representing the digital region of the chip that I think should use the new default driver you are adding in this PR? And that domain should mark the zephyr suspend (called pm2 on chip) and standby (called pm3 on the chip) states as the states that disable the power domain in DT? And all of these peripherals in this overlay being disabled here are part of that domain (except the analog dac and adc, but for now we can pretend they are at minimum because the digital domain is the most affected by power state transitions on this chip)

I actually think this current driver we have written here for our current conception of the "power domain" is maybe completely backwards and maybe not even valid due to it is representing a specific mode transition and not a physical (or conceptual) domain/power region? https://github.com/zephyrproject-rtos/zephyr/blob/main/drivers/power_domain/power_domain_soc_state_change.c . I noticed also that you refactored some of the other power domain drivers regarding this children handling on/off due to the redudancy/reimplementing. I think maybe this particular driver I linked doesn't even need to exist now.

I understand if you don't want to touch that power domain driver I linked on this PR, since I think it's kind of a vendor specific mess, even though it looks it was given a generic zephyr compatible, but at least if you have any guidance or clarity about this at least I will greatly appreciate it.

Comment on lines 762 to 779
out_dt_define(f"{node.z_path_id}_POWER_DOMAIN_CHILDREN_LEN_OKAY", len(pd_child_z_path_ids))
out_dt_define(f"{node.z_path_id}_POWER_DOMAIN_CHILDREN_FOREACH_OKAY(fn)",
" ".join(
f"fn(DT_{z_path_id})"
for z_path_id in pd_child_z_path_ids))
out_dt_define(f"{node.z_path_id}_POWER_DOMAIN_CHILDREN_FOREACH_OKAY_SEP(fn, sep)",
" DT_DEBRACKET_INTERNAL sep ".join(
f"fn(DT_{z_path_id})"
for z_path_id in pd_child_z_path_ids))
out_dt_define(f"{node.z_path_id}_POWER_DOMAIN_CHILDREN_FOREACH_OKAY_VARGS(fn, ...)",
" ".join(
f"fn(DT_{z_path_id}, __VA_ARGS__)"
for z_path_id in pd_child_z_path_ids))
out_dt_define(f"{node.z_path_id}_POWER_DOMAIN_CHILDREN_FOREACH_OKAY_SEP_VARGS(fn, sep, ...)",
" DT_DEBRACKET_INTERNAL sep ".join(
f"fn(DT_{z_path_id}, __VA_ARGS__)"
for z_path_id in pd_child_z_path_ids))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this addition should warrant update the macros.bnf description for this change similar to like pinctrl has done for example

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added entry to macros.bnf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is what vnd,reserved-compat is specifically for already, so this new one shouldn't be needed

Copy link
Collaborator Author

@bjarki-andreasen bjarki-andreasen May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, changed to use existing binding

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vnd,reserved-node node is specifically used to test devices with status reserved :) Added back the device-base one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did you see that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zassert_false(DT_HAS_COMPAT_STATUS_OKAY(vnd_reserved_compat), "");

Comment on lines 2948 to 2961
* @brief Variant of DT_POWER_DOMAIN_CHILDREN_FOREACH_OKAY with VARGS
*
* @see DT_POWER_DOMAIN_CHILDREN_FOREACH_OKAY()
*
* @details The macro @p fn must take one parameter, which will be a node
* identifier, followed by variadic args. The macro is expanded once for
* every power domain child with status "okay".
*
* @param node_id Power domain's node identifier
* @param fn Macro to invoke
* @param ... Variadic args passed to @p fn
*/
#define DT_POWER_DOMAIN_CHILDREN_FOREACH_OKAY_VARGS(node_id, fn, ...) \
DT_CAT(node_id, _POWER_DOMAIN_CHILDREN_FOREACH_OKAY_VARGS)(fn, __VA_ARGS__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other subsystems put the DT macros in include/zephyr/devicetree/ subfolder with their own header, can't this go that route as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to its own header

@decsny
Copy link
Member

decsny commented May 2, 2025

And one more question , is are you targeting this PR to merge before 4.2 freeze point?

@bjarki-andreasen
Copy link
Collaborator Author

bjarki-andreasen commented May 2, 2025

@decsny I'm all for refactoring the soc_state_change power domain and the board using it as part of this PR, it is indeed "backwards" compared to the power domain behavior I envision with this PR. I just did not want to "rock the boat" too much at once :)

@bjarki-andreasen
Copy link
Collaborator Author

And one more question , is are you targeting this PR to merge before 4.2 freeze point?

I'm working with an "as soon as possible" timeline to refactor pm device in general, so probably yes :D

static const struct device *Z_PM_DEVICE_DOMAIN_CHILDREN_NAME(dev_id) \
[Z_PM_DEVICE_DOMAIN_DYNAMIC_NUM];

#define Z_PM_DEVICE_DEFINE_DOMAIN_CHILDREN_DT(node_id, dev_id) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure, but doesn't this break builds if you have a device with status = "okay" but the driver isn't compiled in? The preprocessor has no way of knowing whether a device will actually exist at link-time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was one of the reasons for using device ordinals in the original implementation (the other being the ordinals are half the size). Using integers allows the initial link stage to complete successfully, at which point our hacky scripts can run a second pass using the first linker output to determine which devices actually exist in the build.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That breaks yes, just like any device in zephyr with DT_DEVICE_GET()

Copy link
Collaborator Author

@bjarki-andreasen bjarki-andreasen May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that its not any child device, only child devices which point the parent with power-domains = <>. This power domain has to have a driver for the child device to work so the app would fail at runtime anyway if we ignored that no driver was applied to the power domain :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 4bd44cf

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using DEVICE_DEPS, a missing driver would be silently ignored at runtime. We spent additional time iterating every child, checking if that childs pm domain was the parent which children we are iterating, then calling TURN_ON. This is no longer needed with this refactor, only actual power domain children are in the list, and all power domains needed have drivers, ensured at build time

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That breaks yes, just like any device in zephyr with DT_DEVICE_GET()

So a regression in behaviour. The standard control for drivers is to enable the subsystem, whereas this adds a significant new requirement. Imagine a board with an internal power domain which the ADC hangs off. Suddenly either CONFIG_ADC=n breaks all builds, or CONFIG_ADC=y also requires a board specific overlay to actually work. The same logic applies to external power domains and other subsystems as well.

Not an acceptable state of affairs in my view.

Using DEVICE_DEPS, a missing driver would be silently ignored at runtime.

Thats just describing the behavior of any piece of hardware without a compiled in driver, of course it is ignored at runtime.

We spent additional time iterating every child, checking if that childs pm domain was the parent

I suspect the number of cases where a devicetree node is a child of a power domain node, but not because of a power-domain = <> node, is essentially 0. So there is basically no overhead, because in 99% of cases the two lists would just be the same.

Copy link
Collaborator Author

@bjarki-andreasen bjarki-andreasen May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I will revert back to using DEVICE_DEPS and DEVICE_DEPS_DYNAMIC to track children.

My future vision is to move to a single linked list, and have children register themselves from pm_device_driver_init(), this removes the whole _DYNAMIC_NUM and DEVICE_DEPS deal, and makes it trivial to remove a device from a domain, like in the case of device_deinit() :) This of course requires all drivers to actually use pm_device_driver_init() so we can take a look at it again then :)

@bjarki-andreasen
Copy link
Collaborator Author

bjarki-andreasen commented May 6, 2025

Is that commit not applying the actions to all children of every device? So for example, resuming a SPI controller would now run ACTION_TURN_ON for all SPI devices on that bus, even if they aren't powered?

no, only child devices which point to the parent with power-domains = <>.

Validate that device dep children indeed belong to domain when
iterating dependent devices from pm_device_children_action_run().

Signed-off-by: Bjarki Arge Andreasen <[email protected]>
Introduce helpers pm_device_domain_children_turn_on() and
pm_device_domain_children_turn_off() to be called from
PM_DEVICE_RUNTIME when after resuming and before suspending a
power domain.

Signed-off-by: Bjarki Arge Andreasen <[email protected]>
The APIs pm_policy_device_power_lock_get and
pm_policy_device_power_lock_put are not mocked properly as they
should be excluded based on CONFIG_PM_POLICY_DEVICE_CONSTRAINTS,
not CONFIG_PM.

Signed-off-by: Bjarki Arge Andreasen <[email protected]>
Move device power domain and pm policy device logic to
PM_DEVICE_RUNTIME. Specifically:

- When resuming a device, if successful, get pm policy power
  lock for device, and notify any device power domain children
  of the new state of the device by calling TURN_ON.
- When suspending a device, first notify power domain children
  that device will be suspended by calling TURN_OFF, then put
  pm policy device lock, then try to suspend the device. If
  failure, notify children that device will remain resumed,
  and get pm policy device lock again.

Signed-off-by: Bjarki Arge Andreasen <[email protected]>
PM_DEVICE_RUNTIME now implements power domain child actions. Remove
this duplicate logic from the device power domain driver.

Signed-off-by: Bjarki Arge Andreasen <[email protected]>
PM_DEVICE_RUNTIME now implements power domain child actions. Remove
this duplicate logic from the device power domain driver.

Signed-off-by: Bjarki Arge Andreasen <[email protected]>
Add default power domain driver which is useful for locking system
sleep states based on a group of devices.

Signed-off-by: Bjarki Arge Andreasen <[email protected]>
Use the default power domain driver instead of creating a bespoke
one within the test suite.

Signed-off-by: Bjarki Arge Andreasen <[email protected]>
Add support for pm device runtime to sample. Simply resume the
temperature device before using it. If PM_DEVICE_RUNTIME is not
enabled, device is assumed resumed on boot.

Signed-off-by: Bjarki Arge Andreasen <[email protected]>
Implement pm device runtime for p3t1755 sensor.

Signed-off-by: Bjarki Arge Andreasen <[email protected]>
Update board to use default power domain for the "peripheral"
domain, which when resumed prevents SoC from entering suspend and
standby states as these power down peripherals.

Signed-off-by: Bjarki Arge Andreasen <[email protected]>
Enable PM_DEVICE_RUNTIME and explicitlly enable dependencies
required for the power domains and device drivers power management
to function. For now, this requires manually enabling
PM_DEVICE_RUNTIME for every device used in the sample by adding the
devicetree prop zephyr,pm-device-runtime-auto to make the drivers
actually adhere to PM_DEVICE_RUNTIME...

Signed-off-by: Bjarki Arge Andreasen <[email protected]>
@bjarki-andreasen bjarki-andreasen force-pushed the pm-power-domain-refactor branch from 8ca8d57 to 0fa6876 Compare May 12, 2025 11:31
@bjarki-andreasen
Copy link
Collaborator Author

bjarki-andreasen commented May 12, 2025

Update:

  • Reverted changes to power domain child tracking, along with changes required to RW612 to deal with the enabled but unused nodes
  • Enabled PM_DEVICE_RUNTIME for RW612 used in samples/sensor/thermometer using devicetree overlay
  • Added check to pm_device_foreach_child_run() to ensure child device is indeed on domain before running action. This was present in some power domains, but not pm/device.c

Currently testing on the RW612 to make sure it works as expected with the samples/sensor/thermometer, I will add a pm_device_runtime_stats feature similar to pm_stats to track devices states, checking how long they are resumed, which should help diagnose which devices are keeping an SoC from entering specific sleep states :)

Copy link

@decsny
Copy link
Member

decsny commented May 21, 2025

@bjarki-andreasen I don't know if this PR is the right place to bring this up, but I think we require a distinction between "turned off" and "disabled" for devices, and power domains seem like they might be the right place to handle this.

What I am understanding right now about the general spirit of the whole PM rework as in #89031 is that we want to decouple the state of the devices from the state of the system, ie suspend and resume decisions should not have anything to do with system mode transitions. But obviously it's recognized that devices are affected by the system modes in hardware nonetheless, and it appears that we are making the power domains the architectural glue between device and system.

One of the main limitations that I am seeing with the architecture (current and proposed) for our chip, RW612, basically seems to mostly boil down to the fact that there is no distinction between devices that are powered off and losing state and devices which are disabled but not losing state due to the properties of the hardware domain they are on as it relates to the system modes.

I am seeing that we basically have these valid situations with your proposal (correct me if I included or excluded a situation by mistake, except for other system power levels, I am only interested in active, suspend to idle, and standby) for an arbitrary device on a domain (which may have other devices) within an SOC (system):

System Mode Domain Device Status (software) Device Status (hardware)
Active On Active Active
Active On Suspended Up to driver
Active Off Off Lost state
Suspend On Active Active
Suspend On Suspended Up to driver
Suspend Off Off Lost state
Standby On Suspended Up to driver
Standby Off Off Lost state

I am going by the description of the power states right now which says that suspend to idle may put peripherals in low power state (I am assuming this means domains may or may not be turned off then), and standby does put all peripherals into low power states (except maybe some exceptions, I recognize that it gets a bit hairy per platform).

The issue here for our chip (RW612) is that there is two levels of effect by two different low power modes, the first will disable the peripherals to conserve power but not lose state, and the second will depower completely the peripherals and they will lose state.

The issue here then is that since "suspend" is now meant to be almost orthogonal to system transitions, we can't use it to mean "disabled", since this has nothing to do with software control and actually the device driver's handling of "suspend" might do different things on a device level than what happens at a system level where the entire domain is just disabled (you can think of it maybe conceptually as all the clocks are gated, which is part of what actually happens). From my understanding, we are trying to get away from system power transitions causing devices to suspend. Probably the driver when it gets a suspend action should disable its own clock. But I think what we need to know is if some external event caused the device to be disabled due to system wide state even if it wasnt suspended at the device/domain level.

So I am not sure exactly how to handle this, but one idea is that another sort of state or action is added to the PM device or domain state machine, so then the valid possible situations would expand at least to this (not necessarily all possible on every platform, depends on domain glue):

System Mode Domain Device Status (software) Device Status (hardware)
Active On Active Active
Active On Suspended Up to driver
Active Low Disabled Inactive but retain state
Active Off Off Lost state
Suspend On Active Active
Suspend On Suspended Up to driver
Suspend Low Disabled Inactive but retain state
Suspend Off Off Lost state
Standby On Suspended Up to driver
Standby Low Disabled Inactive but retain state
Standby Off Off Lost state

@bjarki-andreasen
Copy link
Collaborator Author

bjarki-andreasen commented May 21, 2025

@decsny

The states are actually quite a bit simpler :)

X here means "don't care"

System Mode Domain Device Status (Software) Device Status (Hardware)
X On Active Active
X On Suspended Active
X Off Off X

All the power domain does is prevent a system from transitioning to a state which will disable the device hardware. When and if the system actually enters said state is not actually relevant to the device (from the perspective of PM_DEVICE_RUNTIME) :)

When the PM_DEVICE_RUNTIME calls suspend on a device, it means two things:

  1. User no longer needs to use device, so go to the lowest possible power mode.
  2. If device is on power domain, the power domain may now be suspended at any time, at which point the device hardware may loose power, so do act as if it already has.

Now, if the device is on a power domain, and all devices on the power domain are suspended, the power domain will be suspended. Once it is, it will release the system pm state lock if it has one, thus, and do note, no longer preventing the system from entering sleep modes. Child devices will get the turn_off action immediately when the power domain is suspended, not when and if the system actually transitions to a sleep state.

The system can have states locked for any number of reasons, like to meet some minimum latency requirement. Every single device and power domain may be suspended, but if something has requested a pm policy latency of 0, the system will not enter a sleep state, and all device hardware remains Active :)

With this said, a device may still opt to monitor for and react to pm state transitions if necessary, that is totally fine, but as the table above suggests, PM_DEVICE_RUNTIME does not care.

@decsny
Copy link
Member

decsny commented May 21, 2025

All the power domain does is prevent a system from transitioning to a state which will disable the device hardware. When and if the system actually enters said state is not actually relevant to the device (from the perspective of PM_DEVICE_RUNTIME) :)

Yes, that was my understanding. But what I'm saying is that for us there basically is what seems like more than 2 binary states (on, off) for our domain, there is also a situation where the domain is on but everything that is part of it is disabled but not off, which has real implications for the system, and these domain states in the hardware are affected by the system mode. Right now we have no way to differentiate at the domain level between a system state that will cause the domain to turn completely off vs cause the domain to be in this low power state that disables things.

When the PM_DEVICE_RUNTIME calls suspend on a device, it means two things:

1. User no longer needs to use device, so go to the lowest possible power mode.

2. If device is on power domain, the power domain may now be suspended at any time, at which point the device hardware **may** loose power, so do act as if it already has.

This was also my understanding but I do wonder what implication this has for driver development. Like how is the driver supposed to act or what should it do if it loses power, it's not clear. Right now for example on the uart driver we have, we save off which interrupts are enabled so that upon a turn on action, we re-enable them. Or is that wrong and app should reconfigure? Or should the driver do more than this? Because right now it just seems to be a wild west of band-aids of different varieties for different system designs and use cases. It's unclear to me but maybe outside the scope of this PR, although we are talking about the domain-related actions specifically (on and off).

Now, if the device is on a power domain, and all devices on the power domain are suspended, the power domain will be suspended. Once it is, it will release the system pm state lock if it has one, thus, and do note, no longer preventing the system from entering sleep modes. Child devices will get the turn_off action immediately when the power domain is suspended, not when and if the system actually transitions to a sleep state.

Yes, I understood that as well, that is the best part of this PR. It makes the domains and device PM a lot more solid and clear. This is what I have been explaining that this PR is doing to other people.

The system can have states locked for any number of reasons, like to meet some minimum latency requirement. Every single device and power domain may be suspended, but if something has requested a pm policy latency of 0, the system will not enter a sleep state, and all device hardware remains Active :)

I think now I am confused what you mean by active here, because I thought that if a power domain is "suspended" (as you call it, although I am confused by the terminology because I thought that domains are only either on or off), that all the devices on that domain are also off, or at least suspended, but I can see in the PM device state machine that off and suspended are different states, and I thought the off action is the one that a domain would cause...

With this said, a device may still opt to monitor for and react to pm state transitions if necessary, that is totally fine, but as the table above suggests, PM_DEVICE_RUNTIME does not care.

And I think you are referring to something like what I did on #89760, and I get that the monitoring of the system state transitions is fine, but, what I am trying to say, is that the constraints are not distinct enough. Right now the PM constraints come from the Devicetree property zephyr,disabling-states, but there is no way to tell apart in our case these states that will only disable the device (or domain) vs depower it. So what I am basically wondering is if this DT property should be split into two more specific ones, which are like disabling-states and depowering-states, and have those two separate types of PM policy constraints to mechanize, which seem to already be based around system modes, and not related to runtime device PM.

@JordanYates
Copy link
Collaborator

When the PM_DEVICE_RUNTIME calls suspend on a device, it means two things:

  1. User no longer needs to use device, so go to the lowest possible power mode.
  2. If device is on power domain, the power domain may now be suspended at any time, at which point the device hardware may loose power, so do act as if it already has.

This is pretty explicitly NOT the correct behaviour. SUSPEND requires that context is preserved, OFF is used when context is lost. Requiring all drivers to be updated such that SUSPEND could lose context is both a large piece of work and a bad idea IMO.

If a device is in SUSPEND and context is about to be lost, it is expected that ACTION_TURN_OFF is called beforehand.

@decsny
Copy link
Member

decsny commented May 22, 2025

When the PM_DEVICE_RUNTIME calls suspend on a device, it means two things:

  1. User no longer needs to use device, so go to the lowest possible power mode.
  2. If device is on power domain, the power domain may now be suspended at any time, at which point the device hardware may loose power, so do act as if it already has.

This is pretty explicitly NOT the correct behaviour. SUSPEND requires that context is preserved, OFF is used when context is lost. Requiring all drivers to be updated such that SUSPEND could lose context is both a large piece of work and a bad idea IMO.

If a device is in SUSPEND and context is about to be lost, it is expected that ACTION_TURN_OFF is called beforehand.

This is why I was asking on my last comment what did he mean by suspend, I think the term is being overloaded. As far as I know this PR is not changing the meaning of suspend state for devices but changing the behavior of the domains so that if all of the children of the domain are suspended then it will also turn them all off (through the turn off action)

@decsny
Copy link
Member

decsny commented May 28, 2025

@JordanYates is your change request still relevant? This PR makes a lot of sense to me in it's current state, I would actually like to support it's merge sooner than later, so do we need an arbitration?

@bjarki-andreasen I'm not sure what are the twister failures here but is this PR finished otherwise?

@bjarki-andreasen
Copy link
Collaborator Author

@JordanYates is your change request still relevant? This PR makes a lot of sense to me in it's current state, I would actually like to support it's merge sooner than later, so do we need an arbitration?

@bjarki-andreasen I'm not sure what are the twister failures here but is this PR finished otherwise?

Not sure what the CI failures are about, those should be fast addressed :) Got sidelined by a whole lot of other tasks, will get back to this Monday, I think its a couple of days work until ready to merge :)

Copy link
Collaborator

@JordanYates JordanYates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is still doing a bunch of things that are not coupled to each other:

  • Fixing a GPIO driver
  • Updating the implementation of existing power domains
  • Adding a new type of power domain
  • Multiple commits related to one particular sensor sample
  • Implementing PM on a random sensor
  • Changing power domain setups on one SoC

Not all of these need to be their own PR, but the core of this PR (based on the title) is changing where power domain logic runs. That can be done without any of the extra commits that have crept in here.

I don't think its that hard to split this up to be more manageable (and faster for the trivial aspects)

static int invoke_suspend(const struct device *dev)
{
pm_policy_device_power_lock_put(dev);
pm_device_domain_children_turn_off(dev);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT this pattern is still adding a chunk of overhead to all PM calls.

Every suspend/resume cycle is iterating over every child node regardless of whether the device has any capability of acting as a power domain. This is particularly problematic for PM_DEVICE_FLAG_ISR_SAFE, which is running all of these operations in an ISR context.

Should this not be using PM_DEVICE_FLAG_PD in order to determine whether to run these functions in the first place?
This would also allow dropping the pm: device: pm_device_children_action_run: validate child domain commit entirely, as I can't see any way in which a device could be a child of a power domain without being on that power domain.

if (dev->pm_base->action_cb(dev, PM_DEVICE_ACTION_SUSPEND) < 0) {
pm_device_domain_children_turn_on(dev);
pm_policy_device_power_lock_get(dev);
return -EAGAIN;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also overrides return values from the drivers, which was not previously the case

@bjarki-andreasen
Copy link
Collaborator Author

bjarki-andreasen commented May 29, 2025

@JordanYates some parts can be moved to their own PRs sure, but this PR is reworking power domains, which means devices relying on previous power domain logic need to be updated to not introduce regressions with this PR to then be fixed by a later PR, that's usually how refactors are done.

@JordanYates
Copy link
Collaborator

which means devices relying on previous power domain logic need to be updated

I'm not complaining about the commits that update power domain drivers, they are obviously fine.
Its the assortment of other changes, some of which are the cause of CI failures.
Other parts (e.g. the PM policy changes) are not obvious that they are even changed in this PR from the title or description, and people that care about PM policy may have simply skipped the PR as not relevant to them.

@bjarki-andreasen
Copy link
Collaborator Author

which means devices relying on previous power domain logic need to be updated

I'm not complaining about the commits that update power domain drivers, they are obviously fine. Its the assortment of other changes, some of which are the cause of CI failures. Other parts (e.g. the PM policy changes) are not obvious that they are even changed in this PR from the title or description, and people that care about PM policy may have simply skipped the PR as not relevant to them.

I get your point, I will create new PRs for those issues and cherry-pick them here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants